Lexical Enrichment of Biomedical Ontologies

Lexical Enrichment of Biomedical Ontologies

Nils Reiter (Heidelberg University, Germany) and Paul Buitelaar (National University of Ireland Galway, UK)
DOI: 10.4018/978-1-60566-274-9.ch007
OnDemand PDF Download:
$37.50

Abstract

This chapter is concerned with lexical enrichment of ontologies, that is how to enrich a given ontology with lexical information derived from a semantic lexicon such as WordNet or other lexical resources. The authors present an approach towards the integration of both types of resources, in particular for the human anatomy domain as represented by the Foundational Model of Anatomy and for the molecular biology domain as represented by an ontology of biochemical substances. The chapter describes our approach on enriching these biomedical ontologies with information derived from WordNet and Wikipedia by matching ontology class labels to entries in WordNet and Wikipedia. In the first case the authors acquire WordNet synonyms for the ontology class label, whereas in the second case they acquire multilingual translations as provided by Wikipedia. A particular point of emphasis here is on selecting the appropriate interpretation of ambiguous ontology class labels through sense disambiguation, which we address by use of a simple algorithm that selects the most likely sense for an ambiguous term by statistical signi?cance of co-occurring words in a domain corpus. Acquired synonyms and translations are added to the ontology by use of the LingInfo model, which provides an ontology-based lexicon model for the annotation of ontology classes with (multilingual) terms and their linguistic properties.
Chapter Preview
Top

Introduction

As information systems become more and more open, i.e. by including web content, as well as more complex, e.g. by dynamically integrating web services for specific tasks, data and process integration becomes an ever more pressing need - in particular also in the context of biomedical information systems. A wide variety of data and processes must be integrated in a seamless way to provide the biomedical professional with fast and efficient access to the right information at the right time.

A promising approach to information integration is based on the use of ontologies that act as a formalized inter-lingua onto which various data sources as well as processes can be mapped. An ontology is an explicit, formal specification of a shared conceptualization of a domain of interest as defined by Gruber (1993), where ‘formal’ implies that the ontology should be machine-readable and ‘shared’ that it is accepted by a community of stakeholders in the domain of interest. Ontologies represent the common knowledge of this community, allowing its members and associated automatic processes to easily exchange and integrate information as defined by this knowledge.

For instance, by mapping a database of patient radiology reports as well as publicly accessible scientific literature on related medical conditions onto the same ontological representation a service can be build that provides the biomedical professional with patient-specific information on up-to-date scientific research. Scenarios like these can however only work if data can be mapped to ontologies on a large-scale, which implies the automation of this process by automatic semantic annotation. As a large part of biomedical data is available only in textual form (e.g. scientific literature, diagnosis reports), such systems will need to have knowledge also of (multilingual) terminology in order to correctly map text data to ontologies.

This chapter is therefore concerned with the enrichment of ontologies with (multilingual) terminology. We describe an approach to enrich biomedical ontologies with WordNet (Fellbaum, 1998) synonyms for ontology class labels, as well as multilingual translations as provided by Wikipedia. A particular point of emphasis is on selecting the appropriate interpretation of ambiguous ontology class labels through sense disambiguation. Acquired synonyms and translations are added to the ontology by use of the LingInfo model, which provides an ontology-based lexicon model for the annotation of ontology classes with (multilingual) terms and their linguistic properties.

Related work to this chapter is on word sense disambiguation and specifically domain-specific word sense disambiguation as a central aspect of our algorithm lies in selecting the most likely sense for ambiguous labels on ontology classes. The work presented here is based directly on Buitelaar & Sacaleanu (2001) and similar approaches (McCarthy et al., 2004a; Koeling & McCarthy, 2007). Related to this work is the assignment of domain tags to WordNet synsets (Magnini & Cavaglia, 2000), which would obviously help in the automatic assignment of the most likely synset in a given domain – as shown in Magnini et al. (2001). An alternative to this idea is to simply extract that part of WordNet that is directly relevant to the domain of discourse (Cucchiarelli & Velardi, 1998; Navigli & Velardi, 2002).

However, more directly in line with our work on enriching a given ontology with lexical information derived from a semantic lexicon is presented in Pazienza and Stellato (2006). In contrast to Pazienza and Stellato (2006), the approach we present in this chapter uses a domain corpus as additional evidence for statistical significance of a synset.

Finally, some work on the definition of ontology-based lexicon models (Alexa et al., 2002; Gangemi et al., 2003; Buitelaar et al., 2006) is of (indirect) relevance to the work presented here as the derived lexical information needs to be represented in such a way that it can be easily accessed and used by natural language processing components as well as ontology management and reasoning tools.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Preface
Violaine Prince, Mathieu Roche
Chapter 1
Sophia Ananiadou
Text mining provides the automated means to manage information overload and overlook. By adding meaning to text, text mining techniques produce a... Sample PDF
Text Mining for Biomedicine
$37.50
Chapter 2
Dimitrios Kokkinakis
The identification and mapping of terminology from large repositories of life science data onto concept hierarchies constitute an important initial... Sample PDF
Lexical Granularity for Automatic Indexing and Means to Achieve It: The Case of Swedish MeSH®
$37.50
Chapter 3
M. Teresa Martín-Valdivia, Arturo Montejo-Ráez, M. C. Díaz-Galiano, José M. Perea Ortega, L. Alfonso Ureña-López
This chapter argues for the integration of clinical knowledge extracted from medical ontologies in order to improve a Multi-Label Text... Sample PDF
Expanding Terms with Medical Ontologies to Improve a Multi-Label Text Categorization System
$37.50
Chapter 4
Piotr Pezik, Antonio Jimeno Yepes, Dietrich Rebholz-Schuhmann
The present chapter discusses the use of terminological resources for Information Retrieval in the biomedical domain. The authors first introduce a... Sample PDF
Using Biomedical Terminological Resources for Information Retrieval
$37.50
Chapter 5
Laura Diosan, Alexandrina Rogozan, Jean-Pierre Pécuchet
The automatic alignment between a specialized terminology used by librarians in order to index concepts and a general vocabulary employed by a... Sample PDF
Automatic Alignment of Medical Terminologies with General Dictionaries for an Efficient Information Retrieval
$37.50
Chapter 6
Vincent Claveau
This chapter presents a simple yet efficient approach to translate automatically unknown biomedical terms from one language into another. This... Sample PDF
Translation of Biomedical Terms by Inferring Rewriting Rules
$37.50
Chapter 7
Nils Reiter, Paul Buitelaar
This chapter is concerned with lexical enrichment of ontologies, that is how to enrich a given ontology with lexical information derived from a... Sample PDF
Lexical Enrichment of Biomedical Ontologies
$37.50
Chapter 8
Torsten Schiemann, Ulf Leser, Jörg Hakenberg
Ambiguity is a common phenomenon in text, especially in the biomedical domain. For instance, it is frequently the case that a gene, a protein... Sample PDF
Word Sense Disambiguation in Biomedical Applications: A Machine Learning Approach
$37.50
Chapter 9
M. Narayanaswamy, K. E. Ravikumar, Z. Z. Hu, K. Vijay-Shanker, C. H. Wu
Protein posttranslational modification (PTM) is a fundamental biological process, and currently few text mining systems focus on PTM information... Sample PDF
Information Extraction of Protein Phosphorylation from Biomedical Literature
$37.50
Chapter 10
Yves Kodratoff, Jérôme Azé, Lise Fontaine
This chapter argues that in order to extract significant knowledge from masses of technical texts, it is necessary to provide the field specialists... Sample PDF
CorTag: A Language for a Contextual Tagging of the Words Within Their Sentence
$37.50
Chapter 11
Yun Niu, Graeme Hirst
The task of question answering (QA) is to find an accurate and precise answer to a natural language question in some predefined text. Most existing... Sample PDF
Analyzing the Text of Clinical Literature for Question Answering
$37.50
Chapter 12
Nadine Lucas
This chapter presents the challenge of integrating knowledge at higher levels of discourse than the sentence, to avoid “missing the forest for the... Sample PDF
Discourse Processing for Text Mining
$37.50
Chapter 13
Dimosthenis Kyriazis, Anastasios Doulamis, Theodora Varvarigou
In this chapter, a non-linear relevance feedback mechanism is proposed for increasing the performance and the reliability of information (medical... Sample PDF
A Neural Network Approach Implementing Non-Linear Relevance Feedback to Improve the Performance of Medical Information Retrieval Systems
$37.50
Chapter 14
Yitao Zhang, Jon Patrick
The fast growing content of online articles of clinical case studies provides a useful source for extracting domain-specific knowledge for improving... Sample PDF
Extracting Patient Case Profiles with Domain-Specific Semantic Categories
$37.50
Chapter 15
Laura I. Furlong, Ferran Sanz
SNPs constitute key elements in genetic epidemiology and pharmacogenomics. While data about genetic variation is found at sequence databases... Sample PDF
Identification of Sequence Variants of Genes from Biomedical Literature: The OSIRIS Approach
$37.50
Chapter 16
Francisco M. Couto, Mário J. Silva, Vivian Lee, Emily Dimmer, Evelyn Camon, Rolf Apweiler
Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a... Sample PDF
Verification of Uncurated Protein Annotations
$37.50
Chapter 17
Burr Settles
ABNER (A Biomedical Named Entity Recognizer) is an open-source software tool for text mining in the molecular biology literature. It processes... Sample PDF
A Software Tool for Biomedical Information Extraction (And Beyond)
$37.50
Chapter 18
Asanee Kawtrakul, Chaveevarn Pechsiri, Sachit Rajbhandari, Frederic Andres
Valuable knowledge has been distributed in heterogeneous formats on many different Web sites and other sources over the Internet. However, finding... Sample PDF
Problems-Solving Map Extraction with Collective Intelligence Analysis and Language Engineering
$37.50
Chapter 19
Christophe Jouis, Magali Roux-Rouquié, Jean-Gabriel Ganascia
Identical molecules could play different roles depending of the relations they may have with different partners embedded in different processes, at... Sample PDF
Seekbio: Retrieval of Spatial Relations for System Biology
$37.50
Chapter 20
Jon Patrick, Pooyan Asgari
There have been few studies of large corpora of narrative notes collected from the health clinicians working at the point of care. This chapter... Sample PDF
Analysing Clinical Notes for Translation Research: Back to the Future
$37.50
About the Contributors