Applications of Ontologies and Text Mining in the Biomedical Domain

Applications of Ontologies and Text Mining in the Biomedical Domain

A. Jimeno-Yepes (European Bioinformatic Institute, UK), R. Berlanga-Llavori (Universitat Jaume I, Spain) and D. Rebholz-Schuchmann (European Bioinformatic Institute, UK)
Copyright: © 2010 |Pages: 23
DOI: 10.4018/978-1-61520-859-3.ch012


Ontologies represent domain knowledge that improves user interaction and interoperability between applications. In addition, ontologies deliver precious input to text mining techniques in the biomedical domain, which might improve the performance in different text mining tasks. This chapter will explore on the mutual benefits for ontologies and text mining techniques. Ontology development is a time consuming task. Most efforts are spent in the acquisition of terms that represent concepts in real life. This process can use the existing scientific literature and the World Wide Web. The identification of concept labels, i.e. terms, from these sources using text mining solutions improves ontology development since the literature resources make reference to existing terms and concepts. Furthermore, automatic text processing techniques profit from ontological resources in different tasks, for example in the disambiguation of terms and the enrichment of terminological resources for the text mining solution. One of the most important text mining tasks that exploits ontological resources consists of the mapping of concepts to terms in textual sources (e.g. named entity recognition, semantic indexing) and the expansion of queries in information retrieval.
Chapter Preview

Use Of Ontologies In Text Mining

Text mining is the processing and analysis of data stored in textual representation. Text mining extracts facts from text to fill databases or to improve exploitation of document content through better retrieval or navigation in the document. Text mining consists of two main sub-tasks: information retrieval (IR) and information extraction (IE). IR techniques aim at recovering relevant documents from a large textual repository in order to satisfy the user’s information need expressed by his retrieval query. Information extraction solutions glean facts from a set of documents.

In text mining systems, IR and IE are usually interlinked (e.g. Figure 1). IR is used to retrieve relevant documents or parts of the document (e.g., paragraphs or sentences) to be possibly further processed by IE methods. The other way around, IE may feed identified results into an IR system to produce better results. For example, the IR system can generate an enriched index based on the results from the IE system to allow better performance. In the following sections, we present in more detail all involved text mining components and demonstrate different usages and exploitations of the ontological resources to this end.

Figure 1.

Information retrieval and information extraction interaction


Ontologies And Information Retrieval

The main task of an Information Retrieval (IR) system is the recovery of documents from a collection of documents to respond to the user’s information need with the most relevant set of document available. Figure 2 shows the schema of a typical setup of an IR system. The input to the system is a collection of documents and a query. The output is a selection of documents (ranked or not) that matches the relevance criterion for retrieval. Relevance feedback, i.e. user feedback on the relevance of the retrieved documents, might improve the retrieval performance.

Complete Chapter List

Search this Book: