Verification of Uncurated Protein Annotations

Verification of Uncurated Protein Annotations

Francisco M. Couto (Universidade de Lisboa, Portugal), Mário J. Silva (Universidade de Lisboa, Portugal), Vivian Lee (European Bioinformatics Institute, UK), Emily Dimmer (European Bioinformatics Institute, UK), Evelyn Camon (European Bioinformatics Institute, UK) and Rolf Apweiler (European Bioinformatics Institute, UK)
DOI: 10.4018/978-1-60566-274-9.ch016
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusions from it. On the other hand, research in biomedical information retrieval and information extraction are nowadays delivering Text Mining solutions that can support curators to improve the efficiency of their work to deliver better data resources. Over the past decades, automatic text processing systems have successfully exploited biomedical scientific literature to reduce the researchers’ efforts to keep up to date, but many of these systems still rely on domain knowledge that is integrated manually leading to unnecessary overheads and restrictions in its use. A more efficient approach would acquire the domain knowledge automatically from publicly available biological sources, such as BioOntologies, rather than using manually inserted domain knowledge. An example of this approach is GOAnnotator, a tool that assists the verification of uncurated protein annotations. It provided correct evidence text at 93% precision to the curators and thus achieved promising results. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.
Chapter Preview
Top

Background

A large amount of the information discovered in Molecular Biology has been mainly published in BioLiterature. However, analysing and identifying information in a large collection of unstructured texts is a painful and hard task, even to an expert.

Complete Chapter List

Search this Book:
Reset