Reference Hub2
BioTextRetriever: A Tool to Retrieve Relevant Papers

BioTextRetriever: A Tool to Retrieve Relevant Papers

Célia Talma Gonçalves (Instituto Superior de Contabilidade e Administração do Porto & CEISE-STI, Portugal), Rui Camacho (Universidade do Porto, Portugal) and Eugénio Oliveira (Universidade do Porto, Portugal)
Copyright: © 2011 |Volume: 2 |Issue: 3 |Article: 2 |Pages: 16
ISSN: 1947-9115|EISSN: 1947-9123|EISBN13: 9781613508176|DOI: 10.4018/jkdb.2011070102
Cite Article Cite Article

MLA

Gonçalves, Célia Talma, Rui Camacho and Eugénio Oliveira. "BioTextRetriever: A Tool to Retrieve Relevant Papers." IJKDB 2.3 (2011): 21-36. Web. 11 Jan. 2020. doi:10.4018/jkdb.2011070102

APA

Gonçalves, C. T., Camacho, R., & Oliveira, E. (2011). BioTextRetriever: A Tool to Retrieve Relevant Papers. International Journal of Knowledge Discovery in Bioinformatics (IJKDB), 2(3), 21-36. doi:10.4018/jkdb.2011070102

Chicago

Gonçalves, Célia Talma, Rui Camacho and Eugénio Oliveira. "BioTextRetriever: A Tool to Retrieve Relevant Papers," International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 2 (2011): 3, accessed (January 11, 2020), doi:10.4018/jkdb.2011070102

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Whenever new sequences of DNA or proteins have been decoded it is almost compulsory to look at similar sequences and papers describing those sequences in order to both collect relevant information concerning the function and activity of the new sequences and/or know what is known already about similar sequences. In current web sites and data bases of sequences there are, usually, a set of curated paper references linked to each sequence. Those links are a good starting point to look for relevant information related to a set of sequences. One way to implement such approach is to do a blast with the new decoded sequences, and collect similar sequences. Then one looks at the papers linked with the similar sequences. Most often the number of retrieved papers is small and one has to search large data bases for relevant papers. This paper proposes a process of generating a classifier based on the initially set of relevant papers. First, the authors collect similar sequences using an alignment algorithm like Blast. Then, the authors use the enlarges set of papers to construct a classifier. Finally a classifier is used to automatically enlarge the set of relevant papers by searching the MEDLINE using the automatically constructed classifier.

References

Ashburner M. (2000). Gene ontology: Tool for the unification of biology.Nature Genetics, 25, 25–29. 10.1038/7555610802651
Breiman L. (1996). Bagging predictors.Machine Learning, 24(2), 123–140. 10.1007/BF00058655
Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection from libraries of models. In Proceedings of the Twenty-First International Conference on Machine Learning (p. 18).
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems.
Divoli, A., Winter, R., Pettifer, S., & Attwood, T. (2005). BioQSpace: An interactive visualisation tool for clustering MEDLINE abstracts. Retrieved from http://wenku.baidu.com/view/06d2c4d9a58da0116d174905
Dollah, R., Seddiqui, M. H., & Aono, M. (2010). The effect of using hierarchical structure for classifying biomedical text abstracts. Retrieved from https://kaigi.org/jsai/webprogram/2010/paper-92.html
Dzeroski S. Zenko B. (2004). Is combining classifiers with stacking better than selecting the best one?Machine Learning, 54(3), 255–273. 10.1023/B:MACH.0000015881.36452.6e
Efron B. Tibshirani R. (1993). An introduction to the bootstrap. New York, NY: Chapman & Hall.
Fellbaum C. (1998). WordNet: An electronical lexical database. Cambridge, MA: MIT Press.
Freund Y. Schapire R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting.Journal of Computer and System Sciences, 55(1), 119–139. 10.1006/jcss.1997.1504
Frunza O. Inkpen D. Tran T. (2011). A machine learning approach for identifying disease-treatment relations in short texts.IEEE Transactions on Knowledge and Data Engineering, 23(6), 801–814. 10.1109/TKDE.2010.152
Gonçalves, C. A., Gonçalves, C. T., Camacho, R., & Oliveira, E. C. (2010, June). The impact of preprocessing on the classification of MEDLINE documents. In Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems, in conjunction with the International Conference on Enterprise Information Systems, Funchal, Madeira, Portugal (pp. 53-61).
Gonçalves, C. T., Camacho, R., & Oliveira, R. (2011). From sequences to papers: An information retrieval exercise. In Proceedings of the 2nd Workshop on Biological Data Mining and its Applications in Healthcare collocated with 10th IEEE International Conference on Data Mining, Vancouver, BC, Canada.
Hall M. Eibe E. Holmes G. Pfahringer B. Reutemann P. Witten I. H. (2009). The WEKA data mining software: an update.SIGKDD Exploration Newsletter, 11(1), 10–18. 10.1145/1656274.1656278
Homayouni H. Hashemi S. Hamzeh A. (2010). A lazy ensemble learning method to classification.International Journal of Computer Science Issues, 7(5).
Imambi S. Sudha T. (2011). Classification of MEDLINE documents using global relevant weighing schema.International Journal of Computers and Applications, 16(3), 45–48. 10.5120/1989-2679
Indra, N., Sarkar, N., Schenk, R., Miller, H., & Norton, C. (2009). LigerCat: using “MeSH Clouds” from journal, article, or gene citations to facilitate the identification of relevant biomedical literature. In Proceedings of the American Medical Informatics Association Annual Symposium (pp. 563-567).
Kotsiantis S. Pintelas P. (2004). Combining bagging and boosting.International Journal of Computational Intelligence, 1(4), 324–333.
Opitz D. Maclin R. (1999). Popular ensemble methods: An empirical study.Journal of Artificial Intelligence Research, 11, 169–198.
Porter M. F. (1997). An algorithm for suffix stripping. In JonesK. S.WillettP. (Eds.), Reading in information retrieval (pp. 313–316). San Francisco, CA: Morgan Kaufmann.
Rebholz-Schuhmann, D., Pezik, P., Lee, V., Kim, J.-J., Del Gratta, R., & Sasaki, Y. …Ananiadou, S. (2008). Biolexicon: Towards a reference terminological resource in the biomedical domain. In Proceedings of the 16th Annual International Conference on Intelligent Systems for Molecular Biology.
Sehgal A. K. Sanmay D. Noto K. Milton H. Saier M. H. Jr Elkan C. (2011). Identifying relevant data for a biological database: Handcrafted rules versus machine learning.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3), 851–857. 10.1109/TCBB.2009.8321393656
Settles B. (2005). Abner: an open source tool for automatically tagging genes, proteins and other entity names in text.Bioinformatics (Oxford, England), 21(14), 3191–3192. 10.1093/bioinformatics/bti47515860559
Wheeler D. L. Barrett T. Benson D. A. Bryant S. H. Canese K. Chetvernin V. (2006). Database resources of the national center for biotechnology information.Nucleic Acids Research, 34.
Zhou W. Smalheiser N. R. Yu C. (2006). A tutorial on information retrieval: basic terms and concepts.Journal of Biomedical Discovery and Collaboration, 1(2).16722601

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.