Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text

Yanliang Qi, Min Song, Suk-Chung Yoon, Lori deVersterre

Source Title: Organizational Efficiency through Intelligent Information Technologies

DOI: 10.4018/978-1-4666-2047-6.ch003

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it’s abstract. Compared with the authors’ previous work, which investigated the performance of Naïve Bayes, Linear Regression and SVM(reg1/2), this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naïve Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.

Chapter Preview

Top

Introduction

In recent years, there has been a tremendous increase in the number of biomedical documents in digital libraries that provide users (researchers, readers) with access to the scientific and technical literature of those biomedical documents (articles or abstract) (Liu, 2007). For example, the PubMed digital library (a free search engine for accessing the MEDLINE database of biomedical research articles) currently contains over 18 million citations from various types of biomedical documents published in the past several decades (www.pubmed.gov). With the rapid expansion of the number of biomedical documents, the ability to effectively determine the relevant documents from a large dataset has become increasingly difficult for users. As it is a challenging task for a reader to examine complete documents to determine whether the document would be useful, short semantic metadata like key-phrases would be an alternative for a reader to understand the concept of the document (Hamdi, 2008). Key phrases are increasingly used as brief descriptors of text document content. However, not all of the biomedical documents in digital libraries have key phrases, so readers have to read through the documents to determine whether they are relevant to their research. Therefore automatically presenting key phrases from a document has become an important task in the biomedical domain.

Automatic key-phrase extraction can be defined as the process of extracting key phrases from a document that an author (or a professional indexer) is likely to assign to that document (El-Beltagy, 2006). Consequently, automatic extraction makes it feasible to generate key phrases for a large number of full-text documents that do not have manually assigned key phrases. It also reduces the cost and time spent manually assigning key phrases to documents (Zhang, Zincir-Heywood, et al., 2005). Key-phrases, short semantic metadata, are useful for various purposes including summarizing as well as search engine optimization. Using key phrases for full-text documents can vary: when they are presented on the first page of the document, the goal is summarization, which enables the users to quickly determine the concept of the document; when they are entered in a search engine query box in a digital library, the goal is to enable the users to make the search more precise (Turney, 2000). Therefore, they play an important role in document descriptions and document search in digital libraries, e.g., PubMed.

Traditionally, key-phrases are assigned manually to documents by authors or professional indexers. The indexers often choose key phrases from a predefined control vocabulary: Medical Subject Heading (MeSH). Authors usually choose key phrases to present their work in a certain way or to maximize its chance of being noticed by particular searchers. However, issues with this manual assignment of key-phrases are (1) it is a time consuming process, (2) it requires knowledge of subject matter, and (3) entails an updated control vocabulary list (Witten, Paynter et al., 1999; Kumar & Srinathan, 2008). Automatic key phrase extraction can be a good practical alternative.

Key-phrases can be automatically generated in two ways: (1) key-phrase assignment (controlled-vocabulary indexing based), which assigns key-phrases from a controlled vocabulary to documents or (2) key-phrase extraction (free-term indexing based), which identifies and selects the most descriptive phrases in that document (Dumais, Platt et al., 1998).

In domain-specific control-indexing, key-phrases are chosen from a controlled vocabulary such as the MeSH terminology list (Medelyan & Witten, 2006). MeSH provides a consistent way to assign phrases to biomedical documents that have the same concept. However the downsides are that the lists are expensive to build and maintain, so they are not always up to date and potentially useful phrases are ignored if they are not in the list (Jones & Paynter, 2003).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text

Abstract

Introduction

Complete Chapter List