Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A Comparative Evaluation of Different Keyword Extraction Techniques

Raj Kishor Bisht

Source Title: International Journal of Information Retrieval Research (IJIRR) 12(1)

DOI: 10.4018/IJIRR.289573

Article PDF Download Open access articles are freely available for download

Abstract

Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.

Article Preview

Top

1. Introduction

A text is a collection of words. A major part of the text is covered with function words that are necessary to make a sentence meaningful and grammatically correct. The author finds many other words in the text related to the theme of the topic. These words carry important information about the text and this information is useful in many tasks like information retrieval, natural language processing, text summarization, document categorization, etc. These words can be described as keywords. Thus, the automatic extraction of keywords is an important research direction in the field of text mining. The process of extracting keywords is to find the words that are sufficiently informative to represent the text. It is a challenging task to define a generalized rule for every text as different texts may have different linguistic features. To uncover these challenges, researchers have been making continuous efforts to establish the relationship among linguistic features, laws of Mathematics and Physics. The keyword extraction methods can be categorized under three broad categories: linguistics, machine learning, and statistical methods. In linguistics methods, the main focus is to observe syntactic, semantic aspects of words, morphological features, and linguistic relationships among words like synonym, hypernym, hyponym, etc. In machine learning methods, first, the learning algorithm is trained using a tagged training set and then its performance is evaluated through a tagged test set. The weighting of words in a text plays an important role in information retrieval. Initially, weighting schemes are defined in the term of the frequency of words in a text. Term frequency (tf) and inverse document frequency (idf) were the weighting schemes firstly used for the weighting of words.

Luhn (1958) introduced an early idea of the importance of words in a text by analyzing Zip’s analysis of the word’s frequency in a text. Since then, a number of approaches for measuring the importance of words in a text appeared in the literature. The details of weighting schemes in information retrieval can be found in the books of Dominich (2008) and Manning and Schütze (1999). Earlier methods were based on the frequency of words in a text, later on, many other aspects were considered by different researchers. Turney (2000) performed a supervised learning approach for keyword extraction. The standard deviation of the distance between successive occurrences of a word is considered as a parameter to extract keywords by Ortuño et al. (2002). In their work, they found that the relevant words have greater standard deviation as their spatial distribution is more inhomogeneous in comparison to irrelevant words. Hulth (2003) suggested a keyword extraction method based on linguistics knowledge like syntactic features. A study on the fractal structure of the text can be found in Andres et al. (2010) and Andres et al. (2011). Yang et al. (2013) used Shannon’s entropy difference between the intrinsic and extrinsic modes for determining the relevance of words in a text. Najafi and Darooneh (2015) used the concept of fractal dimension for keyword extraction. Jamaati and Namaati and Mehri (2018) used Tsallis entropy for ranking of the relevance of terms taking advantage of the spatial correlation length. Mehri et al. (2019) used distorted entropy for word ranking.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Comparative Evaluation of Different Keyword Extraction Techniques

Abstract

1. Introduction

Complete Article List