Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Unsupervised Keyword Extraction Methods Based on a Word Graph Network

Hongbin Wang, Jingzhen Ye, Zhengtao Yu, Jian Wang, Cunli Mao

Source Title: International Journal of Ambient Computing and Intelligence (IJACI) 11(2)

DOI: 10.4018/IJACI.2020040104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Supervised keyword extraction methods usually require a large human-annotated corpus to train the model. Expensive manual labeling has made unsupervised technology using word graph networks attractive. Traditional word graph networks simply consider the co-occurrence relationship of words or the topological structure of the network, ignoring the influence of semantic relations between words on keyword extraction. To solve these problems, an unsupervised keyword extraction method based on word graph networks for both Chinese and English is proposed. This method uses word embedding to applying a “word attraction score” to semantic relevance between words in a document. Combination of the bias weight of the node and a weighted PageRank algorithm is used to compute the final scores of words. The experimental results demonstrate that the method is more effective than the traditional methods.

Article Preview

Top

Introduction

The term “keyword” refers to a key word or phrase that is directly extracted from a title or the content of a document. Because of the attributes of simplicity and objectivity, keywords are a concise representation of a text and an effective reflection of theme. Currently, keywords provide the foundation for many natural language processing sub-fields, such as text classification, clustering, information extraction, recommendation system and automatic text summarization (Chen, Jiang, & Bian, 2014). Before the advent of automatic keyword extraction technology, task is performed manually, which is both inefficient and time-consuming. Furthermore, in the setting of large corpora simultaneously processed by multiple people, the ways the keywords are extracted vary from person to person, thus leading to the enlargement of the labeling and the deterioration of text description accuracy. Many reports propose keyword extraction algorithms by using supervised or unsupervised learning (Hasan & Ng, 2014).

In supervised learning, keyword extraction is considered a two-category problem. Candidate keywords are classified as either true (i.e., keyword) or false (i.e., non-keyword) (Hulth, 2003). This method uses manual or tagged keyword text as training data, and use classification algorithms, such as decision tree, support vector machine, and logistic regression, to extract keywords (Jiang, Hu, & Li, 2009). Although supervised methods often outperform unsupervised methods, a large amount of manually annotated corpora are required for supervised methods. Research into improving unsupervised learning methods is therefore attractive due to the unnecessity of manual annotation (Florescu & Caragea, 2017).

Keyword extraction algorithms can be divided into three categories in unsupervised learning: keyword extraction based on (1) statistical features, (2) topic models, and (3) graph models.

Statistically based methods do not require prior labeling of training corpora, the document keywords are usually extracted by the frequency of the words in the document, the length of the words, and positional features. The drawback of this approach is that some specialized documents such as biology and medicine journal articles, keywords may appear only once. In this case, the statistical model considers these words to be less important and therefore ignores them (Chen & Lin, 2010).

In the keyword extraction methods based on the topic model, the document analysis is viewed as a mixture of topics because the probability that words appear under each topic is different. Therefore, once the document topics are determined, representative words of each topic represent the core content of the document, which can be considered the keywords. TopicRank (Bougouin, Boudin, & Daille, 2013) and hierarchical clustering are used to classify candidate words and then document keywords can be obtained through algorithms such as PageRank. However, a mixture of topics leads to a method that performs well for long documents, but is difficult to extend to short documents.

Keyword extraction based on a graphic model functions by constructing a semantically weighted network of the document, and then important nodes are found in the network as keywords through the analysis of a word graph network. This method considers the relationship between words (e.g., co-occurrence frequency) and other statistical features that lead to better extraction results (Chang, Zhang, Wang, Wan, & Xiao, 2018). TextRank demonstrates the scalability and accuracy of the word graph network (Mihalcea & Tarau, 2004).

In this research, the word graph network is firstly being built, then the word attraction score is used to capture the semantic features between words by the combination of Word2vec and the Dice coefficient (Dice, 1945). Then bias weight information of a node is added into the PageRank algorithm (Brin & Page, 1998). Top-K document keywords are extracted after multiple iterations of sorting by PageRank.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024)

Volume 14: 1 Issue (2023)

Volume 13: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 2 Issues (2016)

Volume 6: 2 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Unsupervised Keyword Extraction Methods Based on a Word Graph Network

Abstract

Introduction

Complete Article List