Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

A Novel Tagging Augmented LDA Model for Clustering

Yi Zhao, Yu Qiao, Keqing He

Source Title: International Journal of Web Services Research (IJWSR) 16(3)

DOI: 10.4018/IJWSR.2019070104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.

Article Preview

Top

Introduction

The explosive growth of information on the Web has resulted in a sharp increase in both the type and quantity of data which has greatly limited the accuracy and efficiency of data mining and knowledge discovery. Clustering is a major approach to address this challenge. The goal of clustering is to divide the acquired data into several classes according to given principles. Data in the same class should have some general features in common regarding the concepts of classifying attributes, to overcome the disadvantages produced by the centralized store of data and to improve the working efficiency of a database. Automatically clustering data into semantic groups promises improved knowledge sharing and inquiry. In this area, it is common for data to be clustered and retrieved by utilizing the information in tags. For traditional clustering methods, only the positive influence of tags is considered (Tian, He, Wang, Sun, & Xu, 2015; Chae, Park, Park, Yeo, & Shi, 2016). However, when a document has many tags, the document may belong to several different topics. If all such tags were used for clustering, the performance will be poor.

Therefore, despite their effectiveness, we argue that existing approaches for clustering methods using a tagging augmented model suffer from some limitations:

•
Lack of consideration of noisy tags: Each tag has its own semantics and context, and more importantly, strong relationships exist between tags and knowledge. However, existing models have largely ignored the fact that noisy tags cannot sufficiently represent the knowledge;
•
Missing relevant tags: Tagging augmented clustering methods usually consider only user-generated tags. Other words extracted from the document may represent the document better instead of these.

To address the above limitations in clustering, we propose a new solution, advanced by LDA, named Highly Relevant Tags-Latent Dirichlet Allocation (HRT-LDA). Specifically, we use the tag filtering and appending strategies provided by Word2vec, TF-IDF, semantic computing and degree of representation (DR) as preprocessing methods. We then design a new tag list for documents to be used in LDA topic training. Moreover, to incorporate knowledge sharing and inquiry, we discuss the application of the HRT-LDA model in knowledge recommendation and other scenarios.

To summarize, the main contributions of this work are:

•
A novel tagging augmented LDA model is presented by considering user-generated tags from both positive and negative effects on the clustering results;
•
The appending of tags by transfer learning can capture more important feature tags of documents to improve the performance of clustering and to some extent can alleviate the cold start problem (when there are no, or only a few original tags);
•
Extensive experiments on real-world datasets that show that our method outperforms several existing tagging augmented methods.

The remainder of this paper is organized as follows: We begin by deliberating the existing works in this area. Then we present the HRT-LDA approach. After that we describe the performance when comparing the HRT-LDA approach with existing work. The conclusions of this study and our future work are summarized at the end of this paper.

Top

With the rapid development of big data and cloud computing, data mining has attracted significant attention recently. Moreover, the publication of a large number of research papers demonstrate that clustering methods (Rego et al., 2013; Xu, Yang, & Ma, 2011; Schenk & Lungu, 2013) are effective approaches for enhancing the performance of knowledge integration. In this article, we focus on clustering methods that use a tagging augmented model.

1.
Tagging augmented clustering methods which do not consider the negative effect of noisy tags.

Complete Article List

Search this Journal:

Reset

Volume 21: 1 Issue (2024)

Volume 20: 1 Issue (2023)

Volume 19: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 18: 4 Issues (2021)

Volume 17: 4 Issues (2020)

Volume 16: 4 Issues (2019)

Volume 15: 4 Issues (2018)

Volume 14: 4 Issues (2017)

Volume 13: 4 Issues (2016)

Volume 12: 4 Issues (2015)

Volume 11: 4 Issues (2014)

Volume 10: 4 Issues (2013)

Volume 9: 4 Issues (2012)

Volume 8: 4 Issues (2011)

Volume 7: 4 Issues (2010)

Volume 6: 4 Issues (2009)

Volume 5: 4 Issues (2008)

Volume 4: 4 Issues (2007)

Volume 3: 4 Issues (2006)

Volume 2: 4 Issues (2005)

Volume 1: 4 Issues (2004)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Novel Tagging Augmented LDA Model for Clustering

Abstract

Introduction

Complete Article List

A Novel Tagging Augmented LDA Model for Clustering

Abstract

Introduction

Related Work

Complete Article List