Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Dynamic Data Retrieval Using Incremental Clustering and Indexing

Uma Priya D, Santhi Thilagam P

Source Title: International Journal of Information Retrieval Research (IJIRR) 10(3)

DOI: 10.4018/IJIRR.2020070105

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The evolution of the Internet and real-time applications has contributed to the growth of massive unstructured data which imposes the increased complexity of efficient retrieval of dynamic data. Extant research uses clustering methods and indexes to speed up the retrieval. However, the quality of clustering methods depends on data representation models where existing models suffer from dimensionality explosion and sparsity problems. As documents evolve, index reconstruction from scratch is expensive. In this work, compact vectors of documents generated by the Doc2Vec model are used to cluster the documents and the indexes are incrementally updated with less complexity using the diff method. The probabilistic ranking scheme BM25+ is used to improve the quality of retrieval for user queries. The experimental analysis demonstrates that the proposed system significantly improves the clustering performance and reduces retrieval time to obtain top-k results.

Article Preview

Top

Introduction

With the innovation in technology over the past two decades, the emergence of social network organization, adoption of hand-held computerized gadgets, the explosion in the usage of the Internet and computing services contributed to the tremendous growth of heterogeneous data of structured, semi-structured, and unstructured type, commonly known as Big Data. Consistently, 2.5 quintillion bytes of data are generated every day (EDBD Statistics, 2015) as emails, audios, videos, web pages, social media messages, and so forth, where 90% account for unstructured data. The growth in data contributes to the increased complexity of the efficient retrieval of these data. Available conventional methods are well suited for static data, but the above requirements demand a more efficient way of organizing and processing the dynamic unstructured text data.

In this big data era, querying the large data necessitates the organized storage where the incoming data (usually represented as vectors) are categorized based on the similarity of vectors. Thus, similar documents can be retrieved quickly for user queries instead of handling large data instantly. As documents evolve, the clustering algorithms should cope with the dynamic nature of data with minimum sacrifice to clustering quality. Several clustering algorithms are proposed with different data representation models (Ding and He, 2004; Campr and Jezek, 2015), similarity measures (Audhkhasi and Verma, 2007; Huang, 2008), and grouping techniques (Dhillon et al., 2004; Shindler et al., 2011; Cai et al., 2013). The data representation refers to the number of classes and the available patterns applicable to the clustering algorithm. Good representations capture a vast number of possible patterns. Hence, the quality of clustering algorithms is highly dependent on representation learning. To transform the data into more cluster-friendly in this big data era, representation learning models (Mikolov et al., 2013b; Pennington et al., 2014; Yang et al., 2016; Kim et al., 2017; Joshi et al., 2018; Ren et al., 2019) are used to generate the distributed representation of words. Good representations capture a vast number of possible patterns. Hence, the quality of clustering algorithms is highly dependent on representation learning. Traditional machine learning models always result in a locally optimum solution, whereas distributed representation learners are trained by many samples to learn the representation. To state the expressiveness, traditional machine learning models such as decision tree, support vector machine (SVM), etc., requires O(N) input samples to distinguish O(N) regions. In contrast, distributed representation learning models represent the O(2^k) region for the same samples (Bengio et al., 2013) (where k denote the count of non-zero elements in distributed representation).

While clustering intends for efficient organization of data to improve the retrieval performance, the complexity of the search operation in dynamic data is yet another challenge. Applying proper indexing methods shows the good impact on query processing by reducing the complexity of the search operation. Due to the unordered form of input, the mode of search is by its content, i.e., Keyword search. In practice, an inverted index is the most popular indexing method for keyword search on unstructured data. Considering the dynamic nature of the data, the indexing must be dynamic for efficient retrieval. Existing research works mainly concentrate on reducing the index build time and keyword query processing time. However, most of the current works focus on static data. On the other hand, this work differs in improving the accuracy of dynamic clustered data with less retrieval time.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Dynamic Data Retrieval Using Incremental Clustering and Indexing

Abstract

Introduction

Complete Article List