Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

An Automatic User Interest Mining Technique for Retrieving Quality Data

Shilpa Sethi, Ashutosh Dixit

Source Title: International Journal of Business Analytics (IJBAN) 4(2)

DOI: 10.4018/IJBAN.2017040104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Search engines acts as an intermediate between the user and web. It takes the user query as input and retrieves the pages based on query terms from its database, which is in advance populated from World Wide Web. It then applies some ranking algorithm to sort the retrieved pages and presents the results back to the user in the form of millions of web pages. But most of pages in the result are not useful to the user. This problem arises because the search engine retrieves the results based on query keywords only and no attention is paid in incorporating the user interest during the ranking process. Due to the lack of automatic mechanism for tracking user browsing patterns, user seldom gets the relevant results in the top ten links. So, in order to cater the need of individual user, an automatic user interest mining technique for retrieving quality data is being proposed here. The mechanism provides the satisfactory results to the user as each user interest is maintained separately without any hassle at the user end.

Article Preview

Top

1. Introduction

WWW is a large repository of interconnected web documents that contain text, images, multimedia and many other items of information referred to as information resources (Sethi & Dixit, 2015). Statistics of authoritative web sites show that there are at least 4.78 billion web pages in indexed web as recorded on 27 March, 2016 and many more are lying in hidden web. The collection is exponentially increasing at a rate of 25% per year. People use information retrieval tool such as search engine to get information from such a huge collection of documents.

A basic search engine has five main components namely: User interface, crawler also known as spider, indexing module, query processing module and ranking module as shown in Figure 1 in the Appendix (Mudgil, Sharma, & Gupta 2013).

When the user submits its information need in the form of set of keywords referred to as query at user interface, search engine takes few seconds to retrieve the web pages and present back the result list to the user. The less retrieval time is possible because it is retrieving the documents from its own database which has been maintained locally much before the actual requirement arises by crawling and indexing module. The crawler is the program that traverses the web at specified interval and downloads the web documents from different web servers (Sethi & Dixit 2015). Further these documents are parsed to extract text, hyperlinks and stored separately in different files. The extracted hyperlinks are again used by crawler to download the web pages and text is stored in repository. The indexing module takes the text from repository and constructs the inverted index of terms belonging to a document (Hao, Guolian, & Lizhu., 2013, Bilimoria. & Patel, 2015, Kalra,2012). The index is basically the list of terms where each term is linked with multiple postings. The no. of postings is equal to the no. of documents containing the term. The document posting stores doc ID, the no. of incoming links, number of outgoing links from the document, depth and frequency of term in the document. Further this list is attached to a third list containing the exact information about the position of every occurrence of term in the document. The query processor executes the user query on this inverted index and retrieves the matched documents.

These set of documents are then sorted by ranking module based upon content and link structure mining mechanisms. The sorted list is at last present back to the user in response to its query. In short, the information retrieval is purely based on keyword matching. But users of these search engines may have varying skills and internet for retrieving information from a novice user to computer specialist. So, the keywords entered by user are sometimes not enough to clearly reflect its information need or ambiguous to infer distinct need. Moreover, the different users use the same word to get different information. For example, for the query JAVA, some users may be interested in documents related to programming language Java whereas other may be looking for Java coffee beans. But the traditional search engines provide the same ranked list to the entire users regardless of; they are interested in programming language or coffee. Hence, it becomes difficult for a novice user to get relevant information.

In order to predict such information needs precisely, web usages mining can be consider as a solution. It can be defined as the collection of techniques that analyze the user access pattern with an aim to infer its searching need. Many algorithms based on user explicit feedback form, Collaborative filtering (Ekstrand, Riedi & Konstan (2010), click history (Leung,Ng & Lee (2008)), session usages (Duhan & Sharma(2010)) etc. had been proposed in the past. In order to mine the user interest, all the above mentioned approaches requires the involvement of user to some extent. This paper proposed a novel hassle free user interest learning mechanism which dynamically evaluates the user interest factor in different domains that can be further used in ranking process to sort the results as per user expectations.

The rest of the paper is structured as follows: section 2 discusses the related work done in this area. Sections 3 describe the proposed user interest mining system in detail with examples illustration. In section 4, analysis of sample query set is conducted to verify that user profile information can be utilized for the retrieval of relevant pages from search engine database. Section 5 concludes the paper.

Complete Article List

Search this Journal:

Reset

Volume 11: 1 Issue (2024)

Volume 10: 1 Issue (2023)

Volume 9: 6 Issues (2022): 4 Released, 2 Forthcoming

Volume 8: 4 Issues (2021)

Volume 7: 4 Issues (2020)

Volume 6: 4 Issues (2019)

Volume 5: 4 Issues (2018)

Volume 4: 4 Issues (2017)

Volume 3: 4 Issues (2016)

Volume 2: 4 Issues (2015)

Volume 1: 4 Issues (2014)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

An Automatic User Interest Mining Technique for Retrieving Quality Data

Abstract

1. Introduction

Complete Article List