Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Self-Adaptive Ontology based Focused Crawler for Social Bookmarking Sites

Aamir Khan, Dilip Kumar Sharma

Source Title: International Journal of Information Retrieval Research (IJIRR) 7(2)

DOI: 10.4018/IJIRR.2017040104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

It is not possible for one person to explore or surf all the relevant websites pre-training to his/her topic. A user might not be able to get the results that he/she expects from the search engine but another user might have some knowledge about some website containing the information about the first user's topical query. Users share their information on a common sharing platform known as SBS (Social Bookmarking Sites). In SBS a user posts a question seeking some knowledge about a certain topic, and then the people who have some knowledge about any website related to the query topic post the URLs of the website. This paper presents a novel method to verify the authenticity and validity of the URL posted in the SBS. The performance of our method is further increased by using a dictionary based learning methodology that finds the contextually similar words that are added to the Ontology.

Article Preview

Top

Introduction

The internet has become an important part of our day-today life. Total internet users in 2014 were approximately 2.9¹ billion, and this number is increasing day by day. As the internet users have increased so have their requirements, and to cope up with this the size of World Wide Web (WWW) also increased accordingly. The number of websites in 1995 was a mere 65 million but in 2014 it exponentially rose up to 970 million in number. Despite having so many websites user browses only a small percentage of the websites are relevant to him/her due to a number of reasons. The reasons may include:

1.
Low recall of the search engine algorithm/system,
2.
Websites having some synonymous/similar names,
3.
Websites in different languages, and
4.
Non-indexed websites (either new or old). Since many of the websites are not indexed by any of the search engines and are not indexed in the deep web, they show no or barely link to the indexed websites or web pages that is why it is difficult to access them even if they contain information that is more relevant to the user than the indexed pages D. K. Sharma and A. K. Sharma (2011).

That is where Social Bookmarking Sites (SBS) come to our aid. SBS are centralized in nature and it allows users to store and share internet bookmarks. SBS also allow the user to annotate, add, edit, and share the web document’s bookmarks. Suppose a user posts some need of a particular website selling artefacts from mainland China, then other active users give the information about the first user’s need or provides the link to the website(s) selling the product.

Most traditional crawlers impose a heavy communication loads due to the fact that they use the improper ontology for query processing as stated in D. K. Sharma and A. K. Sharma (2009). Focused crawlers are different from traditional crawlers as they satisfy some specific predicates towards the crawl frontier and thereby analysing and maintaining the hyperlink exploration process. For example, a crawler’s main objective might be only to extract pages from the ‘.ac.in’ domain. The probability of the page being relevant to the user query topic is being calculated by the focused crawler before downloading the webpage, whereas a traditional or standard crawler downloads the webpage irrespective of their topic relevance. Focused crawlers extract the thematic words pertaining to the content with the help of either dictionary based or other methods.

The very first time the focused crawler structure was proposed by Chakrabarti et al. (1999). Focused crawler’s general architecture is as shown in Figure 1 instead of searching the complete web, only a specific domain is searched by the focused crawler. There may be many factors on which the search process of a focused crawler depends upon, but it is broadly classified into two categories. First, the area of interest of the user and Second, the predefined set of topics already present with the search engine. A focused crawler which is specifically a topic driven, i.e. it retrieves web documents or web pages belonging to a particular topic group are known as topic driven crawler.

Figure 1.

Generalized Framework of a focused crawler

The focused crawling process is dependent upon two counterparts: Classifier and Distiller Chakrabarti et al. (1999). A classifier is used to calculate the significance of the search topic and the document retrieved, i.e. this module of focused crawler is used to classify between the relevant document and non-relevant document. A distiller is used to search for the valuable access points by using fewer numbers of links that guide to a massive number of suitable documents, i.e. the relevant access point (path to reach form source to destination in least hops or in least time) from the complete web graph is explored.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Self-Adaptive Ontology based Focused Crawler for Social Bookmarking Sites

Abstract

Introduction

Complete Article List