Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Named Entity Based Ranking with Term Proximity for XML Retrieval

Abubakar Roko, Shyamala Doraisamy, Azreen Azman, Azrul Hazri Jantan

Source Title: International Journal of Information Retrieval Research (IJIRR) 8(2)

DOI: 10.4018/IJIRR.2018040104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this article, an indexing scheme that includes the named entity category for each indexed term is proposed. Based on this, two methods are proposed, one to infer the semantics of an XML element based on its data content, called the confidence value of the element, and the second method computes the proximity scores of the query terms. The confidence value of an element is obtained based on the probability of a named entity category in the data content of the underlying XML element. The proximity score of the query terms measures the proximity and ordering of the query term within an XML element. The article then shows how a ranking function uses the confidence value of an XML element and proximity score to mitigate the impact of higher frequency terms and compute the relevance between a keyword query and an XML fragment. Finally, a keyword search system is introduced and experiments show that the proposed system outperforms existing approaches in terms of search quality and achieve a higher efficiency.

Article Preview

Top

1. Introduction

Extensible Mark-up Language (XML) in recent years is one of the most widely used mark-up languages for information representation and exchange over the Internet. Currently, many documents are now represented and stored as XML documents on the web. Thus, the need for effective and user-friendly search systems for XML document search cannot be over emphasised. There are two fundamental methods for searching XML documents: using structured queries or keyword search. Structured queries are queries compose using query languages such as XQuery and XPath. Although these queries are effective, they in general return a set of results meaning that the results are not in ranked order (Cohen et al., 2003; Kim et al., 2009). Keyword queries are generally more user-friendly since users need not to learn a query language and/or remember the schema of the XML data in order to compose the queries. However, keyword queries are inherently ambiguous and it is impossible for users to clearly specify their intentions, which causes keyword search engines to inevitably generate large number of results and hence the needs for these systems to return relevant results earlier in the list of results. This implies that keyword search systems with relevant oriented ranking functions are needed.

Several keyword search systems for XML retrieval with different result ranking capabilities are proposed among them includes query structuring systems (Hummel et al., 2011; Li et al., 2010; Petkova et al., 2009; Li et al., 2009). A query structuring system converts a user keyword query into a set of structured queries and selects the best structured query or queries that match the given input query. However, existing query structuring systems either do not consider relevance ranking or use traditional text IR relevance ranking techniques that favour XML fragment higher term frequencies. For example, Hummel et al. 2011 has no ranking function while Li et al. 2009) has a ranking scheme that computes the relevance between a keyword query and an XML fragment based on the tightness of the XML elements and tf-idf score which favours elements with higher term frequencies. The scheme does not put the semantics of XML documents into account. Therefore, the scheme returns misleading results because it is powerless in recognizing irrelevant results when they are with high term frequencies, indicating a performance limitation.

To address these problems, firstly, a ranking function called NEBTOP is proposed. Specifically, the concept of confidence value is first proposed. Confidence value presents the weight of an XML node with respect to a query keyword. It is computed based on the data value of the node in question. To compute confidence value, each keyword in the data value of a node is converted into its corresponding named entity category (NEC). The NEC of a keyword is either a Person or Organization or Others. The confidence value of a leaf node with respect to a NEC is the probability of that NEC in a node. Then, a function that computes query terms proximity scores which rewards a node higher if it contains the query terms in the order they appeared in the query is proposed. The confidence value and term proximity score are combined and used by NEBTOP to normalise the impact of higher term frequencies in the existing ranking scheme. The existing approaches lack this boost score and therefore are powerless in recognizing irrelevant results when they are with high term frequency. Secondly an XML keyword Query Structuring System (XKQSS) is proposed which uses NEBTOP as its ranking function in order to improve retrieval performance.

The contributions of this paper can be summarised as follows:

•
An index scheme which stores the named entity category of each indexed term, in addition to the usual term frequencies and term position, is proposed.
•
A field base ranking function (NEBTOP) is proposed which allows term proximity score and nodes’ confidence value to be incorporated into BM25F scoring formula. Specifically, the concept of confidence value of a node is first introduced, which is the probability of a named entity category in the data content of a node. Then, the classical BM25TP is extended and a new term proximity score for each query term is proposed. This score considers how the query terms appear in the underlying node.
•
NEBTOP is included in the XKQSS search system and an experiment is conducted to compare the effectiveness of the proposed enhanced XKQSS system with some state-of-the-art systems.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Named Entity Based Ranking with Term Proximity for XML Retrieval

Abstract

1. Introduction

Complete Article List