Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Semantic Models in Information Retrieval

Edmond Lassalle, Emmanuel Lassalle

Source Title: Next Generation Search Engines: Advanced Models for Information Retrieval

DOI: 10.4018/978-1-4666-0330-1.ch007

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Robertson and Spärck Jones pioneered experimental probabilistic models (Binary Independence Model) with both a typology generalizing the Boolean model, a frequency counting to calculate elementary weightings, and their combination into a global probabilistic estimation. However, this model did not consider indexing terms dependencies. An extension to mixture models (e.g., using a 2-Poisson law) made it possible to take into account these dependencies from a macroscopic point of view (BM25), as well as a shallow linguistic processing of co-references. New approaches (language models, for example “bag of words” models, probabilistic dependencies between requests and documents, and consequently Bayesian inference using Dirichlet prior conjugate) furnished new solutions for documents structuring (categorization) and for index smoothing. Presently, in these probabilistic models the main issues have been addressed from a formal point of view only. Thus, linguistic properties are neglected in the indexing language. The authors examine how a linguistic and semantic modeling can be integrated in indexing languages and set up a hybrid model that makes it possible to deal with different information retrieval problems in a unified way.

Chapter Preview

Top

Introduction: Using Semantics In Information Retrieval

Several tasks in IR are based on the common principle of content matching. This is not only apparent with search engines comparing a query with various documents of its database, but also in other major tasks such as document categorization (or clustering) and question/answering (QA). For example, in document categorization the content matching is operated between a document and the descriptive sections of a category system, and the document is then assigned to the best matched category. Another example is making an abridgment of a document: this can be done by splitting the text in sentences and comparing them to the entire text, keeping a few of those that produce the best matches. However, the complexities of these several tasks are different:

•
Documents categorization is simpler to achieve than clustering.
•
Documents retrieval is easier than question/answering (QA).
•
Within the QA task, simplest systems deal with factual information (the height of a monument, the capital of a country) whereas most complex state of the art systems try to answer the questions of why or how.
•
Extracting the answer in QA systems requires a filtering mechanism and selecting sentences of the documents so as to make an automatic abridgment. On the other hand, the difficulty of QA lies in mapping the sentence and the question opposed to mapping the sentence and the document in an automatic abridgment. Knowing that a question provides less information than a document, we can imagine the difficulty.

These different tasks are processed by specific implementations that are not reusable from one task to another. If we want to unify the processes for all those tasks in a reusable mechanism, we must define a generic model that allows a wide coverage of all the tasks. The difference in complexity between two similar tasks lies in the fact that one requires a deeper “semantic analysis” than the other, as we can see below:

•
If we compare clustering to categorization, the second has the outset of a precomputed classification nomenclature and predefined comparison criteria for each section. Categorizing a document consists in extracting from the document elements allowing the comparison with those aforementioned criteria. On the other hand, clustering has to build a classification nomenclature beforehand and calculate for each section the characteristics of comparison. This is done by analyzing a referential corpus of documents that corresponds to a prior semantic processing.
•
While QA task has to supply one or several concise answers to a composed question, “search engine” task returns a list of documents to a query, in responsibility for the user to view each document and estimate its relevance to her/his query. A QA system may be seen as a search engine coupled with a post-processing system: the question is initially treated as a query by the search engine component; and the contents of documents listed as reply are then parsed to extract salient elements that may be possible answers to the question. The complexity in this case is due to a subsequent a posteriori semantic processing.

If we were able to address the highest semantic analysis we would obtain a generic model and then we would be able to deal with IR problems in a unified way. The difficulty of semantic approaches is due to the high cost of the manual construction of large knowledge databases. Sometimes, for simple semantic tasks, resorting to manual analyses (such as editorial functions) or semi-automatic processing (such as statistical analysis of queries logs and manual revision) will improve the quality of a search engine. Thus, a fully automatic semantic processing has a real interest only if we manage to build automatically large knowledge databases in a reasonable cost.

Exploiting the collaborative knowledge databases available on the Internet is a rather good alternative as far as it is validated by experimental implementations. However, the problem remains when passing to full-scale applications: harmonizing or completing collaborative databases becomes essential to cover the needs of real applications. Ad hoc methods have been developed to merge databases with heterogeneous formats and contents. Yet, because of the explosion of such resources, automatic methods are needed, which leads inevitably to a classic machine learning problem.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Semantic Models in Information Retrieval

Abstract

Introduction: Using Semantics In Information Retrieval

Complete Chapter List