Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

NLP for Search

Christian F. Hempelmann

Source Title: Applied Natural Language Processing: Identification, Investigation and Resolution

DOI: 10.4018/978-1-60960-741-8.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter presents an account of key NLP issues in search, sketches current solutions, and then outlines in detail an approach for deep-meaning representation, ontological semantic technology (OST), for a specific, complex NLP application: a meaning-based search engine. The aim is to provide a general overview on NLP and search, ignoring non-NLP issues and solutions, and to show how OST, as an example of a semantic approach, is implemented for search. OST parses natural language text and transposes it into a representation of its meaning, structured around events and their participants as mentioned in the text and as known from the OST resources. Queries can then be matched to this meaning representation in anticipation of any of the permutations in which they can surface in text. These permutations centrally include overspecification (e.g., not listing all synonyms, which non-semantic search engines require their users to do) and, more importantly, underspecification (as language does in principle). For the latter case, ambiguity can only be reduced by giving the search engine what humans use for disambiguation, namely knowledge of the world as represented in an ontology.

Chapter Preview

Top

Introduction

This chapter could have been written as an intro to applying standard Information Retrieval (IR) techniques to internet search as these techniques are the basis for most approaches to search today (“have method, looking for application”). In a nutshell, IR techniques operate by identifying desired keywords or their clusters in a collection of texts and retrieving document, for example www pages, that match the keywords. But such introductions have been done elsewhere and better than this author could. This chapter could also have been written as a theoretical comparison of IR and Information Extraction (IE) techniques, based on the tenets of research in Artificial Intelligence (AI), which is where NLP contributions to search seem to be headed, as I will argue. To put it simply, IE techniques aim to ‘understand’ text to varying degrees and extract the relevant small bits in relation to the information needs of users (for a generic system, see Hobbs 1993). But such introductions have also been done elsewhere and better than this author could. Instead this chapter is going to sketch these issues in its introduction, before focusing on one application in this new direction, based on the experience of its author, namely in building and improving a search engine that is based on representation of meaning with the help of linguistic AI and facilitating IE-style search. As such, this chapter will largely ignore non-linguistic problems and solutions in search.

As for the majority of areas in NLP, text search is largely dominated by statistical approaches. The basic issue for any ANLP is that the complexity, some call it mess, of natural language needs to be made palatable to the computer, to discover in or impose on the unstructured mess of language some formal structure. This formal representation of language, and hopefully some aspect of its meaning gleaned from its surface structure, can then be used by the computer with any formal algorithm, hopefully suggested by a theory for a given application, but often just the favored algorithm of a research group in search for new applications.

Such non-linguistic approaches choose to ignore that language is language and operate under the assumption that its surface manifestation, in particular co-occurrence in its surface representation, are a sufficient window on the underlying meaning. After all, meaning is what all approaches are after, because it is the level at which humans interface with each other through language, and the meaning of language does indeed correlate with its surface manifestation, the text, to a large degree. But the degree to which meaning doesn’t surface repeatedly and regularly in natural language text is inaccessible to statistical methods and responsible for there being an ultimate limit to what these methods can achieve. Furthermore, “language events” are very sparse, which can be gleaned from the famous observation that in a large corpus, trigrams are 85% unique (which can be alleviated to some degree through smoothing and extraction). In other words, of the sequences of three words in a text, the large majority does not recur.

Another approach, actually the rationale of IR in contrast to IE, is to assume that ultimately humans will be the consumers of the application’s output. Under this assumption, human searchers are sufficiently served by documents to his or her query that are deemed relevant by the computer because of overlap to the query and other relatively easily formalized ranking factors. The humans can then extract the information from those documents that fill their information needs on their own, that is, the machine doesn’t have to do semantics, since the human is at the end of the processing chain. In contrast to this, the assumption in the main part of this chapter is that giving the machine semantics to use in matching and ranking will improve its performance and decrease human work load, both common main motivations in automation. In sum, on the basis of the unit concept, the computer represents the meaning of documents and fills the information need of the human from a knowledge base, not a document base (cf. Spärck Jones 1990).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

NLP for Search

Abstract

Introduction

Complete Chapter List