Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

A Cognitive-Based Approach to Identify Topics in Text Using the Web as a Knowledge Source

Louis Massey, Wilson Wong

Source Title: Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances

DOI: 10.4018/978-1-60960-625-1.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter explores the problem of topic identification from text. It is first argued that the conventional representation of text as bag-of-words vectors will always have limited success in arriving at the underlying meaning of text until the more fundamental issues of feature independence in vector-space and ambiguity of natural language are addressed. Next, a groundbreaking approach to text representation and topic identification that deviates radically from current techniques used for document classification, text clustering, and concept discovery is proposed. This approach is inspired by human cognition, which allows ‘meaning’ to emerge naturally from the activation and decay of unstructured text information retrieved from the Web. This paradigm shift allows for the exploitation rather than avoidance of dependence between terms to derive meaning without the complexity introduced by conventional natural language processing techniques. Using the unstructured texts in Web pages as a source of knowledge alleviates the laborious handcrafting of formal knowledge bases and ontologies that are required by many existing techniques. Some initial experiments have been conducted, and the results are presented in this chapter to illustrate the power of this new approach.

Chapter Preview

Top

Introduction

It has become somewhat of a cliché to say that a large quantity of human knowledge is stored as unstructured electronic text. This cliché is nevertheless a true representation of the reality in corporations, governments and even in our everyday life. Indeed, we are plagued by an increasing dependence on an ever-growing body of information on the Web. Some of the common means to date for managing this information explosion include online directory and automated search engines, all of which rely heavily on the notion of topics. In this chapter, topics are keywords that represent and convey the themes or concepts addressed in a text document. In this regard, topics can be seen as lexical manifestations of the general meaning of documents.

The main issues that prevent the application of existing computational means to generate content-representative topics for managing information on a Web-scale are: (1) computational inefficiency; (2) knowledge acquisition and training data bottleneck; and (3) inherent challenges of processing natural language such as handling ambiguity and metaphor. Existing computational methods fill this semantic gap by exploiting knowledge handcrafted by human experts. Natural language processing for example depends on language and encyclopedic knowledge for syntactic and semantic processing, while supervised learning techniques rely on human guidance to classify documents. The problem of acquisition bottleneck in turn leads to major scalability and robustness issues. Ideally, one would like a computational method that can identify topics in a way that is not dependent on any form of human intervention. In this regard, the desirable properties of such systems are autonomy and adaptability.

In this chapter, we present a computational method that is void of any dependence on expert-crafted knowledge resources or training data. This cognition-inspired paradigm of generating topics takes the stream of words from a single document and determines the main themes addressed in that document based on overlapping activations and decay of unstructured lexical information. The lexical information is retrieved from the Web by querying Web search engines. This approach exploits the information embedded in the ordering of words but without traditional syntactic processing.

The chapter is organized as follows. Section 2 presents a case study that illustrates some of the problems with existing topic identification methods and with vector-representation of documents. In Section 3, we introduce the fundamentals of the proposed approach to represent text and to identify topics in documents. We then present and discuss the results obtained using a prototype computational model in Section 4. We conclude this chapter with an outlook to future work in Section 5.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

A Cognitive-Based Approach to Identify Topics in Text Using the Web as a Knowledge Source

Abstract

Introduction

Complete Chapter List