Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Semantic Document Networks to Support Concept Retrieval

Simon Boese, Torsten Reiners, Lincoln C. Wood

Source Title: Encyclopedia of Business Analytics and Optimization

DOI: 10.4018/978-1-4666-5202-6.ch192

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

There are many unstructured documents created in many disciplines which need to be (pre-) processed in one way or another for further integration and use in IT systems. The predominance of the Internet and large corporate databases implies that there are large volumes of documents that need to be analysed and searched to retrieve information; particularly within the fields of machine translation, text analysis, semantic mining, information extraction and retrieval. We explicate a framework based on concept-based indexing that supports the analysis, storage, and retrieval of documents. Natural-language reduction is used to calculate semantic cores for concept-based indexing of stored concepts found within documents. The processed documents are stored within a semantic network enabling effective analysis of core concepts within documents and rapid retrieval of specific ideas from multiple documents based on provided concepts

Chapter Preview

Top

Introduction

This chapter focuses on a framework to support advanced document storage and fast queries to retrieve documents based on concept-focused searches. These searches favour ‘semantic’ searches which evaluate and use the meanings of words and phrases, rather than ‘key-word’ searches. The framework rests on three stages: pre-processing (semantic analysis influences the storage quality within a semantic database), conceptualization (extraction of key concepts to establish document networks), and storage within a semantic database, facilitating advanced future retrieval. The objective is to decompose documents and extract all relevant information about structure and content to allow comprehensive storage in a semantic document network; including the interpretation according to domains, contexts, languages, or readers. For example, the word ‘trunk’ may refer to a storage area (in the context of motor vehicles), a clothes storage box (in the context of travelling), or an elephant’s appendage (in the context of a safari); see Figure 1. The arrows represent parameters associated with relations. There can be multiple meanings for the related words and it is only the clustering of words that provides the important context which provides readers with meaning; e.g., Safari is also the name of an Internet browser.

Figure 1.

Evaluation of the meaning of 'trunk' based on the context. This supports semantic-based retrieval of documents rather than merely keyword-based retrieval [Source: Boese, Reiners, and Wood (2012, p. 5)].

A brief introduction to conceptualization and the semantic document network provides an overview of how information can be stored in an interlinked network. Using a short sample, we demonstrate the calculation of the semantic core using concept-based indexing and how the concepts are embedded within the existing semantic document network.

Top

Background

Organizations are facing increasingly significant document management challenges as they seek to leverage vast volumes of internally-focused documents (e.g., emails or internal reports) or provide document-based services to others. The challenge is to design document management systems that support the storage and retrieval of unstructured electronic documents; in contrast, there are well-established document management methods for structured documents, such as those used by libraries. Limited meta-information (particularly key terms) has historically been used to support simple indexing and classification procedures. However, the rise of user-generated content within Web 2.0, and the on-going accumulation of document digitalization have led to the challenge to maintain, let alone increase, the retrieval quality. Improved search engine capabilities enable users to consider synonyms, stem forms, and even translations (He & Wang, 2009). However, these elements share the commonality of requiring a search request that is based on words within the document, while ignoring the meaning and context that these words occur in – they ignore the semantic meaning behind the text. Semantic analysis can support the search through the determination of the key concepts and scenarios that may be associated with a term; e.g., the word ‘trunk’ may be used with a different meaning in documents about car repair, travel accessories, or in safari reports. As the Web progresses and evolves, we anticipate that computers will continue to process information on increasingly higher levels, and will soon enable search and retrieval of documents based on the meaning of words, rather than just the occurrence of words. The underlying systems that support this process would also enable other applications for handling documents, enabling software agents to extract individualised information from databases, grade unstructured exams with minimal instructor setup, summarise correspondences or articles, and translate documents effectively. In all of these cases, the ability to understand natural, unstructured language is crucial to ensure the robustness and reliability of the results.

Key Terms in this Chapter

Semantic Document Network: A network that contains the semantic representation of content of the document but not the document textual content. It is the intersection between the content of the documents and connects the nodes, representing the overlap of semantic content of documents.

Concept Retrieval: The ability to query a document and extract particular segments of text that match concepts or ideas provided by a user.

Semantic Network: nodes, encapsulating data and information, are connected by edges which include information about how these nodes are related to one another.

Concept: one or multiple words associated with a category that was generated by the abstraction of common characteristics from a range of particular ideas, while removing the uncommon characteristics. The remaining common characteristic is that which is similar to all of the different individuals and represents the meanings, or sense, of the ideas.

Text Analysis: the process of deriving meaningful information from the data and ideas expressed within the document. It includes meta-information, structural information, and content information.

Semantic Core: The document-specific component of the semantic network that contains the ideas, concepts, that best represents the meaning of the document, rather than the best-matching words.

Concept-Based Indexing (CBI): is a method for indexing that differs from text-based indexing (which uses keywords or headings); CBI instead uses descriptions, ideas, and concepts to index documents.

Semantic Analysis: is the elicitation of knowledge from documents, accounting for the context and understanding. The units that are extracted are arranged and grouped within meaningful categories.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference