Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Ontology Based Feature Extraction From Text Documents

Abirami A.M, Askarunisa A., Shiva Shankari R A, Revathy R.

Source Title: Applications of Security, Mobile, Analytic, and Cloud (SMAC) Technologies for Effective Information Processing and Management

DOI: 10.4018/978-1-5225-4044-1.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This article describes how semantic annotation is the most important need for the categorization of labeled or unlabeled textual documents. Accuracy of document categorization can be greatly improved if documents are indexed or modeled using the semantics rather than the traditional term-frequency model. This annotation has its own challenges like synonymy and polysemy in the document categorization problem. The model proposes to build domain ontology for the textual content so that the problems like synonymy and polysemy in text analysis are resolved to greater extent. Latent Dirichlet Allocation (LDA), the topic modeling technique has been used for feature extraction from the documents. Using the domain knowledge on the concept and the features grouped by LDA, the domain ontology is built in the hierarchical fashion. Empirical results show that LDA is the better feature extraction technique for text documents than TF or TF-IDF indexing technique. Also, the proposed model shows improvement in the accuracy of document categorization when domain ontology built using LDA has been used for document indexing.

Chapter Preview

Top

Introduction

Necessity of annotating the text documents has become increased for analyzing the large amount of documents existing in the World Wide Web. But most of the documents are in unstructured format and the machines cannot simply process them. People who buy/sell the products give their comments, feedback, additional features needed, etc., in the form of text which is mostly unstructured. It becomes necessary to categorize these voluminous texts to make business intelligent solutions. The huge data available in the internet has to be modeled, analyzed and then the decision has to be taken. Retrieving the information from the unstructured text is the difficult task. Document annotation with added semantics enables the information or knowledge extraction from the repository in an intelligent way.

Feature extraction is the process which starts from an initial set of measured data and builds features intended to be informative and non-redundant. It involves reducing the amount of resources required to represent a large set of data. Many algorithms are used for identifying the features from the textual data that requires grouping or classifying the entities based on their similar property.

Some of the problems faced with feature extraction by traditional methods are: (i) existing techniques aren’t compatible with the current Web size and growth rate and hence automated techniques are essential if practical and scalable solutions are to be obtained (ii) absence of semantic relations between concepts in feature search processes (iii) imperfections in classifying the feature reviews into more degrees of polarity terms and (iv) misinterpretation of textual features due to lack of prior knowledge.

Ontology is a set of concepts and categories in a subject area or domain that shows their properties and the relations between them. Domain specific Ontology represents the particular meanings of terms as they apply to that domain. The semantic web technologies can be used to model the textual data to represent domain vocabularies and their relationships through Ontologies, RDF, etc. The analysis has to be done in such a way that the context has to be matched both between the writer and the reader. All these challenges can be well handled by representing the different vocabularies for the domain, and their relationship between the concepts. Ontology-based information extraction is the use of ontologies and their specifications to “drive” or inform the information extraction process. The terms and concepts in the source Ontology form the basis for term matching when tagging text documents. Difficulties in feature extraction problems can be overcome if the text document can be modeled using the Ontology representation along with the use of topic modeling techniques. The objective of this proposed work is set to build domain Ontology for the set of documents with relevant features extracted from the text documents.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Ontology Based Feature Extraction From Text Documents

Abstract

Introduction

Complete Chapter List