A Novel Approach for Ontology-Based Dimensionality Reduction for Web Text Document Classification

A Novel Approach for Ontology-Based Dimensionality Reduction for Web Text Document Classification

Mohamed K. Elhadad (Military Technical College, Cairo, Egypt), Khaled M. Badran (Military Technical College, Cairo, Egypt) and Gouda I. Salama (Military Technical College, Cairo, Egypt)
Copyright: © 2017 |Pages: 15
DOI: 10.4018/IJSI.2017100104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting method. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach using several experiments. The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.
Article Preview

In this section, we briefly review some background research including dimensionality reduction applied to document datasets, some previous attempts to apply semantic knowledge to enhance the classification accuracy.

In (Kaur, 2016) (Xie, 2016) (Said, 2007), A full Survey and a comparative study between Dimensionality Reduction Techniques for the Classification of text documents been introduced. It concentrates on the filter approach to achieve dimensionality reduction (DR), and proposed for DR technique to improved classification accuracy and a saving in the feature set size.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing