Article Preview
TopIn this section, we briefly review some background research including the handling of the process of extracting the feature vector from text documents, some of the previously applied techniques for web text document classification, and some previous attempts to apply semantic knowledge to enhance the classification accuracy.
In References (Rasane, 2016; Uma, 2016; Venkata Sailaja, 2016), a full review of the current trends for text documents classification, and classification algorithms are introduced and the techniques used in extracting feature vectors used in different mining tasks. Also in (Said, 2007), a comparative study between Dimensionality reduction (DR) techniques that allows users to make comprehensive choices among available techniques for enhancing automatic text categorization is conducted.
In reference (Davy, 2007), the PCA has been used as an efficient technique for dimensionality reduction for text document classification, the experimental results shows that using dimensionality reduction techniques significantly increases the performance results when using a KNN classification algorithm over two benchmark corpora (Subset of 20 Newsgroups and a Subset of Reuters-21578).it uses both Document Frequency performed Globally technique and Principal Components Analysis technique for dimensionality reduction. In both sets of experiments PCA technique was found to outperform Document Frequency performed globally technique.