Optimization of Text Feature Selection Process Based on Advanced Searching for News Classification

Optimization of Text Feature Selection Process Based on Advanced Searching for News Classification

Khin Sandar Kyaw, Somchai Limsiroratana
Copyright: © 2020 |Pages: 23
DOI: 10.4018/IJSIR.2020100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Nowadays, the culture for accessing news around the world is changed from paper format to electronic and the rate of publication for newspapers and magazines on websites have increased dramatically. Therefore, the feature selection process from high-dimensional text feature set for an automatic news classification model is becoming the top challenge because irrelevant features can degrade the accuracy with high cost computation time for classification model. In this article, six-advanced search policies based on evolutionary, swarm intelligence, nature-inspired intelligence are observed for achieving the global optimal feature subset for optimal accuracy in news classification problem. According to the experimental results, the advanced search schemes that can provide flexibility in integrating classifier in accordance with its objective function such as optimal classification performance by adjusting the rate of modification parameters for the testing data.
Article Preview
Top

1. Introduction

In the age of Information Technology, the development of automatic news classification model is become hot research topic. Meanwhile, the exploration of feature selection process for high- dimensional feature set is become the significant issue for several disciplines likes text mining, pattern recognition, and knowledge discovery, etc. To overcome it, the randomized feature searching capability of advanced search should be applied to select the representative features from feature hypothesis. The purpose of feature selection is to remove irrelevant and/or redundant features that can hurt for classification performance. Meanwhile, the searching policy should be optimized using intelligence-based search approach rather than the bias searching policy such as hill-climbing search, exhaustive search, etc. In order to develop the search model described in above, the nature of meta-heuristic algorithms can be applied for searching process in feature selection because it can support various natural intelligent such as decentralizing the task for local and global optima with random search policy.

In the proposed model, the important useful knowledge is discovered by the framework of data mining which includes data preprocessing, feature engineering, and training and testing the classification model. The main purpose of this research is to describe the apprehension of artificial intelligence (AI) community to the investigation of feature selection process in accompany with advanced searching capability. Through this purpose, we studied the review of contemporary modern solutions with divergent types of searching policy, but homogeneous objective of optimization; to facilitating feature searching for the discovery of optimal feature in feature engineering process. Although the critical objective of this paper is intended to show the ability of meta-heuristic search in multi-dimension of complex feature space, several diverse characteristics of feature selection and calculation schemes for the measurement of text feature do bear in mind to investigate:

  • 1.

    Universal elucidation of problem definition and role of supreme which are related with feature selection and searching processes for optimization problem in AI and Data Mining;

  • 2.

    Applied areas of feature selection with meta-heuristic searching scheme based on the swarms’ intelligent and others natural intelligent for practical-world problems;

  • 3.

    Different methods for calculation of text feature such as statistical and others.

The structure of the paper is designed by using the following agenda: related work in section 2, methodology in section 3, system implementation in section 4, experimental results and discussion in section 5, and conclusion and future work in section 6.

Top

Many researchers have used various searching approaches for handling different feature selection optimization problems in various areas of applications such as document classification (Allahverdipoor & Gharehchopogh, 2018) and clustering (Abualigah & Khader, 2017), pattern recognition, diagnosis of disease using medical data set (Kaur, Saini, & Gupta, 2018), and several other applications of data mining fields (Mavrovouniotis, Li, & Yang, 2017). One of the most popular meta-heuristic algorithms, Artificial Bee Colony (ABC), is used in many different sectors such as training the weight for Artificial Neural Network (ANN); classification of medical patterns; clustering problem to discover the k-best cluster; Travelling Salesman Problem (Rai & Sharma, 2015), etc. In (Holden & Freitas, 2010), they proposed the news web page classification system using the ant colony optimization algorithm (ACO), compared the results with C5.0 and investigated the pros and cons of reducing methods such as WordNet and other preprocessing stages for large numbers of attributes associated with web mining.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 3 Issues (2023)
Volume 13: 4 Issues (2022)
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing