Article Preview
Top1. Introduction
Over the past few years, the dataset dimensionality has been increased in various domains like text-based sentiment analysis or bioinformatics.(Zhai et al., 2014) This reality has brought an intriguing challenge to the research field as much Artificial Intelligence (AI) or Machine Learning (ML) methods unable to manage high dimensional input data that involve products. Indeed, on the occasion that we examine the dimensionality of data posted in the well-known UCI repository and libSVM database,(Chang, 2001) we can see that the largest dimensionality of the dataset has expanded to over 30 million (approximately). Therefore, a part of these calculations is additionally when they face larger instance sizes. In this new situation, it is usual to manage information collection that is much larger than both the number of highlights and the number of tests, so current learning techniques must be adjusted.
To address this issue, dimension reduction methods can be applied to reduce the number of features and to enhance the performance of the resulting learning process. One of the most frequently used dimensionality reduction processes is the feature selection (FS), which accomplishes dimensionality reduction by emptying abstracts and additional features.(Liu & Motoda, 1998) Since FS places the highlights first, it is particularly valuable for applications where model translation and information extraction are important. In any case, existing FS techniques are not expected to scale well when managing a large-scale problem (in both various highlights and cases), in such a way that their effectiveness may be fundamentally broken or they can also be insignificant.
An analysis of sentiments is a way of identifying and classifying the emotions or opinions stated in some piece of text, sentence specifically in order to determining polarity whether the writer's disposition towards a particular topic or artefact is positive, negative, or neutral. For this purpose sentiment analysis and classification uses machine learning (ML) systems and natural language processing (NLP) together. The prevalence of rapid growth on the online social media and electronic network based societies provides all possible outcomes for customers to express their perceptions and exchange their ideas about entirety, for example, social or political issues through any article, book and films and so on through web-based networked media. These are usually in the form of survey material such as Likert type scaling data or text. Nowadays organizations are very fast, they evaluate popular perceptions about their customers or their articles of Internet-based social content.(Parvathy & Bindhu, 2016) Specific online service provider organizations are hooked in the evaluation of social media data in blogs, online forums, tweets, comments, and product feedback surveys. Publically shared reviews on sites or articles are used to recognize a customer's continued perception of any product or services to maintain a good commercialization with their decision making or the nature of its services or product quality.(Stylios et al., 2014) The critical problem that arises when collecting information from a social media networking environment is that the reviews consists mostly a large amount of unwanted data, including of HTML tags, linguistic and spelling errors, and the data is usually so bulky that removing those errors is human typical and time consuming task. An efficacious approach required to solving this problem is to select the usually relevant and significant features from the dataset and dispense repetitive or immaterial features. There are some pre-processing data cleaning techniques that rely on the choice of features selection. In the data mining process for high-dimensional dataset feature selection works as a highly effective pre-preparation strategy. Taxonomy of methods of feature selection present in Figure 1.
Figure 1. Feature selection methods taxonomy