Theoretical Analysis of Different Classifiers under Reduction Rough Data Set: A Brief Proposal

Theoretical Analysis of Different Classifiers under Reduction Rough Data Set: A Brief Proposal

Shamim H. Ripon (Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh), Sarwar Kamal (Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh), Saddam Hossain (Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh) and Nilanjan Dey (Techno India College of Technology, Kalkata, India)
Copyright: © 2016 |Pages: 20
DOI: 10.4018/IJRSDA.2016070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Rough set plays vital role to overcome the complexities, vagueness, uncertainty, imprecision, and incomplete data during features analysis. Classification is tested on certain dataset that maintain an exact class and review process where key attributes decide the class positions. To assess efficient and automated learning, algorithms are used over training datasets. Generally, classification is supervised learning whereas clustering is unsupervised. Classifications under mathematical models deal with mining rules and machine learning. The Objective of this work is to establish a strong theoretical and manual analysis among three popular classifier namely K-nearest neighbor (K-NN), Naive Bayes and Apriori algorithm. Hybridization with rough sets among these three classifiers enables enable to address larger datasets. Performances of three classifiers have tested in absence and presence of rough sets. This work is in the phase of implementation for DNA (Deoxyribonucleic Acid) datasets and it will design automated system to assess classifier under machine learning environment.
Article Preview

1. Introduction

Generally Computational biology (CB) demonstrates a dynamic and significant scientific space for automated or artificial analysis and processing methods and rough set-based analysis permit specific operations with multiple datasets. Traditionally, biological research operates the data in systematic and regular formulas. From last decade, technology boosts up to handle large volumes of data with quick manner. Now-a-days, computational tools handle biological data in quick success. In 1995, first details genome segments were sequenced and consequently many other analyses have been accomplished (Fleischmann et al., 1995, Berman et al., 2000). Microarray data analysis with accuracy is one of the fundamental supports along with lot of other informatics analysis. Parallel DNA sequence processing is another significant contribution (Schena et al., 1995; Duggan et al., 1999).

Classifier techniques might be helpful to solve some of the challenging real-world problems. In a classifier system, a harmonious combination of multiple techniques is used to build an efficient solution to deal with a particular data classification. One field of the classification approaches that has recently become a topic for researchers is Meta learning or classifiers. Meta learning or classifier refers to handling a set of base predictors for a given classification task and then integrate the output information using an integration technique. Association rules can be found under different names such as: decision combination, classifier ensembles, classifier fusion, consensus aggregation, hybrid methods and more (Kuncheva, 2002; Dasarathy, 1994).

Priority generates priority values to interlinked homologous datasets. These values are governed by priority association rules. The main purpose of priority association rule and rough data set is to improve the performance of a single classifier. Different classifiers usually make different predictions on the same sample of data. This is due to their diversity and many research works illustrated that the sets of misclassified samples from different classifiers by using multiple sets of classifiers. The techniques that are used to develop priority association table can be divided into two categories: classifiers disturbance and sample disturbance. The first approach utilizes the instability of the base classifiers. These classifiers are very sensitive to the initialization parameters like neural networks, random forests, and decision trees. The second approach even trains the classifier with different sample subsets or to train classifiers in different feature subspaces.

As with any classification problem, document classification is comprised of two stages: feature extractor and a decision stage that actually performs the assignment of documents to classes based on the extracted features. Several feature extractors have been proposed by Jia et al. 2015, but by far, the most popular ones have been variants of the term-frequency vector. A wealth of data mining and machine learning techniques have then been applied to and/or developed for the purposes of document classification. These include the naive Bayes classifier, k-nearest neighbor classifier, Apriori algorithm, neural networks, decision trees, logistic regression and most recently support vector machines (SVM) (Joachims, 2001). While the aforementioned advances have been significant both conceptually and from the viewpoint of enhancing classification accuracy, it is evident that document classification methods have largely focused on intelligent mining of data in the documents.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing