ACNB: Associative Classification Mining Based on Naïve Bayesian Method

ACNB: Associative Classification Mining Based on Naïve Bayesian Method

Fadi Odeh, Nijad Al-Najdawi
DOI: 10.4018/jitwe.2013010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Integrating association rule discovery and classification in data mining brings a new approach known as associative classification. Associative classification is a promising approach that often constructs more accurate classification models (classifiers) than the traditional classification approaches such as decision trees and rule induction. In this research, the authors investigate the use of associative classification on the high dimensional data in text categorization. This research focuses on prediction, a very important step in classification, and introduces a new prediction method called Associative Classification Mining based on Naïve Bayesian method. The running time is decreased by removing the ranking procedure that is usually the first step in ranking the derived Classification Association Rules. The prediction method is enhanced using the Naïve Bayesian Algorithm. The results of the experiments demonstrate high classification accuracy.
Article Preview
Top

1. Introduction

Given a training data set of historical transactions, the problem is to discover the classification association rules (CARs) with significant supports and high confidences (attribute values that have frequencies above user specified minimum support and minimum confidence thresholds). One subset of the generated CARs is chosen to build an automatic model (classifier) that could be used to predict the classes of previously unseen data. This approach, which uses association rule mining to build classifiers, is called Associative Classification (AC) (Liu et al., 1998; Li et al., 2001; Thabtah et al., 2005; Yoon & Lee, 2008; Niu et al., 2009). In addition, unlike traditional data mining methods such as neural networks (Wiener et al., 1995) and probabilistic methods (Duda & Hart, 1973), which produce classification models that are hard to understand or interpret by end-user, associative classification produces rules that are easy to understand and manipulate by end-users.

There are many applications where associative classification is suitable, including: credit applications, insurance fraud detection, medical diagnosis, text categorization and email classification. Text Categorization (TC) is one of the important problems in data mining and machine learning communities. This problem is considered large and complex since the data are massive and have large dimensionality. Categorization involves building a model from classified documents, in order to classify previously unseen documents as accurately as possible (Hadi et al., 2007).

Classifiers are widely used because of the ease of the interpretability of the models or set of rules they generate. These classifiers perform exceptionally well on complete data sets, meaning, the data is clean, correct, and does not have missing attribute values. To generate the model or train the classifier, the training process uses attributes and values relative to each other to segregate the data and generate a rule relative to a particular class. This poses a problem with incomplete data as the models produced by traditional rule based classifiers are sensitive to missing attribute values in unseen data (Williams et al., 2012).

1.1. Data Mining

Associative classification is a branch of a larger area of scientific study known as data mining. Fayyad et al. (1998) define data mining as one of the main phases in Knowledge Discovery from Databases (KDD), which extracts useful patterns from data. The availability of high speed computers, automated data collection tools and large memory capacities made the process of assembling and storing enormous quantities of information possible, such as the number of sales transactions during one year for a big retail store is large. This enormous growth of stored databases provided an opportunity for new intelligent data techniques, which can produce useful information from these databases. The process of extracting this useful knowledge is accomplished using data mining techniques.

KDD comprises more than one phase where data mining is one of its primary phases. Other phases in KDD are data selection, data cleansing, data reduction, pattern evaluation and visualization of the discovered information (Fayyad et al., 1998). Data mining can be used for many tasks including classification, clustering, association rule discovery, and outlier analysis (Witten & Frank, 2000). These tasks can be accomplished using various data mining techniques that are adopted from different scientific fields, particularly statistics and artificial intelligence. There is no single data mining technique applicable to all tasks and when it comes to selecting a technique for a particular problem, the choice is very critical, as one technique could work well for one problem but it could be poor elsewhere. There are many factors that can be considered before taking such a decision like the size and nature of the data, attribute types (text, real…etc), number of columns, output format, and more importantly the goal of application. Some of the primary data mining tasks include: Classification, Clustering, Regression, Association Rule Discovery, and Outlier Analysis.

Complete Article List

Search this Journal:
Reset
Volume 19: 1 Issue (2024)
Volume 18: 1 Issue (2023)
Volume 17: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing