Scaling Fuzzy Models

Scaling Fuzzy Models

Lawrence O. Hall, Dmitry B. Goldgof, Juana Canul-Reich, Prodip Hore, Weijian Cheng, Larry Shoemaker
DOI: 10.4018/978-1-60566-858-1.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter examines how to scale algorithms which learn fuzzy models from the increasing amounts of labeled or unlabeled data that are becoming available. Large data repositories are increasingly available, such as records of network transmissions, customer transactions, medical data, and so on. A question arises about how to utilize the data effectively for both supervised and unsupervised fuzzy learning. This chapter will focus on ensemble approaches to learning fuzzy models for large data sets which may be labeled or unlabeled. Further, the authors examine ways of scaling fuzzy clustering to extremely large data sets. Examples from existing data repositories, some quite large, will be given to show the approaches discussed here are effective.
Chapter Preview
Top

Introduction

Scaling fuzzy learning systems can be a challenge, because the search space for fuzzy models is larger than that of crisp models. Here, we are concerned with scaling fuzzy systems as the size of the data grows. There are now many collections of data that are terabytes in size and we are moving towards petabyte collections such as a digital Sloan sky survey (Giannella et al., 2006, Gray and Szalay, 2004).

If learning fuzzy models requires more computation time than learning crisp models and it is a struggle to enable crisp learning models to scale, can we scale fuzzy models of learning? The good news is that scalability is certainly possible as the number of examples grow large or very large. We do not examine the issues with large numbers of features which are a significant problem, for at least supervised fuzzy learning.

Methods for scaling supervised fuzzy learning methods and unsupervised fuzzy learning methods (though only clustering algorithms) will be discussed. An obvious approach is to subsample the data such that each subset is a size that is amenable for learning, but captures the information inherent in the full data set. It is a good approach, but one that has pitfalls in knowing when to stop adding data to the training set (Domingos and Hulten, 2000). Some good papers in the area of subsampling are (Provost and Kolluri, 1999,Wang et al., 2008, Provost et al., 1999, Pavlov et al., 2000). Decomposition of the data is the other major approach one can envision. It is this approach, leading to an ensemble or group of models that is the focus of this chapter.

For labeled data which enables supervised learning, We will show that an ensemble approach can be used to increase the accuracy of the fuzzy classifier. This is a necessary condition to working with disjoint subsets to enable the construction of fuzzy classifiers on very large data sets. However, we will focus on relatively small data sets where the goal is to increase accuracy, not to scale. The same approach using disjoint subsets will allow for scalable fuzzy classifiers to be developed. For unsupervised learning, examples will be given which show that the clustering approaches presented here produce data partitions which are comparable to those obtainable when clustering all of the data.

Ensembles

An ensemble, for our purposes, is made up of a set of models. The models may be created through supervised or unsupervised learning. The models in the ensemble need to be diverse. The idea of diversity is that they make different types of errors and in the aggregate errors are corrected (Banfield et al., 2005).

The models may be created from different underlying learning algorithms. However, the most common way to create an ensemble is to use different data sets and the same underlying learning algorithm. A common approach is to use bootstrap aggregation or bagging (Breiman, 1996), which is selection with replacement to create different training data sets. This has the effect of weighting the data, as some of it is left out (0 weight) and some of it is duplicated (doubled, tripled or more in weight). On average about 63% of the training data will be in a given bag which is the same size as the training data. The assumption that the training and test data are independently identically distributed is implicit in bagging. The use of bagging to create an ensemble typically improves the classification accuracy (Banfield, et al., 2007, Dietterich, 2000).

Boosting is another popular algorithm for creating ensembles of classifiers (Freund and Schapire, 1996). It focusses on misclassified examples by giving them a higher weight. For our purposes, it is a sequential algorithm (you do not know what is incorrect until the next model/classifier in the ensemble is built). There have been efforts to make it scalable (Chawla, 2004), but they have not been applied to fuzzy classification approaches.

As fuzzy learning algorithms typically scale poorly with the number of training examples, methods that allow for minimal training data set sizes, but produce accuracy comparable to all the data are desirable. Recent work has shown that an ensemble can be created from disjoint training data sets or data sets that have no overlap and obtain accuracy on unseen test data that is equivalent (or sometimes better) than training on all of the data (Chawla, et.al. 2001). For large data sets, this means you can build classifiers in parallel on subsets of the training data to get the same accuracy as training with all of the data. Now, you can train on data that would not fit in main memory, for example.

Complete Chapter List

Search this Book:
Reset