Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Scaling Fuzzy Models

Lawrence O. Hall, Dmitry B. Goldgof, Juana Canul-Reich, Prodip Hore, Weijian Cheng, Larry Shoemaker

Source Title: Scalable Fuzzy Algorithms for Data Management and Analysis: Methods and Design

DOI: 10.4018/978-1-60566-858-1.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter examines how to scale algorithms which learn fuzzy models from the increasing amounts of labeled or unlabeled data that are becoming available. Large data repositories are increasingly available, such as records of network transmissions, customer transactions, medical data, and so on. A question arises about how to utilize the data effectively for both supervised and unsupervised fuzzy learning. This chapter will focus on ensemble approaches to learning fuzzy models for large data sets which may be labeled or unlabeled. Further, the authors examine ways of scaling fuzzy clustering to extremely large data sets. Examples from existing data repositories, some quite large, will be given to show the approaches discussed here are effective.

Chapter Preview

Top

Introduction

Scaling fuzzy learning systems can be a challenge, because the search space for fuzzy models is larger than that of crisp models. Here, we are concerned with scaling fuzzy systems as the size of the data grows. There are now many collections of data that are terabytes in size and we are moving towards petabyte collections such as a digital Sloan sky survey (Giannella et al., 2006, Gray and Szalay, 2004).

If learning fuzzy models requires more computation time than learning crisp models and it is a struggle to enable crisp learning models to scale, can we scale fuzzy models of learning? The good news is that scalability is certainly possible as the number of examples grow large or very large. We do not examine the issues with large numbers of features which are a significant problem, for at least supervised fuzzy learning.

Methods for scaling supervised fuzzy learning methods and unsupervised fuzzy learning methods (though only clustering algorithms) will be discussed. An obvious approach is to subsample the data such that each subset is a size that is amenable for learning, but captures the information inherent in the full data set. It is a good approach, but one that has pitfalls in knowing when to stop adding data to the training set (Domingos and Hulten, 2000). Some good papers in the area of subsampling are (Provost and Kolluri, 1999,Wang et al., 2008, Provost et al., 1999, Pavlov et al., 2000). Decomposition of the data is the other major approach one can envision. It is this approach, leading to an ensemble or group of models that is the focus of this chapter.

For labeled data which enables supervised learning, We will show that an ensemble approach can be used to increase the accuracy of the fuzzy classifier. This is a necessary condition to working with disjoint subsets to enable the construction of fuzzy classifiers on very large data sets. However, we will focus on relatively small data sets where the goal is to increase accuracy, not to scale. The same approach using disjoint subsets will allow for scalable fuzzy classifiers to be developed. For unsupervised learning, examples will be given which show that the clustering approaches presented here produce data partitions which are comparable to those obtainable when clustering all of the data.

Ensembles

An ensemble, for our purposes, is made up of a set of models. The models may be created through supervised or unsupervised learning. The models in the ensemble need to be diverse. The idea of diversity is that they make different types of errors and in the aggregate errors are corrected (Banfield et al., 2005).

The models may be created from different underlying learning algorithms. However, the most common way to create an ensemble is to use different data sets and the same underlying learning algorithm. A common approach is to use bootstrap aggregation or bagging (Breiman, 1996), which is selection with replacement to create different training data sets. This has the effect of weighting the data, as some of it is left out (0 weight) and some of it is duplicated (doubled, tripled or more in weight). On average about 63% of the training data will be in a given bag which is the same size as the training data. The assumption that the training and test data are independently identically distributed is implicit in bagging. The use of bagging to create an ensemble typically improves the classification accuracy (Banfield, et al., 2007, Dietterich, 2000).

Boosting is another popular algorithm for creating ensembles of classifiers (Freund and Schapire, 1996). It focusses on misclassified examples by giving them a higher weight. For our purposes, it is a sequential algorithm (you do not know what is incorrect until the next model/classifier in the ensemble is built). There have been efforts to make it scalable (Chawla, 2004), but they have not been applied to fuzzy classification approaches.

As fuzzy learning algorithms typically scale poorly with the number of training examples, methods that allow for minimal training data set sizes, but produce accuracy comparable to all the data are desirable. Recent work has shown that an ensemble can be created from disjoint training data sets or data sets that have no overlap and obtain accuracy on unseen test data that is equivalent (or sometimes better) than training on all of the data (Chawla, et.al. 2001). For large data sets, this means you can build classifiers in parallel on subsets of the training data to get the same accuracy as training with all of the data. Now, you can train on data that would not fit in main memory, for example.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Scaling Fuzzy Models

Abstract

Introduction

Ensembles

Complete Chapter List