Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Global Induction of Classification and Regression Trees

Marek Kretowski, Marcin Czajkowski

Source Title: Encyclopedia of Business Analytics and Optimization

DOI: 10.4018/978-1-4666-5202-6.ch098

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Chapter Preview

Top

Introduction

Decision trees are one of the most widely used prediction techniques in data mining (Fayyad, Piatetsky-Shapiro, Smyth, & Uthurusamy, 1996). Their similarity to a human reasoning process makes them popular not only among data analysts but also business users. Tree-based approaches are easy to understand, visualize, interpret and their output model can be explained in typical business terms. The ease of application, fast operation and what may be the most important, i.e. the effectiveness of the decision trees, makes them powerful and popular tool (Kotsiantis, 2013).

Two main types of decision trees’ approaches can be distinguished by the type of the problem they are applied to. Tree predictors can be used to classify existing data (classification trees) or to approximate real-valued functions (regression trees) (see Figure 1). In each leaf, classification tree assigns a class label (usually the majority class of all instances that reach that particular leaf), while the regression tree holds a constant value (usually an average value for the target attribute). In addition, model tree is an extension of regression tree (see Figure 1). Main difference between regression tree and model tree is that, for the latter, constant value in the terminal node is replaced by the regression function. Each leaf of the model tree holds a linear (or nonlinear) model whose output is the final prediction value.

Figure 1.

An example of univariate decision tree with tests on nominal and continuous-valued features. Depending on the tree type, leaves could contain class (classification tree), constant value (regression tree) or some kind of model (model tree).

Most typical tree-based system applies greedy search in decision tree induction. One major drawback of greedy algorithms is that, they search only for locally best splits at each node which does not guarantee the globally best solution. Hence, applications of Evolutionary Algorithms (EAs) to the problem of decision tree induction become increasingly popular alternative. Instead of local search, EAs perform a global search in the space of candidate solutions.

The purpose of this chapter is to illustrate the application of EAs to the problem of decision tree induction. The objectives are to show that evolutionary optimization, compared to the greedy search, may result in finding globally optimal solutions, whose complexity is significantly smaller and the prediction is highly competitive. We will cover the global induction of classification, regression, and model trees.

Top

Background

Inducing optimal decision tree is known to be NP-complete (Naumov, 1991). Consequently, practical decision-tree learning algorithms are based on heuristics such as greedy algorithms where locally optimal splits are made in each node. The most popular tree-induction is based on the top-down approach (Rokach & Maimon, 2005). Top-down induction starts from the root node, where locally best split (test) is searched, according to the given optimality measure (e.g. Gini, towing or entropy rule for classification tree and least squared or least absolute deviation error criterion for regression tree). Next, the training instances are redirected to the newly created nodes and this process is repeated for each node until some stopping-rule is satisfied. The recursive partitioning of the dataset may lead to the data over-fitting, therefore, the decision tree pruning (Esposito, Malerba, & Semeraro, 1997) is applied to improve the generalization power of the predictor.

Most of tree inducing algorithms partition the feature space with axis-parallel hyper-planes. These types of trees are called univariate decision trees. Split at each non-terminal node usually involves single feature. For a continuous-valued feature usually an inequality test with binary outcomes is applied and for a nominal attribute mutually exclusive groups of attribute values are associated with outcomes. One of the first and most well-known solution that can be applied to classification and regression problem is CART (Breiman, Friedman, Olshen, & Stone, 1984) system. Good representatives of univariate inducers are also systems developed by Quinlan: C4.5 (1993) for classification and M5 (1992) for regression.

Key Terms in this Chapter

Decision Tree Pruning: An approach that reduces the size of decision tree by removing parts of the tree that provide little power to classify/predict the data. This technique improves the generalization power of the decision tree and reduces the over-fitting.

Model Tree: An extension of regression trees were the constant value in the terminal node is replaced by the regression plane.

Decision Tree: A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision. Each branch of the decision tree represents a possible decision or occurrence. The tree structure shows how one choice leads to the next, and the use of branches indicates that each option is mutually exclusive.

Global Induction: A method of decision tree generation, where both the tree structure and all tests are searched at the same time; usually based on evolutionary approach in contrast to top-down induction.

Top-Down Induction: A recursive method of decision tree generation. It starts with the entire input dataset in the root node where a locally optimal test for data splitting is searched and branches corresponding to the test outcomes are created. The test searches and data splitting are repeated in the created nodes unless the stopping condition is met.

Over-Fitting: The problem existing in supervised learning when a classifier or regressor perfectly predicts the training data but performs much worse on unseen, testing set. Problem can emerge when machine learning algorithm fits noise or insufficient data, i.e., when it learns on irrelevant facts or generalizes from specific cases.

Multivariate Decision Tree: A decision tree with tests involving several attributes. The most common form of such tests is an oblique split, which is based on a linear combination of features (hyper-plane).

Univariate Decision Trees: A decision tree which partitions the feature space with axis-parallel hyper-planes. Each split at non-terminal node involves single feature.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference