Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Text Classification: New Fuzzy Decision Tree Model

Ben Elfadhl Mohamed Ahmed, Ben Abdessalem Wahiba

Source Title: Handbook of Research on Machine Learning Innovations and Trends

DOI: 10.4018/978-1-5225-2229-4.ch033

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this chapter, a supervised automatic text documents classification using the fuzzy decision trees technique is proposed. Whatever the algorithm used in the fuzzy decision trees, there must be a criterion for the choice of discriminating attribute at the nodes to partition. For fuzzy decision trees usually two heuristics were used to select the discriminating attribute at the node to partition. In the field of text documents classification there is a heuristic that has not yet been tested. This chapter tested this heuristic. The latter was presented in the works of Yuan and Shaw (1995) and was applied in a context different then the textual classification. This heuristic is analyzed and adapted to the author's approach for text documents classification.

Chapter Preview

Top

Introduction

Day by day, the world faces a huge amount of information which continues to increase rapidly. This entire amount requires the availability of effective means for its good management. A preliminary classification of a great source of information facilitates access to its content and its later manipulation. This principle is used in various fields such as databases, presses mails, some websites (the hierarchical classification of Yahoo for example), etc. The classification is divided into two branches (Sebastiani, 2002; Raheel, 2010) supervised classification (also called categorization) and unsupervised classification (also known as segmentation or clustering). This chapter focuses on the first type.

Supervised classification is performed to assign automatically and independently one or more documents to one or more predefined categories (Sebastiani, 2002). There are various techniques for supervised classification, among the best known: the Bayesian networks, support vector machines, k-nearest neighbors, decision trees, etc. Among these techniques, only the decision trees easily generate a set of rules justifying the generated classification decisions. Other techniques generate in a more difficult and complicated way such set of rules.

Despite the wide spread of decision trees, this technique suffers from a problem that may affect its effectiveness: the problem of continuous values attributes. Let’s take the example of a tree that will classify two men according to their sizes. The first has a height of 181cm; the second has a height of 180. The tree classifies a man as tall, if he has a height strictly larger than 180. In this example the tree will classify the first man as tall, but not the second, despite the invisible difference between the two sizes of the two men in the real world.

One of the solutions used to solve the problem of the classification’s results sudden changes following continuous values changes, is the integration of fuzzy set theory with decision trees. This theory takes into account the continuity of values describing the phenomena of the real world and describes them in a graduated way closer to the reality (Janikow and Kawa, 2005).

The fuzzy decision trees allow benefit from the advantages offered by the combination of classical decision trees and fuzzy set theory. This combination uses the fuzzy representation and approximate reasoning ability with the symbolic power and ease of the classic decision trees interpretation.

A fuzzy decision tree is a good choice to use in the field of text classification, to manage a big problem in this type of classification which is the uncertainty and ambiguity necessarily related to the use of human language terms in the documents to be classified. In addition to their ability to handle the noise, the problem of missing or erroneous attributes, classification with fuzzy decision trees still retains the advantage of being easily understandable and interpretable.

Different models have been developed in the literature to construct fuzzy decision trees. Most of these models are based on the fuzzy algorithm ID3 (Matiasko et al, 2006), which is a direct extension of the ID3 algorithm (Quinlan, 1986).

The difference between these models often lies in the selection criterion of discrimination attribute and the way used to find the membership degrees of the used variables.

For the selection of the discrimination attribute, two heuristics have been used in the literature: The first is based on the minimization of the fuzzy entropy; the second is based on minimizing the classification ambiguity (Wang and al, 2000). In the area of text classification with fuzzy decision tree, in the author’s literature search, only the first heuristic has been implemented and tested (Wang and Wang, 2005). For the second heuristic based on the minimization of the classification ambiguity, it has not been yet implemented, nor tested; despite it seems to well fit the context of the text classification due to the existing ambiguity related to the use of human terms that always can’t describe perfectly what we want to say.

Minimizing classification ambiguity has been used by (Yuan and Shaw, 1995) in their model with sample classification on sport to practice according to the state of the climate described by four attributes. In this chapter, the authors will study and apply this heuristic for text classification.

Key Terms in this Chapter

Fuzzification: The process which allows to the continuous values variables to be transformed into linguistic variables.

Branch: A part of the decision tree. It starts from the root of the tree and continues to browse the nodes down until you get to a leaf.

Dataset: A set of documents used by a system to make his learning step, first, and test its performance in the second step. That is why it is composed of two groups of documents: learning documents and test documents.

TFxIDF: A measure used to calculate the weight of an attribute in a document.

Tree: A technique used for supervised classification. It allows classifying objects into predefined classes. Its advantage is that it allows justifying the choice of classification for a particular object. One advantage rarely offers another classification technique.

Classification: A process to classify objects in classes or categories. These classes are predefined in advance by the user, or it is the system which itself generates these categories.

Attribute: An element presenting the text to classify; it can be a simple word, a lemma, n-gram, etc.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Text Classification: New Fuzzy Decision Tree Model

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List