Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

XML Tree Classification on Evolving Data Streams

Albert Bifet, Ricard Gavaldà

Source Title: XML Data Mining: Models, Methods, and Applications

DOI: 10.4018/978-1-61350-356-0.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Nowadays, advanced analysis of data streams is quickly becoming a key area of data mining research, as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is, when concepts drift or change completely, is becoming one of the core issues. At the same time, closure-based mining on relational data has recently provided some interesting algorithmic developments as well as practical uses. In this chapter we show how to use closure-based mining to reduce drastically the number of attributes in XML tree classification tasks. Moreover, using maximal frequent trees, we reduce even more the number of attributes needed in tree classification, in many cases without losing accuracy. We show a general framework to classify XML trees using subtree occurrence, composing a Tree XML Closed Frequent Miner with a classifier algorithm. We present specific methods that can adaptively mining closed patterns from data streams that change over time.

Chapter Preview

Top

Introduction

Pattern classification and frequent pattern discovery have possibly become the most important data mining tasks over the last decade. Nowadays, they are becoming harder, as the size of the pattern datasets is increasing, data often comes from sequential, streaming sources, and we cannot assume that data has been generated from a static distribution. If we want accuracy in the results of our algorithms, we have to consider that the distribution that generates data may vary over time, often in an unpredictable and drastic way.

Tree Mining is becoming an important field of research due, among others, to the fact that XML patterns are tree patterns and that XML has become a standard for information representation and exchange over the Internet. XML data is growing and it will soon constitute one of the largest collection of human knowledge. Other applications of tree mining appear in chemical informatics, computer vision, text retrieval, bioinformatics, and Web analysis (Nayak et al., 2009, Denoyer, Gallinari, & Vercoustre, 2006). XML tree classification (Denoyer & Gallinari, 2004) has been done traditionally using information retrieval techniques considering the labels of nodes as bags of words (Campos, Fernandez-Luna, Huete, & Romero, 2008, Yang & Zhang, 2008) without taking into account the structure of the trees (Ceci & Appice, 2006). With the development of frequent tree miners, classification methods using frequent trees appeared (Zaki & Aggarwal, 2003, Kudo & Matsumoto, 2004, Collins & Duffy, 2001, Kashima & Koyanagi, 2002). Recently, closed frequent miners were proposed (Chi, Xia, Yang, & Muntz, 2005, Termier et al., 2008, Arimura & Uno, 2005), and using them for classification tasks is the next natural step (Kutty, Tran, Nayak, & Li, 2008, Candillier et al., 2007).

The main advantage of using closed patterns is that they still contain the essential information about frequent patterns while eliminating redundant one. In this chapter we show how closure-based mining can be used to reduce drastically the number of attributes in tree classification tasks. Also, we show how to use maximal frequent trees to reduce even more the number of attributes needed in tree classification, in many cases without loosing accuracy.

We study and show a general framework to classify XML trees based on subtree occurrence. It is formed by the composition of a tree XML closed frequent miner with a classification algorithm. We discuss specific methods for adaptively dealing with the problem on data streams that vary over time.

The rest of the chapter is organized as follows. We discuss the data stream setting and mention briefly some previous XML classification methods in Section “Background”. Sections “Frequent Pattern Compression” and “Classification using Compressed Frequent Patterns” introduce a tree closure operator and its properties needed for XML classification. Section “XML Tree Classification framework on data streams” shows the tree classification framework and introduces an adaptive closed frequent mining method. Experimental results are discussed in Section “Experimental Evaluation”. Finally, Section “Conclusions and Future Works” concludes this chapter.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

XML Tree Classification on Evolving Data Streams

Abstract

Introduction

Complete Chapter List