Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Learning Concept Drift Using Adaptive Training Set Formation Strategy

Nabil M. Hewahi, Sarah N. Kohail

Source Title: International Journal of Technology Diffusion (IJTD) 4(1)

DOI: 10.4018/jtd.2013010103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

We live in a dynamic world, where changes are a part of everyday life. When there is a shift in data, the classification or prediction models need to be adaptive to the changes. In data mining the phenomenon of change in data distribution over time is known as concept drift. In this research, the authors propose an adaptive supervised learning with delayed labeling methodology. As a part of this methodology, the atuhors introduce Adaptive Training Set Formation for Delayed Labeling Algorithm (SFDL), which is based on selective training set formation. Our proposed solution is considered as the first systematic training set formation approach which takes into account delayed labeling problem. It can be used with any base classifier without the need to change the implementation or setting of this classifier. The authors test their algorithm implementation using synthetic and real dataset from various domains which might have different drift types (sudden, gradual, incremental recurrences) with different speed of change. The experimental results confirm improvement in classification accuracy as compared to ordinary classifier for all drift types. The authors’ approach is able to increase the classifications accuracy with 20% in average and 56% in the best cases of our experimentations and it has not been worse than the ordinary classifiers in any case. Finally a comparison with other four related methods to deal with changing in user interest over time and handle recurrence drift is performed. These methods are simple incremental method, time window approach with different window size, instance weighting method and conceptual clustering and prediction framework (CCP). Results indicate the effectiveness of the proposed method over other methods in terms of classification accuracy.

Article Preview

Top

1. Introduction

A key assumption in supervised learning is that the training and the testing data (or operational data) used to train the classifier come from the same distribution. This means that training data is representative and the classifier will perform well on all future unseen data instances. However, if the statistical properties of the target variable, which the model is trying to predict, change over time while the same classifier is still applicable, the prediction will be no longer accurate. In machine learning this phenomenon of change in data distribution over time is known as concept drift (Tsymbal, 2004). Concept drift problem have been stated as the tenth challenging problem facing researchers in data mining and machine learning fields (Yang & Wu, 2006).

To show the importance of this problem, assume a data mining application for spam filtering that is developed using the latest generated spam dataset. As this filter adapted to deal with today’s types of spam emails, the spammers will try to bypass the spam filters by disguising their emails to look more like legitimate. So new spam will be generated and the current application will go toward approximation to classify this strange pattern. As time goes by, this will lead to less accurate, poor performance and incorrect knowledge. This dynamic nature of spam email raises a requirement for update in any filter that is to be successful over time in identifying spam (Delany, Cunningham, Tsymbal, & Coyle, 2005).

The main difficulty in mining non-stationary data like spam, intrusion, stock marketing, weather and customer preferences is to cope with the changing of data concept. The fundamental processes generating most real-time data may change over years, months and even seconds, at times drastically. Effective learning in environments with hidden contexts and concept drift requires a learning algorithm that can detect context changes without being explicitly informed about them, recover quickly from a context change and adjust itself to the new context, and can make use of previous experience in situations where old contexts and corresponding concepts reappear (Nishida & Yamauchi, 2009).

In our research, we try to add a contribution to scientific research in solving the problem of concept drift in supervised learning when true labels become known with certain delay. The work presented in this paper is based on training set formation strategy which is reforming the training sets when concept drift is detected. Training set formation methods have an advantages over other adaptivity methods since they do not require complicated parameterization and they can be used for online learning plugging in different types of base classifiers. We can summarize our contribution as:

•
We introduce Adaptive Training Set Formation for Delayed Labeling Algorithm (SFDL), which is based on selective training set formation. Our proposed solution is considered as the first systematic training set formation approach that take into account delayed labeling problem. Our proposed algorithm can be used with any base classifier without the need to change the implementation or setting of the classifier;
•
We test our algorithm implementation using synthetic and real dataset from various domains which might have different drift types (sudden, gradual, incremental recurrences) with different speed of change. Experimental evaluation confirms improvement in classification accuracy as compared to ordinary classifier for all drift types.

The rest of the paper is organized as follows: Section 2 presents related work and gives an introductory background to the main topic of this research, namely concept drift problem and detectability of concept drift when labeled is delayed. Section 3 defines training set formation strategy and summarize the main contributions of our research. Section 4 describes our methodology and proposed algorithms. Experimental results discussed in Section 5. Finally Section 6 concludes the paper.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 14: 1 Issue (2023)

Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 4 Issues (2016)

Volume 6: 4 Issues (2015)

Volume 5: 4 Issues (2014)

Volume 4: 4 Issues (2013)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Learning Concept Drift Using Adaptive Training Set Formation Strategy

Abstract

1. Introduction

Complete Article List