Data Stream Mining Using Ensemble Classifier: A Collaborative Approach of Classifiers

Data Stream Mining Using Ensemble Classifier: A Collaborative Approach of Classifiers

Snehlata Sewakdas Dongre (Ghrce Nagpur, India) and Latesh G. Malik (Ghrce Nagpur, India)
Copyright: © 2017 |Pages: 14
DOI: 10.4018/978-1-5225-0489-4.ch013
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

A data stream is giant amount of data which is generated uncontrollably at a rapid rate from many applications like call detail records, log records, sensors applications etc. Data stream mining has grasped the attention of so many researchers. A rising problem in Data Streams is the handling of concept drift. To be a good algorithm it should adapt the changes and handle the concept drift properly. Ensemble classification method is the group of classifiers which works in collaborative manner. Overall this chapter will cover all the aspects of the data stream classification. The mission of this chapter is to discuss various techniques which use collaborative filtering for the data stream mining. The main concern of this chapter is to make reader familiar with the data stream domain and data stream mining. Instead of single classifier the group of classifiers is used to enhance the accuracy of classification. The collaborative filtering will play important role here how the different classifiers work collaborative within the ensemble to achieve a goal.
Chapter Preview
Top

Introduction

The necessity to handle large amount of data has given birth to the data mining field where it analyzes the processing time and memory for the static data sets. Larger data sets can be handled by the data mining approach but it fails when dealing with the continuous data. Usually, a previously trained model cannot be updated when new type of data arrives, at that time model cannot handle the new data and model should be retrained using new data. This is a very tedious process to retrain model whenever the new data arrives. In response to the continuous supply of data, stream data mining approach has recently emerged. The data stream algorithms are capable to deal with huge volume of data even with the changing nature of the data which are not addressed by the data mining.

Data stream is giant amount of data which is generated in uncontrolled manner at a quick rate from many applications like call detail records, log records, sensors applications, emails, blogging, twitter posts and etc. Data stream mining has gained the attention of so many researchers, so it has become a latest topic of research. As this is huge in the volume it is very difficult to store all data and analyze them and use their characteristics in the future. Massive volume and high speed are the characteristics which make the data stream difficult to handle by the traditional data mining techniques. In the data mining the underline data distribution is static which is not happened in data stream. In data stream the underline data distribution is changing as the data may span long time and the sources which generates may undergo some changes. Data stream is not stable but dynamic in the nature it means it changes with time and these changes are known as concept drift. A rising problem in data streams is the handling of concept drift. The new challenges have brought the concept of stream data. Several algorithms have been developed to handle the data stream.

Generally, in data mining, data can be stored in memory as the size of data is not beyond storage as well as processing time is not an issue. But in the data stream mining the data is beyond the storage capacity as it is huge in the volume as well as it requires lot of time to process the data which is not affordable. So space and time limitation is there in mining of the data streams. In the data streams the data may evolve as per the time so algorithm, which can adapt automatically, is required to handle the evolving data. An algorithm which can adapt the changes, takes less time & memory but gives poor accuracy will not be accepted. So the main issues related to the data stream are: - time, space, concept drift and the most important is accuracy. A good algorithm should give high accuracy, adapt the changes, requires less time to process the data and demand very less memory to store.

Complete Chapter List

Search this Book:
Reset