Article Preview
Top1. Introduction
Data stream mining is a method to extract the information from continuous data which arrives rapidly. Several applications which generate stream data, due to limited computing power and memory storage mining in streaming data, should perform in a single pass (or a little number of a pass). Examples of such stream data are traffic in the network, telephone conversations, bank or ATM transactions, online shopping, various web searches over the internet, and weather prediction.
Data stream classification has become an interesting field of research over the years. It poses many challenges within which a number of areas are compelled to be addressed. In this article, it addresses three challenges (1) Infinite length – It means instance arrives at a rapid rate so the model is not able to store whole the data for classification, it should be divided into chunks. (2) Concept Drift – The challenge is a prediction target is change over the time. For example, we cannot use the same model in winter which is been trained for summer data. (3) Concept Evolution – In stream data number of class for prediction is not always fixed. As concept drift situation arises it may possible that new class may arrive. Most of the existing algorithm addresses only two problems: infinite length and concept drift (Aggarwal, Han, Wang, & Yu, 2006). We cannot store data stream in main memory due to infinite length. So, it is not possible to use all the previous/historical data for training (Hulten, Spencer, & Domingos, 2001). Traditional multi-pass algorithms are performed poorly due to this characteristic of a data stream. The drawback is that it needs infinite/huge space for storing continuous data stream and to process it. (Scholz & Klinkenberg, 2005).
The model attempts to predict the target variable which is known as Concept (target label to predict). In the scenario of a stream, data concept may change over time. So, the challenge is to handle concept drift and to develop a model which can continuously adapt the change and learn from a most recent concept. But most of the existing algorithms ignore the third challenge that is Concept Evolution. It happens when a new/novel class appears in a data stream (Ren, Liao, Zhu, Li, Liu, & Li, 2018). So, to handle the new class classification, model must have the capability to identify change and detect novel class in data stream whenever it is present. Novel Class should detect before model being trained with this labeled instance. In-stream mining model, assumption of the fixed class is not always right because as the concept changes new class may arrive any time. Once a new class evolves, most existing data classification techniques ignore the important aspect of novel class arrival.