Novel Class Detection with Concept Drift in Data Stream - AhtNODE

Novel Class Detection with Concept Drift in Data Stream - AhtNODE

Jay Gandhi, Vaibhav Gandhi
Copyright: © 2020 |Pages: 12
DOI: 10.4018/IJDST.2020010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Data stream mining has become an interesting analysis topic and it is a growing interest in data discovery method. There are several applications supporting stream data processing like device network, electronic network, etc. Our approach AhtNODE (Adaptive Hoeffding Tree based NOvel class DEtection) detects novel class in the presence of concept drift in streaming data. It addresses there are three challenges of streaming data: infinite length, concept drift, and concept evolution. This approach automatically detects the novel class whenever it arrives in the data stream. It is a multi-class approach that distinguishes novel class from existing classes. The authors tend to apply the Adaptive Hoeffding Tree as a classification model that is also used to handle the concept drift situation. Previous approaches used the ensemble model to handle concept drift. In AHT, classification is done in the single pass. The experiment result proves the effectiveness of AhtNODE compared to existing ensemble classifier in terms of classification accuracy, speed and use of memory.
Article Preview
Top

1. Introduction

Data stream mining is a method to extract the information from continuous data which arrives rapidly. Several applications which generate stream data, due to limited computing power and memory storage mining in streaming data, should perform in a single pass (or a little number of a pass). Examples of such stream data are traffic in the network, telephone conversations, bank or ATM transactions, online shopping, various web searches over the internet, and weather prediction.

Data stream classification has become an interesting field of research over the years. It poses many challenges within which a number of areas are compelled to be addressed. In this article, it addresses three challenges (1) Infinite length – It means instance arrives at a rapid rate so the model is not able to store whole the data for classification, it should be divided into chunks. (2) Concept Drift – The challenge is a prediction target is change over the time. For example, we cannot use the same model in winter which is been trained for summer data. (3) Concept Evolution – In stream data number of class for prediction is not always fixed. As concept drift situation arises it may possible that new class may arrive. Most of the existing algorithm addresses only two problems: infinite length and concept drift (Aggarwal, Han, Wang, & Yu, 2006). We cannot store data stream in main memory due to infinite length. So, it is not possible to use all the previous/historical data for training (Hulten, Spencer, & Domingos, 2001). Traditional multi-pass algorithms are performed poorly due to this characteristic of a data stream. The drawback is that it needs infinite/huge space for storing continuous data stream and to process it. (Scholz & Klinkenberg, 2005).

The model attempts to predict the target variable which is known as Concept (target label to predict). In the scenario of a stream, data concept may change over time. So, the challenge is to handle concept drift and to develop a model which can continuously adapt the change and learn from a most recent concept. But most of the existing algorithms ignore the third challenge that is Concept Evolution. It happens when a new/novel class appears in a data stream (Ren, Liao, Zhu, Li, Liu, & Li, 2018). So, to handle the new class classification, model must have the capability to identify change and detect novel class in data stream whenever it is present. Novel Class should detect before model being trained with this labeled instance. In-stream mining model, assumption of the fixed class is not always right because as the concept changes new class may arrive any time. Once a new class evolves, most existing data classification techniques ignore the important aspect of novel class arrival.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 2 Issues (2023)
Volume 13: 8 Issues (2022)
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing