Knowledge Discovery From Evolving Data Streams

Knowledge Discovery From Evolving Data Streams

Prasanna Lakshmi Kompalli (Gokaraju Rangaraju Institure of Engineering and Technology, India)
Copyright: © 2019 |Pages: 21
DOI: 10.4018/978-1-5225-3534-8.ch002

Abstract

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.
Chapter Preview
Top

Background

Online learning and designing new techniques for data stream mining is a challenge faced by researchers. In data streams data arrives rapidly and if not processed instantly it will be lost forever. Additionally, it is not practical to store this data in active storage media for longer period of time.

Extraction of knowledge from evolving data streams is becoming a key task for several researchers (Lakshmi & Reddy, 2010). Mining such evolving data streams have multitudinous applications with many daunting research issues. Handling concept drift in data streams can impact multidisciplinary domains (Khamassi et al, 2018). Before proceeding to further discussion, a study of the features related to data streams is presented.

Features of Data Streams

The following features act as constraints over a model built for working with streaming data.

  • 1.

    Voluminous data with unbounded size arrives continuously and rapidly. Hence not feasible to store it completely. Concise summary generated by modelling data streams can be stored.

  • 2.

    Streams arrive fast, so each stream must be processed and discarded within limited amount of time.

  • 3.

    Streams may change over time called concept drift. Some applications like fraud detection may need complete knowledge gained from the beginning of data stream and some other applications like sensor data stream may need only current knowledge of data streams. Thus, past knowledge gained may be relevant or may be irrelevant based on the application.

Constraints presented in 1 and 2 features limit the processing time of streaming algorithm. Intense research is being carried out for developing algorithms which take minimal amount of time for processing each data element. 3 feature presents a constraint, for which data stream mining algorithms must sometimes work with recent data and some other time must work with whole data (Lakshmi & Reddy, 2015; Odysseas et al., 2015).

Complete Chapter List

Search this Book:
Reset