Article Preview
TopBackground
A data stream is a fast-arriving, transient sequence of data values. Data streams’ characteristics (Domingos & Hulten, 2000; Gama, Rodrigues, & Aguilar-Ruiz, 2007) are summarized as follows:
- i)
Data usually coming in at a detailed level
- ii)
Streaming data arriving at a fast pace
- iii)
Potentially unbounded observations of data
- iv)
Possibly limited storage and memory resources for processing data streams
As high-volume and high-speed data streams challenge data miners to shift data mining paradigm from mining in batches to mining incrementally. Employing the “incremental” approach allows data miners to avoid an expensive processing of potentially large-sized data at the end of the streams by processing data in a small amount at a time. The criteria for a capable data streams clustering method include (Babcock, Babu, Datar, Motwani, & Widom, 2002; Barbará, 2002; Domingos & Hulten, 2001; Golab & Özsu, 2003):