Processing Data Streams

Processing Data Streams

Parimala N. (Jawaharlal Nehru University, India)
Copyright: © 2020 |Pages: 20
DOI: 10.4018/978-1-7998-2975-1.ch002


A data stream is a real-time continuous sequence that may be comprised of data or events. Data stream processing is different from static data processing which resides in a database. The data stream data is seen only once. It is too voluminous to store statically. A small portion of data called a window is considered at a time for querying, computing aggregates, etc. In this chapter, the authors explain the different types of window movement over incoming data. A query on a stream is repeatedly executed on the new data created by the movement of the window. SQL extensions to handle continuous queries is addressed in this chapter. Streams that contain transactional data as well as those that contain events are considered.
Chapter Preview

Data Streams

A data stream is a continuous, real-time, ordered sequence of data values produced by a data source as defined in (Golab & Özsu, 2003a). The data stream data is considered as a relational tuple or sequence of events ordered by time. The data values produced by a data source is seen only once as the old data moves out and new data takes its place. Further, the data is constantly changing as new elements are produced. It does not take the form of a persistent relation (Raghavan & Henzinger, 1999).

The typical applications which generate stream data are temperature sensor applications, financial tickers, stock market, call detail records in telecommunications etc. The data is generated in real-time and is continuous. The applications generate new data at every tick making the data large in volume and never ending. In other words, the data is unbounded and voluminous. It is too large to store entirely. Only a portion of the data is visible at any given point in time.

The data is considered ordered. The ordering may be explicit or implicit. Explicitly, a time stamp attribute can be associated with the data which can be used to order the data. Implicitly, the arrival time can be used for ordering the data, (implicitly by arrival time or explicitly by timestamp) sequence of items, too large to store entirely. The first property of a data stream is that it is continuously automatically generated by various data sources. The data that is used to evaluate a query changes constantly since newly generated data is part of the new evaluation.

Due to the continuous unbounded nature, it is not possible to store the entire stream in a data base. The traditional DBMS cannot be used to answer queries as data is arriving continuously. Therefore, a strategy known as continuous queries is adopted. Here, a query is repeatedly evaluated as and when new data arrives, producing new results continuously. The main goals of DSMS is to output result even while data is continuously arriving and all the data cannot be stored in memory (Panigati, Schreiber, & Zaniolo, 2015). At best, synapses are stored for future use.

Complete Chapter List

Search this Book: