Traditional decision support systems (DSS) and executive information systems (EIS) gather and present information from several sources for business purposes. It is an information technology to help the knowledge worker (executive, manager, analyst) make faster and better decisions. So far, these data were stored statically and persistently in a database, typically in a data warehouse. Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In a typical scenario, an organization stores a detailed record of its operations in a database, which is then analyzed to improve efficiency, detect sales opportunities, and so on. Performing complex analysis on these data is an essential component of these organizations’ businesses. Chaudhuri and Dayal (1997) present an excellent survey on decision-making and online analytical processing (OLAP) technologies for traditional database systems. ?n many applications however, it may not be possible to process queries within a database management system (DBMS). These applications involve data items that arrive online from multiple sources in a continuous, rapid and time-varying fashion (Babcock et. al., 2002). These data may or may not be stored in a database. As a result, a new class of data-intensive applications has recently attracted a lot of attention: applications in which the data is modeled not as persistent relations but rather as transient data streams. Examples include financial applications (streams of transactions or ticks), network monitoring (stream of packets), security, telecommunication data management (stream of calls or call packets), web applications (clickstreams), manufacturing, wireless sensor networks (measurements), RFID data, and others. In data streams we usually have “continuous” queries (Terry et. al., 1992; Babu & Widom, 2002) rather than “one-time.” The answer to a continuous query is produced over time, reflecting the stream data seen so far. Answers may be stored and updated as new data arrives or may be produced as data streams themselves. Continuous queries can be used for monitoring, alerting, security, personalization, etc. Data streams can be either transactional (i.e., log interactions between entities, such as credit card purchases, web clickstreams, phone calls), or measurement (i.e., monitor evolution of entity states, such as physical phenomena, road traffic, temperature, network). How to best model, express and evaluate complex queries over data streams is an open and difficult problem. This involves data modeling, rich querying capabilities to support real-time decision support and mining, and novel evaluation and optimization processing techniques. In addition, the kind of decision support over data streams is quite different from “traditional” decision-making: decisions are “tactical” rather than “strategic.” Research on data streams is currently among the most active areas in database research community. Flexible and efficient stream querying will be a crucial component of any future data management and decision support system (Abiteboul et al., 2005).
The database research community has responded with an abundance of ideas, prototypes and architectures to address the new issues involved in data stream management systems (DSMS). STREAM is Stanford University’s approach for a general-purpose DSMS (Arasu et. al., 2003); Telegraph and TelegraphCQ (Madden & Franklin, 2002: Chandrasekaran et. al., 2003) are prototypes focused on handling measurements of sensor networks, developed in Berkeley; Aurora is a joint project between Brandeis University, Brown University and MIT (Carney et. al., 2002) targeted towards stream monitoring applications; AT&T’s Hancock (Cortes et. al., 2000) and Gigascope (Cranor et. al., 2003) projects are special-purpose data stream systems for network management; Tribeca (Sullivan, 1996) and NiagaraCQ (Chen, et. al., 2000) are other well-known projects from Telcordia and University of Wisconsin respectively. The objective of all these projects is to develop systems that can support the challenging analysis requirements of streaming applications.
Key Terms in this Chapter
Data Streams: Data items that arrive online from multiple sources in a continuous, rapid, time-varying, possibly unpredictable fashion.
Measurement Data Streams: Data streams representing successive state information of one or more entities, such as sensor, climate or network measurements.
Continuous Queries: The answer to a continuous query is produced over time, reflecting the stream data seen so far. Answers may be stored and updated as new data arrives or may be produced as data streams themselves.
Data Stream Management Systems (DSMS): A data management system providing capabilities to query and process data streams and store a bounded part of it.
Transactional Data Streams: Data streams representing log interactions between entities, such as credit card transactions, phone calls and Web click streams.