A Framework for Data Warehousing and Mining in Sensor Stream Application Domains

A Framework for Data Warehousing and Mining in Sensor Stream Application Domains

Nan Jiang
DOI: 10.4018/978-1-60566-816-1.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The recent advances in sensor technologies have made these small, tiny devices much cheaper and convenient to use in many different applications, for example, the weather and environmental monitoring applications, the hospital and factory operation sites, sensor devices on the traffic road and moving vehicles and so on. The data collected from sensors forms a sensor stream and is transferred to the server to perform data warehousing and mining tasks for the end user to perform data analysis. Several data preprocessing steps are necessary to enrich the data with domain information for the data warehousing and mining tasks in the sensor stream applications. This chapter presents a general framework for domain-driven mining of sensor stream applications. The proposed framework is able to enrich sensor streams with additional domain information that meets the application requirements. Experimental studies of the proposed framework are performed on real data for two applications: a traffic management and an environmental monitoring site.
Chapter Preview
Top

Introduction

Networks of thousands of sensors present a feasible and economic solution to some of our most challenging problems, such as real-time traffic modeling, weather and environmental monitoring, and military sensing and tracking. Recent advances in sensor technology have made possible the development of relatively low cost and low-energy-consumption micro sensors, which can be integrated in a wireless sensor network. These devices - Wireless Integrated Network Sensors (WINS) - will enable fundamental changes in applications spanning the home, office, clinic, factory, vehicle, metropolitan area, and the global environment.

Concerning the needs of the user for knowledge discovery from sensor streams in these application domains, new data warehousing, data mining techniques have to be developed to extract meaningful, useful and understandable patterns for the end users to perform data analysis. Many research projects have been conducted by different organizations regarding wireless sensor networks; however, few of them discuss the sensor stream processing infrastructure, and the data warehousing and data mining issues need to be addressed in the sensor network application domains. There is a need for new methodologies in order to extract interesting patterns in a sensor stream application domain. Since the semantics of sensor stream data is application dependent, the extraction of interesting, novel, and useful patterns from stream data applications becomes domain dependent.

Some data warehousing and data mining methods have been recently proposed to mine stream data, for example in (Manku 2002, Chang 2003, Li 2004, Yang 2004, Yu 2004, Dang 2007), the authors proposed algorithms to find frequent patterns over the entire history of data streams. In (Giannella 2003, Chang 2004, Lin 2005, Koh 2006, Mozafari 2008), the authors use different sliding window models to find recently frequent patterns in data streams. These algorithms focus on mining frequent patterns with one scan over the entire data stream.

In (Chi, 2004), Chi et al considers the problem of mining closed frequent itemsets over a data stream sliding window in the Moment algorithm, and in (Li, 2006), the authors proposed the NewMoment algorithm which uses a bit-sequence representation of items to reduce the time and memory needed. The CFI-Stream algorithm in (Jiang, 2006) directly computes the closed itemses online and incrementally without the help of any support information. In (Li, 2008), Li et al proposed to improve the CFI-stream algorithm with bitmap coding named CLIMB (Closed Itemset Mining with Bitmap) over data stream’s sliding window to reduce the memory cost.

Besides pattern mining in data stream applications, as the number of data streaming applications grows, there is also an increasing need to perform association mining in data streams. One example application is to estimate missing data in sensor networks (Halatchev, 2005). Another example application is to predict frequency of Internet packet streams (Demaine, 2002). In the MAIDS project (Cai, 2004), an association mining technique is used to find alarming incidents from data streams. Association mining can also be applied to monitor manufacturing flows (Kargupta, 2004) to predict failures or generate reports based on accumulated web log streams. In (Yang, 2004), (Halatchev, 2005), and (Shin, 2007), the authors proposed using two, three, and multiple frequent pattern based methods to perform association rule mining.

In general, these approaches have focused on mining patterns and associations in data streams, without considering an application domain. As a consequence, these methods tend to discover general patterns, which for specific applications can be useless and uninteresting. Stream patterns are usually extracted based on the concept of pattern frequency. With no semantic or domain information, the discovered patterns cannot be applied directly to a specific domain.

In this book chapter, we present a data warehousing and mining framework where the users give to the data the semantics that is relevant for the application, and therefore the discovered patterns will refer to a specific domain. We will also discuss the issues needed to be considered in the data warehousing and mining components of this framework for sensor stream applications.

Complete Chapter List

Search this Book:
Reset