Knowledge Discovery for Sensor Network Comprehension

Knowledge Discovery for Sensor Network Comprehension

Pedro Pereira Rodrigues (LIAAD - INESC Porto L.A. & University of Porto, Portugal), João Gama (LIAAD - INESC Porto L.A. & University of Porto, Portugal) and Luís Lopes (CRACS - INESC Porto L.A. & University of Porto, Portugal)
DOI: 10.4018/978-1-60566-328-9.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter we explore different characteristics of sensor networks which define new requirements for knowledge discovery, with the common goal of extracting some kind of comprehension about sensor data and sensor networks, focusing on clustering techniques which provide useful information about sensor networks as it represents the interactions between sensors. This network comprehension ability is related with sensor data clustering and clustering of the data streams produced by the sensors. A wide range of techniques already exists to assess these interactions in centralized scenarios, but the seizable processing abilities of sensors in distributed algorithms present several benefits that shall be considered in future designs. Also, sensors produce data at high rate. Often, human experts need to inspect these data streams visually in order to decide on some corrective or proactive operations (Rodrigues & Gama, 2008). Visualization of data streams, and of data mining results, is therefore extremely relevant to sensor data management, and can enhance sensor network comprehension, and should be addressed in future works.
Chapter Preview
Top

1 Introduction

Knowledge discovery is a wide area of research where machine learning, data mining and data warehousing techniques converge to the common goal of describing and understanding the world. Nowadays applications produce infinite streams of data distributed across wide sensor networks. This ubiquitous scenario raises several obstacles to the usual knowledge discovery work flow, enforcing the need to develop new techniques, with different conceptualizations and adaptive decision making. The current setting of having a web of sensory devices, some of them enclosing processing ability, represents now a new knowledge discovery environment, possibly not completely observable, that is much less controlled by both the human user and a common centralized control process. This ubiquitous and fast-changing scenario is nowadays subject to the same interactions required by previous static and centralized applications. Hence the need to inspect how different knowledge discovery techniques adapt to ubiquitous scenarios such as wired/wireless sensor networks.

In this chapter we explore different characteristics of sensor networks which define new requirements for knowledge discovery, with the common goal of extracting some kind of comprehension about sensor data and sensor networks, focusing on clustering techniques which provide useful information about sensor networks as it represents the interactions between sensors. This network comprehension ability is related with sensor data clustering and clustering of the data streams produced by the sensors. A wide range of techniques already exists to assess these interactions in centralized scenarios, but the seizable processing abilities of sensors in distributed algorithms present several benefits that shall be considered in future designs. Also, sensors produce data at high rate. Often, human experts need to inspect these data streams visually in order to decide on some corrective or proactive operations (Rodrigues & Gama, 2008). Visualization of data streams, and of data mining results, is therefore extremely relevant to sensor data management, and can enhance sensor network comprehension, and should be addressed in future works.

1.1 Sensor Network Data Streams

Sensors are usually small, low-cost devices capable of sensing some attribute of a physical phenomenon. In terms of hardware development, the state-of-the-art is well represented by a class of multi-purpose sensor nodes called motes (Culler & Mulder, 2004). In most of the current applications sensor nodes are controlled by module-based operating systems such as TinyOS (TinyOS, 2000) and are programmed using arguably somewhat ad-hoc languages such as nesC (Gay et al., 2003). Sensor networks are composed of a variable number of sensors (depending on the application), which have several features that put them in an entirely new class when compared to other wireless networks, namely: (a) the number of nodes is potentially very large and thus scalability is a problem, (b) the individual sensors are prone to failure given the often challenging conditions they experiment in the field, (c) the network topology changes dynamically, (d) broadcast protocols are used to route messages in the network, (e) limited power, computational, and memory capacity, and (f) lack of global identifiers (Akyildiz et al., 2002).

Sensor network applications are, for the most part, data-centric in that they focus on gathering data about some attribute of a physical phenomenon. The data is usually returned in the form of streams of simple data types without any local processing. In some cases more complex data patterns or processing is possible. Data aggregation is used to solve routing problems (e.g. implosion, overlap) in data-centric networks (Akyildiz et al., 2002). In this approach, the data gathered from a neighborhood of sensor nodes is combined in a receiving node along the path to the sink. Data aggregation uses the limited processing power and memory of the sensing devices to process data online.

Sensor data is usually produced at high rate, in a stream. A data stream is an ordered sequence of instances that can be read only once or a small number of times using limited computing and storage capabilities (Gama & Rodrigues, 2007a). The data elements in the stream arrive online, being potentially unbounded in size. Once an element from a data stream has been processed it is discarded or archived. It cannot be retrieved easily unless it is explicitly stored in memory, which is small relative to the size of the data streams. These sources of data are characterized by being open-ended, flowing at high-speed, and generated by non stationary distributions.

Complete Chapter List

Search this Book:
Reset