3D Real-Time Reconstruction Approach for Multimedia Sensor Networks

3D Real-Time Reconstruction Approach for Multimedia Sensor Networks

Ahmed Mostefaoui (University of Franche-Comte (LIFC), France) and Benoit Piranda (LASELDI, France)
DOI: 10.4018/978-1-4666-1577-9.ch018
OnDemand PDF Download:
No Current Special Offers


Multimedia sensor networks have emerged due to the tremendous technological advances in multimedia hardware miniaturization and the application potential they present. However, the time sensitive nature of multimedia data makes them very problematic to handle, especially within constrained environments. In this paper, the authors present a novel approach based on continuous 3D real time reconstruction of the monitored area dedicated for video surveillance applications. Real-time 3D reconstruction allows an important network bandwidth reduction in context to sensor nodes sending descriptive information to the fusion server instead heavy video streams. Each node has to support additional processing in order to extract this descriptive information in real-time, which results in video sensors capturing tasks, data analysis, and extraction of features needed for 3D reconstruction. In this paper, the authors focus on the design and implementation of such sensor node and validate their approach through real experimentations conducted on a real video sensor.
Chapter Preview


Nowadays, Wireless Sensor Networks (WSN) have emerged as one of the most promising technologies (Akyildiz, Su, Sankarasubramniam, & Cayirci, 2002) by reason of the huge potential they represent in several real-world applications, ranging from health care applications to military (surveillance) applications. In fact, recent advances mainly in miniaturised hardware components, radio frequency communication technologies and low power embedded computing devises have made large scale networks of small devises a reality. In such networks, each node is able to collect information from physical environment (such as temperature, humidity, etc.), to perform simple processing tasks on the collected data (averaging for instance) and finally to transmit it, through multi-hop radio communications, to a remote base station (sink) for data fusion. Nevertheless, managing sensor networks poses a number of new challenging research issues especially because of their restricted capabilities (computing, storage, etc.) on one hand and their limited lifetime, as nodes are driven by batteries on the other hand. In addition, nodes are usually subject to frequent failures. Even though these limitations, wireless sensor networks are nowadays present in several applications that are delay tolerant and have low bandwidth requirements.

More recently, the availability of new miniaturized multimedia hardware (CMOS cameras and microphones for instance) at reasonable prices has encouraged the emergence of Wireless Multimedia Sensor Networks (WMSN (Akyildiz, Melodia, & Chowdhury, 2007)) for a large number of real applications (home automation, assistance to senior people, etc.). What differs WSMN from classical sensor network is their ability to collect multimedia content (video and audio streams, still images, etc.) in addition to scalar data.

It is obvious that the management of multimedia data (storing, retrieving, processing in real-time, fusing and correlating, etc.) within limited environments as those of sensor networks poses new issues at all levels of the applications because of the huge volume of the produced data (the size of videos or images is far away much more important than the one of scalar data) on one hand and on the other hand of the “continuous constraint”, also called real-time constraint, related to multimedia data i.e., their delivery is time sensitive. It is obvious that the management and the delivery of multimedia data require much more resources in terms of processing power as well as network resources, mainly bandwidth. Furthermore, the process of multimedia data correlation and fusion differs fundamentally from that of scalar data in the sense that the latter is primarily concerned with average computing while the first is much more complex: what does the fusion of two images or two videos mean? To the best of our knowledge, no general approach is actually available but only some specific approaches tailored for specific contexts as the one reported in Zhang (2007).

In this paper, we develop a new approach to handle the huge and voluminous multimedia sensor data generated in a video surveillance context. The key idea behind our proposition is to “continuously” construct a 3D representation of the monitored area, in which video streams originating from the video sensors are fused. In other words, the “views” of the sensor nodes are merged in the 3D scene of the monitored region. This approach presents many interesting advantages, in particular for resources limited environments like those of sensor networks.

The first important advantage of using 3D representation is its flexibility in the sense that it allows a comprehensive representation of the observed scene. In fact, it is commonly known that exploiting directly raw video data is very hard or even not possible if it is not pre-processed or annotated by experts/annotators (Mostefaoui, 2006). Moreover, it is not obvious to get comprehensive information from tens or hundreds of simultaneously delivered streams, unless focusing only on one or two streams at once. On the opposite, the 3D scene can deliver more comprehensive information to the observer, in particular if the concern is the surveillance purposes. Also, it is easier to track “objects” in 3D environment than on tens or hundreds of monitors!

Complete Chapter List

Search this Book: