Article Preview
TopIntroduction
In recent years, a lot of research efforts have been devoted to define and develop methods capable to represent multimedia digital content with the aim to provide effective and efficient access to multimedia information. In so doing, various types of multimedia information have been considered, such as text, images, graphics, video and audio files, and 3D objects. In general, the set of technologies used to search for various types of digital multimedia are referred to as multimedia information retrieval (Lew et al., 2006; Datta et al., 2008).
Some issues have to be considered in multimedia information retrieval, independently from the particular searched media. In particular, there is the need for effective representations capable to capture relevant information in compact descriptors and to efficiently compare them so as to provide meaningful retrieval results in large databases. In so doing, one of the main challenges is extracting semantics from the multimedia content.
Among emerging media, 3D models of natural or artificial objects, or 3D scans of indoor or outdoor environments, have recently gained an increasing relevance as means to provide realistic representations of the reality. This is made possible mainly thanks to the availability of 3D scanning devices of increasing quality available at reasonable costs. As a consequence, large repositories of 3D objects are becoming common, and methods to effectively and efficiently access such archives are now required. 3D models are also largely used in recognition applications where they have shown the capability to improve recognition performance due to the use of 3D information. The main difficulties in recognition problems are very similar to those encountered in retrieval applications: definition of compact and effective descriptions of the objects; definition of measures of similarity between the descriptions capable to discriminate between different objects while capturing similarity between objects belonging to the same category; efficient computation of the similarity in large archives; semantic analysis of the objects so as to automatically classify 3D shapes.
In many applications, 2D and 3D informations are used jointly to improve recognition results. A particular applicative scenario is that of face recognition where the use of 3D face models has been experimented only recently. In fact, there is recent evidence that the structural 3D information captured by face scans can improve face recognition results particularly in those situations where pose variations and illumination changes are concerned. Following this idea several works have appeared recently that address 3D-3D face recognition (Kakadiaris et al., 2007; Mian et al., 2007, 2008; Samir et al., 2009; Berretti et al., in press). However, 3D-3D face recognition is suited just for very particular application scenarios where cooperation between the subjects and the recognition/verification system is assumed both during enrollment of subjects into the gallery of known individuals, and for the test of the identity of new subjects (probes). In perspective, it would be interesting to acquire complete and highly defined 3D face scans during enrollment, whereas should be possible to acquire probes also in non-cooperative environments, using video-camera systems that track and capture the subject face and can compare the reconstructed 3D information against 3D gallery models. This aims to define innovative hybrid solutions to the recognition problem that exploit the complementary advantages carried out by different media to improve accuracy results.