Audiovisual resources in the form of still pictures, graphical, 3D models, audio, speech, and video play an increasing pervasive role in our lives, and there will be a growing need to manage all these multimedia objects. This is a task of increasing importance for users who need to archive, organize, and search their multimedia collections in an appropriate fashion. To cope with this situation, much effort has been put in developing standards both for multimedia data (natural and synthetic (e.g., photography, face animation), continuous and static (e.g., video, image)) and for data describing multimedia content (metadata). The aim is to describe open multimedia frameworks and achieve a reasonable and interoperable use of multimedia data in a distributed environment.
Metadata are a representation of the administrative, descriptive, preservation, usage, and technical characteristics associated with multimedia objects; they can be extracted manually or automatically from multimedia documents. This value-added information helps bridge the semantic gap, described as: “The lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation” (Smeulders, Worring, Santini, Gupta, & Jain, 2000).
Because of the high cost and subjectivity associated with human-generated metadata, a large number of research initiatives are focusing on technologies to enable automatic classification and segmentation of digital resources. Many consortia are working on a number of projects in order to define multimedia metadata standards, which are being developed in order to describe multimedia contents in many different domains and to support sharing, exchanging, and interoperability across different networks. They are distinguished in Salvetti, Pieri, & Di Bono, 2004):
Standardised description schemes that are directly related to the representation of multimedia content for a specific domain (like METS, MPEG-7).
Standardised metadata frameworks that consider the possibility of integrating more metadata standards mapped on different application domains, providing rich metadata models for media descriptions together with languages allowing one to define other description schemes for arbitrary domains (like PICS, RDF, MPEG-21).
For example, the vision of MPEG-21 is to define a multimedia framework to enable augmented and transparent use of multimedia resources across a wide range of networks and devices used by different communities. The intent is that this framework will cover the entire multimedia content delivery chain, including creation, production, delivery, personalization, presentation, and trade.
The development of metadata standards will increase the value of multimedia data, which are used by various applications. Nevertheless, there are disadvantages in current metadata representation schemes (Smith & Schirling, 2006). Some of them are cost, unreliability, subjectivity, lack of authentication, and interoperability with respect to syntax, semantics, vocabularies, and languages (Salvetti et al., 2004).
It is necessary to have a common understanding of the semantic relationships between metadata terms from different domains. Representation and semantic annotation of multimedia content have been identified as an important step toward more efficient manipulation and retrieval of multimedia. In order to achieve semantic analysis of multimedia content, ontologies are essential to express semantics in a formal machine-processable representation (Staab & Studer, 2004).
Professional groups increasingly are building metadata vocabularies (or ontologies). A number of research and standards groups are working on the development of common conceptual models (or upper ontologies) to facilitate interoperability between metadata vocabularies and the integration of information from different domains.
Key Terms in this Chapter
Multimedia Core Foundational Ontologies: They are conceptualizations that contain specifications of domain independent concepts and relations based on formal principles derived from philosophy, mathematics, linguistics, and psychology. They are used as a starting point for the construction of new ontologies or as a bridge between existing ontologies.
Multimedia Upper Ontologies: Upper level ontologies are intended for more general use and describe higher level concepts that can be refined by domain ontologies, in order to make multimedia-handling procedures more homogeneous.
Ontology: An ontology is a formal, explicit specification of a domain. It deals with what can be rationally understood, at least partially. Typically, an ontology consists of concepts, concept properties, and relationships between concepts. In a typical ontology, concepts are represented by terms.
Specific Domain Ontologies: They have been created to serve a particular domain; they consist of terms that represent concepts particular to that domain using constructs of content structure.
Multimedia Ontology: In a multimedia ontology concepts might be represented by multimedia entities (images, graphics, video, audio, segments, etc.) or terms. A multimedia ontology is a model of multimedia data, especially of sounds, still images and videos, in terms of low-level features and media structure. Multimedia ontologies enable the inclusion and exchange of multimedia content through a common understanding of the multimedia content description and semantic information.
Artificial Intelligence: AI is a branch of computer science that deals with intelligent behaviour, learning, and adaptation in machines.
Content Structure Ontologies: Ontologies that focus on the description of multimedia content structure. They should be capable of capturing the low-level descriptor information, represent several different audiovisual attributes (e.g., color, shape, texture, motion, localization, etc.) depending on the concept, and allow for basic and complex data types.
Metadata: They are “data about other data;” they are data segments that describe structural, behavioural, or functional aspects of other data segments. Multimedia are a representation of the administrative, descriptive, preservation, usage, and technical characteristics associated with multimedia objects; they can be extracted manually or automatically from multimedia documents.