Naturalness and flexibility of the dialogue between users and multimodal systems can produce more than one interpretation and consequently ambiguities. This chapter deals the problem to correctly recognize user input for enabling a natural interaction. In particular, it analyses approaches that have to cope with issues connected to the interpretation process, dividing them into recognition-based, decision-based, and hybrid multilevel fusion strategies, and providing descriptions of some example of these methods. Moreover, this chapter provides classifications of ambiguities classifying them at a more general level in recognition, segmentation, and target ambiguities, and dividing them in a more detailed way in lexical, syntactical, and pragmatic ambiguities. Considering these classifications, this chapter analyses how interpretation methods support the correct recognition of ambiguities. Finally, this chapter presents methods applied after the interpretation process and that integrate it for solving different class of ambiguities using the dialogue between the user and the system.
TopIntroduction
A multimodal system allows to receive, to interpret, and to process input, and to generate as output two or more interactive modalities in an integrated and coordinated way. This fact is connected to the purpose to make the system’s communication characteristics more similar to the human approach that is often multimodal and it is obtained combining different modalities.
In fact, multimodal systems combine visual information (involving images, text, sketches and so on) with voice, gestures and other modalities (see Figure 1) providing natural dialog approaches using intelligent and personalized approaches.
Figure 1. Example of modalities involved during multimodal interaction
In particular, user communication with a multimodal system involves sending and receipting of a multimodal representation (message) between the two, and this message can be modal or multimodal.
The Figure 2 underlines that during multimodal dialog, the user’s actions or commands produces a message, which has to be interpreted by the system, and vice-versa materialization produced by the system has to be interpreted by the user (Caschera et al., 2007c).
Figure 2. User-system multimodal dialog
In particular, input digital channels acquire user input’s information that is transformed by different process activities and this sequential transformation of input defines the interpretation function. Considering the output direction this information is manipulated in order to be made perceivable by the user, and this sequence of transformation defines the rendering function (Coutaz et al., 1993).
However, the naturalness and flexibility of the dialogue provided by the use of different modalities of interaction can produce more than one interpretation of the user input and consequently one ambiguity. In fact, the identification of one and only one meaning of multimodal input is a crucial aspect in order to provide a flexible and powerful dialog between the user and the system.
Therefore, the focus of this chapter is to deal ambiguities connected to the interpretation process. In this scenario, the role of interpretation methods is very important because they have the purpose to identify the meaning of the user input and it finds the most proper association to the user intention.
In multimodal systems, the combination of different modalities generates the cohabitation of ambiguities connected to the single modality in a multimodal system. Ambiguities can be due to the incorrect interpretation of one modality and moreover they can appear when each modality is correctly interpreted, but the combination of information generated by each modality is not coherent at the semantic level. In fact, information coming from each separate input modality can be correctly and univocally interpreted by the multimodal system, while the interpretation can become ambiguous considering information opportunely combined.
Therefore, during the multimodal interaction ambiguities can be produced by: the propagation at multimodal level of a modal input ambiguity; and combining un-ambiguous modal information containing contrasting concepts in multimodal dialogue.
In particular, let us suppose that the system allows the user to interact using voice and sketch input. In this case the system combines speech and sketch modalities and it can be affected both to speech and sketch ambiguities. Multimodal ambiguities can be caused by both the interpretation process of each modality and the interpretation process of the combined multimodal input.
Until now several tools have been developed to correctly interpret multimodal input combining inputs generated by different interaction modalities.
The interpretation approaches are mainly defined in order to specify the meaning of the multimodal input of the user and they consequently are correlated to the management of ambiguities that can appear during the interpretation process. In particular ambiguities are due to the possibility to interpret user input in more than one way.
Figure 3.
Modal and multimodal ambiguities