A Usability Framework for the Design and Evaluation of Multimodal Interaction: Application to a Multimodal Mobile Phone

A Usability Framework for the Design and Evaluation of Multimodal Interaction: Application to a Multimodal Mobile Phone

Jaeseung Chang (Handmade Mobile Entertainment Ltd., UK) and Marie-Luce Bourguet (University of London, UK)
DOI: 10.4018/978-1-60566-978-6.ch008
OnDemand PDF Download:
No Current Special Offers


Currently, a lack of reliable methodologies for the design and evaluation of usable multimodal interfaces makes developing multimodal interaction systems a big challenge. In this paper, we present a usability framework to support the design and evaluation of multimodal interaction systems. First, elementary multimodal commands are elicited using traditional usability techniques. Next, based on the CARE (Complementarity, Assignment, Redundancy, and Equivalence) properties and the FSM (Finite State Machine) formalism, the original set of elementary commands is expanded to form a comprehensive set of multimodal commands. Finally, this new set of multimodal commands is evaluated in two ways: user-testing and error-robustness evaluation. This usability framework acts as a structured and general methodology both for the design and for the evaluation of multimodal interaction. We have implemented software tools and applied this methodology to the design of a multimodal mobile phone to illustrate the use and potential of the proposed framework.
Chapter Preview


Multimodal interfaces, characterized by multiple parallel recognition-based input modes such as speech and hand gestures, have been of research interest for some years. A common claim is that they can provide greater usability than more traditional user interfaces. For example, they have the potential to be more intuitive and easily learnable because they implement interaction means that are close to the ones used in everyday human-human communication. When users are given the freedom of using the modalities of interaction of their choice, multimodal systems can also be more flexible and efficient. In particular, mobile devices, which generally suffer from usability problems due to their small size and typical usage in adverse and changing environments, can greatly benefit from multimodal interfaces. Moreover, the emergence of novel pervasive computing applications, which combine active interaction modes with passive modality channels based on perception, context, environment and ambience (e.g. Salber, 2000; Feki et al., 2004), raises new possibilities for the development of effective multimodal mobile devices. For example, context-aware systems can sense and incorporate data about lightning, noise level, location, time, people other than the user, as well as many other pieces of information to adjust their model of the user’s environment. More robust interaction is then obtained by fusing explicit user inputs (the active modes) and implicit contextual information (the passive modes). In affective computing, sensors that can capture data about the user’s physical state or behaviour, are used to gather cues which can help the system perceive users’ emotions (Kapoor & Picard, 2005).

However, our lack of understanding of how recognition-based technologies can be best used and combined in the user interface often leads to interface designs with poor usability and added complexity. For designers and developers in the industry, developing multimodal interaction systems presents a number of challenges, such as how to choose optimal combinations of modalities, how to deal with uncertainty and error-prone human natural behaviour, how to integrate and interpret combinations of modalities, and how to evaluate sets of multimodal commands. These challenges are the prime motivation for our research. In particular, we aim to develop methodologies to help developers efficiently produce multimodal systems, while providing a greater user experience.

Existing related work includes the CrossWeaver platform (Sinha & Landay, 2003), a visual prototyping tool, which allows non-programmer designers to sketch multimodal storyboards that can then be executed for quickly testing the interaction. CrossWeaver offers a practical and simple solution to building multimodal prototypes. However, design representations are let in an informal, sketched form, and analysis of the design and user test results is itself an informal process.

Flippo et al. (2003) propose an object oriented framework that enables existing applications to be equipped with a multimodal interface with relatively little effort. The framework equips existing application code with multimodal functionalities (such as modality fusion, multimodal dialog management, and ambiguity resolution), which rely on the declaration of command frames. The command frames embody the interaction design, but very little detail is provided about the process of declaring these frames and any support available for making design decisions and usability evaluation.

ICARE (Interaction CARE) (Bouchet et al., 2004) is a component-based approach to multimodal interfaces development. It defines two types of software components: elementary components for the development of pure modalities, and composition components to compose modality combinations. ICARE, however, assumes that the design of different modality combinations has been thought through prior to their implementation, and does not offer support for making design decisions.

Finally, FAME (Duarte & Carrico, 2006) is a model-based architecture for adaptive multimodal applications, together with a set of guidelines for assisting the development process. Here, the emphasis is on the introduction of adaptive capabilities in the multimodal interface to accommodate diverse users and changes in environmental conditions.

Complete Chapter List

Search this Book: