Analyzing Multimodal Interaction

Fernando Ferri; Stefano Paolozzi

doi:10.4018/978-1-60566-386-9.ch002

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Analyzing Multimodal Interaction

Fernando Ferri, Stefano Paolozzi

Source Title: Multimodal Human Computer Interaction and Pervasive Services

DOI: 10.4018/978-1-60566-386-9.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Human-to-human conversation remains such a significant part of our working activities because of its naturalness. Multimodal interaction systems combine visual information with voice, gestures, and other modalities to provide flexible and powerful dialogue approaches. The use of integrated multiple input modes enables users to benefit from the natural approach used in human communication. In this paper, after introducing some definitions and concepts related to multimodal environment, we describe the different approaches used in the multimodal interaction fields showing both theoretical research and multimodal systems implementation. In particular, we will address those approaches that use the new semantic technologies as well as the ones related to the Semantic Web fields.

Chapter Preview

Top

Introduction And Background

There is a great potential for combining speech and gestures and other “modalities” to improve human-computer interaction because this kind of communication resembles more and more the natural communication humans use every day with each other.

Nowadays, there is an increasing demand for a human-centred system architecture with which humans can naturally interact so that they do no longer have to adapt to the computers, but vice versa. Therefore, it is important that the user can interact with the system in the same way as with other humans, via different modalities such as speech, sketch, gestures, etc. This kind of multimodal human-machine interaction facilitates the communication for the user of course, whereas it is quite challenging from the system’s point of view.

For example, we have to cope with spontaneous speech and gestures, bad acoustical and visual conditions, different dialects and different light conditions in a room and even ungrammatical or elliptical utterances which still have to be understood correctly by the system. Therefore, we need a multimodal interaction where missing or wrongly recognized information could be resolved by adding information from other knowledge sources.

The advantages of multimodal interaction are evident if we consider practical examples. A typical example for multimodal man machine interaction which involve speech and gestures is the ”Put That There” from Bolt (Bolt, 1980). Since that time, lots of research has been done in the area of speech recognition and dialogue management so that we are now in the position to integrate continuous speech and to have a more natural interaction. Although the technology was much worse in these times, the vision was very similar: to build an integrated multimodal architecture which fulfils the human needs. The two modalities can complement each other easily so that ambiguities can be resolved by sensor fusion. This complementarity has already been evaluated by different researchers and the results showed that users are able to work with a multimodal system in a more robust and stable way than with a unimodal one. The analysis of the input of each modality could therefore serve for mutual disambiguation. For example, gestures can easily complement to the pure speech input for anaphora resolution.

Another reason for multimodal interaction is the fact that in some cases the verbal description of a specific concept is too long or too complicated compared to the corresponding gesture (or even a sketch) and in these cases humans tend to prefer deictic gestures (or simple sketches) than spoken words. On the other hand, considering for example the interaction of speech and gesture modalities, there are some cases, where, for example deictic gestures are not used because the object in question is too small, it is too far away from the user, it belongs to a group of objects, etc.; here, also the principles of Gestalt theory have to be taken into account which determine whether somebody pointed to a single object or to a group of objects.

Moreover, there it has been also empirically demonstrated that the user performance is better in multimodal systems than in unimodal ones, as explained in several works (Oviatt, 1999; Cohen et al., 1997; Oviatt et al., 2004). Of course, it is clear that the importance to have a multimodal system than a unimodal one strictly depends on the type of action being performed by the user. For instance as mentioned by Oviatt (Oviatt, 1999), gesture-based inputs are advantageous, whenever spatial tasks have to be done. Although there are no actual spatial tasks in our case, there are some situations where the verbal description is much more difficult than a gesture and in these cases, users may prefer gestures.

As shown by several studies, speech seems to be the more important modality which is supported by gestures as in natural human-human communication (Corradini et al., 2002) This means that the spoken language guides the interpretation of the gesture; for example, the use of demonstrative pronouns indicates the possible appearance of a gesture. Therefore, several studies have been proved that speech and gestures modalities are co-expressive (Quek et al., 2002; McNeill & Duncan, 2000) which means that they present the same semantic concept, although different modalities are used. This observation can be extended also to other modalities that may interact with speech modality.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Analyzing Multimodal Interaction

Abstract

Introduction And Background

Complete Chapter List