Designing Usable Speech Input for Virtual Environments

Designing Usable Speech Input for Virtual Environments

Alex Stedmon (University of Nottingham, UK)
DOI: 10.4018/978-1-61350-516-8.ch008
OnDemand PDF Download:
No Current Special Offers


Speech is the primary mode of communication between humans, and something most people are able to use on a daily basis in order to interact with other people (Stedmon & Baber, 1999). For over 70 years the potential to interact with machines using speech input has been possible (Ullman, 1987), however it still remains an elusive concept without widespread use or acceptance.
Chapter Preview


With so many technical advances, especially with the development of computing and distributed interaction technologies, the question remains: why has speech input not matured into a more usable state with far-reaching applications? There are a number of reasons for this:

  • Speech recognition technology is still trying to grasp the subtleties of human speech processing, recognition and interaction;

  • The uptake of speech input has been slower than might have been expected due to unrealistic user expectations (and marketing promises) about how speech input can be used and what it can achieve;

  • Whilst applications are evolving that use speech input, more tradition input devices are still commonplace.

Input devices are the medium through which users interact with a computer interface and, more specifically in the context of this chapter, a virtual environment (VE) (Stedmon, et al, 2003a). Currently, there is an increasing variety of input devices on the market that have been designed for virtual reality (VR) use, such as tradition keyboard, mouse and joystick devices, wands, data-gloves, speech input. With such a variety, there is a danger of users selecting an inappropriate input device which could compromise the overall effectiveness of a VR application and undermine the user’s experience and satisfaction.

This chapter discusses the importance of speech input focussing on a number of key areas:

  • Speech as an input modality for VR applications

  • Human factors issues of speech input

  • Incorporating speech in the development of VR applications

  • Developing a guidance framework for speech input

  • Guidelines for speech input


Speech As An Input Modality For Vr Applications

Speech is the most natural form of human communication; it is our primary medium of communication for human-human interaction (HHI), which most of us are able to employ in our daily lives from an early age. It is still not fully understood how we learn the subtle rules of syntax and grammar and this is perhaps why it is so difficult to develop such a framework artificially for speech recognition and speech input purposes. What is clear, however, is that speech is a “familiar, convenient, [and] spontaneous part of the capabilities the human brings to the situation of interacting with machines” (Lea, 1980, p.4).

Speech is also the “human’s highest-capacity output communication channel” that offers immense potential for human-to-computer communication (Lea, 1980, p.6). Speech also has other inherent advantages over other more conventional interaction modes. Whereas untrained users may find reading, writing, keyboard skills or manual input difficult without prior learning or practise, using speech input (if designed and implemented correctly) can be an intuitive medium for human-machine interaction (HMI).

One of the benefits of speech input is that it can be exploited in situations where other input devices might not be as successful (for example, in the dark or around objects or obstacles). As a medium of communication (sound) and through the medium in which it transfers (air), speech travels omni-directionally without light, in a way that conventional writing, typing and button pressing are unable to do. As Lea (1980, p.8) states, in “using switches, typewriters, cathode ray tube displays, and even the more unusual graphical input devices … and joysticks, the user must either be in physical contact with the computer console or terminal or must be orientated in a fixed direction to produce input commands and monitor computer outputs”. Speech input therefore presents a basis for remote Human-Machine Interaction (r-HMI) procedures whilst also supporting multi-modal user tasks, through distributed inputs and actions.

Complete Chapter List

Search this Book: