Speech-Based UI Design for the Automobile
Bent Schmidt-Nielsen (Mitsubishi Electric Research Labs, USA), Bret Harsham (Mitsubishi Electric Research Labs, USA), Bhiksha Raj (Mitsubishi Electric Research Labs, USA) and Clifton Forlines (Mitsubishi Electric Research Labs, USA)
Copyright: © 2008
In this chapter we discuss a variety of topics relating to speech-based user interfaces for use in an automotive environment. We begin by presenting a number of design principles for the design of such interfaces, derived from several decades of combined experience in the development and evaluation of spoken user interfaces (UI) for automobiles, along with three case studies of current automotive navigation interfaces. Finally, we present a new model for speech-based user interfaces in automotive environments that recasts the goal of the UI from supporting the navigation among and selection from multiple states to that of selecting the desired command from a short list. We also present experimental evidence that UIs based on this approach can impose significantly lower cognitive load on a driver than conventional UIs.
Key Terms in this Chapter
Driver Distraction: A measure of the degree to which attention is taken away from the driving task.
Push and Hold: A type of speech interaction where the user must hold down a button while speaking to the system. This kind of system is familiar to most users as it is reminiscent of a walkie-talkie.
Misrecognition: A speech recognition result which does not accurately represent what was spoken by the user. In spoken command recognition, recognizing the exact words spoken is not necessary to avoid a misrecognition as long as the correct command is recognized.
Cognitive Load: A measure of the mental effort required to carry out a given task.
Telematics: Broadly, telematics refers to the combination of telecommunication and computation. More specifically telematics has come to refer to mobile systems which combine wireless data communications with local computation resources. Voice communication and/or location information provided by GPS are often assumed.
Recognition lattice: A directed graph of candidate words considered by a speech recognizer. This graph will often contain alternate words with similar phonetics. It will also contain confidence weights.
Speech-Based [User] Interface (SUI): A user interface which uses utterances spoken by the user as a primary input mode. A speech based interface may also have other input modes, such as dedicated or softkey input, and may also have voice feedback and/or visual feedback.
Listening Tone: A sound generated by a speech-based user interface when it is ready to accept spoken input
SILO: A speech-based user interface which returns a shortlist of possible responses, from which the user must make a final selection. We refer to such interfaces as Speech-In List-Out, or SILO.
Lombard Effect: The specific changes in style of speech caused by the presence of noise. In particular the speech gets louder and higher frequencies are emphasized
Push and Release: A type of speech interaction where the user must depress a button prior to the start of speech. This type of interaction is unfamiliar to some users, but provides an easy learning curve with the proper affordances.