Advances in Computer Speech Synthesis and Implications for Assistive Technology

Advances in Computer Speech Synthesis and Implications for Assistive Technology

H. Timothy Bunnell (Alfred I. duPont Hospital for Children, USA) and Christopher A. Pennington (AgoraNet, Inc., USA)
DOI: 10.4018/978-1-61520-725-1.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The authors review developments in Computer Speech Synthesis (CSS) over the past two decades, focusing on the relative advantages as well as disadvantages of the two dominant technologies: rule-based synthesis; and data-based synthesis. Based on this discussion, they conclude that data-based synthesis is presently the best technology for use in Speech Generating Devices (SGDs) used as communication aids. They examine the benefits associated with data-based synthesis such as personal voices, greater intelligibility and improved naturalness, discuss problems that are unique to data-based synthesis systems, and highlight areas where all types of CSS need to be improved for use in assistive devices. Much of this discussion will be from the perspective of the ModelTalker project, a data-based CSS system for voice banking that provides practical, affordable personal synthetic voices for people using SGDs to communicate. The authors conclude with consideration of some emerging technologies that may prove promising in future SGDs.
Chapter Preview
Top

Background

Figure 1 illustrates the three primary stages of processing that are necessary for CSS. When starting with English text, the first stage of processing, Text normalization is required to convert the text to a sequence of words or tokens that are all “speakable” items. In the figure, this is illustrated by a simple address wherein several abbreviations and numbers must be interpreted to determine exactly what words one would use if the address was to be read aloud. First, “Dr.” must be read as “doctor” (and not “drive”). The street address 523 is usually spoken as “five twenty-three” and not “five hundred and twenty three.” The street name should probably end in “drive” (and not “doctor”), and so forth.

Figure 1.

Typical CSS processing stages. Input text passes through a normalization stage that converts all non-word input to words, then through a process that converts words to a phonetic representation. The phonetic representation is then converted to sound.

Complete Chapter List

Search this Book:
Reset