Digital Speech Technology: An Overview

Digital Speech Technology: An Overview

H. S. Venkatagiri
DOI: 10.4018/978-1-61520-725-1.ch003
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

Speech generating devices (SGDs) – both dedicated devices as well as general purpose computers with suitable hardware and software – are important to children and adults who might otherwise not be able to communicate adequately through speech. These devices generate speech in one of two ways: they play back speech that was recorded previously (digitized speech) or synthesize speech from text (text-to-speech or TTS synthesis). This chapter places digitized and synthesized speech within the broader domain of digital speech technology. The technical requirements for digitized and synthesized speech are discussed along with recent advances in improving the accuracy, intelligibility, and naturalness of synthesized speech. The factors to consider in selecting digitized and synthesized speech for augmenting expressive communication abilities in people with disabilities are also discussed. Finally, the research needs in synthesized speech are identified.
Chapter Preview
Top

Many Facets Of Speech Technology

Digital speech technology – the technology that makes talking machines possible – is a burgeoning field with many interrelated applications as shown in Figure 1. The overlapping circles indicate that all these diverse applications share a common knowledge base although each application also requires a set of solutions unique to it. Speech coding, which is an essential part of every digital speech application, is the process of generating a compressed (compact) digital representation of speech for the purposes of storage and transmission (Spanias, 1994). The familiar MP3 (MPEG-1 Audio Layer 3; Brandenburg, 1999) is an efficient coding technique for music; some coding techniques used in machine-generated speech will be discussed later in this chapter. Acoustic speech analysis, analyzing and graphically displaying the frequency, intensity, and durational parameters of speech (Kent & Read, 1992), has provided the foundational data that are necessary for implementing TTS synthesis, especially a type of synthesis known as the synthesis by rule or formant synthesis (Rigoll, 1987). Formant synthesis is discussed in a later section.

Figure 1.

The world of speech technology

978-1-61520-725-1.ch003.f01

Complete Chapter List

Search this Book:
Reset