Computer Synthesized Speech Technologies: Tools for Aiding Impairment

Computer Synthesized Speech Technologies: Tools for Aiding Impairment

John Mullennix (University of Pittsburgh at Johnstown, USA) and Steven Stern (University of Pittsburgh at Johnstown, USA)
Indexed In: SCOPUS
Release Date: January, 2010|Copyright: © 2010 |Pages: 342
ISBN13: 9781615207251|ISBN10: 1615207252|EISBN13: 9781615207268|DOI: 10.4018/978-1-61520-725-1

Description

While the use of technology to compensate for individual shortcomings is nothing new, there has been tremendous progress in the application of technology toward assisting individuals with disabilities, particularly with the use of computer synthesized speech (CSS) to help speech impaired people communicate using voice.

Computer Synthesized Speech Technologies: Tools for Aiding Impairment provides information to current and future practitioners that will allow them to better assist speech disabled individuals who wish to utilize CSS technology. Just as important as the practitioner's knowledge of the latest advances in speech technology, so, too, is the practitioner's understanding of how specific client needs affect the use of CSS, how cognitive factors related to comprehension of CSS affect its use, and how social factors related to perceptions of the CSS user affect their interaction with others. This cutting edge book addresses those topics pertinent to understanding the myriad of concerns involved with the implementation of CSS so that CSS technologies may continue to evolve and improve for speech impaired individuals.

Topics Covered

The many academic areas covered in this publication include, but are not limited to:

  • Aided Language Development
  • Assistive Technology
  • Augmentative and Alternative Communication
  • Computer Synthesized Speech
  • Digital Speech Technology
  • Dysarthria
  • Hidden Markov Modeling
  • Speech-generating Devices
  • Text-to-Speech Devices
  • Vox Artificialis

Reviews and Testimonials

In this book, an international panel of experts across numerous disciplines cover a variety of areas pertinent to understanding the many concerns in the implementation of CSS for practitioners working with speech disabled populations. This book serves to ground this work in current theory and research while at the same time existing as an approachable book used in the classroom or used as a reference book from one's bookshelf. Each chapter is geared toward providing information that practitioners should know, or even better, can use.

– John Mullennix, Steven Stern

This unique book addresses in some detail an area of AAC [alternative and augmentative communication] that will surely increase as technology advances.

– Dr. Adrienne B. Hancock, George Washington University

Table of Contents and List of Contributors

Search this Book:
Reset

Preface

PREFACE

As social scientists often define it, technology refers to devices and processes that extend our natural capabilities. Microscopes make it possible to see smaller things and telescopes enable us to see things that are further away. Cars extend the amount of space that we are able to travel far beyond where our feet can take us during a given period of time.

To us, this definition is most applicable and particularly pragmatic when we consider people whose natural capabilities are limited by a disability. There is nothing particularly new about using technologies to make up for individual shortcomings. Eyeglasses have been around since the thirteenth century. Carved earlike extensions that served as early hearing aids have been around since at least the sixteenth century.

With the advent of electronics and computers, as well as advancements in engineering, medicine and related fields, there has been tremendous, if not miraculous progress in the application of technology toward assisting people with disabilities. This book focuses on just one technology as applied toward one specific disability; that is, the use of computer synthesized speech (CSS) to help speech impaired people communicate using voice.

CSS is used commonly for a variety of applications, such as talking computer terminals, training devices, warning and alarm systems and information databases. Most importantly, CSS is a valuable assistive technology for people with speech impairment and visual impairments. Other technologies such as the internet are made more accessible to people with disabilities through the use of CSS.

When a person loses their voice, or is speech impaired, they are encumbered by a tremendously inconvenient disability coupled with a powerful stigma. The inability to speak is often accompanied by decreased feelings of self-worth and increased incidence of depression, feelings of isolation, and social withdrawal. The use of CSS or other assistive technologies is only one of many adaptations that a person with a serious speech impairment can make, particularly if the underlying cause (e.g., stroke, thoracic cancer) creates other difficulties for the person outside of speech problems.

OVERALL MISSION/OBJECTIVE OF THE BOOK

Our mission is to provide practitioners and future practitioners with information that will allow them to better assist the speech disabled who wish to utilize CSS technology. In this book, an international panel of experts across numerous disciplines cover a variety of areas pertinent to understanding the many concerns in the implementation of CSS for practitioners working with speech disabled populations. This book serves to ground this work in current theory and research while at the same time existing as an approachable book used in the classroom or used as a reference book from one’s bookshelf. Each chapter is geared toward providing information that practitioners should know, or even better, can use.

Throughout the book, there are a number of terms referring to various speech technologies, many of which are overlapping. Although we favor the acronym CSS to refer to computer synthesized speech, our contributors may use terms such as Speech Generating Device (SGDs), Voice Output Communication Aids (VOCAs), or may refer to more general Augmentative and Alternative Communication (AAC) aids. There are several ways in which CSS can be generated. One technique is synthesis by rule, which refers to synthetic speech that is generated via linguistic rules that are represented in a computer program. Another technique involves what is called concatenated speech, which is synthetic speech comprised of pre-recorded human phonemes (bits of speech) strung together. Both techniques can be embedded into what are called Text-to-Speech (TTS) systems, where the user inputs text through a keyboard and then an output device creates the audible speech. CSS systems should be distinguished from digitized human speech samples, where prerecorded messages are used in such applications as voice banking and telephone voice menus. In terms of assistive speaking aids, CSS is the "gold standard" because of its flexibility and its ability to be tailored to many different situations.

In preparing this book, we had five objectives. In overview:

• To provide an overview of CSS technology and its history.

• To present recent developments in CSS and novel applications of this evolving technology.

• To examine how CSS is used as a speaking aid for people with various speech impairments and how CSS is used in these cases as a speaking prosthesis.

• To better understand how social perceptions of CSS users are affected by attitudes toward CSS users, including prejudice and stereotyping.

• To provide case study examples of the issues that practitioners and users face when adopting CSS technology as a speaking aid.

Section I: CSS Technology and History

CSS systems have evolved much over time. The history of these developments is covered in the book along with an explanation of the various types of techniques used in generating CSS in typical systems used today.

Section II: Emerging Technologies The successful implementation of a CSS system is affected by the quality of the synthetic voices used. Some CSS systems are more intelligible, natural sounding, and comprehensible than others. In addition, there is evidence that listening to synthetic speech puts a greater strain on the listener in terms of their paying attention to the speech. We devoted a portion of this book to an examination of cutting-edge approaches to CSS systems that will result in improved, more user-friendly CSS. Higher-quality CSS output will help to minimize the cognitive requirements incurred by attending to synthetic speech and will facilitate comprehension of CSS output.

Section III: Specific Applications There are numerous concerns regarding the use of CSS technology for people with speech disabilities. Many concerns are rooted in the physical realities of the presenting disorder. Disorders that progress slowly permit the patient more time to learn the technology than disorders that have a sudden onset. Some disorders also leave the patient more able to use their hands than others. Several chapters are written by experts on the application of CSS with children, people with intellectual disabilities and people with articulatory disorders for which CSS may offer new avenues of treatment.

Section IV: Social Factors Those who work with CSS users can benefit from an understanding of how the combination of disability and technology affects social interactions between CSS users and other people. Two chapters discuss how attitudes toward CSS users (including stereotyping, prejudice, and discrimination) can affect how people react to CSS speech output from users with speech impairments.

Section V: Case Studies Finally, we felt that the practical value of this book would be enhanced by including case studies of people with speech impairments who are adopting CSS technology as a speaking aid. Two chapters were contributed by practitioners working directly with clients with significant speech impairments who were learning how to use CSS as an assistive speaking aid. In these chapters, the day to day issues and obstacles encountered by both the clients and the practitioners are highlighted.

Overview of Individual Chapters

To introduce the major themes of the book, John W. Mullennix and Steven E. Stern, in Overview: Important Issues for Researchers and Practitioners using Computer Synthesized Speech as an Assistive Aid, present a brief overview of the current research topics and future directions of research in the area encompassing CSS as used in augmentative and alternative communication for people with speech impairments. Issues that are especially important for practitioners who work with people with speech impairments are also discussed. This overview presents an integrated vision of research where practitioners need to be appraised of the latest research and technological developments and where researchers need to solicit feedback from practitioners in order to pursue fruitful future directions for research.

The first section of this book is comprised of two chapters that provide the reader with an overview of the past and present of the technology behind computer synthesized speech. In Debbie Rowe’s chapter, From Wood to Bits to Silicon Chips: A History of Developments in Computer Synthesized Speech, the development of computer synthesized speech technology over time is delineated. Beginning with early synthesis machines from the 18th century, the progression of individual and industrial advancements over time is briefly discussed. The chapter proceeds to examine modern (and more recent) developments from the business and industry sector involved in creating assistive and educational technologies using CSS. The chapter concludes with a discussion on CSS developments related to the fields of neuroprosthetics, robotics, composition and the arts, as well as how CSS has become a part of popular culture as captured through the medium of film.

In H.S. Venkatagiri’s chapter, Digital Speech Technology: An Overview, the current status of digital speech technology is reviewed. Digital speech is divided into the categories of digitized human speech and synthesized speech. A detailed review of the technological details of how speech is digitized speech is presented. Then, a detailed look at manner in which speech is synthesized is covered, with various implementations in terms of algorithms discussed. The chapter concludes with an extended discussion of the considerations that must be taken into account when deciding whether digitized speech or synthesized speech is the best choice for a person in need of an augmented expressive communication capability.

The second section moves past the current state of CSS and examines emerging technologies. Four chapters examine some of the most recent advancements in the technology and application of CSS.

In D. Jeffery Higginbotham’s chapter, he provides a look at where CSS has been and where it is going, with a description of how CSS is currently used in Signal Generating Devices (SGDs) and how speech intelligibility, sentence and discourse comprehension, social interaction, and emotion and identity factors into the use of SGDs by people with speech impairments. Of importance is the use of SGDs in social interaction, with recent developments oriented towards facilitating social interaction discussed. As well, the importance of having personalized and emotive voices is considered as part of what the future holds in order to development more functional SDGs for users who make use of these devices.

In H. Timothy Bunnell, Chris Pennington, and Debra Yarrington’s chapter, Advances in Computer Speech Synthesis and Implications for Assistive Technology, a cutting-edge concatenation-based speech synthesis system, the ModelTalker TSS system, is described. The pros and cons of rule-based speech synthesis versus concatenation-based speech synthesis are briefly discussed followed by a description of a new approach to building personalized voices for users of SGDs. Issues of intelligibility and naturalness are considered as well as the technical constraints and numerous user issues that must be considered with such a system. The ultimate goal of this work is to allow users of this technology the ability to use fully natural sounding and expressive speech to communicate with others. The work the researchers discuss in this chapter represent a significant step forward in terms of developing user-friendly computer-based speech for people with speech impairments.

Sarah Creer, Phil Green, Stuart Cunningham, and Junichi Yamagishi’s chapter, Building Personalized Synthetic Voices for Individuals with Dysarthria using the HTS Toolkit focuses on developing personalized CSS voices for people suffering from dysarthria, an articulatory disorder affecting movement of speech articulators and control of respiration. The chapter discusses various reasons for development of natural sounding synthesized voices, especially the facilitation of social interaction with others. A brief review of current voice pesonalization techniques is followed by a detailed description of a Hidden Markov Modeling (HMM) based synthesis system designed to create an acceptable synthesized voice for a dysarthric individual. A study evaluating the system is described and the results summarized in terms of the efficacy of the authors' system.

Gérard Bailly, Pierre Badin, Denis Beautemps, and Frédéric Elisei’s Speech Technologies for Augmented Communication describes an innovative approach to using artificially generated speech via hypothetical visual humanoid displays. The concept revolves around using signals originating at some point in the speech production system of the speech impaired individual. A brief overview of the speech production process and the recording of speech signals are provided. Methods of mapping of input signals to speech representations are discussed, with the emphasis on a priori knowledge to facilitate the process. Specific applications including communication enhancement, aids for the speech impaired and language training are discussed.

The third section of this book describes specific applications of CSS on different populations with specific disabilities. In particular, five chapters examine the use of CSS with children, individuals with Broca’s and global aphasias, adults with intellectual disabilities and the perception of CSS when used by people with intellectual and communicative disabilities.

Kathryn D.R. Drager and Joe Reichle, in CSS and Children: Research Results and Future Directions, review the research literature on use of CSS with children. The factors that influence the intelligibility of CSS for children are examined, including context, speech rate, age, the listener's native language, experience with CSS and background noise. Comprehension of CSS by children is also discussed. The chapter concludes with an overview of children's preferences and attitudes toward CSS and the special considerations that should be factored in to providing a means of spoken output for children who possess communicative disabilities.

Rajinder Koul, Diana Petroi, and Ralf Schlosser, in Use of Speech Generating Devices by Individuals with Aphasia: A Meta-Analysis describe the results of a large meta-analysis of studies from 1980 to 2007 evaluating the effects of augmentative and alternative communication (AAC) intervention using speech generating devices (SGDs) on several quantitative outcome measures in individuals with severe Broca's and global aphasia. The data extracted from the studies included participant characteristics, treatment characteristics, treatment integrity design, and outcomes. Each study was assessed for methodological quality. The results are valuable for interpreting the efficacy of SGDs on aphasic populations and are important in terms of future applications with aphasic individuals.

Dean Sutherland, Jeff Sigafoos, Ralf W. Schlosser, Mark F. O'Reilly, and Giulio E. Lancioni, in Are Speech-Generating Devices Viable AAC Options for Adults with Intellectual Disabilities? describe the use of speech generating devices (SGDs) with the intellectually disabled. The chapter begins with a full description and definition of intellectual disability. Various issues resulting in a reluctance to use SGDs as interventions with the intellectually disabled are considered. A large scale systematic empirical review of intervention studies that involve teaching the use of SGDs to the intellectually disabled is described. The results of the review provide valuable evidence-based information to guide clinicians who work with this particular population in terms of the suitability for using SGDs as an intervention.

Rajinder Koul and James Dembowski in Synthetic Speech Perception in Individuals with Intellectual and Communicative Disabilities review the research on perception of CSS by individuals with intellectual, language and hearing impairments. Perception by the intellectually impaired (ranging from mild to severe) is examined in terms of perception of single words, sentences, discourse and how practice with CSS affects listening performance. Perception of CSS by those with hearing impairment and specific language impairment is also covered. The chapter concludes with a discussion on the role of CSS in the acquisition and learning of graphic symbols by individuals with little to no functional speech capability.

Oscar Saz, Eduardo Lleida, Victoria Rodriguez, W.-Ricardo Rodriguez, and Carlos Vaquero, in The Use of Synthetic Speech in Language Teaching Tools: Review and a Case Study, discuss the use of CSS in the development of speech therapy tools for the improvement of communication abilities in handicapped individuals is discussed. CSS is required for providing alternative communication to users with different impairments and for reinforcing the correct oral pronunciation of words and sentences. Different techniques can be used, such as pre-recorded audio, embedded Text-to-Speech (TTS) devices, talking faces, etc. These possibilities are reviewed and the implications of their use with handicapped individuals are discussed, including the experience of the authors in the development of tools for Spanish speech therapy. Finally, a preliminary experience in the use of computer-based tools for the teaching of Spanish to young children shows how removing the synthetic speech feature in the language teaching tool produces increased difficulty for the students.

The fourth section of this book contains two chapters that focus on social psychological approaches to understanding how users of CSS are evaluated by others.

Steven E. Stern, John W. Mullennix’s, Ashley Davis Fortier and Elizabeth Steinhauser’s Stereotypes Of People With Physical Disabilities and Speech Impairments as Detected by Partially Structured Attitude Measures focuses on stereotypes that people hold toward people with speech impairment and physical disabilities. The literature on stereotypes of people with physical disabilities is examined. Two empirical studies are described that examine six specific stereotypes. Their research provides evidence that people with physical disabilities and speech impairments are stereotyped as being asexual, unappealing, dependent, entitled, isolated, and unemployable.

John W. Mullennix and Steven E. Stern, in Attitudes toward Computer Synthesized Speech examine attitudes toward users of CSS technology as an assistive aid are examined. The research literature on attitudes toward the speech disabled and users of augmented and alternative communication are briefly reviewed and then discussed within the larger context of people's reactions to speaking computers. Research on attitudes towards CSS and persuasion of CSS is examined as a function of people's prejudicial attitudes toward the disabled. The chapter concludes with a discussion about the social factors that affect listeners' perception of CSS speech that go beyond simple intelligibility of CSS.

The book concludes with two chapters on specific case studies that focus on practical issues encountered in the process of implementing CSS. In Martine Smith, Janice Murray, Stephen von Tetzchner, and Pearl Langan’s chapter, A Tale of Transitions: The Challenges of Integrating Speech Synthesis in Aided Communication, aided language development in persons with communicative disability is addressed. Aided language development refers to the fact that persons using technology aids to communicate must adapt to many changes in the technology over time. The focus of this chapter is on the issues that occur when a switch is made from a manual communication board to an electronic device. The chapter begins with a brief review of simple and complex aided communication and aided communication competence. Then, the complexity of the issues encountered during transition from one technology to another are aptly illustrated through two detailed case studies of aided communicators. Overall, the chapter provides excellent insight into the practical problems that occur in this situation and the factors that affect the adoption of high tech devices using voice output.

In Jeff Chaffee’s Tossed in the Deep End… Now What?! …, the author provides some useful strategies for the practitioner in order to help minimize the shock and stigma of adding device users to a caseload in a school, medical, or rehabilitation setting. To this end, the author provides a number of strategic rules for adapting the device to the therapy setting and a number of strategic rules for improving carryover into activities of daily living, the classroom, and other settings with caregivers and loved ones. To illustrate each of these strategies, a detailed and in-depth case of Corey, an adult AAC device user, is presented. His case illustrates many of the difficulties that are encountered during the adoption of an SGD for a client and highlights the need for clinicians and support staff to work together towards the common goal of improving communication through the use of CSS.

WHO WILL THIS BOOK BENEFIT?

This book is oriented towards educators, students, and practitioners in the areas of Psychology, Communication Disorders, Speech Pathology, Computer Science, Rehabilitation Sciences, Social Work, Gerontology, Nursing, Special Education, and any other discipline where the use of CSS is applicable. The book's primary emphasis is on providing information based on scholarly and clinical work that will assist both clinical practitioners and future practitioners in making informed decisions about applications of synthetic speech with the speech disabled. Additionally, as the book is based on scholarly research with an applied perspective, researchers across multiple disciplines will find inspiration for future research ideas. Although the book is focused on CSS and speech disorders, scholars and practitioners in the more encompassing areas of human factors, human-computer interaction, disability legislation, and product development may find that the issues addressed are applicable to other forms of computer mediated communication as well.

We also hope that this book will be adopted as a primary or supplemental text for courses at the graduate and undergraduate level. These courses potentially span a number of different disciplines including but not limited to Communication Disorders and Sciences, Rehabilitation Sciences, Health-Related Fields, and Social and Behavioral Sciences. We also expect that the book will be useful to individual faculty, researchers and scholars as a reference and research source.

CONCLUSION

When a person with a speech impairment has an opportunity to use CSS as an assistive technology, they are gaining a measure of control that would have been unheard of half a century ago. In today’s day and age, practitioners must be familiar with the latest advances in speech technology in order to properly serve their clients. However, just as important is the practitioner’s understanding of how specific client needs affect the use of CSS, how cognitive factors related to comprehension of CSS affect its use, and how social factors related to perceptions of the CSS user affect interactions with other people. Armed with this information, we can hope for improved outcomes in the future for people using CSS as a speaking aid.

Author(s)/Editor(s) Biography

John Mullennix is a Professor of Psychology at the University of Pittsburgh at Johnstown. He received a B.S. in Psychology from the University of Pittsburgh and a Ph.D. in Psychology from SUNY-Buffalo. His area of research encompasses speech perception, psycholinguistics, and speech technology. He has numerous scholarly publications in the areas of Psychology and Speech & Hearing and has received federal research funding for his work on speech perception. Currently, he is working on research projects related to earwitness testimony and the attitudes toward users of computerized speech technology.
Steven Stern is a Professor of Psychology at the University of Pittsburgh at Johnstown. He received a B.A. in Psychology from Clark University and a Ph.D. in Psychology from Temple University. He is one of a small group of psychologists who study the social psychological implications of technology. He has published several articles on how technologies affect how people view themselves and with each other. As well as examining how people react toward assistive technologies, he is currently studying how cellular telephones alter interpersonal communication and peoples’ relationships.

Indices