Building Personalized Synthetic Voices for Individuals with Dysarthria using the HTS Toolkit

Building Personalized Synthetic Voices for Individuals with Dysarthria using the HTS Toolkit

Sarah Creer (University of Sheffield, UK), Phil Green (University of Sheffield, UK), Stuart Cunningham (University of Sheffield, UK) and Junichi Yamagishi (Centre for Speech Technology Research (CSTR), University of Edinburgh, UK)
DOI: 10.4018/978-1-61520-725-1.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

For an individual with a speech impairment, it can be necessary for them to use a device to produce synthesized speech to assist their communication. To fully support all functions of human speech communication: communication of information, maintenance of social relationships and displaying identity, the voice must be intelligible and natural-sounding. Ideally, it must also be capable of conveying the speaker’s vocal identity. A new approach based on Hidden Markov models (HMMs) has been proposed as a way of capturing sufficient information about an individual’s speech to enable a personalized speech synthesizer to be developed. This approach adapts a statistical model of speech towards the vocal characteristics of an individual. This chapter describes this approach and how it can be implemented using the HTS toolkit. Results are reported from a study that built personalized synthetic voices for two individuals with dysarthria. An evaluation of the voices by the participants themselves suggests that this technique shows promise for building personalized voices for individuals with progressive dysarthria even when their speech has begun to deteriorate.
Chapter Preview
Top

Introduction

Adult speech impairment can be congenital, caused by conditions such as cerebral palsy or acquired through conditions such as motor neurone disease (MND), stroke or traumatic head injury. In some acquired conditions, such as MND, diminishing neurological function contributes to a progressive loss of speech ability. Such neurologically-based motor speech impairments are known as dysarthria and are characterized by impaired movement of the articulators and control of respiration (Duffy, 2005). In the case of acquired conditions such as MND and Parkinson’s disease (PD), the progressive loss of speech motor control results in increasingly severe impairment.

Synthesized voices currently available on communication aids are highly intelligible and can approach human-like naturalness, but there are limited opportunities to personalize the output to more closely match the speech of an individual user. However, recent advances in technology offer the prospect of using probabilistic models of speech to generate high quality personalized synthetic speech with minimal input requirements from a participant speaker.

The aim of this chapter is to describe the need for the personalization of speech synthesis for use with communication aids; to set out currently available techniques for personalization and their limitations for people with speech disorders; to assess whether personalized voices can be built successfully with probabilistic models for individuals whose speech has begun to deteriorate and finally to implement this technique for those individuals and allow them to evaluate the personalized synthetic voices.

Complete Chapter List

Search this Book:
Reset