Automatic Determination of Pauses in Speech for Classification of Stuttering Disorder

Automatic Determination of Pauses in Speech for Classification of Stuttering Disorder

João Paulo Teixeira (Polytechnic Institute of Bragança, Portugal), Maria Goreti Fernandes (Polytechnic Institute of Bragança, Portugal) and Rita Alexandra Costa (Polytechnic Institute of Bragança, Portugal)
DOI: 10.4018/978-1-5225-1724-5.ch008
OnDemand PDF Download:
List Price: $37.50


An algorithm to automatically identify segments of silence or speech is presented. The algorithm was developed to measure the silence periods in spontaneous and read speech. These silence periods are one of the parameters used to know the degree of severity of stuttered speech. For this purpose the three longer disfluent events (pauses or other disfluent events) and also the percentage of silence are useful. The algorithm is based on the evaluation of the energy and the zero crossing rate of the signal compared to the threshold values previously determined in silence. One experiment with eight subjects is described using the Stuttering Severity Instrument for Children and Adults – SSI and the percentage of silence in speech. It was concluded that the percentage of silence is good enough to separate stuttered from the normal speech but alone is not capable of measuring the degree of severity of the stuttered speech.
Chapter Preview


Speech is one of the most fundamental and complex cognitive human acts. The normal speech is the final product of a complex network of linguistic, cognitive and sensorimotor processes. Its production requires the coordinated activation of distinct muscle systems and the vocal tract (Juste et al., 2012; McClean & Tasko, 2004).

Considering the analysis of the speech signal, there are three different states of the speech: silence, unvoiced speech and voiced speech. In the silence state, no speech is produced and the muscles within the vocal folds are relaxed. In the unvoiced state, the folds are closer together and tenser than in the silence state, allowing a turbulence to be generated at the folds themselves. In the voiced state, active and passive contractions of the chest and abdominal wall generate a subglottic pressure that exceeds the closure force of the adducted vocal folds. The transglottic air pressure differential produces an airflow that is modulated by the vocal folds to produce a time-varying longitudinal air pressure wave. This pressure wave is changed by the vocal tract to create the sound we hear as the normal human voice (Plant & Younger, 2000).

For a given idiom, there are a set of phonemes that characterize the language. These phonemes can be divided into vowels and consonants. The vowels group contains the oral, nasal and semi-vowels or glides. The consonants are divided in plosive, liquid, fricatives and vibrant. The plosive vowels are composed by the occlusive part (almost or completely occlusion of sound) and by the plosive part generally followed by one vowel but sometimes followed by other consonant. Anyhow, each of these consonants can be voiced or unvoiced. The voiced sounds are produced with the vibration of the vocal cord and the unvoiced sounds are produced without vibration of the vocal cords and with the glottis open. The voiced sounds generally have low frequency energy and in opposition unvoiced sounds has higher frequency energy. The frequency of vibration of the vocal cords is known as the fundamental frequency (F0), which is controlled by the states of tension and length of the vocal cords. Greater tension and length correspond to higher frequency tones, (Seeley, Stephens & Tate, 2006).

Speech disorders are human disabilities that affect millions of people worldwide and are usually treated with behavioral therapy (Barnes et al., 2016). It is estimated that 40 million Americans have a communication disorder (Ancelle, 2015). The study and evaluation of human speech disorders may lead to a wider array of treatment options and provide key insights into the genetic and neural underpinnings of human speech. Developmental stuttering is the principal disorder of fluency (Ancelle, 2015). This speech disorder is characterized by frequent occurrences of repetitions or prolongations of syllables, words, and sounds, as well as involuntary hesitations or pauses that disrupt the rhythmic flow of speech (Wieland et al., 2015). In addition to the changes in the rhythm of speech, the stuttering is commonly accompanied by body movements, such as tremors, spasms of oro-facial and laryngeal muscles, and also abnormal involuntary movements (ticks) (Mulligan et al., 2003; Riva-Posse et al., 2008; Rogić et al., 2016). Stuttering onset usually begins between the ages of two and five years, when children begin to form simple sentences (Vanhoutte et al., 2016). Recent studies have indicated that the incidence of stuttering is approximately 5%, however the majority of affected children (about 80%) recovers during the puberty (Wieland et al., 2015; Rogić et al., 2016; Ancelle, 2015). It is estimated that 1-2% of world adult population continues to suffer from severe stuttering (often called Persistent Developmental Stuttering) (Prado-Velasco & Fernández-Perunchena, 2011).

Complete Chapter List

Search this Book: