Speech-Based Clinical Diagnostic Systems

Speech-Based Clinical Diagnostic Systems

Jesús Bernardino Alonso Hernández (University of Las Palmas de Gran Canaria, Spain) and Patricia Henríquez Rodríguez (University of Las Palmas de Gran Canaria, Spain)
DOI: 10.4018/978-1-60960-561-2.ch401
OnDemand PDF Download:
No Current Special Offers


It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.
Chapter Preview


A speech recording gives different characteristics of the speech quality of a speaker. The recorded speech signal can be represented in different domains. Each domain shows some of the speech characteristics in a preferential way. The main domains studied in speech processing are:

  • Time Domain

  • Spectral Domain

  • Cepstral Domain

  • Inverse Model Domain

Most works in digital speech signal processing are based on these domains. However, other works use new domains derived from the former ones.

In the following section the most important features of each domain are described.

Time Domain

A high quality speech signal possesses a more regular envolope than a low quality speech signal. This fact is more evident in short time intervals. The main phenomena that enable us to distinguish between high quality speech and low quality speech are:

The energy of the speech signal in a short time interval changes considerably between two consecutive intervals in low quality speech whereas in high quality speech there is a less change in energy. (Figure 1)

Figure 1.

Speech signal in time domain: the five Spanish and sustained vowels are illustrated. The upper figure is a speaker with high quality speech. The lower figure is a speaker with low quality speech.


In low quality speech unperiodicity (without perdiodicity) intervals during voiced sustained speech appear.

Complete Chapter List

Search this Book: