Deep Learning-Based Voice Pathology Detection From Electroglottography

Deep Learning-Based Voice Pathology Detection From Electroglottography

S. Revathi, K. Mohana Sundaram
Copyright: © 2024 |Pages: 22
DOI: 10.4018/979-8-3693-2238-3.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The detection of voice pathology is a critical field in the domain of speech and healthcare, with early and accurate diagnosis being pivotal for effective treatment. Electroglottography (EGG) has been emerged as a promising tool for understanding the functioning of the vocal folds, offering valuable insights into voice disorders. This chapter highlights the current state of research in voice pathology detection using deep networks applied to EGG signals and examines various studies and methodologies in this area, emphasizing data collection and pre-processing techniques, the design of CNN architectures, training strategies, and performance evaluation metrics. Additionally, the chapter discusses the potential for further advancements, challenges, and opportunities in the field, emphasizing the importance of standardized datasets and the integration of CNN-based voice pathology detection models into clinical practice.
Chapter Preview
Top

1. Introduction

The human voice is a remarkable and versatile instrument, enabling communication and expression. It carries not only our spoken words but also conveys our emotions, intentions, and identity. Whether in our personal lives or professional endeavors, the quality of our voice plays an integral role in how we express ourselves and interact with the world. Therefore, any disruption in the harmony of our vocal instrument can have a profound impact on our daily lives. Speech Association says that speech disorder is the study that incorporates the disorders in voice and is a domain of critical importance within the field of healthcare. It encompasses a diverse range of conditions, each affecting an individual's ability to produce and control their voice. These conditions may manifest as alterations in voice quality, pitch, loudness, or endurance, often resulting in a voice that is incongruent with the age, gender, ethnic background, or topographical location. This alteration condition may lead to temporary vocal issues and in some conditions, it may lead to complex chronic conditions also (American Speech-Language-Hearing Association, 2016).

This voice pathology can be overstated if it is left unaddressed and may lead to significant personal, social, and professional consequences. In educational settings, children with untreated voice disorders may struggle academically and face social isolation due to their communication difficulties. In the professional sphere, individuals with voice disorders may find it challenging to excel in vocally demanding careers, especially in the teaching field, which will affect their job performance and prospects (Elena Nerriere et al., 2009). Furthermore, the emotional toll of living with a voice disorder can lead to anxiety, depression, and a diminished quality of life.

For these reasons, timely and accurate diagnosis of voice pathology is imperative. While subjective evaluation by trained clinicians remains an essential component of the diagnostic process, recent years have witnessed a surge in the development and implementation of objective, quantitative methodologies to supplement clinical judgment. Among these, electroglottography (EGG), a technique that records vocal fold vibrations, has gained recognition for its ability to provide insights into vocal function (Atika A. Salih et al., 2015). Traditionally, the assessment of voice disorders has relied on perceptual evaluation by trained clinicians, including speech-language pathologists and laryngologists. While this subjective evaluation remains valuable, it can be influenced by individual variations, experience, and training, limiting its objectivity and accuracy. To address these limitations and enhance diagnostic precision, the field of voice pathology has increasingly turned to technology, with electroglottography (EGG) emerging as a powerful and innovative tool for voice pathology assessment. EGG is acoustic analysis, which measures parameters like fundamental frequency and jitter, provides objective data and fully captures the underlying physiological factors of voice disorders (Patel Shaheen et al., 2018).

This chapter delves into the utilization of EGG signals and standard voice databases for the early detection of diseases and its integration the advanced technologies, such as Deep Neural Networks with EGG data, to improve diagnostic accuracy. By delving into this evolving field, we seek to highlight the significance of early detection and the potential transformative role that technology can play in the assessment and management of speech disorders. Ultimately, this contributes to the growth of knowledge that bridges the realms of healthcare with technology, which offers new perspectives and opportunities for the future of voice pathology diagnosis and treatment.

Complete Chapter List

Search this Book:
Reset