An Intelligent System for the Diagnosis of Voice Pathology Based on Adversarial Pathological Response (APR) Net Deep Learning Model: An Intelligent System for the Diagnosis of Voice Pathology-Based Deep Learning

An Intelligent System for the Diagnosis of Voice Pathology Based on Adversarial Pathological Response (APR) Net Deep Learning Model: An Intelligent System for the Diagnosis of Voice Pathology-Based Deep Learning

Vikas Mittal, R. K. Sharma
Copyright: © 2022 |Pages: 18
DOI: 10.4018/IJSI.312261
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The work investigates the use of two types of glottal flow derivative-based image variants of the input signal with an n-dilated (nD)-inception-layers-based deep learning model for providing optimal labels. The authors have proposed an n-dilated (nD) inception layer-based adversarial pathological response (APR) net deep learning model. This model is trained using the two image databases separately in an adversarial manner so that when a test image is common to test image is applied to both the networks. The results show a mean accuracy of 96.82%, 96.36%, and 99.35% for the Glottal inverse filtering with extended Kalman Filter-Morse scalogram (GIFEKF-MS) APRNet, Glottal inverse filtering with extended Kalman Filter-spectrogram (GIFEKF-S) APRNet, and proposed APR fusion net respectively using the VOice ICar fEDerico II (VOICED) dataset; and mean accuracies 95.67%, 93.27%, and 99.04% for the GIFEKF-MS APRNet, GIFEKF-S APRNet, and proposed APR fusion net respectively using the Saarbrucken voice database (SVD)dataset.
Article Preview
Top

1 Introduction

The speech signal is a convolution between the vocal filter and the source signal originates from the lungs. Usually, an indicative breathy phonation in the voice is prominently noticed, which in most cases have been seen with difficulty in swallowing, shortness of breath and mild cough. However, the condition can be reversed after seeking counsel and treatment from expert (Steffen et al., 2011). The pitch of a voice originates due to the periodic opening and closing of the vocal folds. The vocal fold and vocal tract shapes the sound originating from the lungs, which produces the sound that we hear which may be either normal or distorted. It plays a key role in the production of glottal source excitation, and thus any deviation from the normal stature of such signal can indicate the presence of voice pathology in the speaker. Authors of (Gómez-Vilda et al., 2009), (Drugman et al., 2009) have investigated the application of such glottal source excitation signal for detecting pathologies in a voice signal. The mucosal wave spectrum, glottal formant frequency, spectral balance, bandwidth, and average glottal source dynamics, etc. features are popularly used. Advanced deep learning models can be built which can be used with such features to give a very high classification rate. Several studies have shown voice pathology classification using conventional machine learning classifiers but recently, deep learning techniques have evolved.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing