Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Acoustic Modeling of Speech Signal using Artificial Neural Network: A Review of Techniques and Current Trends

Mousmita Sarma, Kandarpa Kumar Sarma

Source Title: Psychology and Mental Health: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-5225-0159-6.ch008

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Acoustic modeling of the sound unit is a crucial component of Automatic Speech Recognition (ASR) system. This is the process of establishing statistical representations for the feature vector sequences for a particular sound unit so that a classifier for the entire sound unit used in the ASR system can be designed. Current ASR systems use Hidden Markov Model (HMM) to deal with temporal variability and Gaussian Mixture Model (GMM) for acoustic modeling. Recently machine learning paradigms have been explored for application in speech recognition domain. In this regard, Multi Layer Perception (MLP), Recurrent Neural Network (RNN) etc. are extensively used. Artificial Neural Network (ANN)s are trained by back propagating the error derivatives and therefore have the potential to learn much better models of nonlinear data. Recently, Deep Neural Network (DNN)s with many hidden layer have been up voted by the researchers and have been accepted to be suitable for speech signal modeling. In this chapter various techniques and works on the ANN based acoustic modeling are described.

Chapter Preview

Top

Introduction

Acoustic and temporal modelings are the two major issues associated with an Automatic Speech Recognition (ASR) system. Speech is naturally dynamic in nature. The spectral and temporal variations of speech signals are due to speech production nature. Speech signal is produced by moving the articulators to different position necessary for the target sound unit. Due to the variation in the articulator’s motions, instead of producing a sequence of clean identical phonetic units, a sequence of trajectories or signature is obtained in the form of a speech signal. This makes it difficult to extract exact timing information as well as spectral information of the speech units from the speech signal. Therefore, modeling of speech signal needs to consider both these issues.

An ASR system uses acoustic models to extract information from the acoustic signal. In the pattern recognition based approach of speech recognition, basic recognition units are modeled acoustically based on some lexical description, which is essentially a mapping between acoustic measurement and phoneme. Such mappings are learned by a finite training set of utterances. The resulting speech units are called phone like unit (PLU) which is an acoustic description of that speech unit as present in the training set (Brown, 1987).

Thus handling temporal and spectral variability are the main challenges of ASR and currently the best known speech recognition technology prefers Hidden Markov Model (HMM), which provides solution to both these problems. Acoustic modeling is performed by discrete density models and temporal modeling is performed by state transitions (Xiong, 2009). HMM considers the speech signal as quasi- static for short durations and models these frames for recognition. It breaks the feature vector of the signal into a number of states and finds the probability of a signal to transit from one state to another (Rabiner & Juang 1993). Viterbi search, forward-backward and Baum-Welch algorithms are used for parameter estimation and optimization (Rabiner, 1989) (Juang & Rabiner, 1991). But in speech recognition HMM based acoustic modeling has a serious disadvantage. It suffers from quantization errors and poor parametric modeling. The standard Maximum Likelihood (ML) training criterion leads to poor discrimination between the acoustic models. Also the independence assumption makes it hard to exploit multiple input frames; and the first-order assumption makes it hard to model co- articulation and duration (Tebelskis, 1995)

Later after the introduction of Expectation Maximization (EM) algorithm (Rolf, 1974), GMM has been used for acoustic modeling. The probability distribution of the feature vectors associated with the HMM states can be modeled by GMM with higher accuracy. This facilitates the successful implementation of GMM-HMM systems for speech recognition as preferred by present day systems.

Despite of its outstanding performance in terms of accuracy, GMM has some disadvantages, like it requires huge amount of training data and processing speed. But the major drawback of GMM is that it requires a large number of diagonal Gaussians or a large number full covariance Gaussians to model data which lies near a non linear surface in the data space (Hinton, Deng, Yu, Dahl, Mohamed, Jaitly, Senior, Vanhoucke, Nguyen, Sainath & Kingsbury, 2012). Using large coefficients is not statistically efficient since underlying structure of speech signal is much lower dimensional.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Acoustic Modeling of Speech Signal using Artificial Neural Network: A Review of Techniques and Current Trends

Abstract

Introduction

Complete Chapter List