Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Speech Feature Evaluation for Bangla Automatic Speech Recognition

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Mohammad Nurul Huda

Source Title: Technical Challenges and Design Issues in Bangla Language Processing

DOI: 10.4018/978-1-4666-3970-6.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.

Chapter Preview

Top

Introduction

Conventional Automatic Speech Recognition (ASR) systems use stochastic pattern matching techniques, where a word candidate is matched against word templates represented by Hidden Markov Models (HMMs) (Young, 2005). Although these techniques have a fair performance in limited applications, they suffer from huge computational cost at classifier stages, and also they always reject a new vocabulary or so-called Out-Of-Vocabulary (OOV) word. On the other hand, a traditional segmentation-based phone decoding technique can be used to solve these problems, but, until now, its recognition accuracy is far from sufficient performance.

These ASR systems could not be able to provide enough performance at anytime and everywhere. One of the reasons is that the Acoustic Models (AMs) of a Hidden Markov Model (HMM)-based classifier include many hidden factors such as speaker-specific characteristics that include gender types and speaking styles. It is difficult to recognize speech affected by these factors, especially when an ASR system contains only a single acoustic model. One solution is to employ multiple acoustic models, one model for each type of gender. Though the robustness of each acoustic model prevails to some extent, the whole ASR system can handle gender effects appropriately.

Most of these ASR systems use Mel Frequency Cepstral Coefficients (MFCCs) of 39 dimensions (12-MFCC, 12-ΔMFCC, 12-ΔΔMFCC, P, ΔP and ΔΔP, where P stands for raw energy of the input speech signal). Here, Hamming window of 25 ms is used for extracting the feature. The value of pre-emphasis factor is 0.97. Although these standard MFCCs are prevalent to current ASR system, but these features do not provide better performance because frequency domain information are not incorporated within the feature vector during the extraction process.

Recently, dynamic parameters such as velocity and acceleration coefficients of speech showed its necessity for embedding them as features to resolve the coarticulation effect due to widening the context window size. Though the coarticulation effects can be solved by incorporating the triphone models (Young, 2005), but a large-scale speech corpus is required to negotiate all the triphones. Besides, the training of triphone models incurs many complexities in HMM based classifiers. To eliminate these complexities at cost we need some parameters like dynamic parameters for solving the problem of left and right context.

Contemporary Bangla automatic speech recognition suffers from some difficulties: (1) lack of large scale speech corpus, (2) unavailability of labeled speech data, and (3) insufficient research opportunities though more than 220 million people speak in Bangla as their native language, which is ranked sixth based on the number of native speakers. These problems should be reduced immediately for constructing an ASR system for recognizing the voice.

The objective of this chapter is to design some ASR systems based on the above mentioned ground and to incorporate the some other speech features inside the ASR for improving the performance by eliminating the gender effects. The followings explicate the objectives of the chapter in details.

1.
To construct a phoneme recognizer based on standard MFCC features to solve OOV problem.
2.
To innovate a canonicalization method that resolves gender factor by incorporating both types of genders (male and female) in the process after selecting the maximum hypothesis.
3.
To incorporate time and frequency domain information, new feature called local feature instead of standard MFCC is extracted from an input speech for an ASR system.
4.
To extract phoneme probabilities based on time delay neural network by using MFCCs as input feature.
5.
To embed dynamic parameters such as velocity (∆) and acceleration (∆∆) coefficients as features for resolving coarticulation effects.
6.
To design a labeled medium scale speech corpus for evaluating the recognition performance.
7.
To extract hybrid features based on phoneme probabilities extracted by a neural network and acoustic features derived from the input speech.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Speech Feature Evaluation for Bangla Automatic Speech Recognition

Abstract

Introduction

Complete Chapter List