Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Creating Sound Glyph Database for Video Subtitling

Chitralekha Ganapati Bhat, Sunil Kumar Kopparapu

Source Title: Multi-Core Computer Vision and Image Processing for Intelligent Applications

DOI: 10.4018/978-1-5225-0889-2.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Accessibility of speech information in videos is a huge challenge for the hearing impaired, making a visual representation such as text subtitling essential. Unavailability of a good Automatic Speech Recognition (ASR) engine, makes automatic generation of text subtitles for resource deficient languages such as Indian languages, extremely difficult. Techniques to build such an ASR using audio and corresponding transcription in the form of broadcast news or audio books have been proposed; however, these techniques require transcriptions corresponding to the audio in editable text format, which are unavailable for resource deficient languages. In this chapter, a novel technique of building a sound-glyph database for a resource deficient language has been described. The sound-glyph database can be used effectively to subtitle videos in the same language script. Considering large volumes of data that need to be processed, we propose a parallel processing method in a multiresolution setup, harnessing the multi-core capacity of present day computers.

Chapter Preview

Top

Introduction

Science may have found a cure for most evils; but it has found no remedy for the worst of them all - the apathy of human beings. – Helen Keller

Accessibility is one of the key design aspects for any product, to ensure that people with disabilities are able to use the product, indicates a societal growth wherein, Helen Keller’s worst fears have a chance of being addressed. With increasing attention being dedicated to making any digital content accessible, text subtitling or closed captioning for videos, TV programmes, is gaining significance. Several countries have mandated that all broadcasted videos be made accessible. The most common mode of making videos accessible to hearing impaired, is to provide visual cues corresponding to audio through subtitles in text format. The process of manually creating text subtitles for a video is long drawn and tedious. Alternatively, an Automatic Speech Recognition (ASR) engine can be employed to convert the audio into text and then use the text to subtitle the video, either in real-time or in the offline mode. This mechanism is efficient for resource rich languages like English. However, for resource deficient languages, especially Indian languages, this is not possible because of the absence of a good ASR in that language. This is primarily due to the non availability of a good speech corpus.

A speech corpus is a collection of speech audio files and their corresponding transcription. The sanctity of the speech corpus is measured by the quality of audio in terms of noise, accuracy of time alignment of audio and its corresponding text. Current state-of-the-art ASR technologies use audio and transcription in editable text format. There exists a wealth of open access audio and corresponding transcription in the form of news data, audio books etc. for various Indian languages. However, the transcripts of the news audio for several Indian languages are only available in non-editable form, meaning the transcripts corresponding to the audio cannot be converted into text to build a speech corpus. We propose a technique by which, using the audio and the corresponding transcripts in the image form (non-editable) to build a sound and word-glyph database. We derive a correlation between audio clips and images of the script corresponding to these audio clips by exploiting speech and image processing techniques. The central idea is to be able to build a database which represents the audio in terms of images of the script. Considering large volumes of image data that needs to be processed, we use multiresolution techniques on a multi-core processor to provide speed up in the process. The main contribution of this chapter is to build a sound-glyph database for a resource-deficient language to aid making video/audio accessible. We use multiresolution technique to reduce the size of the image and exploit inherent parallelism in the nature of the method of building the sound-glyph database.

The rest of the chapter is organized as follows, a background of the existing techniques for building a speech corpus for resource deficient language and their limitations are provided, followed by the methodology used in building the sound-glyph database using multiresolution and multi-core techniques.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Creating Sound Glyph Database for Video Subtitling

Abstract

Introduction

Complete Chapter List