Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Creating Sound Glyph Database for Video Subtitling

Chitralekha Ganapati Bhat (TCS Innovation Labs, India) and Sunil Kumar Kopparapu (TCS Innovation Labs, India)

Source Title: Artificial Intelligence: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-5225-1759-7.ch104

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Accessibility of speech information in videos is a huge challenge for the hearing impaired, making a visual representation such as text subtitling essential. Unavailability of a good Automatic Speech Recognition (ASR) engine, makes automatic generation of text subtitles for resource deficient languages such as Indian languages, extremely difficult. Techniques to build such an ASR using audio and corresponding transcription in the form of broadcast news or audio books have been proposed; however, these techniques require transcriptions corresponding to the audio in editable text format, which are unavailable for resource deficient languages. In this chapter, a novel technique of building a sound-glyph database for a resource deficient language has been described. The sound-glyph database can be used effectively to subtitle videos in the same language script. Considering large volumes of data that need to be processed, we propose a parallel processing method in a multiresolution setup, harnessing the multi-core capacity of present day computers.

Chapter Preview

Top

Introduction

Science may have found a cure for most evils; but it has found no remedy for the worst of them all - the apathy of human beings. – Helen Keller

Accessibility is one of the key design aspects for any product, to ensure that people with disabilities are able to use the product, indicates a societal growth wherein, Helen Keller’s worst fears have a chance of being addressed. With increasing attention being dedicated to making any digital content accessible, text subtitling or closed captioning for videos, TV programmes, is gaining significance. Several countries have mandated that all broadcasted videos be made accessible. The most common mode of making videos accessible to hearing impaired, is to provide visual cues corresponding to audio through subtitles in text format. The process of manually creating text subtitles for a video is long drawn and tedious. Alternatively, an Automatic Speech Recognition (ASR) engine can be employed to convert the audio into text and then use the text to subtitle the video, either in real-time or in the offline mode. This mechanism is efficient for resource rich languages like English. However, for resource deficient languages, especially Indian languages, this is not possible because of the absence of a good ASR in that language. This is primarily due to the non availability of a good speech corpus.

A speech corpus is a collection of speech audio files and their corresponding transcription. The sanctity of the speech corpus is measured by the quality of audio in terms of noise, accuracy of time alignment of audio and its corresponding text. Current state-of-the-art ASR technologies use audio and transcription in editable text format. There exists a wealth of open access audio and corresponding transcription in the form of news data, audio books etc. for various Indian languages. However, the transcripts of the news audio for several Indian languages are only available in non-editable form, meaning the transcripts corresponding to the audio cannot be converted into text to build a speech corpus. We propose a technique by which, using the audio and the corresponding transcripts in the image form (non-editable) to build a sound and word-glyph database. We derive a correlation between audio clips and images of the script corresponding to these audio clips by exploiting speech and image processing techniques. The central idea is to be able to build a database which represents the audio in terms of images of the script. Considering large volumes of image data that needs to be processed, we use multiresolution techniques on a multi-core processor to provide speed up in the process. The main contribution of this chapter is to build a sound-glyph database for a resource-deficient language to aid making video/audio accessible. We use multiresolution technique to reduce the size of the image and exploit inherent parallelism in the nature of the method of building the sound-glyph database.

The rest of the chapter is organized as follows, a background of the existing techniques for building a speech corpus for resource deficient language and their limitations are provided, followed by the methodology used in building the sound-glyph database using multiresolution and multi-core techniques.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Creating Sound Glyph Database for Video Subtitling

Abstract

Introduction

Complete Chapter List