Visual Speech and Gesture Coding Using the MPEG-4 Face and Body Animation Standard

Eric Petajan

doi:10.4018/978-1-60566-186-5.ch004

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Visual Speech and Gesture Coding Using the MPEG-4 Face and Body Animation Standard

Eric Petajan

Source Title: Visual Speech Recognition: Lip Segmentation and Mapping

DOI: 10.4018/978-1-60566-186-5.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Automatic Speech Recognition (ASR) is the most natural input modality from humans to machines. When the hands are busy or a full keyboard is not available, speech input is especially in demand. Since the most compelling application scenarios for ASR include noisy environments (mobile phones, public kiosks, cars), visual speech processing must be incorporated to provide robust performance. This chapter motivates and describes the MPEG-4 Face and Body Animation (FBA) standard for representing visual speech data as part of a whole virtual human specification. The super low bit-rate FBA codec included with the standard enables thin clients to access processing and communication services over any network including enhanced visual communication, animated entertainment, man-machine dialog, and audio/visual speech recognition.

Chapter Preview

Top

Introduction

In recent years the number of people accessing the internet or using digital devices has exploded. In parallel the mobile revolution is allowing consumers to access the internet on relatively powerful handheld devices. While the transmission and display of information is efficiently handled by maturing fixed and wireless data networks and terminal devices, the input of information from the user to the target system is often impeded by the lack of a keyboard, low typing skills, or busy hands and eyes. The last barrier to efficient man-machine communication is the lack of accurate speech recognition in real-world environments. Given the importance of mobile communication and computing, and the ubiquitous internetworking of all terminal devices, the optimal system architecture calls for compute-intensive processes to be performed across the network. Support for thin mobile clients with limited memory, clock speed, battery life, and connection speeds requires that visual speech and gesture information captured from the user be transformed into a representation that is both compact and computable on the terminal device.

The flow of audio/video data across a network is subject to a variety of bottlenecks that require lossy compression; introducing artifacts and distortion that degrade the accuracy of scene analysis. Video with sufficient quality for facial capture must be either stored locally or analyzed in real time. Real-time video processing should be implemented close to the camera to avoid transmission costs and delays, and to more easily protect the user’s visual privacy. The recognition of the human face and body in a video stream results in a set of descriptors that ideally occur at the video frame rate. The human behavior descriptors should contain all information needed for the Human-Computer Interaction (HCI) system to understand the user’s presence, pose, facial expression, gestures, and visual speech. This data is highly compressible and can be used in a communication system when standardized. The MPEG-4 Face and Body Animation (FBA) standard^1,2 provides a complete set of Face and Body Animation Parameters (FAPs and BAPs) and a codec for super low bit-rate communication. This chapter describes the key features of the MPEG-4 FBA specification, its application to visual speech and gesture recognition, and architectural implications.

The control of a computer by a human incorporating the visual mode is best implemented by the processing of video into features and descriptors that are accurate and compact. These descriptors should only be as abstract as required by network, storage capacity, and processing limitations. The MPEG-4 FBA standard provides a level of description of human facial movements and skeleton joint angles that is both highly detailed and compressible to 2 kilobits per second for the face and 5-10 kilobits per second for the body. The MPEG-4 FBA stream can be transmitted over any network and can be used for visual speech recognition, identity verification, emotion recognition, gesture recognition, and visual communication with the option of an alternate appearance. The conversion of video into an MPEG-4 FBA stream is a computationally intensive process which may require dedicated hardware and HD video to fully accomplish. The performance of recognition tasks on the FBA stream can be performed anywhere on the network without risking the violation of the users visual privacy when video is transmitted. When coupled with voice recognition, FBA recognition should provide the robustness needed for effective HCI. As shown in Figure 1, the very low bit-rate FBA stream enables the separation of the HCI from higher level recognition systems, applications and databases that tend to consume more processing and storage than is available in a personal device. This client-server architecture supports all application domains including human-human communication, human-machine interaction, and local HCI (non-networked). While the Humanoid Player Client exists today on mobile phones, a mobile Face and Gesture Capture Client is still a few years away.

Figure 1.

FBA enabled client-server architecture

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Visual Speech and Gesture Coding Using the MPEG-4 Face and Body Animation Standard

Abstract

Introduction

Complete Chapter List