Spatial Audio Coding and Machine Learning

Spatial Audio Coding and Machine Learning

Karim Dabbabi
Copyright: © 2023 |Pages: 20
DOI: 10.4018/978-1-7998-9220-5.ch149
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Spatial audio encoding plays a fundamental role in the ultra-high-definition TV (UHDTV) and the latest generation of television broadcasting, as well as other technological devices by providing a three-dimensional (3D) audio content to consumers. In this article, the fundamental concepts of the spatial audio coding including its techniques, standards, and applications are exhibited. The object-based audio reproduction system will be presented and compared to the traditional channel-based system in order to offer a good understanding of this system to the users and to give them more flexibility in their preferred audio composition. Moreover, the MPEG standard for encoding multi-channel audio signals will be exposed. Machine learning (ML) methods and their applications in acoustics and spatial audio scenes will then be offered. Ultimately, further research directions will be illustrated and discussed.
Chapter Preview
Top

Background

Nowadays, many technological inventions have been made and integrated into the market, such as three-dimensional (3D) audio technology, also called spatial audio (Rumsey, F., 2001). The latter has many application areas, such as digital audio entertainment media like ultrahigh definition television (UHDTV), many other generations of television broadcasting, etc. As for the UHDTV standard, up to 20 multiple speakers have been explored to provide realistic 3D audio perception to users. Thus, a single audio channel feeds each speaker; therefore, multi-channel audio signals will be requested. Some broadcasting companies which have continuously adopted these audio chain technologies in recording, transmission and production include the BBC, UK and NHK, Japan.

In audio spatial, the active component that plays a major role is perceptual audio coding (Pan, D., 1995; Painter, T., & Spanias, A., 2000; Bosi, M., & Goldberg, R. E., 2002; Brandenburg, Faller, Herre, Johnston, & Kleijn, 2014).

Key Terms in this Chapter

Spatial Audio Object Coding: It is a proprietary spatialization system that makes it possible to render a mix on a variable configuration of loudspeakers from an object-oriented audio stream.

MPEG Surround: MPEG Surround is based on breaking down the original signal into two signals: a stereo (or mono) signal on the one hand, and a channel of spatialization data on the other.

MPEG-H 3D Audio Coding: It is an audio coding standard developed by the ISO/IEC Moving Picture Experts Group (MPEG) to support audio coding as Audio Channels, Audio Objects, or Higher Order Ambisonics (HOA).

Machine-Listening Systems: Machine-listening systems consist of recording, decoding and interpreting sounds (voice, music, noises, etc.).

Machine Learning: Machine learning is a computer programming technique that uses statistical probabilities to give computers the ability to learn on their own without explicit programming.

Binaural Recordings: Binaural recording is a method of recording sound that explores two microphones with the aim of creating a 3D stereo sound sensation for the listener to actually be in the room with the performers or instruments.

Spatial Audio Coding: The concept of spatial audio coding is to represent two or more audio channels by means of down mixing, along with parameters to design the spatial attributes of the original audio signals that are missed by the down mixing procedure.

Complete Chapter List

Search this Book:
Reset