A High Quality Audio Coder Using Proposed Psychoacoustic Model

A High Quality Audio Coder Using Proposed Psychoacoustic Model

DOI: 10.4018/978-1-61520-925-5.ch009
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

9.1. Structure Of Proposed Perceptual Audio Coder

The structure of the proposed high quality perceptual audio encoder is shown in Figure 1 (He et al., 2008b). Input PCM audio samples are fed into the encoder. The time to frequency mapping creates a sub-sampled representation of the audio samples using the DWPT. The psychoacoustic model calculates the masking thresholds, which are later employed to control the quantizer and coding. Bit allocation strategy is utilized to allocate bits to each sub-band sample according to its perceptual importance. Typically, more bits are reserved for low frequency samples, which are perceptually more important. Quantization is performed in a way to keep the quantization noise below the audible threshold for transparent audio coding. The bit allocation information is transmitted together with the encoded audio as ancillary data or side information, which are used in the audio decoder to reconstruct the PCM audio samples. Lossless coding, which is usually Huffman coding, is employed to further remove the redundancy of the quantized value. The frame packing block packs the output of quantizer and coding block as well as the side information and yields the encoded audio stream.

Figure 1.

Structure of perceptual audio encoder

Figure 2 shows the decoder of the proposed audio coding scheme. The encoded audio stream is fed into the frame unpacking block, which unpacks the compressed audio stream into the quantized samples as well as the side information. In the de-quantization and decoding block, Huffman decoding is performed first followed by de-quantization, using the side information extracted from the frame-unpacking block. The output is the audio samples in the wavelet domain, which are later transformed in time domain by the inverse time/frequency mapping block to form the decoded PCM audio samples.

Figure 2.

Structure of perceptual audio decoder

Time/frequency mapping block and psychoacoustic model block are illustrated in chapter 7 as the proposed psychoacoustic model, so only quantizer and coding block is explained in the following section.

Top

9.2 Quantization And Huffman Coding

The quantization and Huffman coding employed in the proposed audio codec is similar to that of the MPEG 1 layer III standard. The input to the quantization and Huffman coding block includes the spectral values (wavelet coefficients) of the frame, the maximum number of bits available for Huffman coding, the critical band partition table and the allowed distortion in each critical band (also called scalefactor band in audio coding).

The maximum number of bits available for Huffman coding for one frame (called granule in audio coding) is defined as

(9.1) where bit_rate is the actual bit rate, granul_size is the number of spectral values in one granule (1024 for our case) and the sampling frequency is 44.1 kHz for CD quality audio.

The allowed distortion in each scalefactor band is calculated as

(9.2) where sb is the scalefactor band index, thrn(sb) is the masking threshold estimated by proposed psychoacoustic model, and bw(sb) is the bandwidth of each scalefactor band and can be read from Table 1.

Table 1.
Scalefactor band partition (sampling frequency 44.1 kHz)
Scalefactor band(sb)Bandwidth(bw)Index of startIndex of end
1414
2458
38916
441720
542124
682532
743336
843740
984148
10164964
1186572
1287380
13168196
141697112
1516113128
1616129144
1716145160
1832161192
1964193256
2032257288
2132289320
2264321384
23128385512
24256513768
252567691024

Complete Chapter List

Search this Book:
Reset