Fast Mode Decision in H.264/AVC

Fast Mode Decision in H.264/AVC

Peter Lambert (Ghent University, Belgium), Stefaan Mys (Ghent University, Belgium), Jozef Škorupa (Ghent University, Belgium), Jürgen Slowack (Ghent University, Belgium), Rik Van de Walle (Ghent University, Belgium), Ming Yuan Yang (University of the West of Scotland, UK), Christos Grecos (University of the West of Scotland, UK) and Vassilios Argiriou (University of East London, UK)
DOI: 10.4018/978-1-61520-761-9.ch021
OnDemand PDF Download:
$37.50

Abstract

The latest video coding standard (Wiegand, 2003), H.264/AVC, uses variable block sizes ranging from 16x16 to 4x4 to perform motion estimation in inter-frame coding and a rich set of prediction patterns for intra-frame coding. Then a robust RDO (Rate Distortion Optimization) technique is employed to select the best coding mode and reference frame for each macroblock. As a result, H.264/AVC exhibits high coding efficiency compared to older video coding standards [2, 3] and shows significant future promise in the fields of video broadcasting and communication. However, high coding efficiency also carries high computational complexity. Fast mode decision is one of the key techniques to significantly reducing computational complexity for a similar RD (Rate Distortion) performance. This chapter provides an up-to-date critical survey of fast mode decision techniques for the H.264/AVC standard. The motivation for this chapter is twofold: Firstly to provide an up-to-data review of the existing techniques and secondly to offer some insights into the studies of fast mode decision techniques.
Chapter Preview
Top

Introduction

The H.264/AVC video coding standard is the newest video coding standard which is proposed by JVT (Joint Video Team). A number of new design features are adopted in this standard which significantly improve the rate distortion performance as compared to other standards. These features include variable block size and quarter sample accurate motion compensation with motion vectors even outside picture boundaries, multiple reference frames selection, decoupling of referencing from display order for flexibility and removal of extra delay associated with bi-predictive coding, bi-predictive pictures to be used as references for better motion compensation, weighted offsetting of prediction signals for coding efficiency in scenes including fades etc, improved “skipped” and “direct” mode inference for better RD performance in video sequences containing neighboring macroblocks (of the same scene object) moving in a common direction etc. H.264/AVC further allows directional edge extrapolation in intra coded areas for improving the quality of the prediction signal and allowing prediction from neighboring areas that are inter coded, in-loop de-blocking filter for removing compression artifacts as well as providing better quality reconstructed signals for subsequent motion compensation, and hierarchical block size transforms that enable signals with sufficient correlation to use longer basis functions than 4x4 transforms. There are also provisions for embedded processors such as exact match inverse transforms for “drift free” decoded representations and finally the standard provides advanced entropy coding techniques such as CAVLC (Context Adaptive Variable Length Coding) and CABAC (Context Adaptive Binary Arithmetic Coding) which are also present in the H.263 and JPEG2000 standards.

However, the improvements in the RD performance come with significant complexity increases. These new features not only increase the complexity of H.264/AVC encoders but also of the corresponding decoders. Variable block size motion estimation and compensation, Hadamard transform, RDO mode decision, displacement vector resolution and multiple reference frames are the main H.264/AVC encoding tools which increase the complexity of H.264/AVC encoders. In (Ostermann, 2004) an analysis of the complexity increase in the H.264/AVC video coding standard is presented and compared with previous standards. The significant computational complexity makes it very difficult to use the standard as it is in real-time applications. Reducing the complexity without degrading RD performance thus becomes a critical problem.

In order to understand the complexity of H.264/AVC more clearly, an experiment of complexity analysis is performed here. The Intel® VTune™ Performance Analyzer7.0 is used in this work as the evaluation tool to evaluate the software performance and obtain the complexity profile of an H.264/AVC encoder. In this experiment, the Foreman sequence (100 frames, QCIF (Quarter Common Intermediate Format) format, Baseline profile) is encoded on an Intel Pentium-4 3.09GHz PC with 768 MB memory and using the Microsoft Windows XP operating system. Figure. 1 shows the complexity proportion of different encoding modules in the H.264 JM8.1 [21] reference encoder.

According to Figure 1, the most time-consuming modules of the H.264/AVC encoder are Motion Estimation, Interpolation, SATD (Sum of Absolute Transformed Differences), and DCT (Discrete cosine transform) which are all related to the RDO based motion estimation and mode decision. Because mode decision covers all these four aspects, a good fast mode prediction algorithm for H.264/AVC is a promising way to reduce the complexity of video encoders.

Figure 1.

Complexity proportion of different encoding modules in H.264/AVC encoder by Intel® VTune™

The relation between complexity reduction and seamless video communication can be best described by using Figure 2 (Hsu 1997) below:

Figure 2.

Time delay diagram of a video communication system

Complete Chapter List

Search this Book:
Reset