Low-Complexity Encoding in Block-Based Hybrid Video Codecs by Moving Motion Estimation to Decoder Side

Low-Complexity Encoding in Block-Based Hybrid Video Codecs by Moving Motion Estimation to Decoder Side

J. Karlsson (Embedded Systems Lab, Department of Applied Physics and Electronics, Umeå University, Umeå, Sweden)
DOI: 10.4018/ijaras.2014010102


In this paper the authors present an approach to provide efficient low-complexity encoding for the block-based video coding scheme. The authors present a method based on removing the most time-consuming task, that is motion estimation, from the encoder. Instead the decoder will perform motion prediction based on the available decoded frame and send the predicted motion vectors to the encoder. The results presented are based on a modified H.264 implementation. The results show that this approach can provide rather good coding efficiency even for relatively high network delays.
Article Preview


The most commonly used video codec standards today are based on the block based hybrid video codec technique, see Figure 1 for an overview of this scheme. In block based hybrid coding the image is first divided into a number of non overlapping blocks. Usually two different type of blocks are used where the typically block size is 8x8 pixels and a macroblock of 16x16 pixels. The image contains one luminance component and two color components. The color component is usually sub sampled 2:1, so one 8x8 block for the color components corresponds to 16x16 pixels in the original image.

Figure 1.

Block diagram for the block based hybrid video codec

Each macro block is first motion estimated (ME) against the previous decoded frame, called the reference frame. The motion vector specifies the displacement between the current block and the best matching block in the reference frame. For motion estimation a block matching scheme is commonly used. Here the difference must be computed between input- and reference frame for each possible displacement vector. This is a very time consuming task, and the main reason for the much higher complexity for the encoder compared to the decoder in the block based hybrid video codec type. In many implementations the encoder has a complexity 5-10 higher than the decoder (Aaron et al., 2004). The motion vectors are used to create a motion compensated (MC) block from the reference frame. The difference between the motion compensated block and the original block is then discrete cosine transformed (DCT). This is a transformation to the frequency plane and is used to exploit the correlation between error pixels. The quantized DCT coefficients and the motion vectors are then coded and transmitted.

A frame that is coded by using the previous frame as reference is called an inter frame. A frame can also be coded without using a previous frame as reference and these frames are called intra frames. The first frame in a video sequence does not have any previous frames and must be coded as an intra frame. In many video codecs an intra frame is also inserted periodically.

The encoder and the decoder must use the same reference picture. The encoder must therefore include a decoder to create this reference picture. In the decoder part the DCT components are first inverse quantized, and then inverse DCT transformed. Each error block is then added to corresponding motion compensated block to create the decoded picture. This will be the reference picture for the next frame. Since the encoder passively controls the decoder, they will lose synchronization if some data is lost in the network. Different error control methods is used to solve this problem (Wang & Zhu, 1998). One of the simplest, and most commonly used methods, is to periodically insert frames that are not predictively coded, that is I-frames. Since these frames have a much lower compression efficiency compared to the predictively encoded frames (P-frames), the overall coding efficiency will drop when they are inserted.

Recently there has been an increasing interest for wireless video sensor networks (Akyildiz, Melodia, & Chowdhury, 2007). The availability of low-cost CMOS image sensors has made it possible to build small, low-power, and low-cost image sensor nodes. These nodes usually have very limited resources in terms of computing, communication and stored energy. Due to the limited communication capacity and the high energy cost for transmitting data an efficient video compression algorithm is needed. However, due to the limited resources in terms of computing and energy, the video encoder on the sensor node must have a low complexity. These two requirements are hard to meet when using standard video codecs. A video codec having high coding efficiency usually have a high encoding complexity, while low-complexity encoders have a low coding efficiency.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 7: 1 Issue (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing