Region-of-Interest Processing and Coding Techniques: Overview of Recent Trends and Directions

Region-of-Interest Processing and Coding Techniques: Overview of Recent Trends and Directions

Dan Grois (Ben-Gurion University of the Negev, Israel) and Ofer Hadar (Ben-Gurion University of the Negev, Israel)
DOI: 10.4018/978-1-4666-2833-5.ch006


This chapter comprehensively covers the topic of the Region-of-Interest (ROI) processing and coding for multimedia applications. The variety of end-user devices with different capabilities, ranging from cell phones with small screens and restricted processing power to high-end PCs with high-definition displays, have stimulated significant interest in effective technologies for video adaptation. Therefore, the authors make a special emphasis on the ROI processing and coding with regard to the relatively new H.264/SVC (Scalable Video Coding) standard, which have introduced various scalability domains, such as spatial, temporal, and fidelity (SNR/quality) domains. The authors’ observations and conclusions are supported by a variety of experimental results, which are compared to the conventional Joint Scalable Video Model (JSVM).
Chapter Preview


The number of video applications has been dramatically increased in the last decade, due to many reasons, such as rapid changes in the video coding standardization process driven by the increase of the computing power and significant developments of network infrastructures (Grois & Hadar, 2012). Nowadays, the most common video applications include wireless and wired Internet video streaming, high-quality video conferencing, High-Definition (HD) TV broadcasting, HD DVD storage and Blu-ray storage, while employing a variety of video transmission and storage systems (e.g., MPEG-2 for broadcasting services over satellite, cable, and terrestrial transmission channels, or H.320 for conversational video conferencing services (Schwarz, et al., 2007). Also, most access networks are usually characterized by a wide range of connection qualities, and a wide range of end-user devices with different capabilities, starting from cell phones/mobile devices with relatively small displays and limited computational resources to powerful Personal Computers (PCs) with high-resolution displays (Schwarz, et al., 2007).

As a result, due to the continuous need for scalability, much of the attention in the field of video processing and coding is currently directed to the Scalable Video Coding (SVC), which was standardized in 2007 as an extension of H.264/AVC (Schwarz, et al., 2007), since the bit-stream scalability for video is currently a very desirable feature for many multimedia applications (e.g., video conferencing, video surveillance, telemedical applications, etc.). The need for the scalability arises from the need for spatial formats, bit-rates or power (Wiegand & Sullivan, 2003; Grois & Hadar, 2011a, 2011b). To fulfill these requirements, it would be beneficial to simultaneously transmit or store video in a variety of spatial/temporal resolutions and qualities, leading to video bit-stream scalability. Major requirements for the Scalable Video Coding are to enable encoding of a high-quality video bitstream that contains one or more subset bitstreams, each of which can be transmitted and decoded to provide video services with lower temporal or spatial resolutions, or to provide reduced reliability, while retaining reconstruction quality that is highly relative to the rate of the subset bitstreams. Therefore, Scalable Video Coding provides important functionalities, such as the spatial, temporal and SNR (quality) scalability, thereby enabling power adaptation. In turn, these functionalities lead to enhancements of video transmission and storage applications.

SVC has achieved significant improvements in coding efficiency compared to the scalable profiles of prior video coding standards due to the largely increased flexibility and adaptability (e.g., SVC enables to provide the graceful degradation in lossy transmission environments as well as the bit-rate, format, and power adaptation). In addition to the temporal, spatial, and quality scalabilities, the SVC supports the Region-of-Interest (ROI) scalability. The ROI is a desirable feature in many future scalable video coding applications, such as mobile device applications, which have to be adapted to be displayed on a relatively small screen (thus, a mobile device user may require to extract and track only a predefined Region-of-Interest within the displayed video). At the same time, other users having a larger mobile device screen may wish to extract other ROI(s) to receive greater video stream resolution (Grois & Hadar, 2011c, 2011d). Therefore, to fulfill these requirements, it would be beneficial to simultaneously transmit or store a video stream in a variety of Regions-of-Interest (e.g., each Region-of-Interest having different spatial resolution, as presented in Figure 1), as well to enable efficiently tracking the predefined Region-of-Interest.

Figure 1.

Defining ROIs with different spatial resolutions (e.g., QCIF, CIF, SD/4CIF resolutions) to be provided within a scalable video coding stream

Complete Chapter List

Search this Book: