SSIM-Based Distortion Estimation for Optimized Video Transmission over Inherently Noisy Channels

SSIM-Based Distortion Estimation for Optimized Video Transmission over Inherently Noisy Channels

Arun Sankisa (Northwestern University, Evanston, IL, USA), Katerina Pandremmenou (University of Ioannina, Ioannina, Greece), Peshala V. Pahalawatta (AT&T, Inc., El Segundo, CA, USA), Lisimachos P. Kondi (University of Ioannina, Ioannina, Greece) and Aggelos K. Katsaggelos (Northwestern University, Evanston, IL, USA)
DOI: 10.4018/IJMDEM.2016070103


The authors present two methods for examining video quality using the Structural Similarity (SSIM) index: Iterative Distortion Estimate (IDE) and Cumulative Distortion using SSIM (CDSSIM). In the first method, three types of slices are iteratively reconstructed frame-by-frame for three different combinations of packet loss and the resulting distortions are combined using their probabilities to give the total expected distortion. In the second method, a cumulative measure of the overall distortion is computed by summing the inter-frame propagation impact to all frames affected by a slice loss. Furthermore, the authors develop a No-Reference (NR) sparse regression framework for predicting the CDSSIM metric to circumvent the real-time computational complexity in streaming video applications. The two methods are evaluated in resource allocation and packet prioritization schemes and experimental results show improved performance and better end-user quality. The accuracy of the predicted CDSSIM values is studied using standard performance measures and a Quartile-Based Prioritization (QBP) scheme.
Article Preview

1. Introduction

Smart phones and mobile devices that contain sophisticated video processing elements have become an integral part of our daily lives and have created a dramatic rise in demand for multimedia services over wireless networks. This demand underscores the need for efficient algorithms that provide optimal end-user quality, while taking into account capacity constraints, like storage and bandwidth. Limited resources on transmission systems inherently prone to dropping packets provide strong motivation for video processing and network entities to implement efficient encoding, resource allocation, packet prioritization and scheduling techniques. The end-user experience and overall perceived quality can be influenced by many factors but most notably by compression and transmission impairments. Towards this end, research in video codecs has moved at a fast pace through standards like H.264/Moving Picture Experts Group (MPEG) 4/Advanced Video Coding (AVC), Scalable Video Coding (SVC) and H.265/High Efficiency Video Coding (HEVC) (Wiegand, Sullivan, Bjøntegaard, & Luthra, 2003; Schwarz, Marpe, & Wiegand, 2007; Sullivan, Ohm, Han, & Wiegand, 2012). Similarly, wireless communication has also made rapid strides with 3G Universal Mobile Telecommunications System (UMTS), High Speed Downlink/Uplink Packet Access (HSDPA/HSUPA), WiMax, 4G/Long Term Evolution (LTE), and plans to introduce 5G before the end of this decade (HSDPA, 2006; DC-HSPA, 2010; LTE; METIS, 2013). These advances help address the growing demand for video streaming services but also emphasize the need for innovative algorithms that offer complete end-to-end solutions. Research in cross-layer optimization, rate-distortion modeling, packet scheduling and resource allocation in multi-user environments (e.g. Maani, Pahalawatta, Berry, Pappas, & Katsaggelos 2008; Li, Li, Chiang, & Calderbank, 2009; Luo, Ci, & Wu, 2011; Ismail, Zhuang, & Elhedhli, 2013; Sankisa, Katsaggelos, & Pahalawatta, 2015) has shown that transmission methods that are content-aware provide noticeable performance improvements than content-agnostic techniques.

The three components for defining a cross-layer, content-aware system are the encoding mechanism, the transmission network and the quality assessment technique. Video coding creates compression artifacts that directly translate to a perceived degradation in overall quality. During the encoding process, sequences are broken into frames and different coding modes are applied on their constituent units, MacroBlocks (MBs) and Group-Of-Blocks (GOBs). The decision about the coding modes usually depends on the frame in which a block resides and a natural outcome of differentiated coding is the formation of data entities with unequal importance, a key incentive for defining a packet prioritization scheme. Additionally, temporal, motion-compensated prediction commonly used by encoders leads to inter-frame dependence and error propagation that needs to be taken into account when designing such a scheme. When an encoded sequence is ready for transmission, usually over a resource-constrained, loss-prone channel, it is broken and packaged into units that each contains a portion of a video frame (for instance, a GOB). All packets belonging to a frame need to be correctly received for error-free reconstruction at the decoder. But if some packets are lost, data can be recovered by applying the appropriate error-concealment technique, although it is usually accompanied by propagation of errors between frames.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 9: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing