Article Preview
Top1. Introduction
Smart phones and mobile devices that contain sophisticated video processing elements have become an integral part of our daily lives and have created a dramatic rise in demand for multimedia services over wireless networks. This demand underscores the need for efficient algorithms that provide optimal end-user quality, while taking into account capacity constraints, like storage and bandwidth. Limited resources on transmission systems inherently prone to dropping packets provide strong motivation for video processing and network entities to implement efficient encoding, resource allocation, packet prioritization and scheduling techniques. The end-user experience and overall perceived quality can be influenced by many factors but most notably by compression and transmission impairments. Towards this end, research in video codecs has moved at a fast pace through standards like H.264/Moving Picture Experts Group (MPEG) 4/Advanced Video Coding (AVC), Scalable Video Coding (SVC) and H.265/High Efficiency Video Coding (HEVC) (Wiegand, Sullivan, Bjøntegaard, & Luthra, 2003; Schwarz, Marpe, & Wiegand, 2007; Sullivan, Ohm, Han, & Wiegand, 2012). Similarly, wireless communication has also made rapid strides with 3G Universal Mobile Telecommunications System (UMTS), High Speed Downlink/Uplink Packet Access (HSDPA/HSUPA), WiMax, 4G/Long Term Evolution (LTE), and plans to introduce 5G before the end of this decade (HSDPA, 2006; DC-HSPA, 2010; LTE; METIS, 2013). These advances help address the growing demand for video streaming services but also emphasize the need for innovative algorithms that offer complete end-to-end solutions. Research in cross-layer optimization, rate-distortion modeling, packet scheduling and resource allocation in multi-user environments (e.g. Maani, Pahalawatta, Berry, Pappas, & Katsaggelos 2008; Li, Li, Chiang, & Calderbank, 2009; Luo, Ci, & Wu, 2011; Ismail, Zhuang, & Elhedhli, 2013; Sankisa, Katsaggelos, & Pahalawatta, 2015) has shown that transmission methods that are content-aware provide noticeable performance improvements than content-agnostic techniques.
The three components for defining a cross-layer, content-aware system are the encoding mechanism, the transmission network and the quality assessment technique. Video coding creates compression artifacts that directly translate to a perceived degradation in overall quality. During the encoding process, sequences are broken into frames and different coding modes are applied on their constituent units, MacroBlocks (MBs) and Group-Of-Blocks (GOBs). The decision about the coding modes usually depends on the frame in which a block resides and a natural outcome of differentiated coding is the formation of data entities with unequal importance, a key incentive for defining a packet prioritization scheme. Additionally, temporal, motion-compensated prediction commonly used by encoders leads to inter-frame dependence and error propagation that needs to be taken into account when designing such a scheme. When an encoded sequence is ready for transmission, usually over a resource-constrained, loss-prone channel, it is broken and packaged into units that each contains a portion of a video frame (for instance, a GOB). All packets belonging to a frame need to be correctly received for error-free reconstruction at the decoder. But if some packets are lost, data can be recovered by applying the appropriate error-concealment technique, although it is usually accompanied by propagation of errors between frames.