K-Means Based Prediction of Transcoded JPEG File Size and Structural Similarity

K-Means Based Prediction of Transcoded JPEG File Size and Structural Similarity

Steven Pigeon, Stéphane Coulombe
DOI: 10.4018/jmdem.2012040103
(Individual Articles)
No Current Special Offers


The problem of efficiently adapting JPEG images to satisfy given constraints such as maximum file size and resolution arises in a number of applications, from universal media access for mobile browsing to multimedia messaging services. However, optimizing for perceived quality (user experience) commands a non-negligible computational cost which in the authors work, they aim to minimize by the use of low-cost predictors. In previous work, the authors presented predictors and predictor-based systems to achieve low-cost and near-optimal adaption of JPEG images under given constraints of file size and resolution. In this work, they extend and improve these solutions by including more information about images to obtain more accurate predictions of file size and quality resulting from transcoding. The authors show that the proposed method, based on the clustering of transcoding operations represented as high-dimensional vectors, significantly outperforms previous methods in accuracy.
Article Preview

1. Introduction

The need for efficient image adaptation arises in a number of contexts, ranging from universal media access with varying browsing conditions (Han et al., 1998; Mohan, Smith, & Li, 1999), to multimedia messaging services (MMS) (Coulombe & Grassel, 2004). In the case of universal access, one uses a mobile device, either a smart-phone, PDA, or a tablet, to access resources or services on the Web. The traditional response has been to use rather crude adaptation strategies (Han et al., 1998) such as simply preparing a single “mobile” version of the resource (Fling, 2009), but this one-size-fits-all solution will leave users at both ends of the device capability spectrum dissatisfied: some will find the mobile version exceeding (or cumbersome for) their devices’ capabilities, while others will find it inadequate and lacking.

In the context of MMS, for another example, a receiving terminal is characterized by its capabilities—or more exactly its limitations—such as the maximum resolution of images it can display, the formats it can decode, and the maximum message size it can receive and interpret correctly (Open Mobile Alliance, 2010). Interoperability between MMS users will require server-side adaptation, as the sender’s device may be more capable than the receiver’s, and the receiving device will be unable to display correctly, if at all, a message that exceeds its capabilities. In this context, adaptation will require that the sender’s images are converted to comply with the receiver’s device capabilities, that is, changing the file size (by altering the compression parameters) and resolution of images (by scaling them). Adaptation can also include the case where the compression format itself needs to be changed. But this is seldom a problem since MMS image traffic is mostly composed of JPEG images taken from the devices’ cameras. Accordingly, we will neglect the case where the format also needs to be adapted (for example, from PNG to GIF) and concentrate on the prevalent problem of JPEG to JPEG image adaptation subject to changes in compression parameters (e.g., the quality factor) and scaling (resolution).

Therefore, whether in the context of universal access or multimedia messaging, the challenge is to adapt images to fit given constraints, dictated by the network conditions and the receiving device capabilities, while simultaneously maximizing the user experience and minimizing the computational cost of adaptation. In the context of high-volume service providing, whether for MMS or universal media access, only the fastest adaptation algorithms yielding the best perceived quality can be considered.

Of course, previous studies have addressed the problem of efficient image adaptation, but the solutions they propose are either still computationally expensive (and extensive modifications to existing JPEG manipulation libraries) (Ridge, 2003; Shu & Chau, 2005) or overly rigid, focusing on unrealistically constrained transformations such as scaling by powers of two (Lei & Georganas, 2002; Ratnakar & Ivashin, 2001; Ridge, 2003), or using a small, fixed, number of possible adaptations, without real consideration for the perceived quality resulting from adaptation. For example, Ridge’s method is accurate, but requires the JPEG image to be partly decompressed so that the DCT coefficients are available, on which successive re-quantization passes are performed until the quality factor yielding the largest file not exceeding the constraint is found (Ridge, 2003). Other methods exploit the structure of the DCT to yield fast scaling algorithms in the (partially) compressed domain by manipulating the DCT coefficients directly, but such methods also require the image to be partly decoded so that the DCT coefficients are available, and they are constrained to scaling by powers of two. Furthermore, it is unclear what the expected speed-ups are, as the DCT-coefficient based scaling algorithms are still relatively complex and may compare to an efficient implementation scaling using space-domain filters in terms of computational complexity. But, in our opinion, the principal shortcoming of previous methods is that they do not consider joint changes in compression parameters and scaling as a means of adaptation maximizing perceived quality.

Complete Article List

Search this Journal:
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing