Compression Schemes with Data Reordering for Ordered Data

Compression Schemes with Data Reordering for Ordered Data

Chun-Hee Lee (Department of Computer Science, KAIST (Korea Advanced Institute of Science and Technology), Daejeon, Korea) and Chin-Wan Chung (Department of Computer Science, KAIST (Korea Advanced Institute of Science and Technology), Daejeon, Korea)
Copyright: © 2014 |Pages: 28
DOI: 10.4018/jdm.2014010101

Abstract

Although there have been many compression schemes for reducing data effectively, most schemes do not consider the reordering of data. In the case of unordered data, if the users change the data order in a given data set, the compression ratio may be improved compared to the original compression before reordering data. However, in the case of ordered data, the users need a mapping table that maps the original position to the changed position in order to recover the original order. Therefore, reordering ordered data may be disadvantageous in terms of space. In this paper, the authors consider two compression schemes, run-length encoding and bucketing scheme as bases for showing the impact of data reordering in compression schemes. Also, the authors propose various optimization techniques related to data reordering. Finally, the authors show that the compression schemes with data reordering are better than the original compression schemes in terms of the compression ratio.
Article Preview

Introduction

Currently, a large volume of data in various environments is generated. Such a large volume of data consumes valuable resources such as space, network bandwidth, and CPU. In order to save the resources, data compression schemes have been developed and applied in many applications.

However, they do not consider the effect of data reordering. If we reorganize data, the compression ratio for the reorganized data may be improved compared to that for the original data. Some papers deal with data reordering problems in very limited environments (Apaydin, Tosun & Ferhatosmanoglu, 2008; Blandford & Blelloch, 2002; Johnson, Krishnan, & Chhugani, 2004; Pinar, Tao & Ferhatosmanoglu, 2005). The work of Apaydin et al. (2008), Blandford and Blelloch (2002), Johnson et al. (2004), and Pinar et al (2005) assumes that the order of data does not have to be preserved, that is, the input data is unordered data. However, in general, the order of data should be preserved and has the important information. For example, time series data should be ordered by the time. If we change the order of the time series data, it will lose much information. Therefore, the approaches in Apaydin et al. (2008), Blandford and Blelloch (2002), Johnson et al. (2004), and Pinar et al. (2005) cannot be applied to the ordered data such as time series data (Chen, Dong, Han, Wah, & Wan, 2002; Korn, Jagadish, & Faloutsos, 1997; Reeves, Liu, Nath, & Zhao, 2009; Elmeleegy, Elmagarmid, Cecchet, Aref, & Zwaenepoel, 2009).

Consider the run-length encoding with the ordered data D = {1, 1, 1, 3, 3, 1, 1, 4, 4, 4}. The run-length encoding is one of the widely used lossless compression schemes. It replaces repeated values with <value, count>, where count is the number of repeated values. We can represent D by the run-length encoding as follows. Note that RLE(D) is the compressed data for D using the run-length encoding. See Table 1 for the detailed notational convention.

RLE(D) = {<1,3>,<3,2>,<1,2>,4,3>},where in the pair <a,b>, a is value and b is count. Table 1.
Notational convention
NotationMeaning
RLE(D)Compressed data for D
using the run-length encoding
BUCKET(D, ε)Compressed data for D using the bucketing
scheme with an error bound ε
dithe i-th element in data set D
di,j{di, di+1, · · ·, dj−1, dj}, where i≤ j
<X>token X in the run-length encoding or
the bucketing scheme
Xmovement information X

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 30: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 29: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing