Article Preview
Top1. Introduction
Data compression plays a vital role in data transmission and data storage applications owing to its capability of minimizing the amount of data and transmission time. Compression can be classified into two categories: lossy compression and lossless compression. In lossless compression, the information can be exactly reproduced as the original data. But in lossy compression, the information loss can occur. Many real-time applications which carry the vital information use lossless compression. Statistical codes or Universal codes have been used in lossless compression.
The codes which are constructed by statistical methods are called as statistical codes and their construction is based on the probability values of the symbols to be coded. The Huffman (Huffman, 1952) and Shannon-Fano (Salomon, 2000) algorithms are the examples for statistical methods. Elias Gamma Code (EC) (Elias, 1975), Elias Delta Code (DC) (Elias, 1975), Golomb-Rice Code (RC) (Golomb, 1966; Rice, 1979), Fibonacci Code (FC) (Fenwick, 2002), Variable Byte Code (VBC) (Salomon, 2007), Nibble Code (NC) (Salomon, 2007), Extended Golomb code (EGC) (Somasundaram & Domnic, 2007) and Fast Extended Golomb Code (FEGC) (Domnic & Glory, 2012) are universal codes which do not require the probabilities of the symbols for their construction. Normally statistical methods take more time to encode and decode the symbols of the file compared to universal codes. The applications which require fast encoding and decoding the data need universal codes rather than statistical codes. One of such applications is Information Retrieval System.
Information Retrieval System (IRS) is an information system that is used to store items of information that need to be processed, searched and retrieved corresponding to the users’ query (Salton & McGill, 1983). IRS is widely used in many applications such as digital libraries, search engines, e-commerce, electronic news, genomic sequence analysis, etc… (Kobayashi & Takeda, 2000; Williams & Zobel, 2002). Indexing is one of the efficient techniques used to locate the data for fast retrieval in IRS. The most commonly used indexing structure is inverted index (Zobel, Moffat & Ramamohanarao, 1998) for fast query evaluations compared to signature files (Faloutsos, 1985), PAT tress (Morrison, 1968) and Bitmaps (Chan & Ioannidis, 1998).
Inverted file contains two components: