This chapter reviews information hiding methods, with a focus on steganography and steganalysis. First, the authors summarize image data structures and image formats required by computers and the Internet. They then introduce several information hiding methods based on image formats including lossless (non-compression based), limited color-based image data, JPEG, and JPEG2000. The authors describe a steganographic method in detail, which is based on image segmentation using a complexity measure. They also introduce a method for applying this to palette-based image formats, reversible information hiding for grayscale images, and JPEG2000 steganography. The steganographic methods for JPEG and JPEG2000 described in this chapter give particular consideration to the naturalness of cover data. In the steganalysis section, the authors introduce two methods, i.e., a specific steganalysis method for LSB steganography and Bit-Plane Complexity Segmentation (BPCS) stegnography.
TopBefore describing detailed methods for the embedding and extraction of information in image data, we provide a summary of image data storage on computers and transmission over the Internet.
Digital Images
Digital images can be represented as a 2-dimensional discrete signal f(x,y) where x and y represent a sampled point in the image, ranging from and if the image size is and respectively. The value of f(x,y) is quantized using bits, i.e., If f(x,y) is a scalar, the image is a grayscale image. The black-and-white color corresponds to 0 and 255, if N = 8. When dealing with a color image, f(x,y) can be regarded as a vector with three components: R, G, and B. In general, a full color image is represented as f(x,y) with 8 bits quantized for each component, i.e., 24 bits are quantized in total.
Palette-Based Images
The intensity of the three colors, i.e., red, green, and blue (denoted as R, G, and B), must be provided to display a color on a screen via a computer's frame buffer. This information is provided from the image data. We designate three images, the R image, G image, and B image, which correspond to the intensities of the R, G, and B color components for the color image, respectively. We refer to these R, G, and B images as the color component images.
A pixel is regarded as a triplet vector (R, G, B), known as a color vector. We assume that the values of R, G, and B are less than 256, which means that each can be represented using 8 bits. A palette contains a set of color vectors with a unique index assigned to each vector. If the palette contains less than 256 color vectors, the index can itself be represented using 8 bits. In a palette-based image, a pixel value does not directly contain color information, but instead it contains an index for the palette. We refer to such an image as an index image.
Digital images are stored in computers, USB drives, DVDs, and other media, and they can be transmitted via the Internet. We need different formats to achieve this. Compression-based image formats consist of two types, i.e., lossless compression-based formats and lossy compression-based formats. With the compression-based image format, a bit sequence of compressed data is stored on a computer, whereas the pixel values are stored directly. The original pixel value is reconstructed from the compressed bit sequence. Lossless compression-based image formats can completely recover the original pixel value, e.g., PNG, whereas the lossy compression-based image format can make errors when recovering the original pixel value from the recovered pixel value, e.g., JPEG and JPEG2000. Non-compression based image formats store pixel values on computers without any compression.
TopIn this section, we introduce several information hiding methods for image data. The hiding methods are categorized as lossless compression (non-compression-based) methods, limited color-based methods, JPEG-based methods, and JPEG2000-based methods. A reversible information hiding method for lossless and JPEG2000 formats is also introduced.