Article Preview
TopIntroduction
The concept of complexity relates to the presence of variation. In science there are many approaches that characterize complexity. A variety of scientific fields have dealt with complex mechanisms, simulations, systems, behavior and data complexity as those have always been a part of our environment. In this work, we focus on the topic of data complexity which is studied in information theory. While randomness is not considered complexity in certain areas such as those related to the study of complex systems, information theory tends to assign high values of complexity to random noise. Many fields benefit from the identification of content or noise related complex areas. In data hiding, adaptive steganography takes advantage of high concentration of self-information on high complexity areas originated from both content and noise to embed data. Solanki, Dabeer, Madhow, Manjunath, and Chandrasekaran (2003) describe the benefits of selective embedding related to the reduction of perceptual degradation for transform domain steganographic techniques. Bio diversity is another area where complexity can be used for identification and localization of different species. In this case, the complexity originated from content is more important than the one originated from noise.
Our goal in this paper is to introduce a variant of quad-trees for mining high complexity sub-domains of a data domain. A quad-tree is a tree structure defined on a finite set of nodes that either contains no nodes or is comprised of a root node and 4 quad-subtrees. In a full quad-tree, each node is either a leaf or has degree exactly 4. Our variant of quad-trees requires that each node that has descendants corresponds to a region that has a sufficient level of diversity as assessed by the value of an information-theoretical measure. We also present an alternative measure that has its roots in fractal geometry where the so called box-counting dimension (BCD) is used to determine the fractal dimension of a set S in a Euclidean space Rn. We then provide an algorithm to capture high complexity areas of 2 dimensional images domains and observe that diversity originated from both data content and noise are mined.
As an application, we used our proposed data structure in the initial stage of a crater detection algorithm. The algorithm is composed by two methods. The first method uses an information-theoretical approach with entropy quad-trees to create an edge filter that generates a binary image from complex areas which may contain edges. The second method applies a Circle Hough Transform (CHT) with modified threshold to detect the presence of circular shapes in complex areas. The new threshold is imposed to increase the quality of the results given the lack of prior knowledge about the number of craters in an image and the difficulty to estimate a good threshold for the minimum number of votes required in the parameter space to indicate true center points. Efficient methods for crater detection have been proposed (Bue & Stepinksi, 2007; Urbach & Stepinksi, 2009; Salamuniccar & Loncaric, 2008). We provide a distinct approach where no external pre-processing of the original image other than conversion to the JPEG format and resizing is needed. Likewise, no external image filters are used.