Understanding Digital Documents Using Gestalt Properties of Isothetic Components

Understanding Digital Documents Using Gestalt Properties of Isothetic Components

Shyamosree Pal (Indian Institute of Technology, Kharagpur, India), Partha Bhowmick (Indian Institute of Technology, Kharagpur, India), Arindam Biswas (Bengal Engineering and Science University, Shibpur, Howrah, India) and Bhargab B. Bhattacharya (Indian Statistical Institute, Kolkata, India)
DOI: 10.4018/978-1-4666-0900-6.ch010
OnDemand PDF Download:
List Price: $37.50


This paper introduces how Gestalt properties can be used for identifying various components in a document image. That the human mind makes a holistic approach to vision rather than a disintegrated approach is shown to be useful for document analysis. Since the major constituent components (textual or non-textual) in a document page are arranged in a rectilinear fashion, rectilinear/isothetic decomposition of different components are made on a document page. After representing the page as a feature set of its polygonal covers corresponding to the distinct regions of interest, each polygon is iteratively decomposed into the sub-polygons tightly enclosing the corresponding sub-components to capture the overall information as well as the necessary details to the desired level of precision. Subsequently, these components and sub-components are analyzed using Gestalt laws/properties, which have been explained in detail in the context of this work. Text regions, tabular structures, and various graphic objects readily admit some of the Gestalt properties. We have tested our algorithm on several benchmark datasets, and some relevant results have been produced here to demonstrate the effectiveness and elegance of the proposed method.
Chapter Preview


The theory of Gestalt psychology is based on the idea that an experienced human mind actually makes a holistic approach to vision rather than a disintegrated approach. The mind has the ability to understand an image in such a way that the individual parts of the image produce the collective impression by assuming connections where it does not actually see one but finds necessary to have an overall perception (Sternberg, 2003). Hence, Gestalt psychology has been used recently in several research paradigms where the visual information has a significant role, e.g., musicology, automated building generation, using semi-autonomous agents to help artists express ideas, designing of web pages, etc. (Leman, 1997; Z. Li, Yan, Ai, & Chen, 2004; Mason, Denzinger, & Carpendale, 2005; Wilson, Russell, Schraefel, & Smith, 2006). We have used Gestalt properties for understanding various digital documents, which is a contemporary problem of the digital era and requires state-of-the-art technologies for its effective solution (Chaudhuri, 2007). Clearly, the solution is largely dependent upon the successful identification of all kinds of structures present in a document image and subsequently finding their associations with different components within a document. Interestingly, a document page has a striking property of admitting a characterization by the rectilinear arrangement of its major constituent components like paragraph, lines, words, tabular structures, graphics, etc. Based on this simple yet useful property, a novel geometric technique is proposed for rectilinear decomposition of different components in a document page, followed by an effective method on indexing and organizing these components for the purpose of efficient retrieval of digital documents. An efficient and meaningful segmentation of the above-mentioned components from a document image is the first step towards indexing of document pages. The second phase involves storing these geometric structures in a scientific way in order to design a robust retrieval system. Given a gray-scale document image, our algorithm performs the segmentation-cum-recognition of its different components by analyzing the geometric features of their respective minimum-area rectilinear/ isothetic polygonal covers corresponding to a few judiciously selected values of the grid spacing, g. As the shape and size of a polygonal cover depends on g (lower the value of g, tighter is the polygonal cover, and vice versa), and each isothetic polygon is represented by an ordered sequence of its vertices, the spatial relationship of the polygons corresponding to a higher grid spacing with those corresponding to a lower one, is performed using an appropriate geometric analysis of the vertex sequences representing these polygons. Some results on a few datasets are shown in Figures 1 and 2 for a preliminary idea. After discussing the important techniques related with document image segmentation and analysis in the next section, we have explained the major steps of our algorithm in the section of Proposed Method. Experimental results have been presented in the Results section to show the strength and efficacy of the algorithm. Concluding notes and further works that may be benefited out of the proposed algorithm have been pointed out in the section of Conclusion.

Figure 1.

A document page and its set of outer isothetic covers with their containment relation for different grid sizes

Figure 2.

(a, b) A tabular structure and its detected components. (c, d) A graphics object and its detected components

Complete Chapter List

Search this Book: