Article Preview
TopIntroduction
The theory of Gestalt psychology is based on the idea that an experienced human mind actually makes a holistic approach to vision rather than a disintegrated approach. The mind has the ability to understand an image in such a way that the individual parts of the image produce the collective impression by assuming connections where it does not actually see one but finds necessary to have an overall perception (Sternberg, 2003). Hence, Gestalt psychology has been used recently in several research paradigms where the visual information has a significant role, e.g., musicology, automated building generation, using semi-autonomous agents to help artists express ideas, designing of web pages, etc. (Leman, 1997; Z. Li, Yan, Ai, & Chen, 2004; Mason, Denzinger, & Carpendale, 2005; Wilson, Russell, Schraefel, & Smith, 2006). We have used Gestalt properties for understanding various digital documents, which is a contemporary problem of the digital era and requires state-of-the-art technologies for its effective solution (Chaudhuri, 2007). Clearly, the solution is largely dependent upon the successful identification of all kinds of structures present in a document image and subsequently finding their associations with different components within a document. Interestingly, a document page has a striking property of admitting a characterization by the rectilinear arrangement of its major constituent components like paragraph, lines, words, tabular structures, graphics, etc. Based on this simple yet useful property, a novel geometric technique is proposed for rectilinear decomposition of different components in a document page, followed by an effective method on indexing and organizing these components for the purpose of efficient retrieval of digital documents. An efficient and meaningful segmentation of the above-mentioned components from a document image is the first step towards indexing of document pages. The second phase involves storing these geometric structures in a scientific way in order to design a robust retrieval system. Given a gray-scale document image, our algorithm performs the segmentation-cum-recognition of its different components by analyzing the geometric features of their respective minimum-area rectilinear/ isothetic polygonal covers corresponding to a few judiciously selected values of the grid spacing, g. As the shape and size of a polygonal cover depends on g (lower the value of g, tighter is the polygonal cover, and vice versa), and each isothetic polygon is represented by an ordered sequence of its vertices, the spatial relationship of the polygons corresponding to a higher grid spacing with those corresponding to a lower one, is performed using an appropriate geometric analysis of the vertex sequences representing these polygons. Some results on a few datasets are shown in Figures 1 and 2 for a preliminary idea. After discussing the important techniques related with document image segmentation and analysis in the next section, we have explained the major steps of our algorithm in the section of Proposed Method. Experimental results have been presented in the Results section to show the strength and efficacy of the algorithm. Concluding notes and further works that may be benefited out of the proposed algorithm have been pointed out in the section of Conclusion.
Figure 1. A document page and its set of outer isothetic covers with their containment relation for different grid sizes
Figure 2. (a, b) A tabular structure and its detected components. (c, d) A graphics object and its detected components