From Image to XML: Monitoring a Page Layout Analysis Approach for the Visually Impaired

From Image to XML: Monitoring a Page Layout Analysis Approach for the Visually Impaired

Robert Keefer, Nikolaos Bourbakis
DOI: 10.4018/ijmstr.2014010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Page layout analysis and the creation of an XML document from a document image are useful for many applications including the preservation of archived documents, robust electronic access to printed documents, and access to print materials by the visually impaired. In this paper, the authors describe a document image process pipeline comprised of techniques for the identification of article headings and the related body text, the aggregation of the body text with the headings, and the creation of an XML document. The pipeline was developed to support multiple document images captured by the head-mounted cameras of a reading device for the visually impaired. Both automatic and manual adaptations of the pipeline processed a sample of 25 newspaper document images. By comparing the automatic and manual processes, we show that overall our approach generates high-quality XML encoded documents for use in further processing, such as a text-to-speech for the visually impaired.
Article Preview
Top

1. Introduction

Page layout analysis and the creation of an XML document from a document image are useful for many applications including the preservation of archived documents (Wang, et al., 2009) and accessibility by those with visual impairments. TYFLOS (Keefer, et al., 2009a,b) is a prototype wearable mobile reading device for the visually impaired. TYFLOS is equipped with two web cameras mounted into a pair of glasses and the software for performing document image rectification and segmentation. Traditional document image analysis techniques play an important role in the operation of the TYFLOS prototype, including document image capture, binarization, page perspective correction in 3-dimensions, page curl correction, and page segmentation. In this paper we describe techniques for headline identification, page segment aggregation, and the creation of an XML document from the document image. The XML document supports various forms of interaction with the text of the document, including a voice user interface (Keefer, et al., 2013).

Much work has been performed to identify headlines within web sites and document images. This work has been in the context of both improving access to documents for the visually impaired, as well as the digital access of archived documents. For example, Brudvick, et al. (2008) have developed a method to predict whether web page content semantically functions as a headline by considering the visual features of text when rendered in a browser. Similarly, Kohlschütter, et al. (2010) describe a method for identifying text elements within a web page.

Document segmentation has been of interest to the document image processing community for many years. O’Gorman’s (1993) Docstrum method offered an original and well organized analysis of document layout analysis based on K-nearest neighbors to identify connected components and from these to identify regions of text. Akram et al. (2010) offer a review on the way to process a document and generally segment the layout area. In another approach, Winder et al. (2011) describe a method for page segmentation based on an analysis of the Voronoi zones of a histogram of the connected component heights of image segments. Similarly, Breuel et al. (2011) also patented a method for document image layout deconstruction. Finally, Ferilli, et al. (2011) apply supervised machine learning techniques to document image layout analysis.

For the purposes of supporting robust interaction with document images converted to XML, Ishitani (2003) proposed a method for transforming a document image into XML. This method extracts document elements such as title, headings, and body text from a document image. The hierarchical structure of the document is also extracted and described by a document object model (DOM). The XML document is created through a set of transforms applied to the extracted document elements and the DOM.

WISDOM++ (Altamura, et al., 2001) is a document processing system that performs document analysis, classification, and text transformation to generate an XML document from a document image. Agrawal and Doermann (2010) also discuss a method for page segmentation that produces GEDI XML files.

To create an XML document from the document image, a document image segmentation method must separate images from text, identify headings within the document image, and identify article content within the document image. The methods described in (Ishitani, 2003), (Altamura, et al., 2001), (Agrawal and Doermann, 2010), (Antonacopoulos and Karatzas, 2004), and (Pletschacher and Antonacopoulos, 2010) all rely on robust document analysis methods to identify the structure and format of the document image, followed by an OCR step to convert the text within a segment to XML.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing