Discovering Spatio-Textual Association Rules in Document Images

Discovering Spatio-Textual Association Rules in Document Images

Donato Malerba (Università degli Studi di Bari, Italy), Margherita Berardi (Università degli Studi di Bari, Italy) and Michelangelo Ceci (Università degli Studi di Bari, Italy)
Copyright: © 2008 |Pages: 22
DOI: 10.4018/978-1-59904-162-9.ch008
OnDemand PDF Download:


This chapter introduces a data mining method for the discovery of association rules from images of scanned paper documents. It argues that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of both the textual content and the layout structure and the logical structure. Therefore, it proposes a method where both the spatial information derived from a complex document image analysis process (layout analysis), and the information extracted from the logical structure of the document (document image classification and understanding) and the textual information extracted by means of an OCR, are simultaneously considered to generate interesting patterns. The proposed method is based on an inductive logic programming approach, which is argued to be the most appropriate to analyze data available in more than one modality. It contributes to show a possible evolution of the unimodal knowledge discovery scheme, according to which different types of data describing the units of analysis are dealt with through the application of some preprocessing technique that transform them into a single double entry tabular data.

Complete Chapter List

Search this Book:
Table of Contents
Pascal Poncelet, Maguelonne Teisseire, Florent Masseglia
Chapter 1
Dan A. Simovici
This chapter presents data mining techniques that make use of metrics defined on the set of partitions of finite sets. Partitions are naturally... Sample PDF
Metric Methods in Data Mining
Chapter 2
Osmar R. Zaïane, Mohammed El-Hajj
Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be... Sample PDF
Bi-Directional Constraint Pushing in Frequent Pattern Mining
Chapter 3
Hui Xiong, Pang-Ning Tan, Vipin Kumar, Wenjun Zhou
This chapter presents a framework for mining highly-correlated association patterns named hyperclique patterns. In this framework, an objective... Sample PDF
Mining Hyperclique Patterns: A Summary of Results
Chapter 4
Simona Este Rombo, Luigi Palopoli
In the last years, the information stored in biological data-sets grew up exponentially, and new methods and tools have been proposed to interpret... Sample PDF
Pattern Discovery in Biosequences: From Simple to Complex
Chapter 5
Gregor Leban, Minca Mramor, Blaž Zupan, Janez Demšar, Ivan Bratko
Data visualization plays a crucial role in data mining and knowledge discovery. Its use is, however, often difficult due to the large number of... Sample PDF
Finding Patterns in Class-Labeled Data Using Data Visualization
Chapter 6
Yeow Wei Choong, Anne Laurent, Dominique Laurent
In the context of multidimensional data, OLAP tools are appropriate for the navigation in the data, aiming at discovering pertinent and abstract... Sample PDF
Summarizing Data Cubes Using Blocks
Chapter 7
Yutaka Matsuo, Junichiro Mori, Mitsuru Ishizuka
This chapter describes social network mining from the Web. Since the end of the 1990s, several attempts have been made to mine social network... Sample PDF
Social Network Mining from the Web
Chapter 8
Donato Malerba, Margherita Berardi, Michelangelo Ceci
This chapter introduces a data mining method for the discovery of association rules from images of scanned paper documents. It argues that a... Sample PDF
Discovering Spatio-Textual Association Rules in Document Images
Chapter 9
Mining XML Documents  (pages 198-219)
Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie Christine Rousset, Alexandre Termier, Anne-Marie Vercoustre
XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the... Sample PDF
Mining XML Documents
Chapter 10
Sascha Schulz, Myra Spiliopoulou, Rene Schult
We study the issue of discovering and tracing thematic topics in a stream of documents. This issue, often studied under the label “topic evolution”... Sample PDF
Topic and Cluster Evolution Over Noisy Document Streams
Chapter 11
Cyrille J. Joutard, Edoardo M. Airoldi, Stephen E. Edoardo M., Tanzy M. Love
Statistical models involving a latent structure often support clustering, classification, and other data-mining tasks. Parameterizations... Sample PDF
Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice
About the Editors
About the Contributors