An XML-Based Database for Knowledge Discovery: Definition and Implementation

An XML-Based Database for Knowledge Discovery: Definition and Implementation

Rosa Meo (Università di Torino, Italy) and Giuseppe Psaila (Università di Bergamo, Italy)
Copyright: © 2009 |Pages: 17
DOI: 10.4018/978-1-60566-098-1.ch017


Inductive databases have been proposed as general purpose databases to support the KDD process. Unfortunately, the heterogeneity of the discovered patterns and of the different conceptual tools used to extract them from source data make the integration in a unique framework difficult. In this chapter, we explore the feasibility of using XML as the unifying framework for inductive databases, and propose a new model, XML for data mining (XDM). We show the basic features of the model, based on the concepts of data item (source data and patterns) and statement (used to manage data and derive patterns). We make use of XML namespaces (to allow the effective coexistence and extensibility of data mining operators) and of XML-schema, by means of which we can define the schema, the state and the integrity constraints of an inductive database.
Chapter Preview


For centuries, humans have been feeling, recording and studying earthquake phenomena. Taking into account that at least one earthquake of magnitude M < 3 (M > 3) occurs every one second (every ten minutes, respectively) worldwide, the seismic data collection is huge and rapidly increasing. Scientists record this information in order to describe and study tectonic activity, which is described by recording attributes about geographic information (epicenter location and disaster areas), time of event, magnitude, depth, an so forth.

On the other hand, computer engineers specialized in the area of Information & Knowledge Management find an invaluable “data treasure”, which they can process and analyze helping in the discovery of knowledge from this data. Recently, a number of applications for the management and analysis of seismological or, in general, geophysical data, have been proposed in the literature by Andrienko and Andrienko (1999), Kretschmer and Roccatagliata (2000), Theodoridis (2003), and Yu (2005). In general, the collaboration between the data mining community and physical scientists has been only recently launched (Behnke & Dobinson, 2000).

Desirable components of a so-called seismic data management and mining system (SDMMS) include tools for quick and easy data exploration and inspection, algorithms for generating historic profiles of specific geographic areas and time periods, techniques providing the association of seismic data with other geophysical parameters of interest, such as geological morphology, and top line visualization components using geographic and other thematic-oriented (e.g., topological and climatic) maps for the presentation of data to the user and supporting sophisticated user interaction.

In summary, we classify users that an SDMMS should support in three profiles:

  • Researchers of geophysical sciences, interested in constructing and visualizing seismic profiles of certain regions during specific time periods or in discovering regions of similar seismic behavior.

  • Public administration officers, requesting for information such as distances between epicenters and other demographical entities (schools, hospitals, heavy industries, etc.).

  • Citizens (“Web surfers”), searching for seismic activity, thus querying the system for seismic properties of general interest, for example, for finding all epicenters of earthquakes in distance no more than 50Km from their favorite place.

The availability of systems following the proposed SDMMS architecture provides users a wealth of information about earthquakes assisting in awareness and understanding, two critical factors for decision making, either at individual or at administration level.

The rest of the article is organized as follows. Initially, we sketch a desired SDMMS architecture, including its database and data warehouse design. The section that follows, presents querying, online analytical processing (OLAP) and data mining functionality an SDMMS could offer, putting emphasis on the support of decision making. Furthermore, we survey and compare proposed systems and tools found in the literature for the management of seismological or, in general, earth science data. Conclusions are drawn in the last section.


The Architecture Of A Seismic Data Management And Mining System

Earthquake phenomena are instantly recorded by a number of organizations (e.g., Institutes of Geodynamics and Schools of Physics) worldwide. The architecture of a SDMMS might allow for the integration of several remote sources. The aim is to collect and analyze the most accurate seismic data among different sources. Obviously, some sources provide data about the same earthquakes though with slight differences in their details (e.g., the magnitude or the exact timestamp of the recorded earthquake). SDMMS should be able to integrate the remote sources in a proper way by refining and homogenizing raw data.

Complete Chapter List

Search this Book: