XML Mining for Semantic Web

XML Mining for Semantic Web

Rafael Berlanga (Universitat Jaume I, Spain) and Victoria Nebot (Universitat Jaume I, Spain)
Copyright: © 2013 |Pages: 25
DOI: 10.4018/978-1-4666-2455-9.ch031
OnDemand PDF Download:
No Current Special Offers


This chapter describes the convergence of two influential technologies in the last decade, namely data mining (DM) and the Semantic Web (SW). The wide acceptance of new SW formats for describing semantics-aware and semistructured contents have spurred on the massive generation of semantic annotations and large-scale domain ontologies for conceptualizing their concepts. As a result, a huge amount of both knowledge and semantic-annotated data is available in the web. DM methods have been very successful in discovering interesting patterns which are hidden in very large amounts of data. However, DM methods have been largely based on simple and flat data formats which are far from those available in the SW. This chapter reviews and discusses the main DM approaches proposed so far to mine SW data as well as those that have taken into account the SW resources and tools to define semantics-aware methods.
Chapter Preview


XML (Bray, Paoli, Sperberg-McQueen, & Maler, 2000) has been extensively used to represent and publish semistructured data across the Web both in the academic and business communities as it provides inter-operability and a well-defined, extensible and machine-readable format. The widespread adoption of XML as the de-facto standard has prompted the development of new techniques that address the problem of XML management and knowledge discovery. Many research efforts have been directed towards mining the structure of XML documents as a way to integrate data sources based on structure similarity. As a step forward, content features borrowed from the text mining field have been introduced to enrich the process of XML mining. However, the increase in volume and heterogeneity of XML-based applications demands new analysis techniques that consider semantic features in the process of knowledge discovery so that more meaningful analysis can be performed.

On the other hand, the Web of Data is currently coming into existence, as opposed to the classical Web of documents, through the Linked Data effort (Bizer, Heath, & Berners-Lee, 2009). The general idea is to extend the Web by creating typed entities and links between data resources in a way that is machine-readable and the meaning (i.e., semantics) is explicitly defined. This new data model, whose representation formats rely on XML, opens a new range of challenges and opportunities in the data mining and knowledge discovery area.

The aim of this chapter is to review the literature and discuss how semantic features have been incorporated and dealt with in the process of mining complex structured and semistructured data. From the data viewpoint, we provide a state-of-the-art review on approaches focused both on mining complex semistructured data (i.e., XML sources) and SW data. We conceive SW data as both formal knowledge resources that have been created with clear and well-defined semantics (e.g., an ontology conceptualizing the human anatomy) and also structured, semistructured or unstructured data that has been a posteriori enriched with semantics (i.e., linked to a semantic knowledge resource as claimed in the Linked Data effort) through the process of semantic annotation.

We believe the integration of heterogeneous data sources into a common semantic formalism, as is OWL-DL, provides a great asset for enhancing the knowledge discovery process. We discuss all the benefits provided by ontologies and knowledge representation formalisms (e.g., OWL-DL) and claim that semantics should be taken into account during the whole mining process.

Semantics-aware mining is a very young and novel field of research. The aim of this chapter is to show how well known statistics-based techniques from artificial intelligence (e.g., clustering, association rules, etc.) can benefit from inferred information coming from logic-based approaches followed in the Semantic Web. We provide a state-of-the-art review structured according to the mining phase in which semantics is incorporated.

The chapter is organized as follows. First we introduce the motivation of integrating knowledge resources and data mining algorithms. Afterwards, we introduce the semantic web scenario which serves as the technological platform for all the semantics-aware mining methods. Taking into account this scenario, we organize and discuss the existing literature according to the mining phase in which semantics is incorporated. Finally, we give some future trends and conclusions.

Complete Chapter List

Search this Book: