Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

XML Mining for Semantic Web

Rafael Berlanga, Victoria Nebot

Source Title: Data Mining: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-4666-2455-9.ch031

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter describes the convergence of two influential technologies in the last decade, namely data mining (DM) and the Semantic Web (SW). The wide acceptance of new SW formats for describing semantics-aware and semistructured contents have spurred on the massive generation of semantic annotations and large-scale domain ontologies for conceptualizing their concepts. As a result, a huge amount of both knowledge and semantic-annotated data is available in the web. DM methods have been very successful in discovering interesting patterns which are hidden in very large amounts of data. However, DM methods have been largely based on simple and flat data formats which are far from those available in the SW. This chapter reviews and discusses the main DM approaches proposed so far to mine SW data as well as those that have taken into account the SW resources and tools to define semantics-aware methods.

Chapter Preview

Top

Introduction

XML (Bray, Paoli, Sperberg-McQueen, & Maler, 2000) has been extensively used to represent and publish semistructured data across the Web both in the academic and business communities as it provides inter-operability and a well-defined, extensible and machine-readable format. The widespread adoption of XML as the de-facto standard has prompted the development of new techniques that address the problem of XML management and knowledge discovery. Many research efforts have been directed towards mining the structure of XML documents as a way to integrate data sources based on structure similarity. As a step forward, content features borrowed from the text mining field have been introduced to enrich the process of XML mining. However, the increase in volume and heterogeneity of XML-based applications demands new analysis techniques that consider semantic features in the process of knowledge discovery so that more meaningful analysis can be performed.

On the other hand, the Web of Data is currently coming into existence, as opposed to the classical Web of documents, through the Linked Data effort (Bizer, Heath, & Berners-Lee, 2009). The general idea is to extend the Web by creating typed entities and links between data resources in a way that is machine-readable and the meaning (i.e., semantics) is explicitly defined. This new data model, whose representation formats rely on XML, opens a new range of challenges and opportunities in the data mining and knowledge discovery area.

The aim of this chapter is to review the literature and discuss how semantic features have been incorporated and dealt with in the process of mining complex structured and semistructured data. From the data viewpoint, we provide a state-of-the-art review on approaches focused both on mining complex semistructured data (i.e., XML sources) and SW data. We conceive SW data as both formal knowledge resources that have been created with clear and well-defined semantics (e.g., an ontology conceptualizing the human anatomy) and also structured, semistructured or unstructured data that has been a posteriori enriched with semantics (i.e., linked to a semantic knowledge resource as claimed in the Linked Data effort) through the process of semantic annotation.

We believe the integration of heterogeneous data sources into a common semantic formalism, as is OWL-DL, provides a great asset for enhancing the knowledge discovery process. We discuss all the benefits provided by ontologies and knowledge representation formalisms (e.g., OWL-DL) and claim that semantics should be taken into account during the whole mining process.

Semantics-aware mining is a very young and novel field of research. The aim of this chapter is to show how well known statistics-based techniques from artificial intelligence (e.g., clustering, association rules, etc.) can benefit from inferred information coming from logic-based approaches followed in the Semantic Web. We provide a state-of-the-art review structured according to the mining phase in which semantics is incorporated.

The chapter is organized as follows. First we introduce the motivation of integrating knowledge resources and data mining algorithms. Afterwards, we introduce the semantic web scenario which serves as the technological platform for all the semantics-aware mining methods. Taking into account this scenario, we organize and discuss the existing literature according to the mining phase in which semantics is incorporated. Finally, we give some future trends and conclusions.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

XML Mining for Semantic Web

Abstract

Introduction

Complete Chapter List