Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Organizing XML Documents on a Peer–to–Peer Network by Collaborative Clustering

Francesco Gullo, Giovanni Ponti, Sergio Greco

Source Title: XML Data Mining: Models, Methods, and Applications

DOI: 10.4018/978-1-61350-356-0.ch018

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this chapter we address the problem of clustering XML documents in a collaborative distributed environment. We developed a clustering framework for XML sources distributed on a P2P network. XML documents are modeled based on a transactional representation which uses both XML structure and content information. The clustering method employs a centroid-based partitional scheme suitably adapted to work on a P2P network. Each peer is enabled to compute a clustering solution over its local repository and to exchange the resulting cluster representatives with the other peers. The exchanged cluster representatives are hence used to compute the global clustering solution in a collaborative way. Effectiveness and efficiency of the framework were evaluated on real XML document collections varying the number of peers. Experimental results have shown significant improvements of our collaborative distributed algorithm with respect to the centralized clustering setting in terms of execution time, achieving clustering solutions that still remain accurate with a moderately low number of nodes in the network.

Chapter Preview

Top

Introduction

The extensibility of its markup functionalities along with its natural capability of representing complex real-world objects and their relationships are the keys of success of XML in enabling the development of domain-specific markup languages. This has had a strong impact on the role of XML in the Internet, where XML languages have been developed for a large variety of domain applications, ranging from multimedia and networking to Web content syndication and rendering, from scientific data representation and literature to business processes.

In recent years, the use of XML for data representation and exchange has become central in high-demand environments. On the one hand, the growing availability of large XML document repositories has raised the need for fast and accurate organization of such data. In this respect, research on XML document clustering has produced a variety of approaches and methods, with different focuses on aspects such as the structure and/or content type of XML features, the XML data representation and summarization model, the XML similarity measures, and the strategy of clustering that was able to such special requirements as dealing with large document collections and high dimensionality, ease for browsing, meaningfulness of cluster descriptions (Candillier, Tellier, & Torre, 2005; Denoyer & Gallinari, 2008; Doucet & Lehtonen, 2006; Kutty, Tran, Nayak, & Li, 2008; Lian, Cheung, Mamoulis, & Yiu, 2004; Nayak & Xu, 2006; Tran, Nayak, Bruza, 2008; Tagarelli & Greco, 2010).

On the other hand, the inherently distributed nature of XML repositories is also calling for adequate distribute processing techniques that can aid the efficient management and mining of XML data. As an example, think of some Web news services that are in charge of very frequently gathering up-to-date information spanning over thousands of news sources: if such services aim to highlight (new) hot topics through the news channels or provide the users with a (personalized) view on the news headlines, they might be required to apply clustering algorithms to the news articles with a frequency of few minutes.

Clustering XML documents in such high-demand environments is hence challenging as the algorithms developed are to be able to face tight requirements on both processing power and space resources. Existing methods for clustering XML documents are instead designed as centralized systems, which is mainly due to the difficulty of decentralizing most clustering strategies and, additionally, to a number of issues arising in the development of a convenient yet effective summarization of both XML structure and content information in XML documents.

XML distributed applications are increasingly being demanded in several domains, such as software and multimedia sharing, product rating, personal profiling, and many others. A great merit of such an extensive use of XML in distributed applications is due to the popularity of peer–to–peer (P2P) networks. A P2P network is a distributed system with the following main properties (Rodrigues & Druschel, 2010). It has a high degree of decentralization, since processing power, bandwidth and space resources are contributed by the peers, which implement both client and server functionalities of the system. In general, there can be a high heterogeneity of resources, in terms of hardware and software architecture, power supply, geographic location. A P2P is mostly self-organizing, since any newly introduced peer node requires little or no manual configuration for the system maintenance. Multiple administrative domains usually characterize the system as the peers are not owned or controlled by a single organization. The deployment costs of a P2P system are typically lower than client-server systems, thanks to its independence of dedicated infrastructure, while the upgrade of the system components is made easier. Because there are few if any peers with centralized state, the P2P system is also more resilient to faults and attacks.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Organizing XML Documents on a Peer–to–Peer Network by Collaborative Clustering

Abstract

Introduction

Complete Chapter List