On the Usage of Structural Information in Constrained Semi-Supervised Clustering of XML Documents

On the Usage of Structural Information in Constrained Semi-Supervised Clustering of XML Documents

Eduardo Bezerra (CEFET/RJ, Federal Center of Technological Education CSF, Brazil), Geraldo Xexéo (Programa de Sistemas, COPPE, UFRJ, Institute of Mathematics, UFRJ, Brazil) and Marta Mattoso (Programa de Sistemas, COPPE/UFRJ, Brazil)
Copyright: © 2008 |Pages: 20
DOI: 10.4018/978-1-59904-645-7.ch004
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, we consider the problem of constrained clustering of documents. We focus on documents that present some form of structural information, in which prior knowledge is provided. Such structured data can guide the algorithm to a better clustering model. We consider the existence of a particular form of information to be clustered: textual documents that present a logical structure represented in XML format. Based on this consideration, we present algorithms that take advantage of XML metadata (structural information), thus improving the quality of the generated clustering models. This chapter also addresses the problem of inconsistent constraints and defines algorithms that eliminate inconsistencies, also based on the existence of structural information associated to the XML document collection.

Complete Chapter List

Search this Book:
Reset