An Interpretation Process for Clustering Analysis Based on the Ontology of Language
Wagner Francisco Castilho (Federal University of Rio Grande do Sul (UFRGS), Brazil and Federal Savings Bank (CEF), Brazil), Gentil José de Lucena Filho (Catholic University of Brasília, Brazil), Hércules Antonio do Prado (Catholic University of Brasilia, Brazil and Embrapa Food Technology, Brazil) and Edilson Ferneda (Catholic University of Brasilia, Brazil)
Copyright: © 2008
Clustering analysis (CA) techniques consist in, given a set of objects, estimating dense regions of points separated by sparse regions, according to the dimensions that describe these objects. Independently from the data nature – structured or non-structured -, we look for homogenous clouds of points, that define clusters, from which we want to extract some meaning. In other words, when doing CA, the analyst is searching for underlying structures in a multidimensional space for what one could assign some meaning. Grossly, to carry a CA application, two main activities are involved: generating clusters configurations by means of an algorithm and interpreting these configurations in order to approximate a solution that could contribute with the CA application objective. Generating a clusters configuration is typically a computational task, while the interpretation task brings a strong burden of subjectivity. Many approaches are presented in the literature for generating clusters configuration. Unfortunately, the interpretation task has not received so much attention, possibly due to the difficulty in modeling something that is subjective in nature. In this chapter a method to guide the interpretation of a clusters configuration is proposed. The inherent subjectivity is approached directly by describing the process with the apparatus of the Ontology of Language. The main contribution of this chapter is to provide a sound conceptual basis to guide the analyst in extracting meaning from the patterns found in a set of data, no matter we are talking about data bases, a set of free texts, or a set of web pages.