Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

Huimin Zhao (University of Wisconsin - Milwaukee, USA) and Sudha Ram (University of Arizona, USA)
Copyright: © 2004 |Pages: 18
DOI: 10.4018/jdm.2004100105
OnDemand PDF Download:
$37.50

Abstract

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on a combination of features such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. An SOM prototype the authors have developed provides users with a visualization tool for display of clustering results as well as for incremental evaluation of candidate similar elements.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 28: 4 Issues (2017): Forthcoming, Available for Pre-Order
Volume 27: 4 Issues (2016): 3 Released, 1 Forthcoming
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing