Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration

Huimin Zhao, Sudha Ram

Source Title: Advanced Topics in Database Research, Volume 5

ISBN13: 9781591409359|ISBN10: 1591409357|ISBN13 Softcover: 9781591409366|EISBN13: 9781591409373

DOI: 10.4018/978-1-59140-935-9.ch013

MLA

Zhao, Huimin, and Sudha Ram. "Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration." Advanced Topics in Database Research, Volume 5, edited by Keng Siau, IGI Global, 2006, pp. 227-248. https://doi.org/10.4018/978-1-59140-935-9.ch013

APA

Zhao, H. & Ram, S. (2006). Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration. In K. Siau (Ed.), Advanced Topics in Database Research, Volume 5 (pp. 227-248). IGI Global. https://doi.org/10.4018/978-1-59140-935-9.ch013

Chicago

Zhao, Huimin, and Sudha Ram. "Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration." In Advanced Topics in Database Research, Volume 5, edited by Keng Siau, 227-248. Hershey, PA: IGI Global, 2006. https://doi.org/10.4018/978-1-59140-935-9.ch013

Export Reference

Favorite

View Full Text PDF

Abstract

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration

MLA

APA

Chicago

Export Reference

Abstract

Request Access