Exploring Similarities Across High-Dimensional Datasets

Exploring Similarities Across High-Dimensional Datasets

Karlton Sequeira (Rensselaer Polytechnic Institute, USA) and Mohammed J. Zaki (Rensselaer Polytechnic Institute, USA)
Copyright: © 2007 |Pages: 32
DOI: 10.4018/978-1-59904-271-8.ch003


Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.

Complete Chapter List

Search this Book: