Exploring Similarities Across High-Dimensional Datasets

Exploring Similarities Across High-Dimensional Datasets

Karlton Sequeira (Rensselaer Polytechnic Institute, USA) and Mohammed J. Zaki (Rensselaer Polytechnic Institute, USA)
Copyright: © 2007 |Pages: 32
DOI: 10.4018/978-1-59904-271-8.ch003
OnDemand PDF Download:


Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.

Complete Chapter List

Search this Book:
Table of Contents
David Taniar
Chapter 1
Torben Pedersen, Jesper Thorhauge, Søren Jespersen
Enormous amounts of information about Web site user behavior are collected in Web server logs. However, this information is only useful if it can be... Sample PDF
Combining Data Warehousing and Data Mining Techniques for Web Log Analysis
Chapter 2
Lixin Fu
In high-dimensional data sets, both the number of dimensions and the cardinalities of the dimensions are large and data is often very sparse, that... Sample PDF
Computing Dense Cubes Embedded in Sparse Data
Chapter 3
Karlton Sequeira, Mohammed J. Zaki
Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like... Sample PDF
Exploring Similarities Across High-Dimensional Datasets
Chapter 4
Irene Ntoutsi, Nikos Pelekis, Yannis Theodoridis
Many patterns are available nowadays due to the widespread use of knowledge discovery in databases (KDD), as a result of the overwhelming amount of... Sample PDF
Pattern Comparison in Data Mining: A Survey
Chapter 5
Fedja Hadzic, Tharam Dillon, Henry Tan, Ling. Feng, Elizabeth Chang
Association rule mining is one of the most popular pattern discovery methods used in data mining. Frequent pattern extraction is an essential step... Sample PDF
Mining Frequent Patterns Using Self-Organizing Map
Chapter 6
Mafruz Ashrafi, David Taniar, Kate Smith
Association rule mining is one of the most widely used data mining techniques. To achieve a better performance, many efficient algorithms have been... Sample PDF
An Efficient Compression Technique for Vertical Mining Methods
Chapter 7
Alex Freitas, André Carvalho
In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified... Sample PDF
A Tutorial on Hierarchical Classification with Applications in Bioinformatics
Chapter 8
Daniel Wu, Xiaohua Hu
In this chapter, we report a comprehensive evaluation of the topological structure of protein-protein interaction (PPI) networks, by mining and... Sample PDF
Topological Analysis and Sub-Network Mining of Protein-Protein Interactions
Chapter 9
Yong Shi, Yi Peng, Gang Kou, Zhengxin Chen
This chapter provides an overview of a series of multiple criteria optimization-based data mining methods, which utilize multiple criteria... Sample PDF
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Chapter 10
Xiuju Fu, Lipo Wang, GihGuang Hung, Liping Goh
Classification decisions from linguistic rules are more desirable compared to complex mathematical formulas from support vector machine (SVM)... Sample PDF
Linguistic Rule Extraction from Support Vector Machine Classifiers
Chapter 11
Graph-Based Data Mining  (pages 291-307)
Wenyuan Li, Wee-Keong Ng, Kok-Leong Ong
With the most expressive representation that is able to characterize the complex data, graph mining is an emerging and promising domain in data... Sample PDF
Graph-Based Data Mining
Chapter 12
Richi Nayak
Web services have recently received much attention in businesses. However, a number of challenges such as lack of experience in estimating the... Sample PDF
Facilitating and Improving the Use of Web Services with Data Mining
About the Authors