Finding Patterns in Class-Labeled Data Using Data Visualization
Gregor Leban (University of Ljubljana, Slovenia), Minca Mramor (University of Ljubljana, Slovenia), Blaž Zupan (University of Ljubljana, Slovenia), Janez Demšar (University of Ljubljana, Slovenia) and Ivan Bratko (University of Ljubljana, Slovenia)
Copyright: © 2008
Data visualization plays a crucial role in data mining and knowledge discovery. Its use is, however, often difficult due to the large number of possible data projections. Manual search through such sets of projections can be prohibitively timely or even impossible, especially in the data analysis problems that comprise many data features. The chapter describes a method called VizRank, which can be used to automatically identify interesting data projections for multivariate visualizations of class-labeled data. VizRank assigns a score of interestingness to each considered projection based on the degree of separation of data instances with different class label. We demonstrate the usefulness of this approach on six cancer gene expression data sets, showing that the method can reveal interesting data patterns and can further be used for data classification and outlier detection.