A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

Chad A. Steed (Oak Ridge National Laboratory, USA), J. Edward Swan (Bagley College of Engineering, Mississippi State University, USA), Patrick J. Fitzpatrick (Northern Gulf Institute, Mississippi State University, USA) and T.J. Jankun-Kelly (Bagley College of Engineering, Mississippi State University, USA)
Copyright: © 2014 |Pages: 21
DOI: 10.4018/978-1-4666-4309-3.ch002


New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today’s increasingly complex, multivariate data sets. In this chapter, a visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today’s data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. This chapter provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.
Chapter Preview


A byproduct of continued technological advances is increasingly complex multivariate data sets, which, in turn, yield information overload when explored with conventional visual analysis techniques. The ability to collect, model, and store information is growing at a much faster rate than our ability to analyze it. However, the transformation of these vast volumes of data into actionable insight is critical in many domains (e.g. climate change, cyber-security, financial analysis). Without the proper techniques, analysts are forced to reduce the problem and discard layers of information in order to fit the tools. New techniques and approaches are necessary to turn today’s flood of information into opportunity.

One of the most promising solutions for the so-called big data challenge lies in the continued development of techniques in the rapidly growing field of visual analytics. Visual analytics, also known as visual data mining, combines interactive visualizations with automated analytics that help the analyst discover and comprehend patterns in complicated, heterogeneous data sets. In general, visual analytics can be described as “the science of analytical reasoning facilitated by interactive visual interfaces” (Thomas, 2005). Visual analytics seeks to combine the strengths of humans with those of machines. While methods from knowledge discovery, statistics, and mathematics drive the automated analytics, human capabilities to perceive, relate, and conclude strengthen the iterative process.

In this chapter, a novel visual data mining framework–called the Multidimensional Data eXplorer (MDX)–is presented that utilizes statistical analysis and data classification techniques in an interactive multivariate representation to improve knowledge discovery in the complex multivariate data sets that characterize today’s data (see Figure 1). In addition to intuitive interaction capabilities, this framework introduces a rich set of graphical statistical indicators, automated regression analysis, visual correlation indicators, optimal axis arrangement techniques, and data classification algorithms. These capabilities are combined into a parallel coordinates based framework for enhanced multivariate visual analysis.

Figure 1.

The Multidimensional Data eXplorer (MDX) consists of a settings panel (upper left), a data table panel (bottom), and an interactive parallel coordinates panel (upper right)

This chapter features an expanded version of MDX that builds on recent efforts in which MDX was applied to tropical cyclone climate studies. In Steed, Fitzpatrick, Jankun-Kelly, Yancey, and Swan II (2009b), the initial version of MDX, which lacked integrated statistical processes, was introduced and the system was demonstrated in a case study with a set of tropical cyclone predictors. Follow-on work by Steed, Fitzpatrick, Swan II, and Jankun-Kelly (2009a) and Steed, Swan II, Jankun-Kelly, Fitzpatrick (2009c) presented an enhanced version of MDX that included statistical analytics and deeper analysis of the previously analyzed tropical cyclone predictors, as well as analysis of a new set of predictors. In the current work, the MDX visual data mining and knowledge discovery capabilities are featured. In addition to presenting new features that facilitate visual correlation mining and automated axis arrangements, the new contributions in this work are new data classification capabilities, a novel regression analysis interface that facilitates interactive model development and confirmation, and a detailed description of the visual and automated correlation mining capabilities.

The remainder of this chapter is organized as follows. To begin, a survey of related work is given followed by a description of the cars data set–used in the examples throughout this chapter. Next, the graphical indicators of descriptive statistics are described. Then, a discussion is provided on the interactive correlation analysis indicators and interaction features that are available in the latest version of MDX. Next, the automated correlation analysis algorithms are described and the new automated data classification capabilities are discussed and demonstrated. Then, the details of the enhanced visual regression capabilities are described including the closing of the iterative regression analysis loop. Next, the optimal axis arrangement capabilities are described. Significant findings from the development and use of MDX, visual design criteria, and domain expert feedback are given and, finally, conclusions and future work are discussed.

Complete Chapter List

Search this Book: