Advanced Data Mining and Visualization Techniques with Probabilistic Principal Surfaces: Applications to Astronomy and Genetics
Antonino Staiano (University of Napoli, “Parthenope”, Italy), Lara De Vinco (Nexera S.c.p.A., Italy), Giuseppe Longo (University “Federico II” of Napoli Polo delle Scienze e della Tecnologia, Italy) and Roberto Tagliaferri (University of Salerno, Italy)
Copyright: © 2008
Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which seem to be able to overcome most of the shortcomings of other neural tools. PPS builds a probability density function of a given set of patterns lying in a high-dimensional space which can be expressed in terms of a fixed number of latent variables lying in a latent Q-dimensional space. Usually, the Q-space is either two or three dimensional and thus the density function can be used to visualize the data within it. The case in which Q = 3 allows to project the patterns on a spherical manifold which turns out to be optimal when dealing with sparse data. PPS may also be arranged in ensembles to tackle complex classification tasks. As template cases we discuss the application of PPS to two real- world data sets from astronomy and genetics.