Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Sammon Error

Encyclopedia of Artificial Intelligence
Error function to maximize structure preservation in projected data. It is defined as,where ij and ij are dissimilarity measures between two objects i, j in the original and projected space, respectively
Published in Chapter:
Neural Network-Based Visual Data Mining for Cancer Data
Enrique Romero (Technical University of Catalonia, Spain), Julio J. Valdés (National Research Council Canada, Canada), and Alan J. Barton (National Research Council Canada, Canada)
Copyright: © 2009 |Pages: 7
DOI: 10.4018/978-1-59904-849-9.ch176
Abstract
According to the World Health Organization (http:// www.who.int/cancer/en), cancer is a leading cause of death worldwide. From a total of 58 million deaths in 2005, cancer accounts for 7.6 million (or 13%) of all deaths. The main types of cancer leading to overall cancer mortality are i) Lung (1.3 million deaths/year), ii) Stomach (almost 1 million deaths/year), iii) Liver (662,000 deaths/year), iv) Colon (655,000 deaths/year) and v) Breast (502,000 deaths/year). Among men the most frequent cancer types worldwide are (in order of number of global deaths): lung, stomach, liver, colorectal, oesophagus and prostate, while among women (in order of number of global deaths) they are: breast, lung, stomach, colorectal and cervical. Technological advancements in recent years are enabling the collection of large amounts of cancer related data. In particular, in the field of Bioinformatics, high-throughput microarray gene experiments are possible, leading to an information explosion. This requires the development of data mining procedures that speed up the process of scientific discovery, and the in-depth understanding of the internal structure of the data. This is crucial for the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayyad, Piatesky-Shapiro & Smyth, 1996). Researchers need to understand their data rapidly and with greater ease. In general, objects under study are described in terms of collections of heterogeneous properties. It is typical for medical data to be composed of properties represented by nominal, ordinal or real-valued variables (scalar), as well as by others of a more complex nature, like images, time-series, etc. In addition, the information comes with different degrees of precision, uncertainty and information completeness (missing data is quite common). Classical data mining and analysis methods are sometimes difficult to use, the output of many procedures may be large and time consuming to analyze, and often their interpretation requires special expertise. Moreover, some methods are based on assumptions about the data which limit their application, specially for the purpose of exploration, comparison, hypothesis formation, etc, typical of the first stages of scientific investigation. This makes graphical representation directly appealing. Humans perceive most of the information through vision, in large quantities and at very high input rates. The human brain is extremely well qualified for the fast understanding of complex visual patterns, and still outperforms the computer. Several reasons make Virtual Reality (VR) a suitable paradigm: i) it is flexible (it allows the choice of different representation models to better suit human perception preferences), ii) allows immersion (the user can navigate inside the data, and interact with the objects in the world), iii) creates a living experience (the user is not merely a passive observer, but an actor in the world) and iv) VR is broad and deep (the user may see the VR world as a whole, and/or concentrate on specific details of the world). Of no less importance is the fact that in order to interact with a virtual world, only minimal skills are required. Visualization techniques may be very useful for medical decisión support in the oncology area. In this paper unsupervised neural networks are used for constructing VR spaces for visual data mining of gene expression cancer data. Three datasets are used in the paper, representative of three of the most importanttypes of cancer in modern medicine: liver, stomach and lung. The data sets are composed of samples from normal and tumor tissues, described in terms of tens of thousands of variables, which are the corresponding gene expression intensities measured in microarray experiments. Despite the very high dimensionality of the studied patterns, high quality visual representations in the form of structure-preserving VR spaces are obtained using SAMANN neural networks, which enables the differentiation of cancerous and noncancerous tissues. The same networks could be used as nonlinear feature generators in a preprocessing step for other data mining procedures.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR