The chapter concentrates on the use of swarm intelligence in data mining. It focuses on the problem of medical data clustering. Clustering is a constantly growing area of current research. Medicine, market, trade, and meteorology belong to the numerous fields that benefit of its techniques. First an introduction into data mining and cluster validation techniques is presented, followed by a review of ant-inspired concepts and applications. The chapter provides a reasonably deep insight into the most successful ant colony and swarm intelligence concepts, their paradigms and application. The authors present discussion, evaluation and comparison of these techniques. Important applications and results recently achieved are provided. Finally, new and prospective future directions in this area are outlined and discussed.
This chapter concentrates on the use of swarm intelligence in data mining. It focuses on the problem of data clustering in biomedical data processing. Clustering is a constantly growing area of current research. Medicine, market, trade, and meteorology are some of the numerous fields that benefit of its techniques.
The objective of this chapter is to introduce the methods for clustering together with the methods for evaluation of different clusterings. It presents the fundamentals of ant inspired methods, followed by a compact review of the basic ant clustering models together with the most successful variations and modifications. In the second part, it presents the application of ant-colony clustering in biomedical data processing.
In the last two decades, many advances in computer sciences have been based on the observation and emulation of processes of the natural world.
The coordination of an ant colony is of local nature, composed mainly of indirect communication through pheromone (also known as stigmergy; the term has been introduced by Grassé et al. (1959)), although direct interaction communication from ant to ant (in the form of antennation) and direct communication have also been observed (Trianni, Labella, & Dorigo, 2004). In studying these paradigms, we have high chance to discover inspiring concepts for many successful metaheuristics. More information on the ant colony metaphors can be found in the section Ant Colony Optimization.
The author himself specializes on the use of such kind of methods in the area of biomedical data processing. The application is described in the section Applications.
The chapter is organized as follows: First, an introduction to data mining and clustering is presented together with a brief survey on ant colony inspired methods in clustering. Then, a natural background of applied methods is presented. It summarizes the most important properties of ant colonies that served as an inspiration source for many algorithms that are described in the following part. The next section describes the most successful methods in data clustering: first, the pioneering ant-inspired clustering algorithms are described followed by the evolution of further ant-inspired algorithms for clustering. Finally, applications of the algorithms and paradigms published by the author are presented, followed by conclusion and future directions. At the end, relevant literature has been carefully selected to provide the reader with additional resources containing the state-of-the-art information in the area.
Key Terms in this Chapter
Stigmergy: Indirect communication in social insect communities via changing the environment. The term has been introduced by Grasse et al.
Cluster Validity Measures: Indices to measure the quality of clustering obtained. When the correct classification is known, SSE measure (sum of square of errors) or accuracy can be used. If it is not known, other measures can used, such as Davies-Bouldin index, Dunn Index, Silhouette index, Mutual information, etc.
Data Mining: Important branch in industry and market, retrieving important information from a huge amount of data. It is usually considered with huge amount of heterogeneous data, where the use of computers is inevitable.
Pheromone: Chemical substance deposited by ants to mark their path and the importance of prey found. The ants are nearly blind, they mainly sense the amount of the pheromone deposited.
Clustering: Automated process for grouping similar data together. It minimizes the intra-cluster distance while maximizing the inter-cluster distance. It is a multi-objective optimization, some instances are NP hard (when the number of classes is higher than two).
Bioinspired Informatics: Scientific branch with industry applicability that tries to reproduce the mental processes of the brain and biogenesis respectively, in a computer environment. These methods are used to solve NP-hard problems with exponential complexity.
Social Insect: Insect which is not able to survive on its own; however in colonies it provides astonishing solutions. Usually, the behavior of an individual is very simple, however, on the colony level it shows interesting behavior (traverse open spaces, determine shortest path, etc.).