Cancer Biomarker Assessment Using Evolutionary Rough Multi-Objective Optimization Algorithm

Cancer Biomarker Assessment Using Evolutionary Rough Multi-Objective Optimization Algorithm

Anasua Sarkar (Government College of Engineering and Leather Technology, India) and Ujjwal Maulik (Jadavpur University, India)
DOI: 10.4018/978-1-4666-7258-1.ch016
OnDemand PDF Download:
No Current Special Offers


A hybrid unsupervised learning algorithm, which is termed as Evolutionary Rough Multi-Objective Optimization (ERMOO) algorithm, is proposed in this chapter. It comprises a judicious integration of the principles of the rough sets theory with the archived multi-objective simulated annealing approach. While the concept of boundary approximations of rough sets in this implementation deals with the incompleteness in the dynamic classification method with the quality of classification coefficient as the classificatory competence measurement, it enables faster convergence of the Pareto-archived evolution strategy. It incorporates both the rough set-based dynamic archive classification method in this algorithm. A measure of the amount of domination between two solutions is incorporated in this chapter to determine the acceptance probability of a new solution with an improvement in the spread of the non-dominated solutions in the Pareto-front by adopting rough sets theory. The performance is demonstrated on real-life breast cancer dataset for identification of Cancer Associated Fibroblasts (CAFs) within the tumor stroma, and the identified biomarkers are reported. Moreover, biological significance tests are carried out for the obtained markers.
Chapter Preview


The progress of microarray technology in the field of cancer research has enabled scientists to measure the molecular signatures of cancer cells. The scientists today monitor the expression levels for differentially expressed cancer genes simultaneously over different time points under different drug treatments (Tusher, 1940). In microarray analysis, the expression levels of two genes may rise and fall synchronously in response to environmental stimuli (Tusher, 1940), (Eisen, 1998). The efficient machine learning classifiers help in the diagnosis of cancer sub types for patients (Spang, 2003).

In recent times, researchers experiment for developing computational methods for analysis of RNA and gene expression profiles for oncology detection. Such computational methods are expected to promote the experimental work that needs to be carried out in the wet laboratory for analyzing biomarker RNAs. Gene expression profiling of breast tumors stratifies into breast cancer of different molecular subtypes which also co-segregate with the receptor status of the tumor cells. Therefore cancer associated fibroblasts (CAFs) within the tumor stroma may exhibit subtype specific gene expression profiles. These onco-RNA signatures may be further analyzed to find out the most significant oncological biomarkers computationally.

Clustering is one unsupervised classification method based on maximum intra-class similarity and minimum inter-class similarity. Historically Eisen et al. (Eisen, 1998) first classified groups of co-expressed genes using hierarchical clustering. Other already proposed clustering, which can be applied for cancer subtype detection are: self-organizing map (SOM) (Spang, 2003), K-Means clustering (Tavazoie, 2001), (Hoon, 2004), simulated annealing (Lukashin, 1999), graph theoretic approach (Xu, 1999), fuzzy c-means clustering (Dembele, 2003), spectral clustering (Maulik, 2013), (Sarkar, 2011), scattered object clustering (de Souto, 2008) and symmetry based clustering (Maulik, 2012), (Sarkar, 2009). Several other methods like (Maulik, 2009), (SarKar, 2009), (Bandyopadhyay, 2010) are also which may be applicable efficiently for cancer subtype detection problem.

Key Terms in this Chapter

Simulated Annealing: A probabilistic method to find global minimum of a cost function that may possess several local minima.

Clustering: Assigning similar elements to one group, which increases intra-cluster similarity and decreases inter-cluster similarity.

Multi-Objective Optimization: Method to find out a set of representative Pareto optimal solutions and quantify the trade-offs in satisfying the different multiple objectives.

Validity Index: Index to estimate compactness of the clusters, leading to properly identified distinguishable clusters.

Gene Expression Data: Conversion data from encoded gene to messenger RNA and then to protein.

Cancer Biomarker: A substance or process which indicates presence of cancer in body, including genetic, epigenetic, proteomic, glycomic, and imaging biomarkers.

Rough Set: Set of elements which lie between lower and upper approximations of a crisp set according to rough set theory by Pawlak.

Complete Chapter List

Search this Book: