Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Cancer Classification From DNA Microarray Using Genetic Algorithms and Case-Based Reasoning

Lilybert Machacha, Prabir Bhattacharya

Source Title: International Journal of Software Science and Computational Intelligence (IJSSCI) 13(1)

DOI: 10.4018/IJSSCI.2021010102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

There are many similarities in the symptoms of several types of cancer and that makes it sometimes difficult for the physicians to do an accurate diagnosis. In addition, it is a technical challenge to classify accurately the cancer cells in order to differentiate one type of cancer from another. The DNA microarray technique (also called the DNA chip) has been used in the past for the classification of cancer but it generates a large volume of noisy data that has many features, and is difficult to analyze directly. This paper proposes a new method, combining the genetic algorithm, case-based reasoning, and the k-nearest neighbor classifier, which improves the performance of the classification considerably. The authors have also used the well-known Mahalanobis distance of multivariate statistics as a similarity measure that improves the accuracy. A case-based classifier approach together with the genetic algorithm has never been applied before for the classification of cancer, same with the application of the Mahalanobis distance. Thus, the proposed approach is a novel method for the cancer classification. Furthermore, the results from the proposed method show considerably better performance than other algorithms. Experiments were done on several benchmark datasets such as the leukemia dataset, the lymphoma dataset, ovarian cancer dataset, and breast cancer dataset.

Article Preview

Top

1. Introduction

Different types of cancer may have similar symptoms and an accurate classification of the type of cancer is thus necessary in order to treat a patient properly. Various cancer classification techniques have been developed in the past but most of them are based on the clinical analysis of morphological symptoms (Hong & Cho, 2004) and with such methods, even a trained specialist may make diagnostic errors. In order to overcome these problems, classification techniques using human gene information have been investigated (e.g., (Ben-Dor et al., 2000; Brazma & Vilo, 2000; Park & Cho, 2003)). Gene information (usually called the “gene expression data”) could be collected by the DNA microarray technique (Amaratunga et al., 2014) and it provides useful information for the classification of different kinds of cancers. Since the original format of the data is an array of numbers, it is not easy to analyze them directly and discover useful classification rules. The DNA micro-array technology (Amaratunga et al., 2014) has been used to profile the global gene expression patterns of normal and transformed human cells in several types of cancers (Alizadeh et al., 2000; Alon et al., 1999; Bittner et al., 2000; Bubendorf et al., 1999; Golub et al., 1999; Perou et al., 2000). With the increase of cancer cases and its re-occurrence in many patients, it is clear that better and faster solutions are currently needed, which is the main motivation of our paper.

Microarray data is composed of many genes but very few samples; therefore to obtain many subsets of genes that can discriminate between different classes of samples is a multidimensional search problem. The Mahalanobis distance (e.g., Duda et al., 2001) is widely used as a multivariate outlier statistic for examining data profiles such as the learning curves, serial position effects, and group profiles, and it has a lesser confusion percentage as compared to the Euclidean distance (Campbell, 1997). The metric essentially addresses the question of whether a particular case would be considered an outlier relative to a particular set of group data. Clinicians usually compute the “z-scores” (see e.g., Mitchell, 1997, p. 235) to determine the percentile ranks (e.g., Li et al., 2000) and then correlate the client’s scores with the mean scores for a selected group. The problem with this approach is that it incorporates only the group mean-values into the computation leaving the variability within each measure, and the correlations and variability between measures are not taken into account. In effect, correlation assumes that the measures in a profile are independent of each other.

Several methods for selecting a subset of discriminative genes for sample classification have been proposed (e.g., Brown et al., 2000; Bubendorf et al., 1999; Campbell, 1997; Cho & Won, 2003; Dasarathy, 1991; Duda et al., 2001; Dudoit et al., 2000; Eisen et al., 1998; Fix & Hodges, 1951) and these researchers applied the neighborhood analysis methods to identify a subset of genes using a separation measure similar to the t-statistic. Several classification methods (both supervised and unsupervised) were applied including the K-NN (without gene selection), and support vector machine (SVM) after gene selection. A boosting technique (Freud & Schapire, 1997) was used to search for a threshold (expression level) for each gene that would maximally discriminate between two types of samples (e.g. normal versus tumor). Several machine learning techniques have been used in classifying gene expression data, including the Fisher linear discriminant (Brazma & Vilo, 2000), K-nearest neighbor (Li, Weinberg, Darde et al, 2001), decision tree, multi-layer perceptron (Duda et al., 2001; Xu, Selaru, Yin, Zou, Shustova, & Mori, 2002), support vector machine (SVM) (Brown et al., 2000; Furey et al., 2000), boosting, and the self-organizing map Golub et al., 1999; Tamayo et al., 1999. Feature selection algorithms have been used widely in building CBR classifiers in the process of removing non-formative genes (Pedersen & Moult, 1996).

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 1 Issue (2023)

Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Cancer Classification From DNA Microarray Using Genetic Algorithms and Case-Based Reasoning

Abstract

1. Introduction

Complete Article List

Cancer Classification From DNA Microarray Using Genetic Algorithms and Case-Based Reasoning

Abstract

1. Introduction

1.1. Related Methods for Cancer Classification

Complete Article List