An Ultra-Fast Method for Clustering of Big Genomic Data

An Ultra-Fast Method for Clustering of Big Genomic Data

Billel Kenidra (National Superior Institute of Computer Science (ESI), Constantine, Algeria) and Mohamed Benmohammed (Lire Laboratory, University of Constantine-2, Constantine, Algeria)
Copyright: © 2020 |Pages: 16
DOI: 10.4018/IJAMC.2020010104


The clustering process is used to identify cancer subtypes based on gene expression and DNA methylation datasets, since cancer subtype information is critically important for understanding tumor heterogeneity, detecting previously unknown clusters of biological samples, which are usually associated with unknown types of cancer will, in turn, gives way to prescribe more effective treatments for patients. This is because cancer has varying subtypes which often respond disparately to the same treatment. While the DNA methylation database is extremely large-scale datasets, running time still remains a major challenge. Actually, traditional clustering algorithms are too slow to handle biological high-dimensional datasets, they usually require large amounts of computational time. The proposed clustering algorithm extraordinarily overcomes all others in terms of running time, it is able to rapidly identify a set of biologically relevant clusters in large-scale DNA methylation datasets, its superiority over the others has been demonstrated regarding its relative speed.
Article Preview

1. Background

1.1. Cancer and Bioinformatics

Cancer is a genetic disease, caused by changes in genes that control the way how our cells function, especially how they grow and divide. Normally, human cells grow and divide to form new cells as the body needs them. When cells grow old or become damaged, they die, and new cells take their place. However, when cancer develops, this ordered process breaks down. As cells become more and more abnormal, old or damaged cells survive when they should die, and new cells form when they are not needed. These extra cells begin to divide without stopping and spread into surrounding tissues called tumors (Nielsen et al., 2010). Cancer is a complicated disease with complex treatments, because different causes of cancer will lead to different prognosis and need different treatments. Whereas, the same treatment for different patients with the same cancer may lead to different results (Xu et al., 2015).

The application of computer technology to the management of molecular biology is known as bioinformatics. The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular level using computational tools. Starting by storing and mining raw genomic data, and going into analyzing and interpreting relations found within data, then deducing information and discovering meaningful knowledge thereof, this knowledge is crucial for making the right decision on diagnosis and prognosis, as well as being able to generate new insights and provide a global perspective about the cell, aiming at exploring the genetic relationships of deadly diseases.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing