Role of Supercomputers in Bioinformatics

Role of Supercomputers in Bioinformatics

Anamika Singh, Rajeev Singh, Neha Gupta
Copyright: © 2015 |Pages: 19
DOI: 10.4018/978-1-4666-7461-5.ch009
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Due to the involvement of effective and client-friendly components (i.e. supercomputers), rapid data analysis is being accomplished. In Bioinformatics, it is expanding many areas of research such as genomics, proteomics, metabolomics, etc. Structure-based drug design is one of the major areas of research to cure human malady. This chapter initiates a discussion on supercomputing in sequence analysis with a detailed table summarizing the software and Web-based programs used for sequence analysis. A brief talk on the supercomputing in virtual screening is given where the databases like DOCK, ZINC, EDULISS, etc. are introduced. As the chapter transitions to the next phase, the intricacies of advanced Quantitative Structure-Activity Relationship technologies like Fragment-Based 2D QSAR, Multiple-Field 3D QSAR, and Amino Acid-Based Peptide Prediction are put forth in a manner similar to the concept of abstraction. The supercomputing in docking studies is stressed where docking software for Protein-Ligand docking, Protein-Protein docking, and Multi-Protein docking are provided. The chapter ends with the applications of supercomputing in widely used microarray data analysis.
Chapter Preview
Top

Introduction

A supercomputer is a computer with high speed and is calculation efficient. Supercomputers first came in practice during 1960s. These supercomputers are normal and are similar to other computers but have more processors making the speed high. Presently, supercomputers are replaced by parallel supercomputers in which thousands of processors were connected to a single computer (Hoffman et al., 1990, Hill et al., 1999 & Prodan et al., 2007). In this chapter we are focusing on some novel areas of biological research where supercomputing is playing a vital role. These few areas are:

Figure 1.

A brief history of supercomputing

978-1-4666-7461-5.ch009.f01
Top

Supercomputing In Sequence Analysis

Sequence analysis is actually used to explore the DNA, RNA and protein sequences in such a way that it gives all the information about the organism, source, phylogeny, function and structure, and other characteristics. Methodologies used include sequence alignment, searches against biological databases and others. Mostly it is required to search a DNA, a protein or genome database for sequence locations that are similar to that of some query sequence.

These databases already have billions of sequences with characteristics and this sequence information is increasing day by day. Manual searching is tough, a time consuming process and the efficiency of result is questionable. So to look for an exact match between the query string and a sub-string of the database is a very computationally demanding task. A perfect database search allows the possibility of mutations, insertion and deletions. There are novel heuristic approaches such as BLAST and FASTA (Altschul et al., 1990, Casey, 2005) which are efficient for mutations, insertions, and deletions but are not well suited for statistical purposes as they are less efficient in comparison to the dynamic programming algorithm such as Smith- Waterman (Lipman and Pearson, 1985, Pearson, W. R; Lipman, D. J. 1988). Although Smith-Waterman takes too much time for the calculations but it is still technically superior.

NCBI BLAST is the Basic Local Alignment Search Tool (BLAST) by the National Center for Biotechnology Information (Altschul et al., 1990). It is one of the most widely-used tools for sequence similarity searches. BLAST can perform comparisons between protein or DNA sequences from a sequence database where diverse sequences from different sources are present. There are different types of algorithms that were utilized for different types of search methods. There are many mathematical algorithms utilized in the analysis of sequence-sequence comparison like Genetic algorithm, Markov method, hidden Markov models, and so forth (Eddy, 1998). HMMER is also used for similarity searches of sequence databases (Edgar, 2004). Inspite of two sequences there are so many tools which can compare multiple number of sequences at a time. These are HMMER, multiple sequence alignment by CLUSTALW, Kaling etc. (“MUSCLE,” http://www.drive5.com/muscle/). Sequence analysis helps molecular biology for a variety of analysis. It can compare two sequences for their similarity and identity, is helpful for the identification and analysis of active sites, interaction sites and regulatory sites. It can also identify mutations within gene and sequences. Sequence analysis also helps in genetic diversity. Here we are taking some examples of bioinformatics tools (Table 1).

Complete Chapter List

Search this Book:
Reset