State-of-the-Art GPGPU Applications in Bioinformatics

State-of-the-Art GPGPU Applications in Bioinformatics

Nikitas Papangelopoulos, Dimitrios Vlachakis, Arianna Filntisi, Paraskevas Fakourelis, Louis Papageorgiou, Vasileios Megalooikonomou, Sophia Kossida
DOI: 10.4018/ijsbbt.2013100103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The exponential growth of available biological data in recent years coupled with their increasing complexity has made their analysis a computationally challenging process. Traditional central processing unist (CPUs) are reaching their limit in processing power and are not designed primarily for multithreaded applications. Graphics processing units (GPUs) on the other hand are affordable, scalable computer powerhouses that, thanks to the ever increasing demand for higher quality graphics, have yet to reach their limit. Typically high-end CPUs have 8-16 cores, whereas GPUs can have more than 2,500 cores. GPUs are also, by design, highly parallel, multicore and multithreaded, able of handling thousands of threads doing the same calculation on different subsets of a large data set. This ability is what makes them perfectly suited for biological analysis tasks. Lately this potential has been realized by many bioinformatics researches and a huge variety of tools and algorithms have been ported to GPUs, or designed from the ground up to maximize the usage of available cores. Here, we present a comprehensive review of available bioinformatics tools ranging from sequence and image analysis to protein structure prediction and systems biology that use NVIDIA Compute Unified Device Architecture (CUDA) general-purpose computing on graphics processing units (GPGPU) programming language.
Article Preview
Top

Introduction

Ever since the human genome project was completed in 2001 biological data availability has been steadily growing. Especially since the introduction of new technologies, most notably microarrays and next generation sequencing (NGS), this growth has become exponential. Other areas such as molecular modelling and medical imaging are also heavily contributing not only to the aforementioned exponential growth of the available data, but also to their algorithmic complexity. At the same time, with falling genome sequencing prices and newer, more efficient instruments, this trend will continue or even accelerate for the foreseeable future (Richter & Sexton, 2009). This huge amount of data combined with their increasingly intricate, interconnecting structures will eventually lead to their analysis becoming intractable by current personal computers.

This is especially true if one considers that in the last few years the increase of computing power is starting to fall behind Moore’s law, with R&D focused more on energy efficiency for mobile applications and less on increasing CPU raw power. Even the introduction of multicore processors wasn’t enough to reverse this trend since most applications are not optimized to use more than one core (multithreaded) and efficient utilization of multiple cores is still a topic of active research in computer science. What is more the higher end, server grade, CPUs are cost prohibitive to any medium sized research institute or biotech company.

Thus, it is becoming obvious that we are steadily moving towards a plateau in microprocessor capabilities and consequently a stalemate in our analysis capabilities of the unrelenting wave of complex biological data.

Enter General Purpose Graphics Processing Units (GPGPU) spearheaded by NVIDIA and its pivotal Compute Unified Device Architecture (CUDA) platform.

Advent of GPUs and CUDA Appearance Evolution

It all started in the mid 90’s with the appearance of the first GPUs whose sole purpose was 3D graphics acceleration, namely for use in computer games. NVIDIA produced its first graphics accelerator in 1999. These units contained a single processing unit and were extremely limited in both rendering capabilities, raw processing power and were cumbersome to write code for. But soon, the market demand for high definition content and photorealistic graphics transformed these first attempts to highly parallel, multithreaded computing powerhouses with many cores and never seen before memory bandwidth. At the same time, NVIDIA coined the term GPUs and the first attempts to utilize the raw potential of these cards for a range of scientific applications other than graphic rendering begun to appear. This was the dawn of the era of General-Purpose computation on GPU, or GPGPU.

Unfortunately, programming on these platforms was still very challenging, requiring scientists to formulate their algorithms in terms of geometrical shapes such as triangles and polygons. At the same time, the available tools offered only limited accessibility to the full computing capabilities of the GPUs. NVIDIA was the first company to realize the potential of this technology and invested heavily in creating fully programmable GPUs and developing intuitive software and hardware tools. This effort culminated in November 2006 with the introduction of CUDA, a general purpose parallel computing platform and programming model, that comes with a software environment based on the C programming language.

GPU computing momentum is growing faster than ever before. With over 375 million CUDA-enabled GPUs installed in a wide range of computers, today CUDA is the dominant platform for GPGPU. It is the preferred method for affordable High Performance Computing (HPC) and is used by thousands of researchers around the world in computationally intensive projects that wouldn’t be possible using conventional CPUs.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 3: 1 Issue (2015)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing