PheGee@Home: A Grid-Based Tool for Comparative Genomics

PheGee@Home: A Grid-Based Tool for Comparative Genomics

Bertil Schmidt, Chen Chen, Weiguo Liu, Wayne P. Mitchell
DOI: 10.4018/978-1-4666-0879-5.ch809
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

In this chapter we present PheGee@Home, a grid-based comparative genomics tool that nominates candidate genes responsible for a given phenotype. A phenotype is the physical manifestation of the interplay of genetic, epigenetic and environmental factors. Our tool is designed to facilitate the discovery and prioritization of candidate genes controlling or contributing to the genetically determined portion of a specified phenotype. However, in order to make reliable nominations of candidate genes from sequence data, several genome-size sequence datasets are required. This makes the approach impractical on traditional computer architectures leading to prohibitively long runtimes. Therefore, we use a computational architecture based on a desktop grid environment and commodity graphics hardware to significantly accelerate PheGee. We validate this approach by showing the deployment and evaluation on a grid testbed for the comparison of microbial genomes.
Chapter Preview
Top

Introduction

High-throughput techniques for DNA sequencing have led to an enormous growth in the amount of publicly available genomic data. As of February 2008, 716 complete genome sequences are available and another 2,756 genome-sequencing projects are in progress (www.genomesonline.org). As the sequences of more and more genomes become available, we have reached a critical mass where, instead of focusing on a subset of sequences, we can use entire genome data sets to derive global inferences and metadata. Comparative genomics refers to the study of relationships between the genomes of different species or strains. It is currently being used for ortholog detection (Itoh, Goto, Akutsu & Kanehisa, 2005) bacterial pharmacogenomics (Fraser, et al., 2000), clustering of similar protein sequences (Itoh, Akutsu & Kanehisa, 2004), etc. Unfortunately, comparative genomics applications are highly computationally intensive tasks due to the large sequence data sets involved and typically take a few months to complete. These runtime requirements are likely to become even more severe due to the rapid growth in the size of genomic databases.

The objectives of this chapter are therefore two-fold:

  • 1.

    The presentation of a new comparative genomics tool called PheGee (Phenotype Genotype Explorer). PheGee nominates candidate genes responsible for a certain phenotype π given genomic sequence datasets of phenotype positive (π+) and phenotype negative (π−) species.

  • 2.

    The proposition of a hybrid computational grid platform to accelerate PheGee.

The proposed hybrid grid architecture efficiently combines desktop grid computing with GPGPUs (General-Purpose computation on Graphics Processing Units). The driving force and motivation behind this architecture is the price/performance ratio. Using desktop grids as in the volunteer computing approach is currently one of the most efficient and simple ways to gain supercomputer power for a reasonable price. Installing in addition massively parallel processor add-on boards such as modern computer graphics cards within each desktop can further improve the cost/performance ratio significantly. We show how this architecture can be used to accelerate PheGee efficiently. Moreover, the proposed grid approach is flexible and is therefore applicable to a variety of compute-intensive genomics applications.

Complete Chapter List

Search this Book:
Reset