A Graphical Workflow Modeler for Docking Process in Drug Discovery

A Graphical Workflow Modeler for Docking Process in Drug Discovery

Qiang Wang, Yunming Ye, Kunqian Yu, Joshua Zhexue Huang
DOI: 10.4018/978-1-60566-374-6.ch015
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A drug discovery process is aimed to find from a large set of molecules the candidate leads that have strong interaction with the target proteins. The process of drug discovery is characterized by its complexity in data and computation. A useful tool to simplify the handling of intensive data and complex algorithms is necessary for domain scientists to build proper drug discovery procedures, carry through the data intensive computation tasks and produce fruitful results. This chapter presents a graphical workflow modeler for domain scientists to perform drug discovery tasks on high performance grid computing grid platforms. A client/server system is described as the platform for implementation of the graphical workflow modeler. A case study on drug discovery for avian influenza virus is given to demonstrate the use of this tool in drug discovery research.
Chapter Preview
Top

Introduction

Scientific research today has changed its way. Computational science, which focuses on simulation of complex phenomena in the last few decades, is being replaced today by e-science or data-centric science where scientific discoveries often result from data intensive computation and analysis. In this area, Grid computing is playing a key role, owing to its characteristic that it allows geographically distributed scientists to work collaboratively in a networked environment and share resources and expertise to solve, in a large scale, difficult scientific and engineering problems. The typical scientific research areas cover a wide range including gravitational-wave science (Deelman, Kesselman, Mehta, Pearlman, Blackburn, Ehrens, Lazzarini, Williams, & Koranda, 2002), neuroscience (Lathers, Su, Kulungowski, Lin, Mehta, Peltier, Deelman, & Ellisman, 2006), astronomy (Katz, Anagnostou, Berriman, Deelman, Good, Jacob, Kesselman, Laity, Prince, Singh, Su, & Williams, 2006), high energy physics (Deelman, Blythe, Gil, & Kesselman, 2004), etc. All are distinctly characterized with complex research problems that involve huge amount of data and requires a lot of computing resources to analyze.

Drug discovery is another such area that grid computing is used to carry out various computational processes for identification of active compounds against a given target from a large number of chemical compounds in molecule databases (Richards, 2002; Ren, Zhang, Wan, Huang, Xie, & Yang, 2006). In modern drug discovery, molecule docking is an important process which aims to determine the candidate lead components that have the strongest interaction with the target component by a series of complex computations. Docking process involves multiple steps, including 3D molecule modeling and representation, search of molecule databases for candidate molecules that best match the receptor structure, evaluation of candidate molecules with scoring functions, conformation determination, hit identification, and lead optimization. In each step, multiple techniques and algorithms are developed, and different techniques are implemented in different systems for drug discovery research and applications. Nevertheless, these techniques and algorithms are used in setting parameters, integrating data sources and performing data analysis tasks. It is time-consuming and costly for domain scientists to learn and master multiple techniques and systems to solve drug discovery problems. Therefore, development of an easy-to-use collaborative workflow modeler is needed by drug discovery practitioners as part of their sophisticated problem solving environment.

In this chapter, we present a software platform and a graphical workflow modeler for drug discovery. The goal of our work is to develop an effective and efficient system that supports drug discovery practitioners to build docking processes for virtual high throughput screening (VHTS) (Yoon, 2005) of potential inhibitors through drag-and-drop and connecting operations in a graphical interface with a mouse. In this graphical working environment, a docking process is presented as a workflow consisting of a set of functional nodes connected in a directed acyclic graph (DAG). Each function node is an implementation of a particular algorithm to perform one docking function such as search or scoring. When a workflow is formed and all parameters of each function node are set, the workflow is executed through a workflow engine that schedules the execution of each function in the underlying computing infrastructure, e.g., a grid computing infrastructure (Foster, & Kesselman, 1998; Berman, Hey, & Fox, 2003). Such a system could provide an efficient and easy way for modeling and managing scientific processes of experimental investigation, evidence accumulation and result validation. Processes themselves can then be modified, reused and shared through collaboration between interdisciplinary scientists.

Key Terms in this Chapter

Workflow Engine: Workflow engine is a software application to manage and execute a workflow process. The workflow engine interprets tasks submitted to it, creates run time instances of these tasks, and act on these tasks according to the defined application processes. In general, workflow engine facilitates the flow of data, tasks, and control of a workflow application.

Workflow Modeler: Workflow modeler is a graphical tool that provides function nodes and connections for constructing the graphical model of a specific application process, i.e., the workflow.

Grid Computing: The term grid computing describes a distributed computing platform which integrates distributed computing resources such as CPUs and data to support computationally-intensive and/or data intensive scientific tasks. In general, grid computing is divided into two subtypes, i.e., data grid and computational grid. Data grids provide controlled sharing and management of large amounts of distributed data, while computational grid acts as a “virtual supercomputer” composed of a network of loosely-coupled computers, acting in concert to perform very large tasks. What distinguishes grid computing from typical cluster computing systems is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Meanwhile, it is often constructed with the aid of general purpose grid software libraries and middleware.

Workflow: A workflow serves as a model of virtual representation of real work which depicts a sequence of operations specific to certain processing. In real implementation, a workflow illustrates a graphical representation of the specific application process in which a node represents corresponding function and a directed link form one node to another represents the data flow or control flow between two nodes.

Drug Discovery: From computer science aspect, drug discovery is a complex computational process by which drugs are discovered and/or designed based on the knowledge that how disease and infection are controlled at the molecular and physiological level. The process of drug discovery involves the identification of candidates, synthesis, characterization, screening, and assays for therapeutic efficacy, most of which need to deal with large amount of data and computation. Once a compound has shown its value in these tests, it will be regarded as a drug candidate for drug development.

E-Science: E-Science (or eScience) is used to describe computation intensive or data intensive science that is carried out in highly distributed network environments, especially grid computing. This term was first presented by John Taylor, the Director General of the United Kindom’s Office of Science and Technology in 1999, to describe “the global collaboration in key areas of science, and the next generation of infrastructure that will enable it.”

Molecular Docking: Molecular Docking is a method which is frequently used to predict the binding orientation of small molecule drug candidates to their protein targets in order to in turn predict the affinity and activity of the small molecule.

Complete Chapter List

Search this Book:
Reset