Dis2PPI: A Workflow Designed to Integrate Proteomic and Genetic Disease Data

Dis2PPI: A Workflow Designed to Integrate Proteomic and Genetic Disease Data

Daniel Luis Notari (Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil), Samuel Brando Oldra (Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil), Mauricio Adami Mariani (Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil), Cristian Reolon (Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil) and Diego Bonatto (Centro de Biotecnologia da UFRGS, Departamento de Biologia Molecular e Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil)
Copyright: © 2012 |Pages: 19
DOI: 10.4018/jkdb.2012070104


Experiments in bioinformatics are based on protocols that employ different steps for data mining and data integration, collectively known as computational workflows. Considering the use of databases in the biomedical sciences software that is able to query multiple databases is desirable. Systems biology, which encompasses the design of interactomic networks to understand complex biological processes, can benefit from computational workflows. Unfortunately, the use of computational workflows in systems biology is still very limited, especially for applications associated with the study of disease. To address this limitation, we designed Dis2PPI, a workflow that integrates information retrieved from genetic disease databases and interactomes. Dis2PPI extracts protein names from a disease report and uses this information to mine protein-protein interaction (PPI) networks. The data gathered from this mining can be used in systems biology analyses. To demonstrate the functionality of Dis2PPI for systems biology analyses, the authors mined information about xeroderma pigmentosum and Cockayne syndrome, two monogenic diseases that lead to skin cancer when the patients are exposed to sunlight and neurodegeneration.
Article Preview


Less than two percent of DNA actually encodes proteins from human genome and more than fifth percent consists of repeated sequences (transposons, movable DNA sequences) (Lehninger et al., 2005). So, there are many researches to be done to discovery DNA and proteins undefined yet. Biological databases are important tool used for bioinformatics analyses like that mentioned above. All biological databases are built taking into consideration the amount of information generated, its logical structure, and the diversity of tools used to gain access to and analyze the information stored (Lesk, 2008). Thus, any bioinformatics application must be able to share and/or integrate data from more than one biological database. To perform this task, the biological data present in the initial database should be read and processed, the output result should be usable as input for another database, and all data should be collected. This process is repeated when a third database is available, resulting in time-consuming data mining and analysis with results that are not always satisfactory.

Unfortunately, the diversity of available tools and the unique logical structure of each database make the simultaneous retrieval of information from multiple sources a difficult task for the user. Two important biological databases containing useful information are (i) Online Mendelian Inheritance in Man (OMIM; http://string-db.org/], a database of known and predicted protein-protein interactions from a large number of organisms, including direct (physical) and indirect (functional) association (Jensen et al., 2009). Unfortunately, the crosstalk between these two databases is very limited, although both contain similar information. This heterogeneous task character a workflow process.

Workflows are used in many bioinformatics experiments to follow a protocol to establish the steps that must be taken and the order in which these steps must be undertaken by the user to execute that protocol. Thus, the organization of these steps in an automatic manner is desirable in many research areas within bioinformatics and the biological sciences. (Peng et al., 2009; Downing et al., 2009) have used computational workflows to define steps for biological experiments using databanks. The results of these experiments were then used to query databases. For example, Peng et al. (2009) developed workflow steps that verify post-transcriptional gene regulation at the miRNA level in complex human diseases, focusing on liver tissue biopsies contaminated or uncontaminated by the hepatitis C virus. Workflows can also be applied to health population analyses, as demonstrated by Downing et al. (2009). One example of this application is the creation of a database to query heterogeneous data sources and extract useful disease information following dynamic neuroscience (Cheung et al., 2009), genomic (Nuzzo and Riva, 2009), and computerized clinical guidelines (Damiani et al., 2010). Another interesting application of the use of workflows is to analyze microscopy images, leading to multiple classifications and the elucidation of complex phenotypes (Misselwitz et al., 2010).

Interestingly, many applications that use web services to extract and integrate data from biological databases (e.g., genomic or proteomic) could also use workflows to define the main steps used for computational processing. Silva et al. (2006) defined a workflow management engine independent of the data source that is based on a set of scientific web services, a set of template data web services, and a database manager. Workflows designed for data management (downloading, importing, extracting, integrating, and normalizing), data and metadata annotation assignments, queries, and custom and meta-analyses of gene expression datasets have been developed (Bisognin et al., 2009).

Complete Article List

Search this Journal:
Open Access Articles
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing