Parallel Object Compositions for the Search of Sequences DNA Strings in the Construction of Gnomes

Parallel Object Compositions for the Search of Sequences DNA Strings in the Construction of Gnomes

Mario Rossainz López (Benemérita Universidad Autónoma de Puebla, Puebla, Mexico), Ivo H. Pineda-Torres (Benemérita Universidad Autónoma de Puebla, Puebla, Mexico), Ivan Olmos Pineda (Benemérita Universidad Autónoma de Puebla, Puebla, Mexico) and José Arturo Olvera López (Benemérita Universidad Autónoma de Puebla, Puebla, Mexico)
DOI: 10.4018/IJPHIM.2019010102

Abstract

Within an environment of parallel objects, an approach of structured parallel programming and the paradigm of the orientation to objects show a programming method based on high level parallel compositions or HLPCs to solve two problems of combinatorial optimization: grouping fragments of DNA sequences and the parallel exhaustive search (PES) of RNA strings that help the sequence and the assembly of DNAs. The pipeline and farm models are shown as HLPCs under the object orientation paradigm and with them it is proposed the creation of a new HLPCs that combines and uses the previous ones to solve the cited problems. Each HLPC proposal contains a set of predefined synchronization constraints between processes, as well as the use of synchronous, asynchronous and asynchronous future modes of communication. This article shows the algorithms that solve the problems, their design and implementation as HLPCs and the performance metrics in their parallel execution using multicores and video accelerator card.
Article Preview
Top

1. Introduction

As it is known, exist infinity of applications that using machines with a single processor tries to obtain the maximum performance from a system when solving a problem; however, when such a system cannot provide the performance that is required (Capel and Troya, 1994) a possible solution it consists on opting for applications, architectures and structures of parallel or concurrent processing. The parallel processing is therefore, an alternative to the sequential processing when the limit of performance of a system is reached. In the sequential computation a processor only carries out at the same time an operation, on the contrary of what happens in the calculation parallel, where several processors they can cooperate to solve a given problem, which reduces the time of calculation since several operations can be carried out simultaneously. From the practical point of view, today in day is enough justified carrying out compatible investigations within the area of the parallel processing and areas related (concurrence, distributed systems, systems of real time, etc.), since the recent advance in massively parallel systems, communications of great band width, quick processors for the treatment of signs, etc., they allow this way it. Important part of those investigations is the parallel algorithms, methodologies and models of parallel programming that at the moment are developing. The present research uses structured parallel programming through a POSIX thread library as a methodological programming proposal based on the pattern of the High-Level Parallel Compositions HLPC or HLPCs, (Corradi and Leonardi, 1991), (Danelutto and Orlando, 1995), the which it is based on the paradigm of Orientation to Objects to solve problems parallelizable using a class of concurrent active objects. In this work supply a library of classes that provides the programmer the communication/interaction patterns more commonly used in the parallel programming, in particular, the pattern of the pipeline and the pattern denominated farm, well-known as it. With them, problems like the assembly of DNA strings and Parallel Exhaustive Search (PES) of RNAi strings proposed in this paper can be solved. Finding the solution to these types of problems has become indispensable in research in biology and in many fields such as medical diagnosis, biotechnology, forensic biology, virology, applied biology and bioinformatics among others. In the case of the first problem mentioned, it is a problem of combinatorial optimization in which diverse heuristics and met heuristics have been proposed to assemble sequences of DNA strings and to provide essential information to understand the species and their mechanisms of life including the human species. This work shows the implementation of a grouping algorithm that evaluates a set of DNA sequence fragments as a HLPC. The HLPC represents a Farm where worker processes are themselves Pipeline HLPCs. The algorithm determines subgroups of fragments by DNA sequences matching found, which have a high probability of being aligned in an assembly task. Each worker process of HLPC Farm works in parallel with the other worker processes that are generated with a group of fragments of DNA sequences that are internally constructed as graphs represented through the HLPC Pipeline and through an in-depth search the new groups of DNA sequences are generated, which must be processed by some assembly technique to form the contigs of a genome that has been sequenced covering most of its structure but missing a fragment to be completed. Finally, the design of an experiment is shown through the use of the new HLPC generated called HLPC GraphADN, with genomes of viruses and bacteria available on the web. The pseudo random synthetic readings created to form contigs are shown and the execution performance of this proposal is obtained for eight genomes with an Intel Core i8 processor, a video accelerator card with 1664 CUDA cores and a clock frequency of 1178 MHz. In the case of the second problem a Parallel Exhaustive Search (PES) of RNAi strings is implemented using the interprocess communication pattern called FARM through an HLPC. The authors propose a new HLPC model called HLPC ARNi that is based on the HLPC Farm, which is part of a parallel object library whose details can be consulted in (Rossainz and Capel, 2008), (Rossainz and Capel, 2012). The HLPC RNAi performs a pre-processing with the input data through its controller object of the Farm, which consists in join in a single string, all the input RNAi strings including the string that contains the characteristics of the organism that is analyzed. The controller then distributes to the farm worker objects within the HLPC RNAi the string constructed so that they, in parallel, perform the Parallel Exhaustive Search, (Blelloch, 1996) and (Kumar, 1994), find the matches based on a pre-established substring length and return the search results. An experiment was designed using the HLPCARNi with the RNAi strings database located in the Pombase site, referring to the yeast species known as S. pombe., with different lengths of substrings for the search of RNAi strings of the species of yeast mentioned and results of execution times and performance analysis were obtained using 3 to 8 cores of a dual-core Intel server machine.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 2 Issues (2020): Forthcoming, Available for Pre-Order
Volume 7: 2 Issues (2019): 1 Released, 1 Forthcoming
Volume 6: 2 Issues (2018)
Volume 5: 2 Issues (2017)
Volume 4: 2 Issues (2016)
Volume 3: 2 Issues (2015)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2013)
View Complete Journal Contents Listing