Application of Uncertainty Models in Bioinformatics

Application of Uncertainty Models in Bioinformatics

B.K. Tripathy (VIT University, India), R.K. Mohanty (VIT University, India) and Sooraj T. R. (VIT University, India)
Copyright: © 2019 |Pages: 15
DOI: 10.4018/978-1-5225-8903-7.ch006

Abstract

This chapter provides the information related to the researches enhanced using uncertainty models in life sciences and biomedical Informatics. The main emphasis of this chapter is to present the general ideas for the time line of different uncertainty models to handle uncertain information and their applications in the various fields of biology. There are many mathematical models to handle vague data and uncertain information such as theory of probability, fuzzy set theory, rough set theory, soft set theory. Literatures from the life sciences and bioinformatics have been reviewed and provided the different experimental & theoretical results to understand the applications of uncertain models in the field of bioinformatics.
Chapter Preview
Top

Introduction

Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. The need for Bioinformatics capabilities has been precipitated by the explosion of publicly available genomic information resulting from the Human Genome Project.

The goal of this project – determination of the sequence of the entire human genome (approximately three billion base pairs) – will be reached by the year 2002. The science of Bioinformatics, which is the melding of molecular biology with computer science, is essential to the use of genomic information in understanding human diseases and in the identification of new molecular targets for drug discovery.

Many times biotechnology and bioinformatics are taken as synonymous. But, bioinformatics combines molecular biology, computer science, mathematics, statistics and engineering to store, maintain, organize, process and analyze biological and chemical data in order to advance medicine and healthcare whereas biotechnology brings together biological sciences with engineering technologies to manipulate living organisms and biological systems to produce products that advances healthcare, medicine, agriculture, food, pharmaceuticals and environmental control.

Figure 1.

The cost of sequencing has fallen from $100,000,000/genome in 2001 to $10,000/genome in 2011. The cost of genomics is estimated to fall to $2,000 a genome within the next few years.

978-1-5225-8903-7.ch006.f01
Courtesy of the National Human Genome Research Institute

This analytical branch of genomic research mines large sets of data to answer new research questions and throw light on older ones. Bioinformatics analysis will support the next revolution in genomic science to address fundamental areas of natural history research (Schuh, 2005) including:

  • Basic investigations of the phylogenetic relatedness of all life.

  • Tracing the geographic distribution of biodiversity across varied environments and regions.

  • Unravelling the developmental history of organisms from initial embryonic cellular stages of life to the functional complexity of mature multi-cellular individuals.

  • Defining population-level processes of natural selection.

  • Initiating landscape-based environmental comparisons of varied habitats around the world.

  • Speeding-up species discovery.

Bioinformatics involves the manipulation, searching and data mining of DNA sequence data. The development of techniques to store and search DNA sequences (Moein et.al, 2008) have led to widely applied advances in computer science, especially string searching algorithms, machine learning and database theory. In other applications such as text editors, even simple algorithms for this problem usually suffice, but DNA sequences because these algorithms to exhibit near-worst case behaviour due to their small number of distinct characters. Data sets representing entire genomes’ worth of DNA sequences, such as those produced by the Human Genome Project (Park et.al, 2008), are difficult to use without annotations, which label the locations of genes and regulatory elements on each chromosome. Regions of DNA sequence that have the characteristic patterns associated with protein or RNA coding genes can be identified by gene finding algorithms (Samatsu et.al, 2008), which allow researchers to predict the presence of particular gene products in an organism even before they have been isolated experimentally.

Figure 2.

DNA barcodes

978-1-5225-8903-7.ch006.f02

Complete Chapter List

Search this Book:
Reset