Application of Uncertainty Models in Bioinformatics

Application of Uncertainty Models in Bioinformatics

B.K. Tripathy (VIT University, India), R.K. Mohanty (VIT University, India) and Sooraj T.R. (VIT University, India)
DOI: 10.4018/978-1-5225-0427-6.ch009
OnDemand PDF Download:
List Price: $37.50


This chapter provides the information related to the researches enhanced using uncertainty models in life sciences and biomedical Informatics. The main emphasis of this chapter is to present the general ideas for the time line of different uncertainty models to handle uncertain information and their applications in the various fields of biology. There are many mathematical models to handle vague data and uncertain information such as theory of probability, fuzzy set theory, rough set theory, soft set theory. Literatures from the life sciences and bioinformatics have been reviewed and provided the different experimental & theoretical results to understand the applications of uncertain models in the field of bioinformatics.
Chapter Preview


Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. The need for Bioinformatics capabilities has been precipitated by the explosion of publicly available genomic information resulting from the Human Genome Project.

The goal of this project – determination of the sequence of the entire human genome (approximately three billion base pairs) – will be reached by the year 2002. The science of Bioinformatics, which is the melding of molecular biology with computer science, is essential to the use of genomic information in understanding human diseases and in the identification of new molecular targets for drug discovery.

Many times biotechnology and bioinformatics are taken as synonymous. But, bioinformatics combines molecular biology, computer science, mathematics, statistics and engineering to store, maintain, organize, process and analyze biological and chemical data in order to advance medicine and healthcare whereas biotechnology brings together biological sciences with engineering technologies to manipulate living organisms and biological systems to produce products that advances healthcare, medicine, agriculture, food, pharmaceuticals and environmental control.

Figure 1.

The cost of sequencing has fallen from $100,000,000/genome in 2001 to $10,000/genome in 2011. The cost of genomics is estimated to fall to $2,000 a genome within the next few years.

Courtesy of the National Human Genome Research Institute

This analytical branch of genomic research mines large sets of data to answer new research questions and throw light on older ones. Bioinformatics analysis will support the next revolution in genomic science to address fundamental areas of natural history research (Schuh, 2005) including:

  • Basic investigations of the phylogenetic relatedness of all life.

  • Tracing the geographic distribution of biodiversity across varied environments and regions.

  • Unravelling the developmental history of organisms from initial embryonic cellular stages of life to the functional complexity of mature multi-cellular individuals.

  • Defining population-level processes of natural selection.

  • Initiating landscape-based environmental comparisons of varied habitats around the world.

  • Speeding-up species discovery.

Bioinformatics involves the manipulation, searching and data mining of DNA sequence data. The development of techniques to store and search DNA sequences (Moein, 2008) have led to widely applied advances in computer science, especially string searching algorithms, machine learning and database theory. In other applications such as text editors, even simple algorithms for this problem usually suffice, but DNA sequences because these algorithms to exhibit near-worst case behaviour due to their small number of distinct characters. Data sets representing entire genomes’ worth of DNA sequences, such as those produced by the Human Genome Project (Park, 2008), are difficult to use without annotations, which label the locations of genes and regulatory elements on each chromosome. Regions of DNA sequence that have the characteristic patterns associated with protein or RNA coding genes can be identified by gene finding algorithms (Samatsu, 2008), which allow researchers to predict the presence of particular gene products in an organism even before they have been isolated experimentally.

DNA barcodes (Figure 2) consist of a standardized short sequence of DNA (400-800 bp) that in principle should be easily generated and characterized for all species on the planet. The Blue and Yellow Macaw's barcode is reflected above with greens, reds, and blues representing the nucleotide bases, Image of Ara ararauna, the Blue and Yellow Macaw by Luc Viatour and courtesy of EOL. Image of Blue and Yellow Macaw barcode is courtesy of CBOL.

Key Terms in this Chapter

Logic: It is a systematic approach to the art of reasoning. Greek philosopher Aristotle is known to be the father of logic.

Fuzzy Set: It is one of the most popular models of uncertainty introduced by Zadeh in 1965 where each element has a grade of belongingness to the set instead of the dichotomous belongingness in case of crisp sets.

Bio Informatics: It is the application of computer technology to the management of biological information. Computers are used to gather, store, analyse and integrate biological and genetic information which can then be applied to gene-based drug discovery and development.

Fuzzy Set: It is an extension of the concept of fuzzy set, introduced by Atanassov in 1986. It is more general than fuzzy set. In fuzzy set the non-membership of an element in a set is one’s complement of its membership. However, this may not be the same in many real life situations because of the hesitation component. In order to model this in intuitionistic fuzzy sets the sum of membership and non-membership values of an element is not restricted to be one.

Fuzzy Logic: Precise definition of this logic does not exist. It is supposed to be the embedded version of fuzzy sets in infinite valued logic. In another sense, according to Zadeh it is equivalent to computing with words.

Rough Set: This is another model of uncertainty which was introduced by Pawlak in 1982 and it follows the concept of Frege on the boundary region model of uncertainty. Here, a set is approximated by a pair of crisp sets called the lower and upper approximation of the set.

Soft Set: This notion was introduced by Molodtsov in the year 1999 which makes up for lack of parametrization in fuzzy set and rough set. It bases upon the concept of topology. It has been observed by Molodtsov that all fuzzy sets are soft sets.

Complete Chapter List

Search this Book: