Biological Big Data Analysis and Visualization: A Survey

Biological Big Data Analysis and Visualization: A Survey

Vignesh U (VIT University, India) and Parvathi R (VIT University, India)
Copyright: © 2019 |Pages: 13
DOI: 10.4018/978-1-5225-8903-7.ch026
OnDemand PDF Download:
No Current Special Offers


The chapter deals with the big data in biology. The largest collection of biological data maintenance paves the way for big data analytics and big data mining due to its inefficiency in finding noisy and voluminous data from normal database management systems. This provides the domains such as bioinformatics, image informatics, clinical informatics, public health informatics, etc. for big data analytics to achieve better results with higher efficiency and accuracy in clustering, classification and association mining. The complexity measures of the health care data leads to EHR (Evidence-based HealthcaRe) technology for maintenance. EHR includes major challenges such as patient details in structured and unstructured format, medical image data mining, genome analysis and patient communications analysis through sensors – biomarkers, etc. The big biological data have many complications in their data management and maintenance especially after completing the latest genome sequencing technology, next generation sequencing which provides large data in zettabyte size.
Chapter Preview


The chapter was initiated by requirement of higher and efficient methodologies to analyze big data in a faster manner. The deficiency has motivated us to investigate the problems in an existing technology and frame a feasible model for this big data analysis. On the other hand, there is a considerable interest in the development of new techniques using dynamic programming algorithms to work faster for bioinformatics methods. High throughput sequencing workflow systems provide easy and cost reduced perspective to genome sequencing with timely detection of functions, accurate and fast solutions for big data in bioinformatics. The table 1 shows the detailed view of the different workflow systems that can support high throughput sequencing technologies which includes a big data incorporated in it for analysis.

Bioinformatics is an interdisciplinary area that deals with the biology, computer and statistics. It involves the major aspects of genomics and proteomics with the genome sequencing, which are very sensitive in nature as representing the individual letter for a single nucleotide in case of DNA sequencing. Since 1970, the biological databases are digitized and their sensitivity factors with efficiency are maintained in a perfect manner but due to the vast amount of increasing data the maintenance aspect and extraction of information from gene expression becomes so complex, thus the big data gives the better results for these problems in an accurate manner. The big data includes the analysis of following major characteristics, viz.

  • Scale of Data: Representing the high amount in size

  • Streaming Data: Maintaining the velocity for extraction process

  • Various Data Forms: Variety in form of data included in database can also be easily analyzed

  • Uncertainty of Data: Poor and inaccurate data can be identified

These characteristics are applied on the biological data to provide the information efficiently, accurately and in a faster manner by saving enormous time with big data concepts.

Table 1.
High Throughput Sequencing Workflow Systems
ErgatisyesyesLinux, MAC OS X, WindowsyesnoyesYes
GalaxyyesyesLinux, MAC OS Xyesnoyesyes
Genboree WorkbenchyesyesLinux, MAC OS X, WindowsyesnoyesYes
GenePattern yesyesLinux, MAC OS X, WindowsyesnoyesNo
GeneProf yesyesLinux (it is not tested on Others yet)yesnoyesNo
Kepler (bioKepler) yesyesLinux, MAC OS X, Windows; > 1 GB RAM, 2 GHz CPU yesnonoNo
KNIME yes-Linux, MAC OS X, WindowsyesyesnoYes
LONI Pipeline yesyesLinux, MAC OS X, WindowsyesyesnoNo
Moa yesyesLinuxyesyesnoNo
Tavaxy yesyesLinuxyesnoyesYes
Taverna yesyesLinux, MAC OS X, Windowsyesyesnoyes
Yabi --Linuxyesyesyesyes

Complete Chapter List

Search this Book: