Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Bioinformatics Clouds for High-Throughput Technologies

Claudia Cava, Francesca Gallivanone, Christian Salvatore, Pasquale Anthony Della Rosa, Isabella Castiglioni

Source Title: Handbook of Research on Cloud Infrastructures for Big Data Analytics

DOI: 10.4018/978-1-4666-5864-6.ch020

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Bioinformatics traditionally deals with computational approaches to the analysis of big data from high-throughput technologies as genomics, proteomics, and sequencing. Bioinformatics analysis allows extraction of new information from big data that might help to better assess the biological details at a molecular and cellular level. The wide-scale and high-dimensionality of Bioinformatics data has led to an increasing need of high performance computing and repository. In this chapter, the authors demonstrate the advantages of cloud computing in Bioinformatics research for high-throughput technologies.

Chapter Preview

Top

Introduction

High-throughput technologies produces an enormous amount of data that comes from the use of gene expression microarrays (Schena et al., 1995; Lipshutz et al., 1995), proteomics (Mann et al., 1999), and DNA sequencing (Lander et al., 2001; Venter et al., 2001).

Laboratories submit and archive their data to big archival databases such as GenBank at the National Center for Biotechnology Information (NCBI) (Benson et al., 2005), the European Bioinformatics Institute EMBL database (Brooksbank et al., 2010), the DNA Data Bank of Japan (DDBJ) (Sugawara et al., 2010), the Short Read Archive (SRA) (Shumway et al., 2010), the Gene Expression Omnibus (GEO) (Barrett et al., 2009) and the microarray database ArrayExpress (Kapushesky et al., 2010). These databases maintain, organize and distribute big data to the scientific community for Bioinformatics analysis. For instance, the public data repository GEO contains hundreds of thousands of microarray samples and supports many billions of analysis. So, in the traditional current Praxis, Bioinformatics researchers download data from these databases and run analyses on in-house computer resources.

With significant advances in high-throughput technologies and consequently the exponential growth of biological data, Bioinformatics encounters difficulties in storage and analysis of these immense volumes of data. Mainly, the gap between high-throughput experimental technologies and computer capabilities in dealing with such big data is increasing.

At present, a promising solution to obtain the power and scale of computation is cloud computing, which uses the full potential of multiple computers and delivers analysis and repository as dynamically allocated virtual resources via the Internet.

The present chapter deals with cloud-based services and presents the advantages (and in some case disadvantages) for big data storage and analysis issues in Bioinformatics, such as data sharing, applications and time-critical calculations:

•
Data Sharing and Security: Public datasets change frequently and dynamically, causing problems in both archiving and sharing data for a long time. Data repositories often disappears from the public domain (e.g. due to cancelation policies for limited space) allowing users to perform partial analysis. Cloud Computing can be a solution for permanent resources where big data sets can be archived and easily accessed without necessarily copying it to another computer resources.
•
Bioinformatics Applications: Public datasets may be analyzed with standard tools for Bioinformatics, such as Significance Analysis of Microarrays (SAM) (Tusher et al., 2001), TM4 Multiple Expression Viewer (Saeed et al., 2006), GenePattern (Reich et al., 2006), and Bioconductor (Gentleman et al., 2004). In many cases it requires local installation and problem of maintenances and updates. Cloud Computing escapes it.

Time-critical calculations and scalability. Complex tasks that require data management are critical on clouds. Two framework ‘MapReduce and Hadoop Distributed File System (HDFS)’ (Taylor et al., 2010) are capable of performing time critical calculation using parallelized analysis.

In particular, cloud computing services in Bioinformatics belong to four major categories:

Key Terms in this Chapter

Microarray: A hybridization technique of a nucleic acid sample (target) to a very large set of oligonucleotide probes, which are attached to a solid support. It used to determine sequences, to detect variations in a gene sequence or to measure the expression levels of large numbers of genes simultaneously.

Sequence Alignment: A process of arranging the sequences of DNA, RNA, or protein to discover regions of similarity that may be an effect of functional, structural, or evolutionary relationships between the sequences.

Data Sharing: The method of making data used for your research available to others through a variety of mechanisms.

Genome-Wide Association (GWA): An approach that involves rapidly scanning markers across the complete sets of DNA of many people that occur more frequently in people with a particular disease.

Basic Local Alignment Search Tool (BLAST): An algorithm to find regions of local similarity between sequences. The algorithm compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

Next Generation Sequencing (NGS): also known as high-throughput sequencing, allow to sequence DNA and RNA much more quickly than the previous sequencing methods.

Single-Nucleotide Polymorphism (SNP): A DNA sequence variation occurring when a single nucleotide in the genome differs between members of a biological species or disease.

High-Throughput Technologies: The generic name to indicate the technologies that allow exact and simultaneous examinations of thousands of genes, proteins and metabolites.

Protein Folding: The process by which a protein structure assumes its functional shape or conformation. To carry out their functions, proteins must fold into a complex three-dimensional structure.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Bioinformatics Clouds for High-Throughput Technologies

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List