Concerns and Challenges of Cloud Platforms for Bioinformatics

Nicoletta Dessì, Barbara Pes

Source Title: Encyclopedia of Information Science and Technology, Fourth Edition

DOI: 10.4018/978-1-5225-2255-3.ch040

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Bioinformatics traditionally concerns applying computational approaches for the management and the exploitation of large volumes of biomedical data that continues to expand in size and in distribution. Although the application of cloud computing in biomedical areas is still preliminary, an increasing number of biomedical applications rely on the Cloud for processing large datasets. This chapter aims investigating the extent to which cloud technology offers a viable platform for developing and deploying applications that support users in searching and integrating information offered by bioinformatics resources. The chapter outlines the basic features that such computing applications should exhibit and the challenging issues they deal with. The architecture and the functionality of the cloud-based environments are presented to stress how cloud platforms could offer added-value service components and flexibility that make their adoption attractive for bioinformatics.

Chapter Preview

Top

Introduction

In recent years, computer advances have played an important role in promoting scientific research in biological areas such as genomics, proteomics and other “-omic” subfields which rely heavily on suitable computational infrastructures for managing large-scale data. In particular, the flood of data from genome sequences has given rise to “bioinformatics”, an interdisciplinary research domain which employs a wide range of computational techniques derived from scientific disciplines (such as statistics, machine learning, applied mathematics etc..) for managing biological data. To get to understand the current application fields of bioinformatics, it is necessary to consider the following aspects.

A first aspect is about the massive production and spread of biological data around the web. Generated within a short period of time and stored in a growing number of web resources, the increasing amount of biological data has introduced new challenges about its management and exploitation.

For example, thanks to next generation sequencing instruments and ICT advances, areas of life sciences that were previously distant from each other (in the ideology, analysis practices, toolkits etc.) are now able to share and analyze data in transparent and reproducible fashion. This interdisciplinary research task calls for the integration of information with multiple levels of granularities from several web resources that often represent information and data in different ways.

In this respect, bioinformatics increasingly deals with providing technical approaches to support interdisciplinary scientific knowledge which relies on working with concepts from different areas in constant evolution and more and more requires experimental techniques, scientific approaches and collaborative management of data (Bosin, Dessì, & Pes, 2007).

A second aspect is about recent advances in computer science that significantly influence the development of computational tools in bioinformatics. Specifically, the service-oriented paradigm has provided a new way of thinking biological resources in terms of computational infrastructures by positioning services as primary functional elements for data integration. Several biomedical organizations (such as the National Center for Biotechnology Information (NCBI) and The National Center for Biomedical Ontology (NCBO)) provide web portals that expose Web services for searching data . Existing techniques for web content classification, search, and visualization seem to be actually inadequate to satisfy the biologist’s needs because accessing these heterogeneous systems from the Internet is not straightforward without the availability of standard and common interfaces.

In this respect, bioinformatics research is devoted to search explicit and automatic ways of joining information to improve the usability of web resources.

Finally, the rapid development of the Internet has provided an opportunity to investigate about the use of state-of-the-art technology for the construction of a new generation of tools that integrate plain data sources, public programmable APIs and any kind of available services. Usually referred to as Web2.0 applications, these tools rely on open APIs or reusable services. The availability of biomedical ontologies dramatically increases the range of benefits and the usages derived from these applications that often support a deeper analysis of data by taking into account the semantic information (Dessì, Pascariello, & Pes, 2014).

Considering the aforementioned aspects, it is clear that bioinformatics research addresses three main challenges namely:

1.
Storing and analyzing large amount of heterogeneous data.
2.
Enabling knowledge extraction from several web resources and collaboration through user-friendly interfaces.
3.
Promoting solutions for offering different categories of services to end-users.

Nowadays, the cloud computing paradigm represents a primary solution to these challenges as it extends the role of the Internet to enable a new form of distributed system for large-scale data processing.

Key Terms in this Chapter

Biomedical Ontology: A structured representation of the knowledge by means of formal naming and definition of the types, properties, and interrelationships of the entities that really exist for biological research.

NoSQL Database: Modern databases conceived for web applications. Unlike a relational database, a NoSQL database does not store data and relationships in tables. Conversely, NoSQL databases are schema-free, distributed and horizontally scalable to clusters of machines i.e. the database is partitioned in a cluster of distributed database servers (each maintaining its own data and a self-contained schema) and it makes easy to add or remove, namely to scale, a single database server.

Application Programming Interface (API): An API specifies how software components interact in terms of its operations, inputs, outputs, and underlying types in a way that is independent of their respective implementations. APIs allows software developers to build programs as a set of building blocks whose functionality is provided by their corresponding APIs.

Computer Platform: The computer hardware and the operating system that conforms a set of standards enabling software developers to deploy software applications for the platform.

RESTful Services: In a distributed computing environment, the Representational State Transfer (REST) architecture allows clients and servers to interoperate by using a standardized interface and protocol using a uniform set of simple and well-defined operations. Resources are manipulated using a fixed set of four create, read, update, delete operations: RESTful web service exposes a set of resources identified by URIs and manipulated using a fixed set of four operations (i.e. create, read, update, delete).

Web 2.0: A recent new vision of the Web that enhances collaboration and interaction of users with each other. Unlike traditional Web sites where users had a passive role in viewing information, Web 2.0 environments (such as social networks sites, blogs, wikis etc.) are grounded on user-generated content and foster people to social media dialogue.

Web Service: A Web Service allows two resources to communicating each other over a network Internet for transferring machine readable file formats such as XML and JSON in a manner prescribed by its interface.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Concerns and Challenges of Cloud Platforms for Bioinformatics

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List