Architecture for Big Data Storage in Different Cloud Deployment Models

Architecture for Big Data Storage in Different Cloud Deployment Models

Chandu Thota, Gunasekaran Manogaran, Daphne Lopez, Revathi Sundarasekar
DOI: 10.4018/978-1-5225-3142-5.ch008
(Individual Chapters)
No Current Special Offers


Cloud Computing is a new computing model that distributes the computation on a resource pool. The need for a scalable database capable of expanding to accommodate growth has increased with the growing data in web world. More familiar Cloud Computing vendors such as Amazon Web Services, Microsoft, Google, IBM and Rackspace offer cloud based Hadoop and NoSQL database platforms to process Big Data applications. Variety of services are available that run on top of cloud platforms freeing users from the need to deploy their own systems. Nowadays, integrating Big Data and various cloud deployment models is major concern for Internet companies especially software and data services vendors that are just getting started themselves. This chapter proposes an efficient architecture for integration with comprehensive capabilities including real time and bulk data movement, bi-directional replication, metadata management, high performance transformation, data services and data quality for customer and product domains.
Chapter Preview

Cloud Computing

Cloud Computing is the practice of using a network of remote servers hosted on Internet to manage, store and process data rather than a personal computer and local server. In other words, Cloud Computing is a type of computing and is used for the delivery of hosted services over the Internet to manage real time applications (Manogaran, Thota & Kumar, 2016).

Big Data Solutions for Cloud Applications

Cloud Computing and Big Data are combined together to achieve many tasks. Big Data provides many techniques and technologies to process distributed queries across multiple datasets and compute the results in a timely manner.


Big Data

Big Data is high-volume, high-variety and high-velocity information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, process automation and decision making. In other words, “Big Data” originally meant the Volume, Velocity and Variety of data that becomes difficult to process by using traditional data processing platforms and techniques. Recently, data generation sources are increased rapidly such as sensor networks, telescopes, streaming machines, high throughput instruments and these environments produce huge amounts of data. Nowadays, Big Data has been playing an essential role in various environments such as business organizations, healthcare (Lopez & Manogaran, 2017; Manogaran & Manogaran, 2017), industry, scientific research, social networking, natural resource management and public administration. Recently, Big Data is ranked in both ‘‘Top 10 Strategic Technology Trends For 2013’’ and ‘‘Top 10 Critical Tech Trends for the Next Five Years’’ (Savitz, 2012; Savitz, 2013).

Key Terms in this Chapter

Cloud Computing: Cloud Computing is used to connect the computing resources, hardware and access IT managed services with a previously unknown level of ease.

Hybrid Cloud: Hybrid Cloud is a type of Cloud Computing model which uses a mix of on-premises, Private Cloud and third party Public Cloud services.

Scalability: Scalability is another benefit of storing Big Data on the cloud which is also one of the most important benefits of cloud environment.

Elasticity: Elasticity is also considered as the salient feature of storing Big Data on the cloud.

Access Control: Access Control Management provides access requirements for end users and System Administrators (privileged users) who access system, network and application resources.

Data Service Integrator: Data Service Integrator is used to provide Data Virtualization capabilities to rapidly develop and manage federated data services for accessing single views of different information.

Graphics Processing Unit (GPUs): GPU is used as frame buffer to process images and compute the results. In general, GPUs are widely used in graphical processing such as Image and Video Processing, Analyzing, Editing and Computing results.

Infrastructure as a Service (IaaS): Infrastructure as a Service (IaaS) is used to deliver the computer infrastructure on an outsourced basis to support enterprise applications.

Bridge Gate: Bridge Gate provides real time data integration, online data comparison and transactional data replication across heterogeneous systems.

Platform as a Service (PaaS): Platform as a service (PaaS) is a category of Cloud Computing model that provides a platform and environment to allow developers to create software applications using tools supplied by the provider.

Virtualization: Virtualization provides an effective way to process Big Data in cloud. Though Virtualization is not required to process the data in cloud, few software frameworks work based on Virtualization technologies.

Public Cloud: A Public Cloud is a type of Cloud Computing model in which a service provider makes resources, such as Storage, computing resources and applications available to all users or general public over the Internet.

Private Cloud: A Private Cloud is a type of Cloud Computing model that involves a distinct and secure cloud based environment in which only the authorized users can operate.

Big Data: Big Data is high-volume, high-variety and high-velocity information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, process automation and decision making.

Software as a Service (SaaS): Software as a service (or SaaS) is a software distribution model and it is used to deliver the applications over the Internet as a service.

Grid Computing: Grid Computing refers to the interconnected computers to share resources with each other. Grid Computing is used to enlarge computational power and reduce processing time for every job.

Complete Chapter List

Search this Book: