Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Performance Evaluation of Data Intensive Computing In the Cloud

Sanjay P. Ahuja, Bhagavathi Kaza

Source Title: International Journal of Cloud Applications and Computing (IJCAC) 4(2)

DOI: 10.4018/ijcac.2014040103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data is a topic of active research in the cloud community. With increasing demand for data storage in the cloud, study of data-intensive applications is becoming a primary focus. Data-intensive applications involve high CPU usage for processing large volumes of data on the scale of terabytes or petabytes. While some research exists for the performance effect of data intensive applications in the cloud, none of the research compares the Amazon Elastic Compute Cloud (Amazon EC2) and Google Compute Engine (GCE) clouds using multiple benchmarks. This study performs extensive research on the Amazon EC2 and GCE clouds using the TeraSort, MalStone and CreditStone benchmarks on Hadoop and Sector data layers. Data collected for the Amazon EC2 and GCE clouds measure performance as the number of nodes is varied. This study shows that GCE is more efficient for data-intensive applications compared to Amazon EC2.

Article Preview

Top

1. Introduction

Cloud computing has become a viable solution for researchers and organizations for the on growing demanding needs. With the amount of data increasing exponentially across various fields of research like IT, social networking, Science, Engineering applications etc., dependency on the cloud is increasing. There is a need for the researchers to evaluate the performance of the cloud and study the metrics affecting the performance. The present work evaluates the performance of two public clouds Amazon EC2 and GCE which are part of IaaS layer of the cloud. Three data-intensive benchmarks TeraSort, MalStone and CreditStone were used to benchmark the cloud. High CPU instances are chosen for the clouds as data intensive applications need more computing power than memory. Performance of the cloud is studied by varying the data sizes from 1GB, 10GB, 100GB and 1TB across the nodes 1 through 8. Response time is considered to be the primary metric in evaluating the performance for big data applications.

Cloud offers the hardware and software necessary to support an application while providing storage, performance, security and maintenance. Clouds are classified into Public. Private and Hybrid clouds based on the deployment models and Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) based on the service models.

Amazon EC2 is an IaaS cloud service that provides a resizable computing capacity. EC2 supports various operating systems and instance types and Amazon EC2 defines the minimum processing unit, referred to as EC2 Compute Unit (ECU), which is the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor (AWS13, 2013).

Google Compute Engine (GCE) is an open source IaaS cloud service. GCE is a suitable alternative to the Amazon EC2 cloud service. GCE defines the minimum processing unit, referred to as Google Compute Engine Unit (GCEU), which is the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron processor. GCE uses 2.75 GCEU’s to represent the minimum processing power of one logical core.

Big data refers to the collection of large, complex data sets, which can be structured or unstructured, and are difficult to process using traditional relational database management tools. Big data refers to large volumes of data which can be terabytes, petabytes or even xetabytes of data. Apache Hadoop and Sector are open source frameworks used to process big data to produce useful information.

Apache Hadoop is a well known open source framework used for data intensive applications. Apache Hadoop utilizes Master-slave system architecture in which the single master node is responsible for storing and managing the metadata and the multiple slave (worker) nodes process and store the data. Hadoop uses the Hadoop Distribution File System (HDFS), which is a block-based distributed file system, to distribute an application across the nodes in a cluster. Apache Hadoop ensures fault tolerance to prevent data loss in the event of a system failure by storing the same data on three unrelated nodes, by default; however, the number of nodes used for fault tolerance (referred to as the Replication Factor) is configurable.

MapReduce is a programming model used to process large data sets across a distributed collection of nodes in a cluster. Map () and Reduce () are two different functions in which Map () works on a set of inputs to generate the key-value pairs and Reduce () works on the output produced by Map () and sorts them to produce a single output.

Sector, a valid alternative for Hadoop for data intensive applications uses Sphere processing framework. Sector also uses master-slave architecture and ensures fault tolerance. Sector is widely used for WAN since it uses User Datagram Protocol (UDP) which is considered to be faster than TCP across wide area networks.

The remaining sections in the paper discuss the related works in section II, our experimentation in section III followed by results discussion in section IV and conclusions in section V.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Performance Evaluation of Data Intensive Computing In the Cloud

Abstract

1. Introduction

Complete Article List