Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Big Data Processing on Cloud Computing Using Hadoop Mapreduce and Apache Spark

Yassir Samadi, Mostapha Zbakh, Amine Haouari

Source Title: Cloud Computing Technologies for Green Enterprises

DOI: 10.4018/978-1-5225-3038-1.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Size of the data used by enterprises has been growing at exponential rates since last few years; handling such huge data from various sources is a challenge for Businesses. In addition, Big Data becomes one of the major areas of research for Cloud Service providers due to a large amount of data produced every day, and the inefficiency of traditional algorithms and technologies to handle these large amounts of data. In order to resolve the aforementioned problems and to meet the increasing demand for high-speed and data-intensive computing, several solutions have been developed by researches and developers. Among these solutions, there are Cloud Computing tools such as Hadoop MapReduce and Apache Spark, which work on the principles of parallel computing. This chapter focuses on how big data processing challenges can be handled by using Cloud Computing frameworks and the importance of using Cloud Computing by businesses

Chapter Preview

Top

Introduction

Cloud Computing and Big Data induce a major transformation in the digital use by all economic sectors companies. Related issues link the activity and job creation within the digital actors, and enable user companies to generate competitiveness gains. Nowadays, the enterprises and organizations are producing and storing data on large scale every day and the rate is dynamic by nature, mainly in the web and online social networks applications, such as Facebook, Twitter, and YouTube, to name a few. The quantitative explosion of digital data has forced researchers and developers to find new ways of seeing and analyzing the world. This is to discover new orders of magnitude concerning acquisition, searching, sharing, storage, analysis and presentation of the data. The main concern of Big Data (Gandomi & Haider, 2015) is storing a tremendous amount of information on a numerical basis that becomes difficult to process with conventional database management tools. Big data is not just data, it is also a set of technologies, architecture, tools and procedures allowing an organization to quickly capture, process and analyze large quantities of heterogeneous data, and extract relevant information at an affordable cost. The main challenges of data-intensive computing are analyzing and processing exponentially growing data volumes for different purposes in a minimum delay. Also, new algorithms which can scale to search and process massive amounts of data should be developed. Several solutions are available to deal with the requirements of Big Data. Among the proposed solutions, there are Cloud Computing tools such as Hadoop MapReduce and Apache Spark.

Hadoop Mapreduce is a framework that has mainly been used to store and analyze a large amount of data. Hadoop was designed for batch processing providing scalability and fault tolerance but not fast performance (Apache Hadoop, 2017). It enables applications to run in thousands of nodes with petabytes of data. Hadoop Mapreduce responds to the large amount of data by splitting up the data elements and assigns each element in a given cluster node for analysis. It follows a similar strategy for computing by breaking jobs into a number of smaller tasks that will be executed in nodes of the cluster. However, Hadoop’s performance is not suitable for real-time applications (SAP Business By Design, 2017) because it writes and reads data from and to an external storage system, e.g., a distributed file system. This generates additional overheads due to data replication and input/output operations on a physical disk, which can increase the application’s execution time. To solve this problem, Matei Zaharia has proposed a new framework called Spark (Zaharia, Chowdhury, Michael, & Shenker, 2010). Spark minimizes these data transfers from and to disk by using effectively the main memory and performing in-memory computations. Also, Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming.

Cloud Computing affirms the ability to scale computing resources as needed without a large upfront investment in infrastructure and with affordable cost. Therefore, Cloud Computing facilitates movement towards Big Data, linked to the need for greater computing capacity and storage of data flow from the increased use of new digital technologies. Consequently, Companies should continue to manage an exponential increase in the volume of generated data (structured, semi- structured or unstructured) and analyze as soon as possible to try to extract value. Cloud Computing and Big Data represent a rapidly developing field, providing many opportunities for value creation.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Big Data Processing on Cloud Computing Using Hadoop Mapreduce and Apache Spark

Abstract

Introduction

Complete Chapter List