Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments

Orazio Tomarchio, Giuseppe Di Modica, Marco Cavallo, Carmelo Polito

Source Title: International Journal of Information Technologies and Systems Approach (IJITSA) 11(1)

DOI: 10.4018/IJITSA.2018010102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Advances in the communication technologies, along with the birth of new communication paradigms leveraging on the power of the social, has fostered the production of huge amounts of data. Old-fashioned computing paradigms are unfit to handle the dimensions of the data daily produced by the countless, worldwide distributed sources of information. So far, the MapReduce has been able to keep the promise of speeding up the computation over Big Data within a cluster. This article focuses on scenarios of worldwide distributed Big Data. While stigmatizing the poor performance of the Hadoop framework when deployed in such scenarios, it proposes the definition of a Hierarchical Hadoop Framework (H2F) to cope with the issues arising when Big Data are scattered over geographically distant data centers. The article highlights the novelty introduced by the H2F with respect to other hierarchical approaches. Tests run on a software prototype are also reported to show the increase of performance that H2F is able to achieve in geographical scenarios over a plain Hadoop approach.

Article Preview

Top

1. Introduction

Technologies for big data analysis have arisen in the last few years as one of the hottest trend in the ICT scenario. Several programming paradigms and distributed computing frameworks (Dean & Ghemawat, 2004) have appeared to address the specific issues of big data systems.

Application parallelization and divide-and-conquer strategies are, indeed, natural computing paradigms for approaching big data problems, addressing scalability and high performance.

Furthermore, the availability of grid and cloud computing technologies, which have lowered the price of on-demand computing power, have spread the usage of parallel paradigms, such as the MapReduce (Dean & Ghemawat, 2004), for big data processing.

However, Hadoop, the most known open-source implementation of the MapReduce paradigm, was mainly designed to work on clusters of homogeneous computing nodes belonging to the same local area network: nowadays, more and more frequently, data are generated and stored in a geographically distributed manner, making existing frameworks such as Hadoop no longer suited to effectively process such data (Heintz, Chandra, Sitaraman, & Weissman, 2014).

The critical choice for every system that has to deal with this scenario is either moving the computation close to the data or, vice versa, moving the data to where the computation has to be done. These choices, of course, represent the two extreme possibilities of many other intermediate choices. Moving the data from different sites to a central one may increase latency introducing delay in processing time; similarly, the cost of transferring huge amount of data may be infeasible as well. On the other hand, moving the computation close to the sites where the data reside is not always possible depending on the characteristics of the processing. Data may happen to be stored in sites with very different computing capacities. Having large data to be locally processed by very low-power computing facilities turns to be a big inefficiency; conversely, using a very powerful data center to elaborate only limited amounts of data is an unacceptable waste.

In this work, we propose a Hierarchical Hadoop Framework (H2F) that overcomes the limits showed by the original Hadoop job scheduling algorithm by taking into account the actual heterogeneity of nodes, network links and data distribution among geographically distant sites (Cavallo, Di Modica, Polito, & Tomarchio, 2016). Our approach follows a hierarchical scheme, where a top-level entity takes care of serving a submitted job. The job is split into a number of bottom-level, independent MapReduce sub-jobs that are efficiently scheduled to run on the sites where the data reside.

We believe a hierarchical computing model may help since it decouples the job/task scheduling from the actual computation: this way, the compelling potentiality of Hadoop is exploited at the bottom level while the job scheduling is delegated to the top level. In our work, we introduce a novel job scheduling algorithm which accounts for the discussed inhomogeneity to optimize the job makespan. Unlike previous works, our job scheduling algorithm aims to exploit fresh information continuously sensed from the distributed computing context to guess each job’s optimum execution flow.

Another enhancement we propose with respect to similar works in the literature consists in a novel approach to the study of the job’s application profile, which is an important characteristic of the computing context that may strongly affect the job performance.

A prototype of the H2F system has been developed and deployed in a testbed environment: experiments carried out showed that the H2F system outperforms Hadoop in some scenarios where resources (computing capacity, data distribution, network links) are heterogeneous.

The remainder of the paper is organized as follows. Section 2 provides the motivation for the work and also discusses some related work. In Section 3 we briefly introduce the system design and describe its basic behavior. Section 4 describes the proposed job scheduling algorithm, while in Section 5 the strategy for the application profiling is presented. Section 6 provides the details of the H2F architecture and the role of its components. Section 7 presents the results of the experiments run on the system’s software prototype. Section 8 concludes the work.

Complete Article List

Search this Journal:

Reset

Volume 17: 1 Issue (2024)

Volume 16: 3 Issues (2023)

Volume 15: 3 Issues (2022)

Volume 14: 2 Issues (2021)

Volume 13: 2 Issues (2020)

Volume 12: 2 Issues (2019)

Volume 11: 2 Issues (2018)

Volume 10: 2 Issues (2017)

Volume 9: 2 Issues (2016)

Volume 8: 2 Issues (2015)

Volume 7: 2 Issues (2014)

Volume 6: 2 Issues (2013)

Volume 5: 2 Issues (2012)

Volume 4: 2 Issues (2011)

Volume 3: 2 Issues (2010)

Volume 2: 2 Issues (2009)

Volume 1: 2 Issues (2008)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments

Abstract

1. Introduction

Complete Article List