Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

PaaS Optimization of Apache Applications Using System Parameter Tuning of Big Data Platforms in Distributed Computing

Tanuja Pattanshetti, Vahida Attar

Source Title: International Journal of Distributed Systems and Technologies (IJDST) 11(4)

DOI: 10.4018/IJDST.2020100102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Widely used data processing platforms use distributed systems to process huge data efficiently. The aim of this article is to optimize the platform services by tuning only the relevant, tunable, system parameters and to identify the relation between the software quality metrics. The system parameters of data platforms based on the service level agreements can be defined and customized. In the first stage, the most significant parameters are identified and shortlisted using various feature selection approaches. In the second stage, the iterative runs of applications are executed for tuning these shortlisted parameters to identify the optimal value and to understand the impact of individual input parameters on the system output parameter. The empirical results imply significant improvement in performance and with which it is possible to render the proposed work optimizing the services offered by these data platforms.

Article Preview

Top

1. Introduction

The data which is associated with characteristics like huge velocity, volume and variety is termed as “Big Data”. The expansion of the digital world is further accelerating has rate at which this data is generated. Gantz and Reinsel (2012) have mentioned that as per the IDC prognosis, the digital universe will reach a stupendous size in another year which will be 300 times its size around a decade ago. IBM Big Data and Analytics Hub has presented the fourth V, “Veracity”, which consolidates the vulnerability of information (IBM Big Data & Analytics Hub, “The Four V's of Big Data,” n.d.). With the ascent of such information, the conventional methods fail to meet the demands of real*world applications. The cloud computing technology which makes use of distributed architecture has thus emerged as the one of the prime innovations fulfilling the needs of execution, effectiveness and accessibility. The Apache based data processing platforms like Hadoop (Apache Hadoop, n.d.), Spark (Apache Spark, n.d.) and Storm (Apache Storm, n.d.) are nowadays widely used for processing the big data.

Hadoop data platform performs batch processing by making use of HDFS (Hadoop Distributed File System) and MapReduce (Map and Reduce functions) framework. In distributed environment of large clustered systems, HDFS is used for data storage and processing of data is done using the MapReduce framework (Apache Hadoop, n.d.). Spark data platform is an open source cluster-based framework which provides faster data processing than Hadoopp due to resilient distributed data-set architecture (Apache Spark, n.d.). Storm is a real-time distributed computation system used as a stream processing solution in large clusters (Apache Storm, n.d.).

These widely used data platforms have more than hundreds of configurable system parameters. These parameters are normally tuned to certain default values. The purpose of the research work carried here is to assess the role, the impact of these system parameters and the values to which they are tuned in defining the completion of a given job with improved efficiency. To induce the foremost of every system, it is essential to tune these parameters to the best possible values. In current scenario these values are set according to the instinct and experience of the service provider, subsequently which might not always lead to the most ideal setup for offering services especially in a model like pay-as-you-go.

Earlier work carried out by different researchers suggests two methods currently adopted to find the optimal configuration for a system offering platform-as-a-service (PaaS). The first method involves an exhaustive trial-and-error methodology in which several attempts are made to identify the “best value” for each parameter. This method is intrinsically infeasible looking at the example stated here. Let us assume that service administrator needs to try 10 values for each parameter; for a parameter set of 100 parameters. For this one has to empirically note observations of the magnitude 10¹⁰⁰ making it practically exhaustive. The second approach is using machine learning techniques to find tailor-made parameter values for a system setup (Wang, Xu, & He, 2016) (Trotter, Liu, & Wood, 2017). The second approach is robust and flexible enough although this too involves vast computation.

After identifying the research gaps, this paper proposes the heuristic optimal values for the configurable parameters of Apache framework data platforms. The parameters tuned here are identified using filter and embedded approaches of feature selection techniques (Pattanshetti and Attar, in press). Applying the feature selection technique helped in identifying the reduced feature space for all three data platforms and in eliminating the not-so relevant and redundant features. The commonly identified features by various filter and embedded algorithms eventually made to the final feature space producing the optimal feature set. This optimal feature set is used for tuning, to empirically assess the impact of every input parameter on the output parameter of the respective data platform. The results show significant improved performance when these input parameters are tuned to the heuristic optimal values as compared to when assigned with default values.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024)

Volume 14: 2 Issues (2023)

Volume 13: 8 Issues (2022)

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 4 Issues (2016)

Volume 6: 4 Issues (2015)

Volume 5: 4 Issues (2014)

Volume 4: 4 Issues (2013)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

PaaS Optimization of Apache Applications Using System Parameter Tuning of Big Data Platforms in Distributed Computing

Abstract

1. Introduction

Complete Article List