Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Apache Spark

Encyclopedia of Data Science and Machine Learning
An open-source analytical engine for big data processing with an interface for programming entire clusters of implicit data parallelism and fault tolerance.
Published in Chapter:
Sustainable Big Data Analytics Process Pipeline Using Apache Ecosystem
Jane Cheng (UBS, USA) and Peng Zhao (INTELLIGENTRABBIT LLC, USA)
Copyright: © 2023 |Pages: 13
DOI: 10.4018/978-1-7998-9220-5.ch073
Abstract
This article provides a comprehensive understanding of the cutting-edge big data workflow technologies that have been widely applied in industrial applications, covering a broad range of the most current big data processing methods and tools, including Hadoop, Hive, MapReduce, Sqoop, Hue, Spark, Cloudera, Airflow, and GitLab. An industrial data workflow pipeline is proposed and investigated in terms of the system architecture, which is designed to meet the needs of data-driven industrial big data analytics applications concentrated on large-scale data processing. It differs from traditional data pipelines and workflows in its ability of ETL and analytical portals. The proposed data workflow can improve the industrial analytics applications for multiple tasks. This article also provides bid data researchers and professionals with an understanding of the challenges facing big data analytics in real-world environments and informs interdisciplinary studies in this field.
Full Text Chapter Download: US $37.50 Add to Cart
More Results
Open Source Software (OSS) for Big Data
Open Source Software that composes the aggregation and analysis layer of Apache SMACK.
Full Text Chapter Download: US $37.50 Add to Cart
Building a Chatbot for Libraries
An open-source unified analytics engine for large-scale data processing.
Full Text Chapter Download: US $37.50 Add to Cart
Data Gathering, Processing, and Visualization for COVID-19
An open-source analytical engine for big data processing with an interface for programming entire clusters of implicit data parallelism and fault tolerance.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR