Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Cluster Pool

Encyclopedia of Data Science and Machine Learning
Cluster pool is a set of idle and ready-to-use instances. This reduces cluster start time as instances are waiting to be used as a part of cluster nodes.
Published in Chapter:
Data Lakes
Anjani Kumar (University of Nebraska at Omaha, USA) and Parvathi Chundi (University of Nebraska at Omaha, USA)
Copyright: © 2023 |Pages: 15
DOI: 10.4018/978-1-7998-9220-5.ch025
Abstract
Data lake (DL) technology is popular for its flexibility to handle different raw data formats at the ingestion time as well as at the time of retrieval from the data lake. It typically includes the following five layers data ingestion, staging, processed data, storage and visualization, and analytics. These five layers together provide access to seemingly infinite computation and storage resources for democratizing data access and for supporting a wide variety of analytics tasks in an enterprise. This work is going to explain the four steps approach for doing the analysis task. It will describe the three pillars for building a DL. Then, it will give a brief history of the evolution from Excel Sheet to DL. It will explain the five layers: data ingestion, staging, processed data, storage and visualization, and analytics. It will briefly explain three DL systems, Snowflake, Databricks, and Redshift, and then nine important metrics for these three DL systems will be compared.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR