Save 10% on All IGI Global Research Books
& OnDemand Individual Chapter & Article DownloadsAvailable exclusively on IGI Global’s Online Bookstore. Offer valid through October 31, 2024

Special Offers
- Save 10% on the IGI Global Online bookstore
  Now through October 31, 2024, save 10% on all IGI Global research books & OnDemand individual chapter & article downloads. IGI Global contributors may stack this discount with their exclusive 50% contributor discount, which is automatically applied when logged into a contributor portal account. Non-contributors may also combine the discount with one other discount, including coupon codes. Not valid on open access processing charges, e-collections, or videos. Discount is not applicable for distributors.
  Explore Books & Chapters
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Data Lake Architecture: A New Repository for Data Engineer

Arvind Panwar, Vishal Bhatnagar

Source Title: International Journal of Organizational and Collective Intelligence (IJOCI) 10(1)

DOI: 10.4018/IJOCI.2020010104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Data is the biggest asset after people for businesses, and it is a new driver of the world economy. The volume of data that enterprises gather every day is growing rapidly. This kind of rapid growth of data in terms of volume, variety, and velocity is known as Big Data. Big Data is a challenge for enterprises, and the biggest challenge is how to store Big Data. In the past and some organizations currently, data warehouses are used to store Big Data. Enterprise data warehouses work on the concept of schema-on-write but Big Data analytics want data storage which works on the schema-on-read concept. To fulfill market demand, researchers are working on a new data repository system for Big Data storage known as a data lake. The data lake is defined as a data landing area for raw data from many sources. There is some confusion and questions which must be answered about data lakes. The objective of this article is to reduce the confusion and address some question about data lakes with the help of architecture.

Article Preview

Top

1. Introduction

Data is the biggest assets after people for business, and it is a new driver of the world economic and social changes for today’s world. The volume of data that enterprise gathering every day is growing rapidly (Bala, Boussaid, & Alimazighi, 2017; Hefer, 2007). Every organization has its own data warehouse to store huge amount of business data. A data warehouse is designed to capture and store business data from another enterprise system for example, inventory system, supply chain management system, customer relationship management system. A data warehouse system allows business users and data analysts to drive values from data and make important decisions to grow their business.

The world is changing with speed of light so new technology has come in market for data storage, data processing, and data analysis. New technologies including streaming data, data from connected devices on internet of things, cloud computing, social media, high tech power grid, is driving a much greater volume of data (CITO research, 2014; Hortonworks, 2014). This greater volume of data is driving higher user’s expectations and globalization of economics. Data generated from above-said resources is not only huge in term of volume but generate with high velocity and variety of data such as structured, unstructured and semi-structured. This kind of generated data is known as Big Data. The traditional data warehouse is not suitable to process and analyze Big Data. Now organizations are understanding that traditional data warehouse technologies can’t match their business need to compete in the ever-growing market.

As a result, every organization is turning toward Apache Hadoop for Big Data storage and gain insights from data. Hadoop is an open-source software which is used for distributed processing and distributed storage of huge amount of data sets on computer clusters commodity hardware. Apache Hadoop provides many services like storage of data, processing of data, data access, data governance, data security, data visualization, and operations. Adoption of Hadoop in organization is growing exponentially, according to Gartner survey in mid-2015, 26% enterprises already deploying and piloting Hadoop for practice next-generation data storage and processing framework. According to survey, 12% is planning to deploy very soon and 7 to 10 percent deploy within a year.

Many organization experiences good success and growth in business with these early pursuits of mainstream Hadoop deployment in healthcare, retail, financial and e-commerce sectors. In starting Hadoop is used as tactical tools instead of strategic tool, because many opposed to replacing data warehouse. They have some questions and doubts about whether Hadoop can match their enterprise services for scalability, security, performance, and availability. But organizations know that they can’t continue with data warehouse due to some challenges which come with advancement in technology.

As technology advancement enterprise data warehouse is not suitable for data storage for current market demand. Enterprise data warehouse works on the concept of schema-on-write architecture, to get data in data warehouse an extraction, transformation, and loading (ETL) process is required (Cha, Park, Kim, Pan, & Shin, 2018; Khine & Wang, 2018). With this architecture, organization design a data model and prepare an analytic plan before loading data. In other words, organization must know in starting, before loading data, how they are planning to use that data, and this is very limiting. Big data analytics want data storage who works on schema-on-read concept in which data is stored in raw format as data generated or in other words, there is no need to prepare an analytic plan before loading data, and no need to know ahead of time how they plan to use that data.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022)

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Data Lake Architecture: A New Repository for Data Engineer

Abstract

1. Introduction

Complete Article List