Article Preview
Top1. Introduction
Data is the biggest assets after people for business, and it is a new driver of the world economic and social changes for today’s world. The volume of data that enterprise gathering every day is growing rapidly (Bala, Boussaid, & Alimazighi, 2017; Hefer, 2007). Every organization has its own data warehouse to store huge amount of business data. A data warehouse is designed to capture and store business data from another enterprise system for example, inventory system, supply chain management system, customer relationship management system. A data warehouse system allows business users and data analysts to drive values from data and make important decisions to grow their business.
The world is changing with speed of light so new technology has come in market for data storage, data processing, and data analysis. New technologies including streaming data, data from connected devices on internet of things, cloud computing, social media, high tech power grid, is driving a much greater volume of data (CITO research, 2014; Hortonworks, 2014). This greater volume of data is driving higher user’s expectations and globalization of economics. Data generated from above-said resources is not only huge in term of volume but generate with high velocity and variety of data such as structured, unstructured and semi-structured. This kind of generated data is known as Big Data. The traditional data warehouse is not suitable to process and analyze Big Data. Now organizations are understanding that traditional data warehouse technologies can’t match their business need to compete in the ever-growing market.
As a result, every organization is turning toward Apache Hadoop for Big Data storage and gain insights from data. Hadoop is an open-source software which is used for distributed processing and distributed storage of huge amount of data sets on computer clusters commodity hardware. Apache Hadoop provides many services like storage of data, processing of data, data access, data governance, data security, data visualization, and operations. Adoption of Hadoop in organization is growing exponentially, according to Gartner survey in mid-2015, 26% enterprises already deploying and piloting Hadoop for practice next-generation data storage and processing framework. According to survey, 12% is planning to deploy very soon and 7 to 10 percent deploy within a year.
Many organization experiences good success and growth in business with these early pursuits of mainstream Hadoop deployment in healthcare, retail, financial and e-commerce sectors. In starting Hadoop is used as tactical tools instead of strategic tool, because many opposed to replacing data warehouse. They have some questions and doubts about whether Hadoop can match their enterprise services for scalability, security, performance, and availability. But organizations know that they can’t continue with data warehouse due to some challenges which come with advancement in technology.
As technology advancement enterprise data warehouse is not suitable for data storage for current market demand. Enterprise data warehouse works on the concept of schema-on-write architecture, to get data in data warehouse an extraction, transformation, and loading (ETL) process is required (Cha, Park, Kim, Pan, & Shin, 2018; Khine & Wang, 2018). With this architecture, organization design a data model and prepare an analytic plan before loading data. In other words, organization must know in starting, before loading data, how they are planning to use that data, and this is very limiting. Big data analytics want data storage who works on schema-on-read concept in which data is stored in raw format as data generated or in other words, there is no need to prepare an analytic plan before loading data, and no need to know ahead of time how they plan to use that data.