Big Data Architecture Components

Big Data Architecture Components

Copyright: © 2019 |Pages: 22
DOI: 10.4018/978-1-5225-3790-8.ch002

Abstract

The previous chapter overviewed big data including its types, sources, analytic techniques, and applications. This chapter briefly discusses the architecture components dealing with the huge volume of data. The complexity of big data types defines a logical architecture with layers and high-level components to obtain a big data solution that includes data sources with the relation to atomic patterns. The dimensions of the approach include volume, variety, velocity, veracity, and governance. The diverse layers of the architecture are big data sources, data massaging and store layer, analysis layer, and consumption layer. Big data sources are data collected from various sources to perform analytics by data scientists. Data can be from internal and external sources. Internal sources comprise transactional data, device sensors, business documents, internal files, etc. External sources can be from social network profiles, geographical data, data stores, etc. Data massage is the process of extracting data by preprocessing like removal of missing values, dimensionality reduction, and noise removal to attain a useful format to be stored. Analysis layer is to provide insight with preferred analytics techniques and tools. The analytics methods, issues to be considered, requirements, and tools are widely mentioned. Consumption layer being the result of business insight can be outsourced to sources like retail marketing, public sector, financial body, and media. Finally, a case study of architectural drivers is applied on a retail industry application and its challenges and usecases are discussed.
Chapter Preview
Top

Background

Big Data can be stored, retrieved, processed and analysed in various ways. This includes many dimensions and requires a high computation model with security and governance. The choice of such an architecture pattern is a challenging task across huge factors. The complexity of Big Data types defines a logical architecture with layers and high level components to obtain a Big Data solution. The logical architecture includes a set of data sources and is relation with atomic patterns by focusing on each aspect for a Big Data solution.

With the beginning of Big Data technologies, organizations started querying, “What kind of insight are possible for business, governance if Big Data technologies comes into existence?” A structured approach is defined based on the dimensions to assess the feasibility of Big Data solution. The dimensions in this approach may include:

  • Volume of the data

  • Variety of data sources, types, and formats

  • Velocity at which the data is generated, i.e. the speed

  • Veracity which is uncertainty or trustworthiness of the data

  • Business value from analyzing the data

  • Governance for the new sources of data and its usage

Figure 1.

Dimensions of Big Data viability

Top

Big Data Architecture

Big Data architecture is for developing reliable, scalable, completely automated data pipelines (Azarmi, 2016). The developed component needs to define several layers in the stack comprises data sources, storage, functional, non-functional requirements for business, analytics engine cluster design etc. as a Big Data solution for any business case (Mysore, Khupat, & Jain, 2013). These set of layers are the critical components for the defining the process from data acquisition to analytics via business/human insight.

The layers define an approach to organize the components with specific functions. Its highly logical and so functions related does not mean that it runs on separate processes. The layers can be given as

  • Big Data sources

  • Data massaging and store layer

  • Analysis layer

  • Consumption layer

Top

Big Data Sources

Data is ubiquitous but it’s hard to discover as required. Data can be collected from all channels for analysis. Many organizations collect data as required and data scientists analyse it for further analytics. The data can vary in various ways of format, origin etc. This defines:

Complete Chapter List

Search this Book:
Reset