Big Data Computing and the Reference Architecture

Big Data Computing and the Reference Architecture

M. Baby Nirmala, Pethuru Raj
DOI: 10.4018/978-1-4666-5864-6.ch002
(Individual Chapters)
No Current Special Offers


Earlier, the transactional and operational data were maintained in tables and stored in relational databases. They have formal structures and schemas. However, the recent production and flow of multi-structured data has inspired many to ponder about the new ways and means of capturing, collecting, and stocking. E-mails, PDF files, social blogs, musings, tweets, still photographs, videos, office documents, phone call records, sensor readings, medical electronics, smart grids, avionics data, real-time chats, and other varieties of data play a greater role in presenting highly accurate and actionable, timely insights for executives and decision-makers. The chapter provides an insight into the big data phenomenon, its usability and utility for businesses, the latest developments in this impactful concept, and the reference architecture.
Chapter Preview

Big Data Analytics

What is Big Data Analytics?

Big Data Analytics is the process of examining large amounts of data of a variety of types (Big data) to uncover hidden patterns, unknown correlations and other useful information. In other words, Big data Analytics is the use of advanced analytical techniques against very large diverse data sets that includes different types such as Structured/Unstructured and Streaming/Batch and different sizes from terabytes to zettabytes.

Figure 1 shows how a Big data processing is done. By facilitating data scientists and other users to analyze huge volumes of transactional data as well as data from other sources which are left untapped by conventional Business Intelligence(BI) Programs, Big data analytics help the organizations to make better business decisions.

Figure 1.

Big data processing


These other data sources may include Web server logs and Internet click stream data, Social media activity reports, Mobile-phone call detail records and information captured by the sensors.

Some people exclusively associate Big Data and Big data Analytics with unstructured data. Consulting firms like Gartner Inc. and Forrester Research Inc., consider transactions and structured data to be valid forms of Big data.

Big Data Analytics can be done with the software tools commonly used as part of advanced analytics discipline such as Predictive Analytics and Data Mining.

Three Key Technologies for Extracting Business Value from Big Data

  • Information Management: Manage data as a strategic, core asset, with ongoing process control for Big data analytics.

  • High-Performance Analytics: Gain rapid insights from Big Data and the ability to solve increasingly complex problems using more data.

  • Flexible Deployment Options: Choose between options for on-premises or hosted, Software-as-a-Service (SaaS) approaches for Big Data and Big data analytics.

Key Terms in this Chapter

Big Data Analytics: It is the process of inspecting huge amount of varied data to uncover hidden patterns, unknown correlations and to extract valuable information using advanced analytic techniques and business intelligence tools.

Hadoop: It is an open source apache framework for crunching database made of two main components HDFS and MapReduce.

HDFS: HDFS is an important part of Hadoop and is an abbreviation of Hadoop Distributed File System (HDFS), which stores data in files on a cluster of servers.

Map Reducing: Map Reducing is an important part of Hadoop and it is a programming framework for building parallel applications that run on HDFS.

BDRA: Big Data Reference Architecture.

Big Data: Big data is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates, data that would take too much time and cost to load into a relational Database for analysis.

Cloud Analytics: any analytics initiative in which one or more of the following elements is implemented in the cloud” qualifies as Cloud Analytics: Data Sources, Data Models, Processing Applications, Computing Power, Analytic Models, and Sharing or Storage of Results.

NoSQL: Database is also called as Not Only SQL. NoSQL is the name given to a broad set of Databases whose only common thread is that they don't require SQL to process data, although some support both SQL and non-SQL forms of data processing.

Complete Chapter List

Search this Book: