Big Data Analytics Demystified

Big Data Analytics Demystified

Pethuru Raj (IBM India Pvt Ltd, India)
DOI: 10.4018/978-1-4666-5864-6.ch003
OnDemand PDF Download:
List Price: $37.50


This chapter is mainly crafted in order to give a business-centric view of big data analytics. The readers can find the major application domains / use cases of big data analytics and the compelling needs and reasons for wholeheartedly embracing this new paradigm. The emerging use cases include the use of real-time data such as the sensor data to detect any abnormalities in plant and machinery and batch processing of sensor data collected over a period to conduct failure analysis of plant and machinery. The author describes the short-term as well as the long-term benefits and find and nullify all kinds of doubts and misgivings on this new idea, which has been pervading and penetrating into every tangible domain. The ultimate goal is to demystify this cutting-edge technology so that its acceptance and adoption levels go up significantly in the days to unfold.
Chapter Preview

The Unwrapping Of Big Data Computing

We have discussed about the fundamental and fulsome changes happening in the IT and business domains. Service-enablement of applications, platforms, infrastructures and even everyday devices besides the varying yet versatile connectivity methods has laid down strong and simulating foundations for man as well as machine-generated data. The tremendous rise in data collection along with all the complications has instinctively captivated both business and IT leaders to act accordingly to take care of this huge impending and data-driven opportunity for any growing corporates. This is the beginning of the much-discussed and discoursed big data computing discipline. This paradigm is getting formalized with the deeper and decisive collaboration amongst product vendors, service organizations, independent software vendors, system integrators, innovators, and research organizations. Having understood the strategic significance, all the different and distributed stakeholders have come together in complete unison in creating and sustaining simplifying and streamlining techniques, platforms and infrastructures, integrated processes, best practices, design patterns, and key metrics to make this new discipline pervasive and persuasive. Today the acceptance and activation levels of big data computing are consistently on the climb. However it is bound to raise a number of critical challenges but at the same time, it is to be highly impactful and insightful for business organizations to confidently traverse in the right route if it is taken seriously. The continuous unearthing of integrated processes, platforms, patterns, practices and products are good indications for the bright days of big data phenomenon.

Key Terms in this Chapter

Value: All that available data will create a lot of value for organizations, societies and consumers. Big data means big business and every industry will reap the benefits from big data.

Apache Hadoop: Apache Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.

HBase: An open source, non-relational, distributed database running in conjunction with Hadoop.

HBase: HBase is the mainstream Apache Hadoop database.

NewSQL: An elegant, well-defined database system that is easier to learn and better than SQL. It is even newer than NoSQL.

Visualization: With the right visualizations, raw data can be put to use. Visualizations of course do not mean ordinary graphs or pie-charts. They mean complex graphs that can include many variables of data while still remaining understandable and readable.

Volume: The amount of data, ranging from megabytes to brontobytes.

Veracity: Organizations need to ensure that the data is correct as well as the analyses performed on the data are correct. Veracity refers to the correctness of the data.

Big Data: Big data is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates, data that would take too much time and cost to load into a relational Database for analysis.

Variety: Data today comes in many different formats: structured data, semi-structured data, unstructured data and even complex structured data.

MapReduce: MapReduce is a software framework for processing vast amounts of data by using divide and conquare method.

Appliance: Appliances (hardware and virtual) are being prescribed as a viable and value-adding approach for scores of business-critical application infrastructure solutions such as service integration middleware, messaging brokers, security gateways, load balancing, etc.

NoSQL: Sometimes referred to as ‘Not only SQL’ as it is a database that doesn’t adhere to traditional relational database structures. It is more consistent and can achieve higher availability and horizontal scaling.

Big Data Analytics: Big Data Analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information using advanced analytic techniques.

Hadoop: An open-source framework that is built to enable the process and storage of big data across a distributed file system.

Velocity: The speed at which the data is created, stored, analysed and visualized.

Complete Chapter List

Search this Book: