First of All, Understand Data Analytics Context and Changes

Source Title: Big Data Analytics for Entrepreneurial Success

DOI: 10.4018/978-1-5225-7609-9.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data marks a major turning point in the use of data and is a powerful vehicle for growth and profitability. A comprehensive understanding of a company's data, its potential can be a new vector for performance. It must be recognized that without an adequate analysis, our data are just an unusable raw material. In this context, the traditional data processing tools cannot support such an explosion of volume. They cannot respond to new needs in a timely manner and at a reasonable cost. Big data is a broad term generally referring to very large data collections that impose complications on analytics tools for harnessing and managing such. This chapter details what big data analysis is. It presents the development of its applications. It is interested in the important changes that have touched the analytics context.

Chapter Preview

Top

Introduction

It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.

Sherlock Holmes. (A Scandal in Bohemia)

Recognizing the big data universe, the opportunities and challenges and the different types of data, its significance and where to look for it, understanding the big data importance and realizing why loads of attention have been paid to the ‘data revolution’ were the mission of the previous section.

But, faced with the volume and the diversification, of data available today, it is essential to develop techniques to make the best use of all of these stocks in order to extract the maximum amount of information. Indeed, a shift is also expected to be made in thinking; this could be about the infrastructure of data but also about business intelligence and analytics.

Applying big data analytics is not about only knowing the R or Python language, or masters the big data technology... It is mainly about knowing why and how applies the different technical tools. The increase in data produced by companies, individuals, scientists and public officials, coupled with the development of IT tools, offers new analytical perspectives. Analysis of the big data requires an investment in computing architecture to store, manage, analyze, and visualize an enormous amount of data.

Today, companies are no longer wondering what a big data strategy can bring them. It is about knowing how to orchestrate and integrate the technological bricks. In short, the speech is also upscale from a technical point of view.

But, that’s not all, because the emergence of big data age is related not only to the several opportunities to investigate areas that were previously hard to examine but also to its challenges and the way this phenomenon is changing businesses opportunities. So, follow along with this chapter to enrich more your understanding about big data context its development, its changes: from descriptive to predictive to perspective and advanced analytics and the promises that it holds.

Before breaking down the process of data analytics in chapter five, and in order to understand big data analytics, it’s necessary to look at what it is and under which circumstances it fall. That’s what will be illustrated in this chapter.

Top

Big Data Analytics: From Descriptive To Predictive And Prescriptive Analysis

In order to understand big data analytics, it’s necessary to look at what it is and under which literature it fall. Many terms in business literature are often related to one another: ‘analytics’, ‘business analytics’, and ‘business intelligence’ (BI). Davenport and Harris (2007) define analytics as:

The extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.

An analytics team often uses their expertise in statistics, data mining, machine learning, and visualization to answer questions and solve problems that management points out.

Analytics can be defined also as (Schniederjans et al, 2014):

A process that involves the use of statistical techniques (measures of central tendency, graphs, and so on), information system software (data mining, sorting routines), and operations research methodologies (linear programming) to explore, visualize, discover and communicate patterns or trends in data.

Business analytics begins with a data set or commonly with a database. As databases grow, they need to be stored somewhere. Technologies such as computer and data warehousing, store data. Database storage areas have become so large that a new term was devised to describe them (Sedkaoui, 2018a).

Stubbs (2011) believes that Business Analytics goes beyond plain analytics, requiring a clear relevance to business, a resulting insight that will be implementable, and performance and value measurement to ensure a successful business result.

Business analytics traditionally covers the technologies and application that companies use to collect mostly structured data from their internal legacy systems. This data is then analyzed and mined using statistical methods and well-established techniques classed as data mining and data warehousing (Chen et al, 2012). Such type of analytics allows businesses to perform two main types (Delen & Demirkan, 2013):

Key Terms in this Chapter

Unsupervised Learning: Unsupervised learning identifies hidden patterns or intrinsic structures in the data. It is used to draw conclusions from datasets composed of labeled unacknowledged input data.

Business Intelligence (BI): Is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.

Scalability: The measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands. Enterprises that are growing rapidly should pay special attention to scalability when evaluating hardware and software.

Supervised Learning: A supervised learning algorithm applies a known set of input data and drives a model to produce reasonable predictions for responses to new data. Supervised learning develops predictive models using classification and regression techniques.

Open Source: A designation for a computer program in which underlying source code is freely available for redistribution and modification.

Big Data: A generic term that designates the massive volume of data that is generated by the increasing use of digital tools and information systems. The term big data is used when the amount of data that an organization has to manage reaches a critical volume that requires new technological approaches in terms of storage, processing, and usage. Volume, velocity, and variety are usually the three criteria used to qualify a database as “big data.”

Exploratory Data Analysis (EDA): In statistics, EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

Return on Investment (ROI): Is a performance measure, used to evaluate the efficiency of an investment or compare the efficiency of a number of different investments. ROI measures the amount of return on an investment, relative to the investment’s cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment. The result is expressed as a percentage or a ratio.

Knowledge: It is a type of know-how that makes it possible to transform information into instructions. Knowledge can either be obtained through transmission from those who possess it or by extraction from experience.

Data Analysis: This is a class of statistical methods that make it possible to process a very large volume of data and identify the most interesting aspects of its structure. Some methods help to extract relations between different sets of data, and thus, draw statistical information that makes it possible to describe the most important information contained in the data in the most succinct manner possible. Other techniques make it possible to group data in order to identify its common denominators clearly, and thereby understand them better.

Analytics: Has emerged as a catch-all term for a variety of different business intelligence (BI) and application-related initiatives. For some, it is the process of analyzing information from a particular domain, such as website analytics. For others, it is applying the breadth of BI capabilities to a specific content area (for example, sales, service, supply chain and so on). In particular, BI vendors use the “analytics” moniker to differentiate their products from the competition. Increasingly, “analytics” is used to describe statistical and mathematical data analysis that clusters, segments, scores and predicts what scenarios are most likely to happen. Whatever the use cases, “analytics” has moved deeper into the business vernacular. Analytics has garnered a burgeoning interest from business and IT professionals looking to exploit huge mounds of internally generated and externally available data.

Data Lake: Is a collection of storage instances of various data assets added to the originating data sources. These assets are stored in a near-exact, or even exact, a copy of the source format. The purpose of a data lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse).

Text Mining: Equivalent to text analytics, text mining is the process of deriving information from text. Text mining usually involves the process of structuring the input text deriving patterns within the structured data, and finally evaluation and interpretation of the output.

Data Mining: This practice consists of extracting information from data as the objective of drawing knowledge from large quantities of data through automatic or semi-automatic methods. Data mining uses algorithms drawn from disciplines as diverse as statistics, artificial intelligence, and computer science in order to develop models from data; that is, in order to find interesting structures or recurrent themes according to criteria determined beforehand and to extract the largest possible amount of knowledge useful to companies. It groups together all technologies capable of analyzing database information in order to find useful information and possible significant and useful relationships within the data.

Key performance indicator (KPI): Is a high-level measure of system output, traffic or other usages, simplified for gathering and review on a weekly, monthly or quarterly basis. Typical examples are bandwidth availability, transactions per second, and calls per user. KPIs are often combined with cost measures (e.g., cost per transaction or cost per user) to build key system operating metrics.

Computer Science: Computer science is the study of how to manipulate, manage, transform, and encode information.

Statistical Inference: Is the process of deducing properties of an underlying distribution by analysis of data. Inferential statistical analysis infers properties about a population: this includes testing hypotheses and deriving estimates. The population is assumed to be larger than the observed data set; in other words, the observed data is assumed to be sampled from a larger population.

Machine Learning: A method of designing a sequence of actions to solve a problem that optimizes automatically through experience and with limited or no human intervention.

Artificial Intelligence: The theory and development of computer systems able to perform tasks that traditionally have required human intelligence.

NoSQL: Is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stands for “not only SQL,” is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference