Plan and Rules for Data Analysis Success: A Roadmap

Plan and Rules for Data Analysis Success: A Roadmap

Copyright: © 2019 |Pages: 32
DOI: 10.4018/978-1-5225-7609-9.ch008


Adapting the complex big data into your projects will be one of your strengths! Your mission to integrate big data is not limited to the use of sophisticated tools to solve your problems, but you must align the requirements of your activities with data lake or data warehouse through clear and correct strategies, taking into account your business as a goal. This provides support to your companies in all stages of your projects: from defining and taking requirements to start production and subsequent maintenance. Finally, it will help you create sustainable and stable competitive advantages.
Chapter Preview


My freedom thus consists in moving about within the narrow frame that I have assigned myself for each one of my undertakings. . . . Whatever diminishes constraint diminishes strength. The more constraints one imposes, the more one frees one’s self of the chains that shackle the spirit. Stravinsky (1942, p 65)

Today we are witnessing a strong enthusiasm around the theme of big data. Publications of different natures and demonstrations multiply and the promises also, without really defining the outline of the phenomenon, to be able to approach it as a real project and not as a fuzzy and ephemeral technological mode.

Big data is an extraordinary opportunity for a company, a sector or even a country. Indeed, it allows having the useful and necessary knowledge at the right time to better manage the growing complexity of the operational. To take full advantage of this large amount of data, the first step is to define the process by which data should be collected, processed and analyzed. Then, we must identify the most appropriate business domain to launch a pilot or a Proof of Concept. Subsequently, we must validate the choice of tools and appropriate technologies and finally build an organization and governance to sustain and enhance the big data initiatives.

If some companies are already engaged in big data initiatives, the difficulty for others, especially small businesses and entrepreneurs, is how and where to start? Here are the important keys to implement when starting a big data project.

These different points that we put forward will allow future entrepreneurs to better understand the experience of value creation based on the big data analytics, as a whole. These tools will shed light on the conditions of success for entrepreneurship in the big data universe and on the different actions to be implemented.


Data Analytics Workflow

Data Analytics, big data, and machine learning are very popular terms in today’s business world. However, perimeters encompassed by each of these terms overlap meaning different things. From a data point of view, big data refers to several Vs, in addition to the three famous Vs, which highlight the ability of traditional tools to process and analyze the available data (collection, storage, analysis, integration, etc.).

In this chapter, I will start by describing the workflow that can be adopted to better explore the data. Typically, I will explain how the data analytics process can be applied when working with big data in general.

Let’s go!

But it should be noticed that, due to its experimental side, the data analytics will empirically run this workflow. As a result, this experimental model of work is not linear, but iterative: The analyst or the entrepreneur, who want work with data, will define a hypothesis, implement it, and then refine it. Usually, a big data analytics process is represented in the following form.

Figure 1.

Data analytics process


If you decide to work with data and launch your proper big data project, you need to have a clear idea of the implementation process to be performed because there are several steps to respect. From the setting up of good questions and the definition of goals to the exploration of the data through the preparation (collection, cleaning …) of that data, until the critical analysis of the results, globally, here is a data analytics workflow:

Key Terms in this Chapter

Business model: A business model is a company's plan for how it will generate revenues and make a profit. It explains what products or services the business plans to manufacture and market, and how it plans to do so, including what expenses it will incur.

Missing Values: Occur when no data value is stored for the variable in an observation.

Data Mining: This practice consists of extracting information from data as the objective of drawing knowledge from large quantities of data through automatic or semi-automatic methods. Data mining uses algorithms drawn from disciplines as diverse as statistics, artificial intelligence, and computer science in order to develop models from data; that is, in order to find interesting structures or recurrent themes according to criteria determined beforehand and to extract the largest possible amount of knowledge useful to companies. It groups together all technologies capable of analyzing database information in order to find useful information and possible significant and useful relationships within the data.

Data Lake: Is a collection of storage instances of various data assets added to the originating data sources. These assets are stored in a near-exact, or even exact, a copy of the source format. The purpose of a data lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse).

Data Analysis: This is a class, of statistical methods, that makes it possible to process a very large volume of data and identify the most interesting aspects of its structure. Some methods help to extract relations between different sets of data, and thus, draw statistical information that makes it possible to describe the most important information contained in the data in the most succinct manner possible. Other techniques make it possible to group data in order to identify its common denominators clearly, and thereby understand them better.

Outliers: An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations.

Natural Language Processing (NLP): An interdisciplinary field of computer science, artificial intelligence, and computational linguistics that focuses on programming computers and algorithms to parse, process, and understand human language.

Complete Chapter List

Search this Book: