What the 3Vs Acronym Didn't Put Into Perspective?

What the 3Vs Acronym Didn't Put Into Perspective?

Copyright: © 2019 |Pages: 33
DOI: 10.4018/978-1-5225-7609-9.ch002

Abstract

Data sizes have been growing exponentially within many companies. Facing this size of data—meta tagged piecemeal, produced in real-time, and arrives in continuous streams from multiple sources—analyzing the data to spot patterns and extract useful information is harder still. This includes the ever-changing landscape of data and their associated characteristics, evolving data analysis paradigms, challenges of computational infrastructure, data quality, complexity, and protection in addition to the data sharing and access, and—crucially—our ability to integrate data sets and their analysis toward an improved understanding. In this context, this second chapter will cover the issues and challenges that are hiding behind the 3Vs phenomenon. It gives a platform to complete the first chapter and proceed to different big data issues and challenges and how to tackle them in the dynamic processes.
Chapter Preview
Top

Introduction

Never trust anything that can think for itself if you cannot see where it keeps its brain.

J.K. Rowling. (Harry Potter and the Chamber of Secrets)

Big data defines the use of technologies and methods to analyze the different and voluminous available data. It is about identifying and making exploitable certain market trends, consumer behavior and so on. It is particularly used by business professionals to refine their targeting and analyze all facets of their consumer behavior. The data resulting from their purchases on the internet or in store, their preferences on social networks and their browsing history on the internet (cookies) thus serve as references to apprehend a global behavior.

Who collects data about customers knows more than the one that does not do that. It is logical. Yet, the benefit is far more important than ‘knowing more’ only. Data collection and analysis allows the business to update models, and to see trends, problems, and possible solutions. ‘The result?’ More loyal customers and a competitive advantage over the long term.

Going to the data culture to improve continuously and explore new use cases, especially under the impetus of the machine learning and artificial intelligence is a reality. But, the immaterial nature of the data formats and the volumes involved make big data a phenomenon that feeds fantasy and maintains public confusion and mistrust.

Data volume, data variety, and data velocity are the three criteria commonly used to define the big data phenomenon. It is the famous “3Vs rule”. This short description suggests that the challenge around the data is only technological. However, it is not. The techniques used to analyze big data are just an extension of business analytics and business intelligence (see chapter four), two disciplines that have existing businesses for a long time.

What lies behind the expression big data is the general awareness, begun in 2012, of the strategic importance of data and the major changes they will bring to companies. Until now, data management was at the service of the business: it was a simple support function, which managers used to enlighten their lantern and guide the strategic choices. Today, things are reversing: the data becomes a strategic resource. The result is an upheaval in our business models, in which the data function and its extraordinary possibilities will drive business activity and create value.

As real gold mine, big data allowing companies to improve their processes, identify the needs of their customers and even anticipate their future consumption. However, to take advantage of this gold mine and exploit it properly, companies will have to ensure compliance with a roadmap, and be attentive to the major issues surrounding the notion of big data.

To engage in big data it is still necessary to take into account the data quality. Given the vast amount of information available, the relevant data must be identifiable and cleansed. Indeed, the databases contain their batch of bad data, these incomplete data or likely to be misinterpreted, which must be rectified. Also, a data quality audit should be the first priority of any big data project. In this respect, the technological solutions are multiple. Automatic correction tools exist for example and make it possible to ensure the relevance of the information collected and analyzed.

Also, one of the current challenges of big data is the development of complex tools to process and better visualize, analyze and protect huge data flows. These data come in bulk and come in various formats. The company must, therefore, invest in data integration solutions and the implementation of scalable infrastructures. Storage, in innovative tools (cloud computing, etc.) is in this respect preponderant. It must be coupled with software that uses sophisticated algorithms that allow the analysis of these large volumes of digital data in real-time.

In addition, most of the data collected by companies to define their strategy come from the private domain. Coming straight from the user accounts, this information affects the relationship between the company and its customers. The question of security around this data is therefore crucial because it engages the responsibility and reputation of the company.

Key Terms in this Chapter

Garbage In, Garbage Out (GIGO): In the field of computer science or information and communications technology refers to the fact that computers, since they operate by logical processes, will unquestioningly process unintended, even nonsensical, input data (“garbage in”) and produce undesired, often nonsensical, output (“garbage out”). The principle applies to other fields as well.

Machine-to-Machine (M2M): Communications is used for automated data transmission and measurement between mechanical or electronic devices. The key components of an M2M system are Field-deployed wireless devices with embedded sensors or RFID-Wireless communication networks with complementary wireline access includes, but is not limited to cellular communication, Wi-Fi, ZigBee, WiMAX, wireless LAN (WLAN), generic DSL (xDSL) and fiber to the x (FTTx).

Analytics: Has emerged as a catch-all term for a variety of different business intelligence (BI) and application-related initiatives. For some, it is the process of analyzing information from a particular domain, such as website analytics. For others, it is applying the breadth of BI capabilities to a specific content area (for example, sales, service, supply chain and so on). In particular, BI vendors use the “analytics” moniker to differentiate their products from the competition. Increasingly, “analytics” is used to describe statistical and mathematical data analysis that clusters, segments, scores and predicts what scenarios are most likely to happen. Whatever the use cases, “analytics” has moved deeper into the business vernacular. Analytics has garnered a burgeoning interest from business and IT professionals looking to exploit huge mounds of internally generated and externally available data.

Algorithm: A set of computational rules to be followed to solve a mathematical problem. More recently, the term has been adopted to refer to a process to be followed, often by a computer.

Web 2.0: This term designates the set of techniques, functions, and uses of the world wide web that has followed the original format of the web. It concerns, in particular, interfaces that allow users with little technical training to appropriate new web functions. Internet users can contribute to information exchanges and interact (share, exchange, etc.) in a simple manner.

Machine Learning: A method of designing a sequence of actions to solve a problem that optimizes automatically through experience and with limited or no human intervention.

Hadoop: Big data software infrastructure that includes a storage system and a distributed processing tool.

Scalability: The measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands. Enterprises that are growing rapidly should pay special attention to scalability when evaluating hardware and software.

Cybersecurity: Also known as computer security or IT security, is the protection of computer systems from the theft or damage to the hardware, software or the information on them, as well as from disruption or misdirection of the services they provide.

Return on Investment (ROI): Is a performance measure, used to evaluate the efficiency of an investment or compare the efficiency of a number of different investments. ROI measures the amount of return on an investment, relative to the investment’s cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment. The result is expressed as a percentage or a ratio.

Smart Data: The flood of data encountered by ordinary users and economic actors will bring about changes in behavior, as well as the development of new services and value creation. This data must be processed and developed in order to become “smart data.” Smart data is the result of analysis and interpretation of raw data, which makes it possible to effectively draw value from it. It is, therefore, important to know how to work with the existing data in order to create value.

Business Intelligence (BI): An umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.

Internet of Things (IoT): The inter-networking of physical devices, vehicles, buildings, and other items embedded with electronics, software, sensors, actuators, and network connectivity that enable these objects to collect and exchange data and send, receive, and execute commands. According to the Gartner group, IoT is the network of physical objects that contain embedded technology to communicate and sense or interact with their internal states or the external environment.

Big Data: A generic term that designates the massive volume of data that is generated by the increasing use of digital tools and information systems. The term big data is used when the amount of data that an organization has to manage reaches a critical volume that requires new technological approaches in terms of storage, processing, and usage. Volume, velocity, and variety are usually the three criteria used to qualify a database as “big data.”

Data Mining: This practice consists of extracting information from data as the objective of drawing knowledge from large quantities of data through automatic or semi-automatic methods. Data mining uses algorithms drawn from disciplines as diverse as statistics, artificial intelligence, and computer science in order to develop models from data; that is, in order to find interesting structures or recurrent themes according to criteria determined beforehand and to extract the largest possible amount of knowledge useful to companies. It groups together all technologies capable of analyzing database information in order to find useful information and possible significant and useful relationships within the data.

Complete Chapter List

Search this Book:
Reset