A Conceptual Framework for Big Data Analysis

A Conceptual Framework for Big Data Analysis

Fernando Almeida (University of Porto, Portugal) and Mário Santos (University of Aveiro, Portugal)
DOI: 10.4018/978-1-4666-4526-4.ch011
OnDemand PDF Download:
List Price: $37.50


Big data is a term that has risen to prominence describing data that exceeds the processing capacity of conventional database systems. Big data is a disruptive force that will affect organizations across industries, sectors, and economies. Hidden in the immense volume, variety, and velocity of data that is produced today is new information, facts, relationships, indicators, and pointers that either could not be practically discovered in the past, or simply did not exist before. This new information, effectively captured, managed, and analyzed, has the power to enhance profoundly the effectiveness of government. This chapter looks to the main challenges and issues that will have to be addressed to capture the full potential of big data. Additionally, the authors present a conceptual framework for big data analysis structured in there layers: (a) data capture and preprocessing, (b) data processing and interaction, and (c) auxiliary tools. Each has a different role to play in capturing, processing, accessing, and analyzing big data.
Chapter Preview


The term “big data” has recently grown in prominence as a way of describing the phenomenon of growth in data volume, complexity and disparity. The definition of big data is not totally consensual in literature and there may be some confusion around what it really means. Big data is not just an environment in which accumulated data has reached very large proportions. The word “big” does not just refer to size. If it was just a capacity issue the solution would be relatively simple. Instead, big data refers to environment in which data sets have grown too large to be handled, managed, stored and retrieved in an acceptable timeframe (Slack, 2012).

Big Data can be often characterized by three fundamental factors: volume, velocity, and variety. According to Wilson and Kerber (2011) only fifteen percent of the information today is structured information, or information that is easily stored in relational databases of spreadsheets, with their ordinary columns and rows. However, unstructured information, such as email, video, blogs, call center conversations, and social media, makes up about 85% of data generated today and presents challenges in deriving meaning with conventional business intelligence tools. Information-producing devices, such as sensors, tablets, and mobile phones continue to multiply. Social networking is also growing at an accelerated pace as the world becomes more connected. Such information sharing options represents a fundamental shift in the way people, government and businesses interact with each other.

The characteristics of Big Data will shape the way government organizations ingest, analyze, manage, store, and distribute data across the enterprise and across the ecosystem. Table 1 illustrates the characteristics of Big Data and highlights the difference of “Big Data” from the historical perspective of “normal” data.

Table 1.
Characteristics of big data (Wilson & Kerber, 2011)
VolumeThe sheer amount of data generated or data intensity that must be ingested, analyzed, and managed to make decisions based on complete data analysis.The digital universe is generating a high volume of data, which is expected to increase with exponential growth.Increase in data sources, higher resolution sensors.
VelocityHow fast data is being produced, changed and the speed with which data must be received, understood and processed.Metrics used can be defined in the segments of accessibility, applicable and time value.Increase in data sources, improve throughput connectivity and enhanced computing power of data generating devices.
VarietyThe rise of information coming from new sources both inside and outside the walls of the enterprise or organization creates integration, management, governance, and architectural pressures in IT.The data can be divided in the following segments: structured, unstructured, semistructured, and complexity.Mobile, social media, videos, chat, genomics, and sensors.
VeracityThe quality and provenance of received data.The quality of Bid Data may be good, bad, or undefined due to data inconsistency, incompleteness, ambiguities, latency, deception, and model approximations.Data-based decisions require traceability and justification.

Complete Chapter List

Search this Book: