Security and Privacy Issues of Big Data

Security and Privacy Issues of Big Data

José Moura, Carlos Serrão
DOI: 10.4018/978-1-4666-8505-5.ch002
(Individual Chapters)
No Current Special Offers


This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.
Chapter Preview


The Big Data is an emerging area applied to manage datasets whose size is beyond the ability of commonly used software tools to capture, manage, and timely analyze that amount of data. The quantity of data to be analyzed is expected to double every two years (IDC, 2012). All these data are very often unstructured and from various sources such as social media, sensors, scientific applications, surveillance, video and image archives, Internet search indexing, medical records, business transactions and system logs. Big data is gaining more and more attention since the number of devices connected to the so-called “Internet of Things” (IoT) is still increasing to unforeseen levels, producing large amounts of data which needs to be transformed into valuable information. Additionally, it is very popular to buy on-demand additional computing power and storage from public cloud providers to perform intensive data-parallel processing. In this way, security and privacy issues can be potentially boosted by the volume, variety, and wide area deployment of the system infrastructure to support Big Data applications.

As Big Data expands with the help of public clouds, traditional security solutions tailored to private computing infrastructures, confined to a well-defined security perimeter, such as firewalls and demilitarized zones (DMZs) are no more effective. Using Big Data, security functions are required to work over the heterogeneous composition of diverse hardware, operating systems, and network domains. In this puzzle-type computing environment, the abstraction capability of Software-Defined Networking (SDN) seems a very important characteristic that can enable the efficient deployment of Big Data secure services on-top of the heterogeneous infrastructure. SDN introduces abstraction because it separates the control (higher) plane from the underlying system infrastructure being supervised and controlled. Separating a network's control logic from the underlying physical routers and switches that forward traffic allows system administrators to write high-level control programs that specify the behavior of an entire network, in contrast to conventional networks, whereby administrators (if allowed to do it by the device manufacturers) must codify functionality in terms of low-level device configuration. Using SDN, the intelligent management of secure functions can be implemented in a logically centralized controller, simplifying the following aspects: enforcement of security policies; system (re)configuration; and system evolution. The robustness drawback of a centralized SDN solution can be mitigated using a hierarchy of controllers and/or through the usage of redundant controllers at least for the most important system functions to be controlled.

Key Terms in this Chapter

IPS: Intrusion Prevention Systems are a subset of IDS that besides the detection of malicious activity can also block that activity from occurring.

Big Data: The term that represents data sets that are extremely large to handle through traditional methods. Big data represents information that has such a high volume, velocity, variety, variability, veracity and complexity that require specific mechanisms in order to produce value from it.

IDS: Intrusion Detection System is a system that actively monitors networks or other systems for security policy violations or unusual activities.

SDN: Software Defined Networking allows network administrators to manage network services through the decoupling of the traffic sending decisions (control) system from the underlying (data) traffic forwarding systems. Some advantages of using SDN are decreasing the maintenance cost and fostering innovation on the networking infrastructure.

SCADA: Supervisory Control and Data Acquisition refers to systems that are used to control infrastructure processes (for instance, electrical power supply), facility-based processes (for instance, airports) or industrial processes (for instance, production).

JSON: Although originated from Javascript, the Javascript Object Notation is a language-independent and open data format that can be used to transmit human-readable text-based object information, across domains, using an attribute-value pair’s notation.

BYOD: Abbreviation of the term for Bring Your Own Device representing the policy that allows employees to bring their own personal mobile devices to their workplace, and make use of the company information and applications.

DMZ: A Demilitarized Zone, also known as perimeter network, used to create a physical or logical separation between the organization internal and external-facing services to a public network, for instance, the Internet. An outside network device can only get access to the services on the organization DMZ.

IoT: The term refers to the Internet of Things, representing a network of devices that are integrated and operate with the surrounding environment, enabling the communication with other systems or with each other to improve the offered value to customers.

Complete Chapter List

Search this Book: