IoT Big Data Architectures, Approaches, and Challenges: A Fog-Cloud Approach

IoT Big Data Architectures, Approaches, and Challenges: A Fog-Cloud Approach

David Sarabia-Jácome (Universitat Politècnica de València (UPV), Spain), Regel Gonzalez-Usach (Universitat Politècnica de València (UPV), Spain) and Carlos E. Palau (Universitat Politècnica de València (UPV), Spain)
Copyright: © 2019 |Pages: 24
DOI: 10.4018/978-1-5225-7432-3.ch008
OnDemand PDF Download:
No Current Special Offers


The internet of things (IoT) generates large amounts of data that are sent to the cloud to be stored, processed, and analyzed to extract useful information. However, the cloud-based big data analytics approach is not completely appropriate for the analysis of IoT data sources, and presents some issues and limitations, such as inherent delay, late response, and high bandwidth occupancy. Fog computing emerges as a possible solution to address these cloud limitations by extending cloud computing capabilities at the network edge (i.e., gateways, switches), close to the IoT devices. This chapter presents a comprehensive overview of IoT big data analytics architectures, approaches, and solutions. Particularly, the fog-cloud reference architecture is proposed as the best approach for performing big data analytics in IoT ecosystems. Moreover, the benefits of the fog-cloud approach are analyzed in two IoT application case studies. Finally, fog-cloud open research challenges are described, providing some guidelines to researchers and application developers to address fog-cloud limitations.
Chapter Preview


The accelerated growth of the Internet of Things (IoT) ecosystems will generate considerably even larger amounts of information in the short-term future. The analysis of IoT data sources brings enormous benefits in the wide range of application areas that IoT covers (e.g. transport and logistics, health, intelligent environments, industries, and smart cities, among many others). IoT is providing many benefits in our well-being, industry and in our economy. But in order to obtain these benefits, IoT data need to be stored, processed, and analyzed. Due to the specific characteristics of data generated from IoT systems, these steps represent a difficult challenge in terms data analysis processing to find patterns, trends, or valuable information. Data sent by sound sensors, video cameras, motion sensors, temperature gauges, and other types of sensing devices describe the wide variety of IoT data sources with different formats and structures.

This heterogeneity represents an important challenge for its analysis, as far as it makes difficult data integration and the extraction of useful information. In addition, the rapid generation of these data flows in real-time exceeds the maximum storage and processing capacity of different kind of applications. For example, audio and video applications produce a very high volume of data per user. If additionally, those applications are deployed on a large scale, the very large volume of data generated cannot be supported by conventional systems. Additionally, other aspects of the data from IoT systems make necessary special management and processing functionality not supported by traditional systems. For these reasons, specific technologies suitable for big data analysis are required for handling IoT data (Chen, Mao, Zhang, & Leung, 2014).

Cloud computing is a technology capable of appropriately handling IoT data storage and processing, as far as it is adequate for big data management. This powerful technology offers resources for complex and large-scale computing hosted on the Internet. Those online resources are provided on-demand and have a secure and easy access. Moreover, cloud computing has technical advantages such as energy efficiency, optimization of resource utilization, elasticity, and flexibility. All these benefits are possible thanks to infrastructure virtualization and distributed computing that enable flexibility, scalability, high availability, and security.

Cloud computing is a fully mature technology that accomplishes the requirements to provide big data storage and analysis. However, it presents some issues handing IoT big data. As far as resources are hosted on the Internet, there is an additional delay caused by the propagation time of data across the network. Many IoT systems are very time-sensitive and require responses in real-time, so this latency can be excessive. Also, IoT big data may overload the network capacity, causing indirectly a poor cloud analysis service and higher delay. For this reason, Cisco, in 2012, introduced the fog computing concept to solve the limitations of an interconnection between IoT and cloud computing. Fog computing is a paradigm that provides distributed computing, storage, and network services using virtualization and non-virtualization technologies in between devices and cloud computing (Bonomi, Milito, Zhu, & Addepalli, 2012). To do this, fog computing places part of this capabilities closer to where the data is generated. In this way, this technology solves the disadvantages of using a centralized infrastructure generally distant from the IoT system. Fog computing allows the creation of geographically distributed services, since it is highly distributed paradigm (Aazam & Huh, 2014). Furthermore, fog computing is capable of performing an efficient IoT data management streaming. Also, it permits the reduction of latency and bandwidth occupation, along with efficient data storage.

This chapter provides an overview of big data architectures, approaches, and solutions for IoT applications.

Key Terms in this Chapter

NoSQL (NO Structured Query Language): A group of database systems which are not based on structured query language.

End-to-End Latency: In the context of cloud services, it refers to the time that takes to receive the response of a service across the network once it is triggered. In the context of IoT and cloud services, it includes the network delay of transmitting the data to the cloud, the processing time of the cloud service and finally the network transport delay of transmitting the information to the IoT system.

ETL (Extraction Transformation and Load): Process to extract, to transform, and to load data from a database to another database.

MQTT (Message Queue Telemetry Transport): A lightweight communication protocol that is typically used by sensors to send telemetry information.

Fog Computing: Highly virtualized paradigm that provides computing and storage services between end devices (typically IoT smart objects) and cloud data centers, at the edge of the network. In this way, part of the processing performed by the cloud services is done within the system that generates big data, sparing the network end-to-end communication delay.

Cloud Computing: Paradigm that offers resources hosted on the Internet for complex and large-scale computing. Those online resources are provided on-demand and have a secure and easy access.

SaaS (Software as a service): A cloud computing model service which provides applications to end users.

CEP (Complex Event Processor): A framework to enable the analysis of a stream of events to find patterns.

IaaS (Infrastructure as a Service): A cloud computing model service that provides online infrastructure resources through virtual machines.

API (Application Programming Interface): External interface of a software that presents a group of well-defined methods, to be used by another software.

Fog Node: Component of the fog computing infrastructure. Edge device networks (e.g. gateways) which, in addition to simple network function, have storage and processing capabilities. Due to this fact, a fog node can perform data preprocessing and part of the cloud computing processing close to the devices within the IoT system, thus enabling a faster response to events than the cloud paradigm.

CoAP (Constrained Application Protocol): a lightweight web transfer protocol appropriated for constrained devices (e.g., sensors) to send telemetry information.

PaaS (Platform as a Service): A cloud computing model service which provides online platform layer resources by supporting operating systems and software for the development of frameworks.

BDaaS (Big Data Platforms as a Service): A cloud computing model service which provides big data platforms for storing, processing and analyzing data to online users.

Complete Chapter List

Search this Book: