Big Data Processing: Concepts, Architectures, Technologies, and Techniques

Big Data Processing: Concepts, Architectures, Technologies, and Techniques

Can Eyupoglu
DOI: 10.4018/978-1-7998-2142-7.ch005
(Individual Chapters)
No Current Special Offers


Big data has attracted significant and increasing attention recently and has become a hot topic in the areas of IT industry, finance, business, academia, and scientific research. In the digital world, the amount of generated data has increased. According to the research of International Data Corporation (IDC), 33 zettabytes of data were created in 2018, and it is estimated that the amount of data will scale up more than five times from 2018 to 2025. In addition, the advertising sector, healthcare industry, biomedical companies, private firms, and governmental agencies have to make many investments in the collection, aggregation, and sharing of enormous amounts of data. To process this large-scale data, specific data processing techniques are used rather than conventional methodologies. This chapter deals with the concepts, architectures, technologies, and techniques that process big data.
Chapter Preview


Today, data usage changes people’s way of living, working and playing. Enterprises are utilizing generated data in order for enhancing customer experience, being more agile, developing new models and creating sources. The digital world nowadays constitutes the vast majority of people’s daily lives which are based on reaching goods and services, communicating with friends and having fun. The current world economy is substantially dependent on created and stored data. In times to come, this dependence will rise much more with the development and spread of technology. Companies collect large amounts of customer data for supplying more personalization and then consumers use social media, cloud, gaming and personalized services more. Because of this rising dependence on data, the size of the global datasphere will continuously increase. Furthermore, International Data Corporation (IDC) estimates that the global datasphere will reach to 175 zettabytes in 2025 as shown in Figure 1 (Reinsel, Gantz & Rydning, 2018).

In the big data age, the basic features of produced data having complicated structures are enormous volume and high velocity. These data are created via sensors, social networks, online and offline transactions. In the fields of government, business, management, medical and healthcare, smart, informational and related decision making can be performed thanks to efficient processing of big data (Wang, Xu & Pedrycz, 2017).

Figure 1.

Annual size of the global datasphere


To cope with and process such a huge amount of data effectively, new techniques and technologies are emerging day by day (Eyupoglu, Aydin, Sertbas, Zaim & Ones, 2017; Eyupoglu, Aydin, Zaim & Sertbas, 2018). This chapter describes what is required to handle big data. Before discussing big data processing, it is needed to address big data life cycle. For this purpose, firstly, big data life cycle is expressed in this chapter. Then, big data analytics which is essential to design efficient systems in order for processing big data is explained. Afterwards, big data processing using Apache Hadoop including HDFS and MapReduce is clarified in detail. Moreover, the main aim and contribution of this chapter are to provide general information to the researchers who will work in this field.

The rest of this chapter is organized as follows. In the second section, big data life cycle is explained. The third section expresses big data analytics. Big data processing with Apache Hadoop is clarified in the fourth section. Finally, the fifth section concludes the chapter.


Big Data Life Cycle

It is necessary to design effective and efficient systems for processing large-scale data coming from various sources and reaching astronomical speeds in order to overcome different aspects of big data with regard to volume, velocity and variety. Today, data are distributed. Therefore, researchers have focused on developing new techniques to process and store huge amount of data. Cloud computing based technologies like Hadoop MapReduce are investigated for this aim (Mehmood, Natgunanathan, Xiang, Hua & Guo, 2016). The life cycle of big data consists of three main stages shown in Figure 2. This section expresses these stages in detail.

Figure 2.

Life cycle of big data


Complete Chapter List

Search this Book: