Big Data and Clustering Techniques

Big Data and Clustering Techniques

Jayashree K. (Rajalaskshmi Engineering College, India) and Chithambaramani R. (TJS Engineering College, India)
Copyright: © 2020 |Pages: 9
DOI: 10.4018/978-1-7998-0106-1.ch001

Abstract

Big data has become a chief strength of innovation across academics, governments, and corporates. Big data comprises massive sensor data, raw and semi-structured log data of IT industries, and the exploded quantity of data from social media. Big data needs big storage, and this volume makes operations such as analytical operations, process operations, retrieval operations very difficult and time consuming. One way to overcome these difficult problems is to have big data clustered in a compact format. Thus, this chapter discusses the background of big data and clustering. It also discusses the various application of big data in detail. The various related work, research challenges of big data, and the future direction are addressed in this chapter.
Chapter Preview
Top

Background

Big Data

Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are diverse, complex, and of a massive scale (Hashem, 2015).

  • 1.

    Volume refers to the amount of all types of data generated from different sources and continue to expand. The benefit of gathering large amounts of data includes the creation of hidden information and patterns through data analysis

  • 2.

    Variety refers to the different types of data collected through sensors, smartphones, or social networks. Such data types include video, image, text, audio, and data logs, in either structured or unstructured format.

  • 3.

    Velocity refers to the speed of data transfer. The contents of data constantly change because of the absorption of complementary data collections, introduction of previously archived data or legacy collections, and streamed data arriving from multiple sources (Berman, 2013).

  • 4.

    Value refers to the process of discovering huge hidden values from large datasets with various types and rapid generation (Chen, 2014).

Complete Chapter List

Search this Book:
Reset