Since the new century, with the extensive growth and significant development of digital technology and network technology, the amount of data generated by human society has grown exponentially, doubling approximately every two years. It is expected that in 2020, the world will have a total of 35ZB of data, and humans will inevitably usher in an era of big data.
2.1 Concepts and Characteristics of Big Data
Big data refers to massive data with high growth rate and diversified data structures that need to be processed more efficiently. (Zhu et al.,2016). There are multiple definitions for big data, such as the following two:
The definition given by research institution Gartner is that Big data is a massive, high-growth and diversified information asset that needs new processing methods to gain stronger decision-making ability, insight and discovery ability and process optimization ability.
The definition given by the McKinsey Global Institute is A data set that is large enough to exceed the capabilities of traditional database software in terms of collection, acquisition, storage, management, and analysis.
Big data shows the characteristics of “4V + 1C”, in which “4V” refers to Volume, Variety, Velocity and Value, “1C” refers to complexity (Osman,2019).
Large volume is the most significant feature of big data that distinguishes traditional data. According to IDC, the volume of global data nearly doubles biennially; Over the last two years, people have produced as much data as those in the entire previous history of human race. While general relational database processes the data in terabytes, data processing in big data is usually above petabytes.
There are many data types of big data, including structured data, semi-structured data and unstructured data. Compared with structured data, which is mainly based on text, log, audio, video, pictures, unstructured data imposes additional requirements on data processing capacity.
Fast processing speed is another significant feature that set big data apart from traditional ones. Only real-time analysis of massive data can reflect the value of big data. IDC's “digital universe” report predicts that the amount of data stored in electronic form worldwide will reach 35.2 ZB by 2020, and processing efficiency will be the key to measuring the level of technology in the face of such a huge amount of data.
Big Data tends to have low value density. Its value density is inversely proportional to the volume of data. For example, in a one-hour monitoring process, the potentially useful data is only for one second. Therefore, in the context of big data, the core problem to be solved is how to extract useful data and explore its potential value more quickly through powerful computer algorithms.
As Big Data is of huge volume with multiple sources and high complexity, processing and analysis of data tend to be more difficult to handle.