The generalization of the big data and new techniques associated with the processing and analysis of large databases is revolutionizing both the scientific work as the management of companies. Applications such as personalized recommendations for Amazon have been a very significant improvement in the purchase experience for consumers. In this work we analyze the possibilities of big data to improve the services offered by companies and the customer experience and increase the efficiency of these companies. The work also examines some aspects associated with the use of big data such as the issues of data privacy and compliance with the regulations on the use of the information.
Top2. Background
At present many lines of research in computational sciences and many new businesses refers to the use of big data. The definition of big data is complex because many of the techniques that accompany these massive databases were known by other names such as data mining. By big data normally means the construction, organization and use of enormous amounts of data to extract relationships or create new forms of value in markets, organizations, public services, etc. This definition should be qualified to understand better the importance of the big data since the domain of these techniques is not only related to the size of the databases that use. In the first place when you think of big data is thought to have an enormous volume of information.
Secondly an application of big data involves the aggregation of information from various sources, which make it particularly important to the process of management and merging of data. The data can come from sensors, GPS of millions of phones, clicks, logs of servers, emails, etc. Therefore it is not a question of numeric data arranged in a standard fashion (for example in tables). The data is very heterogeneous and can include images, texts, sounds, etc. Companies also, by the own heterogeneity of the data, usually prevent data stored in fixed structures such as the classical relational database. The management of the data is done using NoSQL systems (not only SQL) as opposed to the traditional language of SQL queries. This tool is essential when companies work with many gigabytes of data or millions of observations with heterogeneous formats and whose structure may change in time and need to be easily scalable. Some of the tools used to manipulate big data are becoming the industry standard as Hadoop, MapReduce, Pig, None, OpenRefine, Hive, HBase, Mahout, ZooKeeper or Impala. The vast majority of these tools have as objective to allow the parallel processing necessary when working with huge databases.
Thirdly the information used usually has a very heterogeneous level of signal-noise ratio, although there is much more noise than in typical applications that use administrative data, surveys or internal information systems of business organizations (Grossman & Siegel (2014)). Fourthly the objective of the techniques of big data in general is not to discover causalities but produce predictive models. By contrast with the fundamental vision that is explained in the courses of traditional statistics and econometrics, in big data only matters correlations while the causality is irrelevant. Finally, the use and analysis of information tends to occur at a very high speed (Kruschwitz (2011).