Chapter Preview
TopBackground
Given the large volume of data, applications that work on big data need to distribute data on a cluster of processors, and processing has to be carried out in parallel for computation to complete in a reasonable amount of time (Dean, & Ghemawat, 2010). However, building and debugging distributed software remains extremely difficult. Distributed applications require a developer to orchestrate concurrent computation and communication across machines, in a manner that is robust to delays and failures. (Alvaro, Condie, Conway, Elmeleegy, Hellerstein, & Sears, 2010).
Key Terms in this Chapter
Functional Programming: Style of programming in which programs are modeled as the evaluation of expressions.
Big Data: Data that is so large and complex that it cannot be processed using traditional data processing tools or applications.
Map Reduce: A distributed parallel programming model for developing large scale data centric applications designed to run on a cluster of shared nothing commodity computers.
Data Mining: Extraction of interesting, non-trivial, implicit, previously unknown and potentially useful patterns from large dataset.
Distributed Software: Software in which components are spread across computers on a network and the components coordinate their actions by sharing memory or by passing messages.
Directed Acyclic Graph: A collection of vertices and directed edges connecting the vertices to one another in such a way that there is no path that starts at a vertex and loops back to the same vertex.
Shared Nothing: An architecture for connecting computers in a network such that each node is independent and self sufficient. The nodes do not share any memory or disk storage.