Distributed Programming Models for Big Data Analytics

Distributed Programming Models for Big Data Analytics

Rakhi Saxena (Deshbandhu College, University of Delhi, India)
Copyright: © 2014 |Pages: 12
DOI: 10.4018/978-1-4666-5202-6.ch071

Chapter Preview



Given the large volume of data, applications that work on big data need to distribute data on a cluster of processors, and processing has to be carried out in parallel for computation to complete in a reasonable amount of time (Dean, & Ghemawat, 2010). However, building and debugging distributed software remains extremely difficult. Distributed applications require a developer to orchestrate concurrent computation and communication across machines, in a manner that is robust to delays and failures. (Alvaro, Condie, Conway, Elmeleegy, Hellerstein, & Sears, 2010).

Key Terms in this Chapter

Functional Programming: Style of programming in which programs are modeled as the evaluation of expressions.

Big Data: Data that is so large and complex that it cannot be processed using traditional data processing tools or applications.

Map Reduce: A distributed parallel programming model for developing large scale data centric applications designed to run on a cluster of shared nothing commodity computers.

Data Mining: Extraction of interesting, non-trivial, implicit, previously unknown and potentially useful patterns from large dataset.

Distributed Software: Software in which components are spread across computers on a network and the components coordinate their actions by sharing memory or by passing messages.

Directed Acyclic Graph: A collection of vertices and directed edges connecting the vertices to one another in such a way that there is no path that starts at a vertex and loops back to the same vertex.

Shared Nothing: An architecture for connecting computers in a network such that each node is independent and self sufficient. The nodes do not share any memory or disk storage.

Complete Chapter List

Search this Book: