Hadoop MapReduce Programming

Hadoop MapReduce Programming

Copyright: © 2019 |Pages: 22
DOI: 10.4018/978-1-5225-3790-8.ch007

Abstract

The second major component of Hadoop is MapReduce. It is the software framework for Hadoop environment. It consists of a single resource manager, one node manager per node, and one application manager per application. These managers are responsible for allocating necessary resources and executing the jobs submitted by clients. The entire process of executing a job is narrated in this chapter. The architecture of MapReduce framework is explained. The execution is implemented through two major operations: map and reduce. The map and reduce operations are demonstrated with an example. The syntax of different user interfaces available is shown. The coding to be done for MapReduce programming is shown using Java. The entire cycle of job execution is shown. After reading this chapter, the reader will be able to write MapReduce programs and execute them. At the end of the chapter, some research issues in the MapReduce programming is outlined.
Chapter Preview
Top

Working Of Mapreduce

The MapReduce framework consists of a single master ResourceManager, one slave NodeManager per cluster-node, and AppMaster per application. The user/application can submit the work to be executed as a job. The input and output of the job are stored in file system. The framework takes care of splitting the job into number of smaller tasks, scheduling the tasks across different nodes and monitoring them. If the task fails, the framework re-executes the job automatically without user intervention. The tasks are normally scheduled in the nodes where data is already present and hence the network bandwidth is properly utilized.

The applications can specify the input/output locations and other job parameters in “job configuration”. Then the client submits the jar/executable file of the job along with its configuration to the ResourceManager. The ResourceManager then:

  • Distributes software/configuration to the slaves

  • Schedules the tasks

  • Monitors the tasks

  • Provides status and diagnostic information to the client

Execution of Job

The various steps in execution of MapReduce job is shown in Figure 1. The job is executed as follows:

Figure 1.

Execution of job in MapReduce

978-1-5225-3790-8.ch007.f01
  • 1.

    The application process submits the job and initiates “Job” object with the help of Java Virtual Machine (JVM).

  • 2.

    The ResourceManager in the Master node checks if any new application comes in.

  • 3.

    The details of new application such as application id are submitted to the ResourceManager.

  • 4.

    The client application copies all needed resources ie. files into HDFS.

  • 5.

    Then the ResourceManager requests Slave nodes for containers.

  • 6.

    The NodeManager in the Slave node creates container.

  • 7.

    The AppMaster in the Slave node requests the needed files from HDFS.

  • 8.

    The requested files are given by HDFS.

  • 9.

    The AppMaster requests the ResourceManager for resources.

  • 10.

    The ResourceManager indicates the slave node with more resources.

  • 11.

    The AppMaster requests the NodeManager in the selected slave node to run the task.

  • 12.

    Then the NodeManager creates a YARN child process to execute the task.

  • 13.

    The YARN child container executes the task.

Complete Chapter List

Search this Book:
Reset