MapReduce and YARN API

MapReduce and YARN API

Copyright: © 2019 |Pages: 22
DOI: 10.4018/978-1-5225-3790-8.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Apache Hadoop includes Java APIs for different functions on a HDFS file system like creation of a file, renaming, deletion, and to set read-write permissions for directories. This can be done on a single and cluster of systems. In addition, REST (REpresentational State Transfer) APIs is a collection of web services to provide interoperability between a single system and an interconnected distributed network. REST is chosen for its speedy performance, scalability, simplicity, and reliability. YARN REST and MapReduce REST APIs are briefly discussed in this chapter. YARN web service REST API includes URI resources through which the cluster information, nodes, and application information can be accessed. YARN is comprised of Resource manager, node manager, and timeline REST APIs. The application has HTTP request as resource and the response can be in the form XML or JSON. The request URI, response status, header, and body are defined in actual format. Similarly, the REST API is used for MapReduce that comprises the details about the jobs running with the information such as number of tasks, counters, and attempts. Hence, the REST APIs on YARN and resource manager create small modules as a response when a resource is requested. An outline of the research and growth of REST APIs is included in this chapter.
Chapter Preview
Top

Background

REpresentational State Transfer (REST) is web service to provide interoperability between a single system and an interconnected distributed network. It allows the requesting system to access and manipulate data of web resources using uniform set of stateless operations. In a REST API, requests will be in the form of resource URI which may elicit a response in XML, JSON, and HTML etc… REST is chosen for its fast performance, scalability, visibility, simplicity, reliability, reusing components and gets the system updated without affecting it. The services which adhere to the architectural constraints and properties of REST APIs utilize it. One such case is that Hadoop services that combine the architecture of YARN and Mapreduce based RESTful APIs.

REST uses HTTP protocol for communication in the web world. URI is used for communication among the resources of RESTful services. The HTTP methods supported are:

  • 1.

    GET: Read a resource (Read only).

  • 2.

    PUT: Create new resource.

  • 3.

    POST: Update an existing resource or create a new resource.

  • 4.

    DELETE: Remove a resource.

  • 5.

    OPTIONS: Get the supported operations on the resource.

Top

Yarn Rest Apis

Hadoop YARN web service REST APIs includes set of URI resources through which the cluster information, nodes and application information can be accessed. The resources can be grouped based on type of information. Some can be together while others using URI resource

The URI of REST based web service is:

                         http://{http address of service}/ws/{version}/ {resourcepath}

where,

  • {http address of service}: The http address of the service to get information. It can be ResourceManager, NodeManager, MapReduce application master, and history server.

  • {version}: The version of the APIs.

  • {resourcepath}: A path that defines a resource or as collection

To cite REST API, the application begins with a HTTP operation on the URI linked with the resource. GET is used to retrieve the information about resource specified. The headers in HTTP can be Accept or Accept-Encoding. Accept supports XML and JSON in response whereas Accept-Encoding supports gzip compressed format.

The format of JSON request and response with a single resource can be,

HTTP Request:

GET  http://rmhost.domain:8088/ws/v1/cluster/app/application\ _1324325439346\_0001

Response Status Line:

HTTP/1.1 200 OK

Response Header:

  HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26)

Response Body:

{
app”:   {
“id”:”application_1324057493980_0001”,
“user”:”user1”,
“name”:””,
“queue”:”default”,
“state”:”ACCEPTED”,
“finalStatus”:”UNDEFINED”,
“progress”:0,
“trackingUI”:”UNASSIGNED”,
“diagnostics”:””,
“clusterId”:1324325439346,
“startedTime”:1324057495921,
“finishedTime”:0,
“elapsedTime”:2063, “amContainerLogs”:”http:\/\/amNM:2\/node\/containerlogs\/container_ 1324057493980_0001_01_000001”,
“amHostHttpAddress”:”amNM:2”
}   }
Top

Resource Manager Rest Api

Resource Manager REST APIs permit the user to get cluster information in Hadoop environment i.e. the status of cluster, scheduler information, node information and about application in cluster. The cross-origin support can be enabled for the resource manager using the two configurations:

Complete Chapter List

Search this Book:
Reset