Data-Aware Distributed Computing

Esma Yildirim, Mehmet Balman, Tevfik Kosar

Source Title: Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management

ISBN13: 9781615209712|ISBN10: 1615209719|EISBN13: 9781615209729

DOI: 10.4018/978-1-61520-971-2.ch001

MLA

Yildirim, Esma, et al. "Data-Aware Distributed Computing." Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management, edited by Tevfik Kosar, IGI Global, 2012, pp. 1-27. https://doi.org/10.4018/978-1-61520-971-2.ch001

APA

Yildirim, E., Balman, M., & Kosar, T. (2012). Data-Aware Distributed Computing. In T. Kosar (Ed.), Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management (pp. 1-27). IGI Global. https://doi.org/10.4018/978-1-61520-971-2.ch001

Chicago

Yildirim, Esma, Mehmet Balman, and Tevfik Kosar. "Data-Aware Distributed Computing." In Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management, edited by Tevfik Kosar, 1-27. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-61520-971-2.ch001

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

With the continuous increase in the data requirements of scientific and commercial applications, access to remote and distributed data has become a major bottleneck for end-to-end application performance. Traditional distributed computing systems closely couple data access and computation, and generally, data access is considered a side effect of computation. The limitations of traditional distributed computing systems and CPU-oriented scheduling and workflow management tools in managing complex data handling have motivated a newly emerging era: data-aware distributed computing. In this chapter, the authors elaborate on how the most crucial distributed computing components, such as scheduling, workflow management, and end-to-end throughput optimization, can become “data-aware.” In this new computing paradigm, called data-aware distributed computing, data placement activities are represented as full-featured jobs in the end-to-end workflow, and they are queued, managed, scheduled, and optimized via a specialized data-aware scheduler. As part of this new paradigm, the authors present a set of tools for mitigating the data bottleneck in distributed computing systems, which consists of three main components: a data-aware scheduler, which provides capabilities such as planning, scheduling, resource reservation, job execution, and error recovery for data movement tasks; integration of these capabilities to the other layers in distributed computing, such as workflow planning; and further optimization of data movement tasks via dynamic tuning of underlying protocol transfer parameters.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Data-Aware Distributed Computing

MLA

APA

Chicago

Export Reference

Abstract

Request Access