Reference Hub2
Data-Aware Distributed Computing

Data-Aware Distributed Computing

Esma Yildirim, Mehmet Balman, Tevfik Kosar
ISBN13: 9781615209712|ISBN10: 1615209719|EISBN13: 9781615209729
DOI: 10.4018/978-1-61520-971-2.ch001
Cite Chapter Cite Chapter

MLA

Yildirim, Esma, et al. "Data-Aware Distributed Computing." Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management, edited by Tevfik Kosar, IGI Global, 2012, pp. 1-27. https://doi.org/10.4018/978-1-61520-971-2.ch001

APA

Yildirim, E., Balman, M., & Kosar, T. (2012). Data-Aware Distributed Computing. In T. Kosar (Ed.), Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management (pp. 1-27). IGI Global. https://doi.org/10.4018/978-1-61520-971-2.ch001

Chicago

Yildirim, Esma, Mehmet Balman, and Tevfik Kosar. "Data-Aware Distributed Computing." In Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management, edited by Tevfik Kosar, 1-27. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-61520-971-2.ch001

Export Reference

Mendeley
Favorite

Abstract

With the continuous increase in the data requirements of scientific and commercial applications, access to remote and distributed data has become a major bottleneck for end-to-end application performance. Traditional distributed computing systems closely couple data access and computation, and generally, data access is considered a side effect of computation. The limitations of traditional distributed computing systems and CPU-oriented scheduling and workflow management tools in managing complex data handling have motivated a newly emerging era: data-aware distributed computing. In this chapter, the authors elaborate on how the most crucial distributed computing components, such as scheduling, workflow management, and end-to-end throughput optimization, can become “data-aware.” In this new computing paradigm, called data-aware distributed computing, data placement activities are represented as full-featured jobs in the end-to-end workflow, and they are queued, managed, scheduled, and optimized via a specialized data-aware scheduler. As part of this new paradigm, the authors present a set of tools for mitigating the data bottleneck in distributed computing systems, which consists of three main components: a data-aware scheduler, which provides capabilities such as planning, scheduling, resource reservation, job execution, and error recovery for data movement tasks; integration of these capabilities to the other layers in distributed computing, such as workflow planning; and further optimization of data movement tasks via dynamic tuning of underlying protocol transfer parameters.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.