Grid Data Handling

Grid Data Handling

Alexandru Costan (University Politehnica of Bucharest, Romania)
Copyright: © 2012 |Pages: 28
DOI: 10.4018/978-1-61350-113-9.ch005
OnDemand PDF Download:
$37.50

Abstract

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.
Chapter Preview
Top

Background

Data intensive environments often deal with applications that produce, store and process data in the range of hundreds of megabytes to petabytes and beyond. The data may be structured or unstructured and organized as collections or datasets that are typically stored on mass storage systems (also called repositories) such as tape libraries or disk arrays. These storage resources are geographically dispersed and usually span over different administrative domains. The data sets are maintained independent of the underlying storage systems and are able to include new sites without major effort. The data collections are further accessed by users from different locations. They may create local copies or replicas of the datasets to reduce latencies involved in wide-area data transfers in order to improve application performance and support eventual failures.

Complete Chapter List

Search this Book:
Reset