XtreemFS: A File System for the Cloud

XtreemFS: A File System for the Cloud

Jan Stender (Zuse Institute Berlin, Germany), Michael Berlin (Zuse Institute Berlin, Germany) and Alexander Reinefeld (Zuse Institute Berlin, Germany)
Copyright: © 2013 |Pages: 19
DOI: 10.4018/978-1-4666-3934-8.ch016
OnDemand PDF Download:
$37.50

Abstract

Cloud computing poses new challenges to data storage. While cloud providers use shared distributed hardware, which is inherently unreliable and insecure, cloud users expect their data to be safely and securely stored, available at any time, and accessible in the same way as their locally stored data. In this chapter, the authors present XtreemFS, a file system for the cloud. XtreemFS reconciles the need of cloud providers for cheap scale-out storage solutions with that of cloud users for a reliable, secure, and easy data access. The main contributions of the chapter are: a description of the internal architecture of XtreemFS, which presents an approach to build large-scale distributed POSIX-compliant file systems on top of cheap, off-the-shelf hardware; a description of the XtreemFS security infrastructure, which guarantees an isolation of individual users despite shared and insecure storage and network resources; a comprehensive overview of replication mechanisms in XtreemFS, which guarantee consistency, availability, and durability of data in the face of component failures; an overview of the snapshot infrastructure of XtreemFS, which allows to capture and freeze momentary states of the file system in a scalable and fault-tolerant fashion. The authors also compare XtreemFS with existing solutions and argue for its practicability and potential in the cloud storage market.
Chapter Preview
Top

Introduction

Cloud computing is emerging as a pioneering paradigm for service hosting and on-demand computing. By providing computation and storage as a utility without requiring an up-front commitment by users, clouds offer a flexible alternative to traditional solutions that are built upon dedicated hardware (Armbrust et al., 2009).

Especially with respect to data storage, cloud computing presents various new challenges. From a user’s point of view, a storage cloud is expected to behave like a reliable and exclusively owned storage resource with unlimited capacity. Storage clouds convey this impression by addressing issues like elasticity, isolation, availability, and robustness:

  • Elasticity: Providers of cloud computing systems aim to lead users to believe that unlimited amounts of resources are available on demand. Accordingly, they must be able to dynamically adjust the scale of the underlying hardware installation, subject to the demand of their users. In the context of cloud computing, such on-demand scalability is referred to as elasticity.

An elastic storage cloud must be able to extend or shrink the capacity by means of adding or removing resources. In doing so, new resources must be able to become seamlessly integrated into an existing storage installation. As a consequence, cloud storage systems require a distributed system architecture that comprises a network of loosely coupled, independent storage components.

  • Isolation: To handle accesses from many users in a cost-efficient manner, users of a cloud share the same pool of physical resources. Accordingly, a storage cloud implements a many-to-many relationship between users and storage devices, regardless of any trust relationships between users. Storage devices may further reside in an untrusted environment, where data is exposed to the threat of unauthorized access, which may affect privacy and integrity. To take potential security concerns of cloud users into account, storage clouds have to meet high standards in terms of security and data isolation. This involves a comprehensive security infrastructure with support for a secure authentication and authorization of users as well as the encryption of stored data.

  • Availability and Robustness: To encourage users to submit their data to a storage cloud, the cloud provider has to guarantee that all data is safely stored and available at any time. Data access must be impervious to network partitionings and downtimes as well as permanent failures of system components, which may occur as a consequence of misconfiguration, power cuts, hardware failures or disasters. Such outages have to be handled internally by the storage system and hidden from users to the best possible extent. Robustness especially matters in connection with elasticity, as failures on the underlying hardware layer have proven to be the norm rather than the exception on large-scale storage installations (Ghemawat, Gobioff, & Leung, 2003).

The advent of cloud computing as a new paradigm for utility computing and the resulting new challenges have leveraged a variety of novel cloud storage systems (e.g., Amazon S31, Windows Azure2 or Google Cloud Storage3). Most of them offer a proprietary, typically HTTP-based interface with vendor-specific semantics, which are incompatible with those of traditional file systems (e.g. POSIX), thereby requiring users to adapt their application to the underlying cloud storage infrastructure. This can be a tedious and time-consuming task for cloud users, not only because it involves a learning effort to get acquainted to new data management schemes and APIs, but also because adapting existing applications to a specific data management framework involves a potentially large additional programming and maintenance effort.

Complete Chapter List

Search this Book:
Reset