Data storage requirements have consistently increased over time. According to the latest WinterCorp survey (http://www/WinterCorp.com), “The size of the world’s largest databases has tripled every two years since 2001.” With database size in excess of 1 terabyte, there is a clear need for storage systems that are both cost effective and highly reliable. Historically, large databases are implemented on mainframe systems. These systems are large and expensive to purchase and maintain. In recent years, large data warehouse applications are being deployed on Linux and Windows hosts, as replacements for the existing mainframe systems. These systems are significantly less expensive to purchase while requiring less resources to run and maintain. With large databases it is less feasible, and less cost effective, to use tapes for backup and restore. The time required to copy terabytes of data from a database to a serial medium (streaming tape) is measured in hours, which would significantly degrade performance and decreases availability. Alternatives to serial backup include local replication, mirroring, or geoplexing of data. The increasing demands of larger databases must be met by less expensive disk storage systems, which are yet highly reliable and less susceptible to data loss. This article is organized into five sections. The first section provides background information that serves to introduce the concepts of disk arrays. The following three sections detail the concepts used to build complex storage systems. The focus of these sections is to detail: (i) Redundant Arrays of Independent Disks (RAID) arrays; (ii) multilevel RAID (MRAID); (iii) concurrency control and storage transactions. The conclusion contains a brief survey of modular storage prototypes.
The fifty year old magnetic disk drive [technology] remains a viable storage medium because they can accommodate an excess of 500 Gigabytes (GB), are nonvolatile, inexpensive, have an acceptable random access time (10 milliseconds), and exhibit a Mean Time to Failure (MTTF) exceeding 106 hours. Concurrent with their benefits, disk failures occur frequently in large data centers.
RAID serves to mitigate disk failures in large installations (Chen et al. 1994). RAID level 5 (RAID5) masks the failure of a single disk by reconstructing requested blocks of a failed disk, on demand. Additionally, it automatically reconstructs the contents of the failed disk on a spare disk.
The RAID paradigm is inadequate for Very Large Disk Arrays (VLDA’s) used in data warehousing applications, because the non-disk components may be less reliable than the physical disks. Hierarchical RAID (HRAID) achieves a high reliability by using multiple levels of RAID controllers (Baek et al. 2001). The higher and lower RAID levels of HRAID are specified as RAIDX(M)/Y(N), where X and Y are the RAID level and M and N denote the number of virtual disks or storage nodes (SN’s) at the higher level and physical disks at the lower level. Multilevel RAID (MRAID) was proposed as an alternative to HRAID, because of its two key differences: (i) disks are organized into SN’s (or bricks) at the lower level, this constitutes the smallest replaceable unit (SRU), (ii) the association among SN’s is logical and dynamic rather than hardwired (Thomasian 2006).
Each SN consists of an array of disks, an array controller, a partially nonvolatile cache, and the capability to interconnect. SN costs are kept low, in some designs, by using mirroring to protect data on each SN.
The internal structure of a brick in IBM’s Intelligent Brick prototype is illustrated in (Wilcke et al. 2006) Figure 2. Bricks are cube shaped and communicate via capacitive coupling between insulated flat metal plates at each of its six surfaces. Higher capacities are attained by stacking bricks on top of each other. Gigabit Ethernet is used to provide connectivity to external cubes. A fail-in-place or deferred maintenance system is also postulated.Top
We first review RAID systems, before discussing multilevel RAID. We next describe storage transactions, which are required for the correct operation of the system.