The Red Storm Architecture and Early Experiences with Multi-Core Processors

The Red Storm Architecture and Early Experiences with Multi-Core Processors

James L. Tomkins, Ron Brightwell, William J. Camp, Sudip Dosanjh, Suzanne M. Kelly, Paul T. Lin, Courtenay T. Vaughan, John Levesque, Vinod Tipparaju
DOI: 10.4018/978-1-4666-0906-8.ch012
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The Red Storm architecture, which was conceived by Sandia National Laboratories and implemented by Cray, Inc., has become the basis for most successful line of commercial supercomputers in history. The success of the Red Storm architecture is due largely to the ability to effectively and efficiently solve a wide range of science and engineering problems. The Cray XT series of machines that embody the Red Storm architecture have allowed for unprecedented scaling and performance of parallel applications spanning many areas of scientific computing. This paper describes the fundamental characteristics of the architecture and its implementation that have enabled this success, even through successive generations of hardware and software.
Chapter Preview
Top

Introduction

In 2001, the U.S. Department of Energy’s National Nuclear Security Administration (NNSA) commissioned Sandia National Laboratories (Sandia) to obtain new computational capability to address mission needs for very high-end computation. After a Request For Information (RFI) failed to provide proposed architectures that met the application scalability and cost requirements for the new system, Sandia issued a Request For Proposals (RFQ) that essentially prescribed in detail the architecture for a new massively parallel computer, dubbed Red Storm. Sandia received proposals from two potential suppliers, but neither proposal met the requirements as laid out in the Statement Of Work (SOW). However, one of the proposers, Cray, Inc., indicated a willingness to engineer a system to Sandia’s architectural specifications and within the cost envelope. Subsequently, Sandia awarded the development contract to Cray, and Sandia and Cray then jointly produced the Red Storm supercomputer system–going from architectural specification to first hardware deployment in approximately 30 months. This extremely short development time was largely enabled by the simple design for scalability and for scalable manufacturability promulgated by Sandia in the architectural specifications.

As part of the contract, Cray was required to develop a commercial product based on the Red Storm architecture. In 2005, Cray introduced the XT3 supercomputing system. Subsequent versions (XT4 and XT5) have been widely deployed in the high-performance computing market; and in 2008, the Cray XT product line became the most successful supercomputer in history with over one thousand cabinets sold. Although national security was a key target, the Red Storm architecture has proven to be effective at solving a wide range of science and engineering problems. These applications include climate change, fusion, material science, structural response, nanomaterials, biology, catalysis, combustion and astrophysics. This paper describes the fundamental characteristics of the Red Storm architecture and its implementation that have enabled this success, even through successive generations of hardware and software.

We previously described our approach to the Red Storm architecture prior to its development (Brightwell et al., 2005). In this paper, we summarize the key points of our approach and provide a retrospective now that the architecture has been widely deployed. The rest of this paper is organized as follows. In the next section, we discuss the history of massively parallel processing (MPP) systems that influenced the development of the Red Storm architecture and enumerate the key characteristics instrumental in its success. In the following section, we describe the hardware components of the architecture and the evolution of the Red Storm machine at Sandia. Following that, the software environment is presented, with a focus on the important factors that enabled scalability and performance across successive generations of hardware. We continue with a discussion of the Cray XT product line, and then provide several examples of application performance on the Sandia Red Storm system and Cray XT systems at Oak Ridge National Laboratory. The final section summarizes the major contributions of this paper.

Complete Chapter List

Search this Book:
Reset