Simultaneous MultiThreading Microarchitecture

Simultaneous MultiThreading Microarchitecture

Chen Liu (Florida International University, USA), Xiaobin Li (Intel Corporation, USA), Shaoshan Liu (University of California at Irvine, USA) and Jean-Luc Gaudiot (University of California at Irvine, USA)
Copyright: © 2010 |Pages: 31
DOI: 10.4018/978-1-60566-661-7.ch024
OnDemand PDF Download:
List Price: $37.50


Due to the conventional sequential programming model, the Instruction-Level Parallelism (ILP) that modern superscalar processors can explore is inherently limited. Hence, multithreading architectures have been proposed to exploit Thread-Level Parallelism (TLP) in addition to conventional ILP. By issuing and executing instructions from multiple threads at each clock cycle, Simultaneous MultiThreading (SMT) achieves some of the best possible system resource utilization and accordingly higher instruction throughput. In this chapter, the authors describe the origin of SMT microarchitecture, comparing it with other multithreading microarchitectures. They identify several key aspects for high-performance SMT design: fetch policy, handling long-latency instructions, resource sharing control, synchronization and communication. They also describe some potential benefits of SMT microarchitecture: SMT for faulttolerance and SMT for secure communications. Given the need to support sequential legacy code and emerge of new parallel programming model, we believe SMT microarchitecture will play a vital role as we enter the multi-thread multi/many-core processor design era.
Chapter Preview


Ever since the first integrated circuits (IC) were independently invented by Jack Kilby (Nobel Prize Laureate in Physics in 2000) from Texas Instruments and Robert Noyce (co-founder of Intel®) around 50 years ago, we have witnessed an exponential growth of the whole semiconductor industry.

Moore’s Law and Memory Wall

The semiconductor industry has been driven by Moore’s law (Moore, 1965) for about 40 years with the continuing advancements in VLSI technology. Moore’s law states that the number of transistors on a single chip doubles every TWO years, as shown in Figure 11, which is based on data from both Intel® and AMD. A corollary of Moore’s law states that the feature size of chip manufacturing technology keeps decreasing at the rate of one half approximately every FIVE years (a quarter every two years), based on our observation shown in Figure 2.

Figure 1.

Moore’s Law: Transistor count increase

Figure 2.

Moore’s Law: Feature size decrease

As the number of transistors on a chip grows exponentially, we have reached the point where we could have more than one billion transistors on a single chip. For example, the Dual-Core Itanium® 2 from Intel® integrates more than 1.7 billion transistors (Intel, 2006). How to efficiently utilize this huge amount of transistor estate is a challenging task which has recently preoccupied many researchers and system architects from both academia and industry.

Processor and memory integration technologies both follow Moore’s law. Memory latency, however, is drastically increasing relatively to the processor speed. This is often referred to as the “Memory Wall” problem (Hennessy, 2006). Indeed, Figure 3 shows that CPU performance increases at an average rate of 55% per year, while the memory performance increases at a much lower 7% per year average rate. There is no sign this gap will be remedied in the near future. Even though the processor speed is continuously increasing, and processors can handle increasing numbers of instructions in one clock cycle, however, we will continue experiencing considerable performance degradation each time we need to access the memory. Pipeline stalls will occur when the data does not arrive soon enough after it has been requested from the memory.

Figure 3.

Memory wall

Key Terms in this Chapter

Resource Sharing Control: A mechanism which allows the distribution of various resources in the pipeline among multiple threads.

Cache Coherence: The integrity of the data stored in local caches of a shared resource.

Thread-Level Parallelism: A measure of how many of the operation across multiple threads can be performed simultaneously.

Instruction-Level Parallelism: A measure of how many of the operations in a computer program can be performed simultaneously.

Fetch Policy: A mechanism which allows the determination of which thread(s) to fetch instructions from, when executing multiple threads.

Secure Communication: Means by which information is shared with varying degrees of certainty so that third parties cannot know what the content is.

Synchronization: Timekeeping which requires the coordination of events to operate a system in unison.

Fault Tolerance: The property that enables a system (often computer-based) to continue operating properly in the event of the failure of (or one or more faults within) some of its components.

Simultaneous Multithreading: A technique to improve the overall efficiency by executing instructions from multiple threads simultaneously to better utilize the resources provided by modern processor architecture.

Microarchitecture: A description of the electrical circuits of a processor that is sufficient to completely describe the operation of the hardware.

Complete Chapter List

Search this Book: