Design Space Exploration Using Cycle Accurate Simulator

Design Space Exploration Using Cycle Accurate Simulator

Arsalan Shahid (HITEC University, Pakistan), Bilal Khalid (HITEC University, Pakistan), Muhammad Yasir Qadri (University of Essex, UK), Nadia N. Qadri (COMSATS Institute of Information Technology, Pakistan) and Jameel Ahmed (HITEC University, Pakistan)
DOI: 10.4018/978-1-5225-0287-6.ch004


Multi-Processor System on Chip (MPSoC) architectures have become a mainstream technology for obtaining performance improvements in computing platforms. With the increase in the number of cores, the role of cache memory has become pivotal. An ideal memory configuration is always desired to be fast and large; but, in fact, striking to balance between the size and access time of the memory hierarchy is considered by processor architect. Design space exploration is used for performance analysis of systems and helps to find the optimal solution for obtaining the desired objectives. In this chapter, we explore two design space parameters, i.e., cache size and number of cores, for obtaining the desired energy consumption. Moreover, previously presented energy models for multilevel cache are evaluated by using cycle accurate full system simulator. Our results show that with the increase in cache sizes, the number of cycles required for application execution decreases, and by increasing number of cores, the throughput improve.
Chapter Preview


With the evolution of multicore processor architectures and trend shifting towards parallelism, the role of cache memory has become pivotal (Wei, Shao, Huang, 2016; Geer, 2005; Jacob, Ng & Wang, 2010). Cache memory plays a vital role in modern processor architectures as it reduces the gap between main memory and the processor (González, A., Aliagas, C., & Valero, 2014). It serves to reduce the average time taken by each memory access or in other words cache acts as a buffer between Central Processing Unit (CPU) and main memory. Multi-level cache hierarchy is always beneficial as it bridges processor-memory gap efficiently (Whitham, J., Audsley, N. C., & Davis, 2014). Moreover, this memory hierarchy can reduce up to 50% of the total energy spent by the microprocessor (Segars, 2001). Therefore, cache size has become a critical parameter for a processor architect to choose (Huang, J., Yeluri, S., Frailong, Libby, 2014). This fact has urged researchers to explore cache hierarchy design in terms of energy optimization.

Energy consumption has always been a key concern and desired objective in multicore processor systems (Hennessy & Patterson, 2011). For the processor architect, choice of components to achieve minimum energy consumption is a very important and critical decision to make. Most of the techniques involve verification methodology based on Transaction Level Modeling (TLM) (Ferro, Pierre, Amor, Lachaize, & Lefftz, 2011) or virtualized platforms (Magnusson, Christensson, Eskilson, & others, 2002) to analyze a proposed configuration several times. But, a single configuration can take several hours for complete evaluation. Moreover, the results obtained out of those techniques to explore the optimum design space are not accurate as the simulators are not cycle accurate. Therefore, an efficient technique to propose search/design space parameters using a cycle accurate simulator is required. The problem of exploring configurable parameters to minimize energy consumption is known as Design Space Exploration (DSE) (Silvano, Fornaciari, & Villar, 2014). DSE is used for system optimization, integration and to explore several design parameters. In this paper, we have explored two such parameters:

  • 1.

    The optimum sizes of cache at different levels of memory hierarchy.

  • 2.

    The number of cores for best performance with respect to energy consumption.

We focused on the design space parameters for two level cache memory hierarchies (L1 & L2 cache) and improved the cache energy models presented by M.Y Qadri (Qadri & McDonald-Maier, 2010). Moreover, we evaluated cache energy mathematical models by using different standard benchmarks. These models significantly require less number of parameters which have been estimated by using state-of-the-art cycle accurate Micro Architectural and System Simulator (MARSS) for multicore processors (Patel, Afram, & Ghose, 2011). MARSS gives the exact number of cycles required to execute an instruction. Energy per access of the tag array for the L1 and L2 cache were obtained from HP Labs’ CACTI, i.e., an integrated cache timing, power, and area model tool (D. Tarjan & N. P. Jouppi, 2006).

Complete Chapter List

Search this Book: