Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Exploring Vectorization and Prefetching Techniques on Scientific Kernels and Inferring the Cache Performance Metrics

J. Saira Banu, M. Rajasekhara Babu

Source Title: International Journal of Grid and High Performance Computing (IJGHPC) 7(2)

DOI: 10.4018/IJGHPC.2015040102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Performance improvement in modern processor is staggering due to power wall and memory wall problem. In general, the power wall problem is addressed by various vectorization design techniques. The Memory wall problem is diminished through prefetching technique. In this paper vectorization is achieved through Single Instruction Multiple Data (SIMD) registers of the current processor. It provides architecture optimization by reducing the number of instructions in the pipeline and by minimizing the utilization of multi-level memory hierarchy. These registers provide an economical computing platform compared to Graphics Processing Unit (GPU) for compute intensive applications. This paper explores software prefetching via Streaming SIMD extension (SSE) instructions to mitigate the memory wall problem. This work quantifies the effect of vectorization and prefetching in Matrix Vector Multiplication (MVM) kernel with dense and sparse structure. Both Prefetching and Vectorization method reduces the data and instruction cache pressure and thereby improving the cache performance. To show the cache performance improvements in the kernel, the Intel VTune amplifier is used. Finally, experimental results demonstrate a promising performance of matrix kernel by Intel Haswell's processor. However, effective utilization of SIMD registers is a programming challenge to the developers.

Article Preview

Top

Introduction

Uni-processor performance improvement has become flattened due to three wall problems such as Instruction Level Parallelism (ILP), memory wall problem and power problem. Power problem is addressed via improved resource utilization. Currently, general-purpose commercial microprocessors are provided with SIMD vector extensions to minimize the power. SIMD approach utilizes a small amount of extra hardware in the execution units of a core and thereby reducing the power consumption overall (Welch et al., 2012). It is a cost effective method compared to GPU computing (Livesey et al., 2012). (Cebrian et al.,2012) quantifies the effect of parallelization, vectorization, specialization and heterogeneity in increasing the energy efficiency of the new generation processor. They specified that, software developers should prefer vectorization compared to parallelization since it gives better energy efficiency. (Liu et al., 2013) in his paper described that SIMD computing address the cooling challenges and provides high performance computing with a minimum clock speed. To program effectively in SIMD units, SSE instructions are now available with high level programming language. (Mitra et al., 2013) performed a comparative study on NEON SIMD instruction set of ARM processor with the SSE2 instruction set provided for Intel platforms. They also performed a performance study on auto-vectorization and hand tuned vectorization for 5 different benchmarks in ten different hardware processor. They proved that hand tuned vectorization outperforms auto-vectorization in both ARM and Intel Processor. In this paper, we have used SIMD vectorization technique with hand tuned SSE instructions to address the power problem as it has been used in the literature.

Memory wall problem is addressed by the techniques like speculative execution, out of order execution, Multithreading and data prefetching. Prefetching can be performed either by using software or hardware method (Liu et al., 2014), (Karakasis et al., 2009), (Byna et al., 2008). Nowadays, Current processors have a support for hardware prefetchers. They are preferable for applications like DMVM exhibiting regular access patterns. As specified in (Intel Architectures optimization reference manual, 2013) hardware prefetchers of Haswell processors are used to prefetch the data in to the L2 cache. Several studies have examined prefetch strategies for scientific and commercial applications. (Daniel F. Zucker et al., 2000) examined hardware and software cache prefetching techniques for MPEG benchmarks. Software prefetching is used to hide the memory latency problem of applications such as SpMV and graph algorithms showing irregular memory access patterns (Ammenouche and Guojing Cong, 2011). Prefetching via hardware and software means is preferred in the literature to improve the cache performance by reducing the cache miss rate and thereby addresses the memory wall problem.

Performance tools like GNU prof, ATOM, PIN tool and VTune Amplifier are used in the literature to gather the cache related metrics. (Khamparia and SairaBanu, 2013) made an extensive study on performance monitoring tools and used PIN tool to measure the cache metrics for DMVM kernel. In another work (Sairabanu et al., 2013) used PIN tool to gather the cache miss rate for SpMV kernel. This binary instrumentation tool is not capable of gathering kernel level statistics. Routines written for one tool is incompatible to others and also spends significant time in gathering the results (Thiel 2006). To collect data on a specific line of a function and to collect more cache related metrics Intel VTune amplifier is preferred in the literature.(Prakash and Peng 2008) have described the usage of Intel VTune Performance analyzer as a fast and practical tool to characterize the emerging workloads. (Kimball et al., 2014) enumerates the effect of the matrix structure on SpMV performance. They analysed the cache memory performance metrics in SpMV Kernel with R-MAT matrices and (Finite Difference) FD matrices using VTune amplifier. In their paper, they have not concentrated on SIMD vectorization or prefetching technique.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 2 Issues (2023)

Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Exploring Vectorization and Prefetching Techniques on Scientific Kernels and Inferring the Cache Performance Metrics

Abstract

Introduction

Complete Article List