Adaptive Virtual Machine Management in the Cloud: A Performance-Counter-Driven Approach

Adaptive Virtual Machine Management in the Cloud: A Performance-Counter-Driven Approach

Gildo Torres, Chen Liu
DOI: 10.4018/ijssoe.2014040103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The success of cloud computing technologies heavily depends on both the underlying hardware and system software support for virtualization. In this study, we propose to elevate the capability of the hypervisor to monitor and manage co-running virtual machines (VMs) by capturing their dynamic behavior at runtime and adaptively schedule and migrate VMs across cores to minimize contention on system resources hence maximize the system throughput. Implemented at the hypervisor level, our proposed scheme does not require any changes or adjustments to the VMs themselves or the applications running inside them, and minimal changes to the host OS. It also does not require any changes to existing hardware structures. These facts reduce the complexity of our approach and improve portability at the same time. The main intuition behind our approach is that because the host OS schedules entire virtual machines, it loses sight of the processes and threads that are running within the VMs; it only sees the averaged resource demands from the past time slice. In our design, we sought to recreate some of this low level information by using performance counters and simple virtual machine introspection techniques. We implemented an initial prototype on the Kernel Virtual Machine (KVM) and our experimental results show the presented approach is of great potential to improve the overall system throughput in the Cloud environment.
Article Preview
Top

Introduction

Nowadays cloud computing has become pervasive. Common services provided by the cloud include infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS), among others. Cloud computing has begun to transform the way enterprises deploy and manage their infrastructures. It provides the foundation for a truly agile enterprise, so that IT can deliver an infrastructure that is flexible, scalable, and most importantly, economical through efficient resource utilization (Red Hat, 2009).

One of the key enabling technologies for cloud computing is virtualization, which offers users the illusion that their remote machine is running the operating system (OS) of their interest on its own dedicated hardware. But underneath is actually a completely different case, where different OS images (Virtual Machines) from different users may be running on the same physical server simultaneously.

The success of cloud computing technologies heavily depends on both the underlying hardware support for virtualization, such as Intel VT and AMD-V, and the increase in the number of cores contained in modern multi-core multi-threading microprocessors (MMMP). Because a single Virtual Machine (VM) does not normally use all the hardware resources available on MMMPs, multiple VMs can run simultaneously on processors such that the resource utilization on the cloud side can be improved. Aggregately, this means increasing the system throughput in terms of the total number of VMs supported by the cloud and even reducing the energy cost of the cloud infrastructure by consolidating the VMs and turning off the resources (processors) not in usage.

Co-running VMs, however, introduces contention on shared resources in MMMP. VMs may compete for the computation resources if they are sharing the same core; or even if they are running on separate cores, they may be competing for the Last Level Cache (LLC) and memory bandwidth if they are sharing the same die. If not managed carefully, this contention might cause a significant performance degradation of the VMs. This is against the original motivation for running them together, and may violate the service level agreement (SLA) between the customer and the cloud service provider.

Traditionally, load balancing of MMMPs have been under the purview of the OS scheduler. This is still the case in cloud environments that use hosted virtualization such as the Kernel Virtual Machine (KVM). In the case of bare-metal virtualization, the scheduler is implemented as part of the Virtual Machine Monitor (VMM, a.k.a. hypervisor). Regardless of where the scheduler resides, the scheduler tries to evenly balance the workload among existing cores. Normally, these workloads are processes and threads, but in a cloud environment they also include entire virtual machines. (Note that this is the host scheduler. Each virtual machine will run its own guest OS, with its own guest scheduler that manages the guest’s processes and threads.) The hypervisor is unaware of potential contention on processor resources among the concurrent VMs it is managing. On top of that, the VMs (and the processes/threads within them) exhibit different behaviors at different times during their lifetimes, sometimes being computation-intensive, sometimes being memory-intensive, sometimes being I/O intensive, and other times following a mixed behavior. The fundamental challenge is the semantic gap, i.e. the hypervisor is unaware of when and which guest processes and threads are running. In facing this challenge, we propose to elevate the capability of the hypervisor to monitor and manage the co-running VMs by capturing their dynamic behavior at runtime using hardware performance counters. Once a performance profile and model has been obtained and computational phases determined, we then adaptively schedule and migrate VMs across cores according to the predicted phases (as opposed to process and thread boundaries) to minimize the contention on system resources hence maximize the system throughput.

The rest of this paper is organized as follows: a review of related published work is presented in the Related Work section; In the Proposed Scheme section we include a description of the default Virtual Machine Monitor architecture, we introduce the hardware performance counters and some events of interest; within this section also we also introduce our proposed architecture. Next, the Experiment Setup section introduces the hardware and software setups, benchmarks, and workloads created to test the proposed scheme. The experiments conducted and their respective results are presented in the Experimental Results section. Finally, conclusions and future work are drawn in the Conclusion section.

Complete Article List

Search this Journal:
Reset
Volume 13: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 12: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing