Improving the Performance of Hypervisor-Based Fault Tolerance



Similar documents
RELIABLE service plays an important role in

Virtual Machine Synchronization for High Availability Clusters

Remus: : High Availability via Asynchronous Virtual Machine Replication

Enhancing the Performance of Live Migration of Virtual Machine s with WSClock Replacement Algorithm

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

Virtualizare sub Linux: avantaje si pericole. Dragos Manac

Satish Mohan. Head Engineering. AMD Developer Conference, Bangalore

Microsoft Exchange Solutions on VMware

Xen and the Art of. Virtualization. Ian Pratt

In addition to their professional experience, students who attend this training should have technical knowledge in the following areas.

Balancing CPU, Storage

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

IOS110. Virtualization 5/27/2014 1

Performance tuning Xen

Windows Server 2012 R2 Hyper-V: Designing for the Real World

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Understanding Memory Resource Management in VMware vsphere 5.0

Cloud Computing CS

Getting the Most Out of Virtualization of Your Progress OpenEdge Environment. Libor Laubacher Principal Technical Support Engineer 8.10.

WHITE PAPER. AMD-V Nested Paging. AMD-V Nested Paging. Issue Date: July, 2008 Revision: 1.0. Advanced Micro Devices, Inc.

Xen Live Migration. Networks and Distributed Systems Seminar, 24 April Matúš Harvan Xen Live Migration 1

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Virtualization. Types of Interfaces

Server Virtualization with Windows Server Hyper-V and System Center

Distributed Systems. Virtualization. Paul Krzyzanowski

HRG Assessment: Stratus everrun Enterprise

Best Practice of Server Virtualization Using Qsan SAN Storage System. F300Q / F400Q / F600Q Series P300Q / P400Q / P500Q / P600Q Series

How To Use Vsphere On Windows Server 2012 (Vsphere) Vsphervisor Vsphereserver Vspheer51 (Vse) Vse.Org (Vserve) Vspehere 5.1 (V

Virtualization for Cloud Computing

Box Leangsuksun+ * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston,

HY345 Operating Systems

Nested Virtualization

Cisco Application-Centric Infrastructure (ACI) and Linux Containers

13.1 Backup virtual machines running on VMware ESXi / ESX Server

COM 444 Cloud Computing

A hypervisor approach with real-time support to the MIPS M5150 processor

Vmware Training. Introduction

Bosch Video Management System High Availability with Hyper-V

Enhancing TCP Throughput of Highly Available Virtual Machines via Speculative Communication

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Virtualizing a Virtual Machine

HPC performance applications on Virtual Clusters

SAP Solutions on VMware Infrastructure 3: Customer Implementation - Technical Case Study

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Welcome to the IBM Education Assistant module for Tivoli Storage Manager version 6.2 Hyper-V backups. hyper_v_backups.ppt.

VMware vsphere 5.1 Advanced Administration

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Server-Virtualisierung mit Windows Server Hyper-V und System Center MOC 20409

Basics of Virtualisation

2972 Linux Options and Best Practices for Scaleup Virtualization

Virtualization: Benefits & Pitfalls. Matt Liebowitz, Kraft Kennedy Tim Garner, Aderant Mike Lombardi, Vertigrate Sergey Polak, Ropes & Gray LLP

Selective Hardware/Software Memory Virtualization

VMware vsphere 5.0 Boot Camp

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Distributed and Cloud Computing

Comparison of the Three CPU Schedulers in Xen

Chapter 5 Cloud Resource Virtualization

Microsoft s Advantages and Goals for Hyper-V for Server 2016

How To Stop A Malicious Process From Running On A Hypervisor

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Chapter 14 Virtual Machines

Handling Hyper-V. In this series of articles, learn how to manage Hyper-V, from ensuring high availability to upgrading to Windows Server 2012 R2

BridgeWays Management Pack for VMware ESX

Getting Even More Out of OpenEdge in a Virtualized Environment

Scaling in a Hypervisor Environment

White Paper on NETWORK VIRTUALIZATION

Virtualization with Windows

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014

Server virtualization overview

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

Autonomic resource management for the Xen Hypervisor

Virtual machines and operating systems

Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades

HAVmS: Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime

Performance Evaluation of Intel EPT Hardware Assist VMware ESX builds & (internal builds)

Understanding Memory Resource Management in VMware ESX 4.1

Google

Implementing Microsoft Windows Server 2008 Hyper-V Release Candidate 1 on HP ProLiant servers

HBA Virtualization Technologies for Windows OS Environments

Technical Paper. Leveraging VMware Software to Provide Failover Protection for the Platform for SAS Business Analytics April 2011

Google. Iustin Pop, <iustin@google.com> Google Switzerland. Sponsored by:

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

SCSI support on Xen. MATSUMOTO Hitoshi Fujitsu Ltd.

Windows Server 2012 Hyper-V Virtual Switch Extension Software UNIVERGE PF1000 Overview. IT Network Global Solutions Division UNIVERGE Support Center

Introduction to Virtual Datacenter

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

Microsoft SMB File Sharing Best Practices Guide

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

VIRTUALIZATION FOR PROCESS AUTOMATION SYSTEMS

Managing Microsoft Hyper-V Server 2008 R2 with HP Insight Management

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

COMP 7970 Storage Systems

COS 318: Operating Systems. Virtual Machine Monitors

9/26/2011. What is Virtualization? What are the different types of virtualization.

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Transcription:

Improving the Performance of Hypervisor-Based Fault Tolerance Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen Xiao, Xiaoming Li School of Electronics Engineering and Computer Science Peking University, China Tuesday - 20 April 2010

Background System failures exist inevitably, e.g. power loss, OS halting. Expensive to mask failures transparently. Redundant software or redundant hardware component e.g. HP non-stop server Laborious effort on application development. Applications based on fault tolerant protocols e.g. Google Chubby based on Paxos

Hypervisor-Based Fault Tolerance Domain0 Domain0 Apps DomainU Guest OS Primary VM Backup VM DomainU Hypervisor Primary Host Hypervisor Backup Host The primary VM and the backup VM reside in different physical hosts.

Hypervisor-Based Fault Tolerance Domain0 Domain0 Apps DomainU Guest OS Primary VM Backup VM DomainU Hypervisor Primary Host Hypervisor Backup Host The primary VM and the backup VM reside in different physical hosts. The execution of the primary VM is divided into epochs (e.g. 20msec).

Hypervisor-Based Fault Tolerance Domain0 Domain0 Apps DomainU Guest OS Primary VM Backup VM DomainU Hypervisor Primary Host Hypervisor Backup Host The primary VM and the backup VM reside in different physical hosts. The execution of the primary VM is divided into epochs (e.g. 20msec). The hypervisor records modified pages in each epoch. (Expensive)

Hypervisor-Based Fault Tolerance Domain0 Domain0 Apps DomainU Guest OS Primary VM Backup VM DomainU Hypervisor Primary Host Hypervisor Backup Host The primary VM and the backup VM reside in different physical hosts. The execution of the primary VM is divided into epochs (e.g. 20msec). The hypervisor records modified pages in each epoch. (Expensive) Modified pages are mapped into Domain0 s address space. (Expensive)

Hypervisor-Based Fault Tolerance Domain0 Domain0 Apps DomainU Guest OS Checkpoint Primary VM Backup VM DomainU Hypervisor Primary Host Hypervisor Backup Host The primary VM and the backup VM reside in different physical hosts. The execution of the primary VM is divided into epochs (e.g. 20msec). The hypervisor records modified pages in each epoch. (Expensive) Modified pages are mapped into Domain0 s address space. (Expensive) Checkpoint is sent to the backup VM.

Objective Efficient memory tracking mechanism Read fault reduction Write fault prediction Efficient memory mapping machanism Software superpage

Current Approach Read Fault Reduction Write Fault Predicti Outline 1 Introduction 2 Memory Tracking Current Approach Read Fault Reduction Write Fault Prediction 3 Memory Mapping 4 Evaluation

Current Approach Read Fault Reduction Write Fault Predicti Shadow Page Table (SPT) Guest OS: GPT Hypervisor: P2M Virtual Address Pseudo-Physical Address Machine Address OS: Page Table Virtual Address Machine Address Virtualization: virtual, pseudo-physical and machine address space; Traditional OS: virtual and machine address space.

Current Approach Read Fault Reduction Write Fault Predicti Shadow Page Table (SPT) Guest OS: GPT Hypervisor: P2M Virtual Address Pseudo-Physical Address Machine Address Hypervisor: SPT Virtual Address Machine Address Virtualization: virtual, pseudo-physical and machine address space; Traditional OS: virtual and machine address space. SPT is created on demand according to the guest page table (GPT) and pseudo-physical to machine table (P2M).

Current Approach Read Fault Reduction Write Fault Predicti Current Memory Tracking Approach Guest OS: GPT Hypervisor: P2M Virtual Address Pseudo-Physical Address Machine Address Hypervisor: SPT Virtual Address Machine Address Developed for live migration, it is not suitable for frequent checkpointing in hypervisor-based fault tolerance.

Current Approach Read Fault Reduction Write Fault Predicti Current Memory Tracking Approach Guest OS: GPT Hypervisor: P2M Virtual Address Pseudo-Physical Address Machine Address Virtual Address Machine Address Developed for live migration, it is not suitable for frequent checkpointing in hypervisor-based fault tolerance. At the beginning of each epoch, all SPTs are destroyed.

Current Approach Read Fault Reduction Write Fault Predicti Current Memory Tracking Approach Guest OS: GPT Hypervisor: P2M Virtual Address Pseudo-Physical Address Machine Address Hypervisor: SPT Virtual Address Machine Address Developed for live migration, it is not suitable for frequent checkpointing in hypervisor-based fault tolerance. At the beginning of each epoch, all SPTs are destroyed. During the epoch, any memory access induces a page fault and write accesses can be identified.

Current Approach Read Fault Reduction Write Fault Predicti Observation: Shadow Entry Reuse 100 90 Shadow Entry Accesses(%) 80 70 60 50 40 30 20 CINT CFP 10 SPECjbb SPECweb 0 0 10 20 30 40 50 Unique Shadow Entries(%) Shadow Entry Reuse: If a shadow entry is accessed in an epoch, it will likely be accessed in future epochs. Reuse Degree: The percentage of unique shadow entries required to account for a given percentage of page accesses.

Current Approach Read Fault Reduction Write Fault Predicti Our Approach: Scanning Shadow Entries s c a n 1 0 0 1 Marker s c a n Shadow Page Table Our approach: Preserve SPTs and revoke their write permission. During the epoch, the marker records which parts have dirty pages. At the end of the epoch, shadow entries are selectively checked based on the marker. During checking, we record the dirty pages and revoke write permission for the next epoch.

Current Approach Read Fault Reduction Write Fault Predicti Observation: Spatial Locality Shadow Page Table Stride1 = 4 Stride2 = 2 Stride3 = 2 Percentage of Shadow Entries(%) 100 80 60 40 20 0 [1,1] [2,2] [3,4] [5,16] [17,256] [257,512] CINT CFP SPECjbb SPECweb Workloads Stride: Consecutive shadow entries with rw permission in the same epoch. Ave stride: Average length of strides for each SPT. Spatial Locality: The shadow entries with rw set are inclined to cluster together.

Current Approach Read Fault Reduction Write Fault Predicti Observation: History Similarity Shadow Page Table Stride1 = 4 Stride2 = 2 Stride3 = 2 Cumulative Fraction (%) 100 95 90 85 80 75 70 65 CINT CFP SPECjbb SPECweb 60 0 20 40 60 80 100 120 140 Delta Stride Stride: Consecutive shadow entries with rw set in the same epoch. Ave stride: Average length of strides for each SPT. History Similarity: For a particular SPT, the behavior of spatial locality is inclined to be similar among adjacent epochs.

Current Approach Read Fault Reduction Write Fault Predicti Our Approach: Predicting Write Accesses When a write fault occurs, the adjacent shadow entries will likely be referenced for write accesses. (Spatial Locality) How many shadow entries are predicted is decided by the historical ave strides. (History Similarity) his stride = his stride α+ave stride (1 α) 1/3 his stride backwards and his stride afterwards shadow entries are predicted (heuristically). When a shadow entry is predicted, we set rw in advance with Prediction bit tagged. At the end of each epoch, we rectify wrong predictions by checking Prediction and Dirty bits.

Outline 1 Introduction 2 Memory Tracking Current Approach Read Fault Reduction Write Fault Prediction 3 Memory Mapping 4 Evaluation

Observation: Memory Mapping is Expensive At the end of each epoch, modified pages are mapped into Domain0 s address space. Memory mapping and umapping are expensive, resulting in the primary VM being stalled too long. Observation: Because of locality, the mappings can be reused, without mapping and umapping frequently.

Our Approach: Software Superpage Normal L2 Entry page Virtual Address Space of Dom0 L2 Entry for Pseudo page page page page page page }Primary VM Pages L2 Page Table Persistent Mapping L1 Page Tables L1 page tables are allocated to point to the primary VM s entire memory pages, but limited L2 page table entries. L1 page tables are installed into these limited L2 page table entries on demand. LRU algorithm is employed to decide which L1 page tables are actually pointed to.

Outline 1 Introduction 2 Memory Tracking Current Approach Read Fault Reduction Write Fault Prediction 3 Memory Mapping 4 Evaluation

Evaluation Setup Two HP ProLiant DL180 servers with 12G memory and two quad-core CPUs, connected via switched Gigabit Ethernet. The primary VM: 2G memory and one vcpu. Workloads: SPEC Int, SPEC Fp, SPEC Jbb and SPEC Web. Epoch length: 20 msec as default. Improving two sources of overhead. Memory tracking mechanism: read fault reduction and write fault prediction. Memory mapping mechanism: software superpage.

Memory Tracking: Performance Improvement Normalized Performance(%) 100 90 80 70 60 50 40 30 20 10 OriginXen RFRXen WFPXen 0 CINT CFP SPECjbb SPECweb

Memory Tracking: Page Faults Reduction Count of Shadow Page Fault(x100) 70 60 50 40 0 0 10 0 10 0 Read Fault Write Fault 30 CINT CFP SPECjbb SPECweb From left to right, current approach, our read fault reduction and write fault prediction.

Software Superpage: Mapping Hit Ratio. We allocate a fixed 64M virtual address space in Dom0 to map all the memory pages of the primary VM (2G). Table: Software superpage mapping hit ratio. Workload CINT CFP SPECjbb SPECweb Hit Ratio 97.27% 97.25% 97.80% 79.45%

Overall Improvement Normalized Performace (%) 80 70 60 50 40 30 20 Not Optimized LogDirty Optimized LogDirty Map Optimized 10 0 CINT CFP SPECjbb SPECweb

Conclusion Memory tracking and memory mapping are two sources of overhead in hypervisor-based fault tolerance. Read fault reduction eliminates those unnecessary page faults from read accesses. Write fault prediction reduces page faults by predicting the pages that will likely be modified. Software superpage improves memory mapping with limited virtual address space.

Future Work We will evaluate our experiments with different epochs and different number of vcpus. Port our approach to nested page table (another memory virtualization mechanism). Improve our source code and contribute it to Xen open source project.

Q & A Thank You!