COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Similar documents
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

The Transition to PCI Express* for Client SSDs

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Desktop Board DG41BI

Intel X38 Express Chipset Memory Technology and Configuration Guide

Intel SSD 520 Series Specification Update

Intel Media SDK Library Distribution and Dispatching Process

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Virtual Machine Synchronization for High Availability Clusters

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com

Intel Desktop Board DG41WV

Hybrid Virtualization The Next Generation of XenLinux

Intel Desktop Board DG31PR

Intel Virtualization Technology (VT) in Converged Application Platforms

Intel Desktop Board D945GCPE Specification Update

Intel Desktop Board DP55WB

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Intel Desktop Board D945GCPE

Intel Desktop Board DG43RK

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

Virtual Switching Without a Hypervisor for a More Secure Cloud

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Cloud based Holdfast Electronic Sports Game Platform

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

Intel Platform and Big Data: Making big data work for you.

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Intel Service Assurance Administrator. Product Overview

Running Windows 8 on top of Android with KVM. 21 October Zhi Wang, Jun Nakajima, Jack Ren

1000Mbps Ethernet Performance Test Report

Intel Desktop Board DG41TY

Accelerating Business Intelligence with Large-Scale System Memory

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

Intel Desktop Board DQ965GF

Achieving Real-Time Performance on a Virtualized Industrial Control Platform

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

VMWARE WHITE PAPER 1

A Superior Hardware Platform for Server Virtualization

Intel Desktop Board DQ43AP

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

Intel Desktop Board DP43BF

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel Desktop Board D945GCZ

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Intel Identity Protection Technology (IPT)

Quest vworkspace Virtual Desktop Extensions for Linux

Solution Recipe: Improve PC Security and Reliability with Intel Virtualization Technology

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Intel Desktop Board DG31GL

I3: Maximizing Packet Capture Performance. Andrew Brown

Accelerating Business Intelligence with Large-Scale System Memory

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide

Remus: : High Availability via Asynchronous Virtual Machine Replication

Different NFV/SDN Solutions for Telecoms and Enterprise Cloud

Intel Desktop Board D945GCL

Enabling Technologies for Distributed Computing

Comparing Multi-Core Processors for Server Virtualization

Leading Virtualization 2.0

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions

Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux

Windows Server 2008 R2 Hyper-V Live Migration

New!! - Higher performance for Windows and UNIX environments

VNF & Performance: A practical approach

RAID and Storage Options Available on Intel Server Boards and Systems

Intel Trusted Platforms Overview

Bandwidth Calculations for SA-1100 Processor LCD Displays

Intel Desktop Board DG965RY

Intel Desktop Board D945GNT

Hetero Streams Library 1.0

Intel Desktop Board DG33TL

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

PCI Express* Ethernet Networking

Accomplish Optimal I/O Performance on SAS 9.3 with

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Intel Desktop Board DQ35JO

Intel Extreme Memory Profile (Intel XMP) DDR3 Technology

Linux NIC and iscsi Performance over 40GbE

Introduction to PCI Express Positioning Information

Streaming and Virtual Hosted Desktop Study

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Enabling Technologies for Distributed and Cloud Computing

Muse Server Sizing. 18 June Document Version Muse

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

Transcription:

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright 2012 Intel Corporation.

Agenda Background COarse-grain LOck-stepping Performance Optimization Evaluation Summary 3

Non-Stop Service with VM Replication Typical Non-stop Service Requires Expensive hardware for redundancy Extensive software customization VM Replication: Cheap Application-agnostic Solution 4

Existing VM Replication Approaches Replication Per Instruction: Lock-stepping Execute in parallel for deterministic instructions Lock and step for un-deterministic instructions Replication Per Epoch: Continuous Checkpoint Secondary VM is synchronized with Primary VM per epoch Output is buffered within an epoch 5

Lock-stepping Problems Excessive replication overhead memory access in an MP-guest is un-deterministic Continuous Checkpoint Extra network latency Excessive VM checkpoint overhead 6

Agenda Background COarse-grain LOck-stepping Performance Optimization Evaluation Summary 7

Why COarse-grain LOck-stepping (COLO) VM Replication is an overly strong condition Why we care about the VM state? The client care about response only Can the control failover without precise VM state replication? Coarse-grain lock-stepping VMs Secondary VM is a replica, as if it can generate same response with primary so far Be able to failover without service stop Non-stop service focus on server response, not internal machine state! 8

How COLO Works Response Model for C/S System & are the request and the execution result of an un-deterministic instruction Each response packet from the equation is a semantics response Successfully failover at k th packet if (C is the packet series the client received) 9

Architecture of COLO COarse-grain LOck-stepping Virtual Machine for Non-stop Service 10

Why Better Comparing with Continuous VM checkpoint No buffering-introduced latency Less checkpoint frequency On demand vs. periodic Comparing with lock-stepping Eliminate excessive overhead of un-deterministic instruction execution due to MP-guest memory access 11

Agenda Background COarse-grain LOck-stepping Performance Optimization Evaluation Summary 12

Performance Challenges Frequency of Checkpoint Highly dependent on the Output Similarity, or Response Similarity Key Focus is TCP packet! Cost of Checkpoint Xen/Remus uses passive-checkpoint Secondary VM is not resumed until failover Slow path COLO implements active-checkpoint Secondary VM resumes frequently 13

Improving Response Similarity Minor Modification to Guest TCP/IP Stack Coarse Grain Time Stamp Highly-deterministic ACK mechanism Coarse Grain Notification Window Size Per-Connection Comparison 14

Number of Packets Time (ms) Packets # Time (ms) Similarities after Optimization Web Server FTP Server Packets # Duration Packets # Duration 16000 600 4000 400 500 12000 400 3000 300 8000 300 2000 200 200 4000 100 1000 100 0 0 1 2 4 8 16 32 0 Number of Threads PUT GET *Run Web Bench in Client For more complete information about performance and benchmark results, visit Performance Test Disclosure 0 15

Reducing the Cost of Active-checkpoint Lazy Device State Update Lazy network interface up/down Lazy event channel up/down Fast Path Communication 16

Spent Time (ms) Checkpoint Cost with Optimizations Leave Net UP 1800 1500 1200 900 Suspend Resume Netif-Up Netif-Down Mem Xmit Leave EventChannel up Replace XenStore Access with Eventchannel 600 300 0 Baseline Lazy Netif Lazy Event Channel Efficient Comm. Final cost: 74ms/checkpoint: (1/3 on page transmission, 2/3 on suspend/resume) For more complete information about performance and benchmark results, visit Performance Test Disclosure 17

Agenda Background COarse-grain LOck-stepping Performance Optimization Evaluation Summary 18

Hardware Configurations Intel Core i7 platform, a 2.8 GHz quad-core processor 2GB RAM Intel 82576 1Gbps NIC * 2 (internal & external) Software Xen 4.1 Domain 0: RHEL5U5 Guest: 32-bit BusyBox 1.20.0, Linux kernel 2.6.32 256MB RAM and uses a ramdisk for storage 19

Bandwidth (Mb/s) Bandwidth (Mb/s) Bandwidth of NetPerf Native Remus-20ms Remus-40ms COLO Native Remus-20ms Remus-40ms COLO 1000 1000 800 800 600 600 400 400 200 200 0 54 1500 64K 0 54 1500 64K Message Size Message Size TCP UDP For more complete information about performance and benchmark results, visit Performance Test Disclosure 20

Time (s) FTP Server 128 64 32 16 8 4 Remus-20ms Remus-40ms COLO Native 2 1 0.5 PUT GET For more complete information about performance and benchmark results, visit Performance Test Disclosure 21

Throughput (Mbps) Web Server - Concurrency Native Remus-20ms Remus-40ms COLO 1000 100 10 1 1 2 4 8 16 32 Threads Run Web Bench in Client For more complete information about performance and benchmark results, visit Performance Test Disclosure 22

Throughput (response/s) Web Server - Throughput Native Remus-20ms Remus-40ms COLO 1200 1000 800 600 400 200 0 100 500 1000 1500 Request / second Run httperf in Client For more complete information about performance and benchmark results, visit Performance Test Disclosure 23

Average Latency (ms) Latency in Netperf/Ping 40 30 20 Native Remus-20ms Remus-40ms COLO 10 0 0.28 0.40 0.28 0.4 0.38 0.55 Netperf-TCP-RR Netperf-UDP-RR Ping For more complete information about performance and benchmark results, visit Performance Test Disclosure 24

Response Latency (ms) Web Server - Latency 1200 Native Remus-20ms Remus-40ms COLO 1000 800 600 400 200 0 100 500 1000 1500 Request/second For more complete information about performance and benchmark results, visit Performance Test Disclosure Run httperf in Client 25

Agenda Background COarse-grain LOck-stepping Performance Optimization Evaluation Summary 26

Summary COLO is an ideal Application-agnostic Solution for Non-stop service Web server: 67% of native performance CPU, memory and netperf: near-native performance Next steps: Merge into Xen More optimizations 27

28