Benchmarking Database Performance in a Virtual Environment



Similar documents
Scaling in a Hypervisor Environment

W H I T E P A P E R. Performance and Scalability of Microsoft SQL Server on VMware vsphere 4

Virtualizing Performance-Critical Database Applications in VMware vsphere VMware vsphere 4.0 with ESX 4.0

Full and Para Virtualization

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Performance Evaluation of Intel EPT Hardware Assist VMware ESX builds & (internal builds)

Chapter 14 Virtual Machines

Database Virtualization

Enabling Technologies for Distributed Computing

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Understanding Full Virtualization, Paravirtualization, and Hardware Assist. Introduction...1 Overview of x86 Virtualization...2 CPU Virtualization...

IOS110. Virtualization 5/27/2014 1

Virtualization. Dr. Yingwu Zhu

Enabling Technologies for Distributed and Cloud Computing

Virtualization. Types of Interfaces

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

COS 318: Operating Systems. Virtual Machine Monitors

Virtualization Technologies

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Performance tuning Xen

Virtualization Technology. Zhiming Shen

Enterprise-Class Virtualization with Open Source Technologies

Virtualization. Jia Rao Assistant Professor in CS

Virtualizing a Virtual Machine

Introduction to Virtual Machines

Virtualization and the U2 Databases

Cloud Computing CS

Virtualization in a Carrier Grade Environment

Monitoring Databases on VMware

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies. Virtualization of Clusters and Data Centers

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Virtualization. Pradipta De

Virtualization. Michael Tsai 2015/06/08

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

COM 444 Cloud Computing

Balancing CPU, Storage

Virtualization for Cloud Computing

Chapter 5 Cloud Resource Virtualization

Microsoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1

Distributed Systems. Virtualization. Paul Krzyzanowski

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

Virtual Machines. COMP 3361: Operating Systems I Winter

Hybrid Virtualization The Next Generation of XenLinux

Architecture of the Kernel-based Virtual Machine (KVM)

Basics of Virtualisation

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

WHITE PAPER. AMD-V Nested Paging. AMD-V Nested Paging. Issue Date: July, 2008 Revision: 1.0. Advanced Micro Devices, Inc.

Networking for Caribbean Development

Virtualization Performance Insights from TPC-VMS

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

kvm: Kernel-based Virtual Machine for Linux

Data Centers and Cloud Computing

Virtualization. Jukka K. Nurminen

Lab Validation Report

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

The Xen of Virtualization

Performance Analysis of Large Receive Offload in a Xen Virtualized System

KVM Virtualized I/O Performance

SQL Server Virtualization

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Distributed and Cloud Computing

Virtualization. Explain how today s virtualization movement is actually a reinvention

FOR SERVERS 2.2: FEATURE matrix

12. Introduction to Virtual Machines

Intro to Virtualization

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

Version 3.7 Technical Whitepaper

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

A quantitative comparison between xen and kvm

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Virtual machines and operating systems

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK

Maximizing SQL Server Virtualization Performance

Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way. Ashish C. Morzaria, SAP

App App App App App App App App. VMware vcenter Suite. VMware vsphere 4. Availability Security Scalablity. vshield Zones VMSafe

The best platform for building cloud infrastructures. Ralf von Gunten Sr. Systems Engineer VMware

Anh Quach, Matthew Rajman, Bienvenido Rodriguez, Brian Rodriguez, Michael Roefs, Ahmed Shaikh

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

9/26/2011. What is Virtualization? What are the different types of virtualization.

VirtualclientTechnology 2011 July

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Nested Virtualization

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

WHITE PAPER Optimizing Virtual Platform Disk Performance

2972 Linux Options and Best Practices for Scaleup Virtualization

Understanding Data Locality in VMware Virtual SAN

Understanding Memory Resource Management in VMware vsphere 5.0

Beyond the Hypervisor

Hypervisor Competitive Differences: Beyond the Data Sheet. Chris Wolf Senior Analyst, Burton Group

Virtualization VMware Inc. All rights reserved

Transcription:

Benchmarking Database Performance in a Virtual Environment Sharada Bose, HP sharada_bose@hp.com Priti Mishra, Priya Sethuraman, Reza Taheri, VMWare, Inc. {pmishra, psethuraman, rtaheri}@vmware.com

Agenda/Topics Introduction to virtualization Performance experiments with benchmark derived from TPC-C Performance experiments with benchmark derived from TPC-E Case for a new TPC benchmark for virtual environments

Variety of virtualization technologies IBM Sun HP System Z/VM and IBM PowerVM on the Power Systems X/VM and Zones HP VM On the X86 processors Xen and XenServer Microsoft Hyper-V KVM VMware ESX Oldest (2001) and largest market share Where I work! So, focus of this talk

Why virtualize? Server consolidation The vast majority of server are grossly underutilized Reduces both CapEx and OpEx Migration of VMs (both storage and CPU/memory) Enables live load balancing Facilitates maintenance High availability Allows a small number of generic servers to back up all servers Fault tolerance Lock-step execution of two VMs Cloud computing! Utility computing was finally enabled by Ability to consolidate many VMs on a server Ability to live migrate VMs in reaction to workload change

How busy are typical servers? Results of our experiment: 8.8K DBMS transactions/second 60K disk IOPS Typical Oracle 4-core installation: 100 transactions/second 1200 IOPSLock-step execution of two VMs

Hypervisor Architectures Virtual Machine Drivers Virtual Machine Drivers Dom0 General (Linux) Purpose or OS Parent VM (Windows) Virtual Machine Drivers Virtual Machine Drivers Virtual Machine Drivers Drivers Dom0 or Parent Xen/Viridian Partition Model Vmware ESX Drivers Xen and Hyper-V Very Small Hypervisor General purpose OS in parent partition for I/O and management All I/O driver traffic going thru parent OS ESX Server Small Hypervisor < 24 mb Specialized Virtualization Kernel Direct driver model Management VMs Remote CLI, CIM, VI API

Binary Translation of Guest Code Translate guest kernel code Replace privileged instrs with safe equivalent instruction sequences No need for traps BT is an extremely powerful technology Permits any unmodified x86 OS to run in a VM Can virtualize any instruction set

BT Mechanics Guest input basic block translator translated basic block Each translator invocation Consume one input basic block (guest code) Produce one output basic block Store output in translation cache Future reuse Amortize translation costs Guest-transparent: no patching in place Translation cache

Virtualization Hardware Assist More recent CPUs have features to reduce some of the overhead at the monitor level Guest Examples are Intel VT and AMD-V Hardware-assist doesn t remove all virtualization overheads: scheduling, memory management and I/O are still virtualized with a software layer VMkernel Scheduler Memory Allocator Monitor Virtual NIC Virtual Switch Virtual SCSI File System The Binary Translation monitor is faster than hardware-assist for many workloads NIC Drivers I/O Drivers VMware ESX takes advantage of these features. Physical Hardware

Performance of a VT-x/AMD-V Based VMM VMM only intervenes to handle exits Same performance equation as classical trap-andemulate: overhead = exit frequency * average exit cost VMCB/VMCS can avoid simple exits (e.g., enable/disable interrupts), but many exits remain Page table updates Context switches In/out Interrupts

Qualitative Comparison of BT and VT-x/AMD-V BT loses on: system calls translator overheads path lengthening indirect control flow BT wins on: page table updates (adaptation) memory-mapped I/O (adapt.) IN/OUT instructions no traps for priv. instructions VT-x/AMD-V loses on: exits (costlier than callouts ) no adaptation (cannot elim. exits) page table updates memory-mapped I/O IN/OUT instructions VT-x/AMD-V wins on: system calls almost all code runs directly

VMexit Latencies are getting lower VMexit performance is critical to hardware assist-based virtualization In additional to generational performance improvements, Intel is improving VMexit latencies

Virtual Memory (ctd) Process 1 Process 2 0 4GB 0 4GB Virtual Memory VA Physical Memory PA Applications see contiguous virtual address space, not physical memory OS defines VA -> PA mapping Usually at 4 KB granularity Mappings are stored in page tables HW memory management unit (MMU) Page table walker TLB (translation look-aside buffer) VA TLB PA %cr3 TLB fill hardware VA PA mapping...

Virtualizing Virtual Memory Shadow Page Tables VM 1 VM 2 Process 1 Process 2 Process 1 Process 2 Virtual Memory VA Physical Memory PA Machine Memory MA VMM builds shadow page tables to accelerate the mappings Shadow directly maps VA -> MA Can avoid doing two levels of translation on every access TLB caches VA->MA mapping Leverage hardware walker for TLB fills (walking shadows) When guest changes VA -> PA, the VMM updates shadow page tables

2 nd Generation Hardware Assist Nested/Extended Page Tables VA PA mapping TLB VA MA Guest PT ptr... TLB fill hardware guest VMM Nested PT ptr PA MA mapping

Analysis of NPT MMU composes VA->PA and PA->MA mappings on the fly at TLB fill time Benefits Significant reduction in exit frequency No trace faults (primary page table modifications as fast as native) Page faults require no exits Context switches require no exits No shadow page table memory overhead Better scalability to wider vsmp Costs Aligns with multi-core: performance through parallelism More expensive TLB misses: O(n 2 ) cost for page table walk, where n is the depth of the page table tree

CPU and Memory Paravirtualization Paravirtualization extends the guest to allow direct interaction with the underlying hypervisor Guest TCP/IP File System Paravirtualization reduces the monitor cost including memory and System call operations. Monitor Monitor Virtual NIC Virtual SCSI Gains from paravirtualization are workload specific VMkernel Scheduler Memory Allocator Virtual Switch File System Hardware virtualization mitigates the need for some of the paravirtualization calls NIC Drivers I/O Drivers VMware approach: VMI and paravirt-ops Physical Hardware

Device Paravirtualization Device Paravirtualization places A high performance virtualization- Aware device driver into the guest Guest TCP/IP File System vmxnet pvscsi Paravirtualized drivers are more CPU efficient (less CPU overhead for virtualization) Monitor vmxnet pvscsi Paravirtualized drivers can also take advantage of HW features, like partial offload (checksum, large-segment) VMkernel Scheduler Memory Allocator Virtual Switch NIC Drivers File System I/O Drivers VMware ESX uses paravirtualized network and storage drivers Physical Hardware

Paravirtualization For performance Almost everyone uses a paravirt driver for mouse/keyboard/screen and networking For high throughput devices, makes a big difference in performance Enabler Without Binary Translation, the only choice on old processors Xen with Linux guests Not needed with newer processors Xen with Windows guests

Today s visualization benchmarks VMmark Developed by VMware in 2007 De facto industry standard 84 results from 11 vendors SPECvirt Still in development Will likely become the virtualization benchmark But not a DBMS/backend server benchmark vconsolidate Developed by IBM and Intel in 2007 vapus Mark I from Sizing Server Lab vservcon developed for internal use by Fujitsu Siemens Computers

VMmark Aimed at server consolidation market A mix of workloads Tile is a collection of VMs executing a set of diverse workloads Workload Application Virtual Machine Platform Mail server Exchange 2003 Windows 2003, 2 CPU, 1GB RAM, 24GB disk Web server Java server SPECjbb 2005- based Windows 2003, 2 CPU, 1GB RAM, 8GB disk Standby server None Windows 2003,1 CPU, 256MB RAM, 4GB disk SPECweb 2005- based SLES 10, 2 CPU, 512MB RAM, 8GB disk Database server MySQL SLES 10, 2 CPU, 2GB RAM, 10GB disk File server dbench SLES 10, 1 CPU, 256MB RAM, 8GB disk

VMmark client workload drivers Client 0 Mail Client 1 Files Mail Files Web Mail OLTP Database Files Web Client 2 OLTP Database Web ESX Java Order Entry 18 VMs Java Order Entry OLTP Database Java Order Entry Three Tiles

VMmark is the de-facto Virtualization Benchmark Number of VMmark Submissions 90 80 vsphere 4 Cumulative Number 70 60 50 40 30 20 10 VI 3.5.x 0 Q3 2007 Q4 2007 Q1 2008 Q2 2008 Q3 2008 Q4 2008 Q1 2009 Q2 2009 Q3 2009 (as of 8/4)

So why do we need a new benchmark? Most virtual benchmarks today cover consolidation of diverse workloads None are aimed at transaction processing or decision support applications, the traditional areas addressed by TPC benchmarks. The new frontier is virtualization of resource-intensive workloads, including those which are distributed across multiple physical servers. None of the existing virtual benchmarks available today measure the database-centric properties that have made TPC benchmarks the industry standard that they are today.

But is virtualization ready for a TPC benchmark? The accepted industry lore has been that databases are not good candidates for virtualization In the following slides, we will show that benchmarks derived from TPC workloads run extremely well in virtual machines We will show that there exists a natural extension of existing TPC benchmarks into new virtual versions of the benchmarks

Databases: Why Use VMs for databases? Virtualization at hypervisor level provides the best abstraction Each DBA has their own hardened, isolated, managed sandbox Strong Isolation Security Performance/Resources Configuration Fault Isolation Scalable Performance Low-overhead virtual Database performance Efficiently Stack Databases per-host

First benchmarking experiment Workload: Pick a workload that is: A database workload OLTP Heavy duty A workload that everybody knows and understands So we decided on a benchmark that is a fair-use implementation of the TPC-C business model Not compliant TPC-C results. Results cannot be compared to official TPC-C publications

Configuration, Hardware EMC CX3-80, 240 drives 8-way Intel server 1 Gigabit Network switch 4-way Intel client 4Gb/sec Fibre channel switch EMC CX3-40, 30 drives EMC CX3-80, 240 drives

Configuration, Benchmark The workload is borrowed from the TPC-C benchmark; let us call this the Order Entry Benchmark A batch benchmark; there were up to 625 DBMS client processes running on a separate client computer, generating the load 7500 warehouses and a 28GB SGA We were limited by the memory available to us; hence a DB size smaller than the size required for our throughput. With denser DIMMs, we would have used a larger SGA and a larger database Our DBMS size/sga size combination puts the same load on the system as ~17,000 warehouses on a 72GB-system Reasonable database size for the performance levels we are seeing

Disclaimers ACHTUNG!!! All data is based on in-lab results w/ a developmental version of ESX Our benchmarks were fair-use implementations of the TPC-C and TPC-E business models; our results are not TPC-C E compliant results, and not comparable to official TPC-C E results. TPC Benchmark is a trademark of the TPC. Our throughput is not meant to indicate the absolute performance of Oracle and MS SQL Server, or to compare their performance to another DBMSs. Oracle and MS SQL Server were simply used to analyze a virtual environment under a DBMS workload Our goal was to show the relative-to-native performance of VMs, and the ability to handle a heavy database workload, not to measure the absolute performance of the hardware and software components used in the study

Results: Peak The VM throughput was 85% of native throughput Impressive in light of the heavy kernel mode content of the benchmark Results summary for the 8-vcpu VM: Configuration Native VM Throughput in business transactions per minute 293K 250K Disk IOPS 71K 60K Disk Megabytes/second 305 MB/s 258 MB/s Network packets/second 12K/s receive 10K/s receive 19K/s send 17K/s send 25Mb/s receive 21Mb/s receive 66Mb/s send 56Mb/s send Network bandwidth/second

Results: ESX4.0 vs. Native Scaling VM configured with 1, 2, 4, and 8 vcpus In each case, ESX was configured to use the same number of pcpus Each doubling of vcpus results in ~1.9X increase in throughput Relative to 2p-ESX throughput

SQLServer Performance Characteristics Non-comparable implementation of TPC-E Models a brokerage house Complex mix of heavyweight transactions Metric 4VCPU VM Database size 500 GB Disk IOPS 10500 SQLServer buffer cache 52 GB Network Packets/sec 7,500 Network Throughput 50 Mb/s

Hardware configuration for tests on vsphere 4.0 8-way AMD server 1 Gb direct-attach 4-way and 8way Intel clients 4 Gb/s Fiber Channel switch EMC CX3-40, 180 drives

Resource intensive nature of the 8-vCPU VM Metric Physical Machine Virtual Machine Throughput in transactions 3557 per second* 3060 Average response time of all transactions** 255 milliseconds 234 milliseconds Disk I/O throughput (IOPS) 29 K 25.5 K Disk I/O latencies 9 milliseconds 8 milliseconds Network packet rate receive 10 K/s 8.5 K/s 16 K/s 8 K/s Network packet rate send Network bandwidth receive 11.8 Mb/s 10 Mb/s Network bandwidth send 105 Mb/s send 123 Mb/s

SQL Server Scale up performance relative to native At 1 & 2 vcpus, ESX is 92 % of native performance Hypervisor able to effectively offload certain tasks to idle cores. flexibility in making virtual CPU scheduling decisions 4 vcpus, 88% and 8 vcpus 86 % of native performance 36

SQL Server Scale out experiments Throughput increases linearly as we add up to 8vCPUs in four VMs Over-committed, going from 4 to 6 VMs (1.5x), performance rises 1.4x 37

Scale out overcommittment fairness Fair distribution of resources to all eight VMs 38

Benchmarking databases in virtual environments We have shown database are good candidates for virtualization But no formal benchmark Can benchmark a single VM on the server IBM s power series TPC disclosures Need a TPC benchmark to cover the multi-vm case It is what the users are demanding!

Proposal 1 Comprehensive database virtualization benchmark Virtual machine Configuration: System should contain a mix of at least two multi-way CPU configurations, for example an 8-way server result might contain 2x2 vcpu and 1x4 vcpu VMs Measure the cpu overcommitment capabilities in hypervisors by providing an overcommitted result along with a fully committed result. Both results should report throughput of individual VMs. Workloads used Each VM runs homogenous or heterogeneous workloads of a mix of database benchmarks, e.g., TPC-C, TPC-H and TPC-E. Consider running a mix of operating systems and databases.

Proposal 1 Advantages Comprehensive database consolidation benchmark Disadvantages Complex benchmark rules may be too feature-rich for an industry standard workload

Proposal 2 Virtualization extension of an existing database benchmark Virtual Machine configuration: System contains a mix of homogenous VMs, for example an 8-way server might contain 4x2 vcpu VMs The number of vcpus in a VM would be based on the total number of cores and the cores/socket on a given host E.g., an 8-core has to be 4 2-vCPU VMs; a 64-core 8 8-vCPU VMs The benchmark specification would prescribe the number of VMs and number of vcpus in each VM for a given number of cores Workloads used Homogeneous database workload, e.g., TPC-E, in each VM

Proposal 2 Advantages Simple approach provides users with a wealth of information about virtualized environments that they do not have currently The simplicity of the extension makes it possible to develop a new benchmark quickly, which is critical if the benchmark is to gain acceptance Disadvantages Unlike Scenario 1, this approach does not emualte consolidation of diverse workloads Features of virtual environments such as over-commitment not part of the benchmark definition

Proposal 3 Benchmarking multi-tier/multi-phase applications map each step in a workflow (or, each tier in a multi-tier application) to a VM. (For large-scale implementations, mapping may instead be to a set of identical/homogeneous VMs.) From a benchmark design perspective, a challenging exercise with a number of open questions, e.g.: Does the benchmark specify strict boundaries between the tiers? Are the size and number of VMs in each layer parts of the benchmark spec? Does the entire application have to be virtualized? Or, would benchmark sponsors have freedom in choosing the components that are virtualized? This question arises due to the fact that support and licensing restrictions often lead to parts not being virtualized.

Recommendation TPC benchmarks are great, but take a long time to develop Usually well worth the wait But in this case, timing is everything So, go for something simple: an extension of an existing benchmark Proposal #2 fits the bill Not esoteric, is what most users want Can be developed quickly Based on a proven benchmark Yes, it is really that simple!

Conclusions Virtualization is a mature technology in heavy use by customers Databases were the last frontier; we have shown it s been conquered Benchmarking community is behind the curve Badly in need of a TPC benchmark A simple extension of TPC-E is: A natural fit Easy to produce Timely Great price performance!