Red Hat Enterprise Virtualization Performance Mark Wagner Senior Principal Engineer, Red Hat June 13, 2013
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
ENTERPRISE VIRTUALIZATION FROM THE PEOPLE WHO BROUGHT YOU RED HAT ENTERPRISE LINUX COMPLETE DATACENTER VIRTUALIZATION SOLUTION Leading performance: Top virtualization benchmarks for performance and scalability Affordable: Lower TCO and higher ROI than competitive platforms Enterprise-ready: Powerful mix of enterprise features and a rich set of partners Open: Offers choice and interoperability with no proprietary lock-in Cross-platform: Optimized for Microsoft Windows and Linux guests
RED HAT ENTERPRISE VIRTUALIZATION ARCHITECTURE
RED HAT ENTERPRISE VIRTUALIZATION HYPERVISOR/KVM OVERVIEW SMALL-FORM FACTOR, SCALABLE, HIGH PERFORMANCE HYPERVISOR BASED ON RED HAT ENTERPRISE LINUX Inherits performance, scalability, security and supportability of Red Hat Enterprise Linux Shares Red Hat Enterprise Linux hardware and software ecosystem Host: 160 logical CPU (4,096 theoretical max), 2 TB RAM (64TB theoretical max) Guest: 160 vcpu, 2 TB RAM Supports latest silicon virtualization technology Microsoft certified for Windows guests
INDUSTRY LEADERSHIP: THE ONLY END-TO-END OPEN VIRTUALIZATION INFRASTRUCTURE INDUSTRY LEADERS IN INFRASTRUCTURE, NETWORKING, AND STORAGE ARE BACKING RED HAT ENTERPRISE VIRTUALIZATION
SPECvirt2010: RHEL 6 KVM Post Industry Leading Results > 1 SPECvirt Tile/core > 1 SPECvirt Tile/core Key Enablers: Blue = Disk I/O Green = Network I/O Virtualization Layer and Hardware System Under Test (SUT) Client Hardware http://www.spec.org/virt_sc2010/results/ SR-IOV Huge Pages NUMA Node Binding
SPECvirt2010: Red Hat Owns Industry Leading Results Best SPECvirt_sc2010 Scores by CPU Cores 8-socket 64/80 (As of May 30, 2013) 10,000 SPECvirt_sc2010 score 8,000 7,000 6,000 2-socket 12 2-socket 16 2-socket 20 5,467 4,682 5,000 3,824 4,000 3,000 2,000 8,956 4-socket 40 9,000 2,442 1,221 1,367 1,570 VMware ESX 4.1 HP DL380 G7 (12 Cores, 78 VMs) RHEL 6 (KVM) IBM HS22V (12 Cores, 84 VMs) VMware ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs) 1,878 2,144 VMware ESXi 4.1 HP BL620c G7 (20 Cores, 120 VMs) RHEL 6 (KVM) IBM HX5 w/ MAX5 (20 Cores, 132 VMs) 2,742 1,000 0 RHEV 3.1 HP DL380p gen8 (16 Cores, 150 VMs) VMware ESXi 4.1 HP DL380 G7 (12 Cores, 168 Vms) VMware ESXi 4.1 IBM x3850 X5 (40 Cores, 234 VMs) RHEL 6 (KVM) HP DL580 G7 (40 Cores, 288 VMs) RHEL 6 (KVM) IBM x3850 X5 (64 Cores, 336 VMs) RHEL 6 (KVM) HP DL980 G7 (80 Cores, 552 VMs) System Comparison based on best performing Red Hat and VMware solutions by cpu core count published at www.spec.org as of May 17, 2013. SPEC and the benchmark name SPECvir_sct are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
Features that help with Performance Use these features to help improve Guest Performance Host CPU CPU Pin Hooks Direct LUN Huge Pages Migration Numad MTU
Features that help with Performance Use Host CPU Pros Allows for guest to use hardware feature of CPU Can provide good performance gains Cons Prevents migration
RHEV CPU Pinning
RHEL6.4 Single Large Guest (Parallel OpenMP Benchmark) RHEL6.4 vs RHEL6.4 CPU type Linpack 2-node KVM guest (Intel SandyBridge 8core/16cpu) 350 300 Gflops 250 200 150 100 50 0 1 2 4 8 Linpack Threads (intel nxn @20000) Kvm6.4 Kvm6.4 +cputype 16cpu-baremetal 16
Features that help with Performance CPU Pin Helps keep data cache lines hot Keep host scheduler from moving guests around Improved NUMA locality If you pin correctly...
RHEV CPU Pinning
RHEV CPU Pinning 4 Guests, 2 Hosts Manual Pin Total Transactions / minute Out of the Box 20U 60U User Sets - Scaling 100U
Features that help with Performance A few others Hooks The hook mechanism has been around for a long time Some items move from hook to feature Direct LUN SR-IOV is currently one of the more important hooks
Features that help with Performance A few others Direct LUN Allows you to use directly attached storage Typically higher performance
Features that help with Performance Standard HugePages 2MB Reserve/free via /sys/devices/node/* /hugepages/*/nrhugepages TLB Used via hugetlbfs GB Hugepages 1GB /proc/sys/vm/nr_hugepages Reserved at boot time/no freeing Used via hugetlbfs 128 data 128 instruction Physical Memory Transparent HugePages 2MB On by default via boot args or /sys Used for anonymous memory Virtual Address Space
RHEV Huge Pages in Guest Impact of Huge Pages in Guest ~ 10-15% improvement with huge pages Huge Pages Total Transactions / minute Regular 20U 60U User sets - scaling 100U
Features that help with Performance Migration support Under the Cluster -> Policy settings Can set duration and CPU load thresholds Moves VM when limits are hit Useful for Maintenance Power savings Load balancing
Migration for Power Savings
Migration for Performance
Tuning for Migration Live Migration Without Tuning Note due to high load did not finish TPM-RR TPM LM 32 700000 600000 TPM 500000 400000 300000 200000 100000 0 Time
Tuning for Migration Check vdsm defaults /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample # Maximum bandwidth for migration, in MiBps, 0 means libvirt's # default, since 0.10.x default in libvirt is unlimited # migration_max_bandwidth = 32 Edit /etc/vdsm/vdsm.conf Verify parameters are in the correct section Restart vdsm daemon for the changes to take effect service vdsmd restart
Tuning for Migration Impact of Tuning on Live migration Completes in approximately 1 minute TPM-RR TPM LM 32 700000 600000 TPM 500000 400000 300000 200000 100000 0 Time TPM UL
RHEV Migration for Even Distribution Host policy was set to 51%. Guest migration started automatically, resulting in overall higher performance as both hosts were utilized. The single guest migration completed in approximately one minute. Auto Migration Transactions / minute Transactions per Minute W/O Migration Guest 1 Guest 2 Guest 3 Guest 4 Agg of 4 Guest
Four NUMA node system, fully-connected topology Core 0 Core 2 Node 0 Node 1 Node 0 RAM Node 1 RAM L3 Cache Core 1 Core 0 Core 3 Core 2 L3 Cache Core 1 Core 3 QPI links, IO, etc. QPI links, IO, etc. Node 2 Node 3 Node 2 RAM Node 3 RAM Core 0 Core 2 L3 Cache Core 1 Core 0 Core 3 Core 2 QPI links, IO, etc. L3 Cache Core 1 Core 3 QPI links, IO, etc.
Sample remote access latencies 4 socket / 4 node: 1.5x 4 socket / 8 node: 2.7x 8 socket / 8 node: 2.8x 32 node system: 5.5x (30/32 inter-node latencies >= 4x) 10 13 40 48 55 ( 32/1024: 3.1%) ( 32/1024: 3.1%) ( 64/1024: 6.2%) (448/1024: 43.8%) (448/1024: 43.8%)
So, what's the NUMA problem? The Linux system scheduler is very good at maintaining responsiveness and optimizing for CPU utilization Tries to use idle CPUs, regardless of where process memory is located... Using remote memory degrades performance! Red Hat is working with the upstream community to increase NUMA awareness of the scheduler and to implement automatic NUMA balancing. Remote memory latency matters most for longrunning, significant processes, e.g., HPTC, VMs, etc.
numad can help improve NUMA performance New RHEL6.4 user-level daemon to automatically improve out of the box NUMA system performance, and to balance NUMA usage in dynamic workload environments Was tech-preview in RHEL6.3 Not enabled by default See numad(8)
numad aligns process memory and CPU threads within nodes After numad Before numad Node 0 Node 1 Node 2 Node 3 Node 0 Node 1 Node 2 Node 3 Process 37 Process 29 Process 19 Process 61 Proc 29 Proc 19 Proc 61 Proc 37
RHEV hand tuning vs numad numad gives same performance improvements as manual pinning and also allows migration manual pin numad Total Transactions / minute untuned 20U 60U User Sets - Scaling 100U
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
Tuning Tuning for both the hypervisor and guest Already covered vdsm tuned Kernel MTU
tuned Profile Comparison Matrix Tunable default enterprisestorage virtualhost virtualguest kernel.sched_min_ granularity_ns 4ms 10ms 10ms 10ms 10ms kernel.sched_wakeup_ 4ms granularity_ns 15ms 15ms 15ms 15ms 40% 10% 40% 40% vm.dirty_ratio 20% RAM vm.dirty_background_ra 10% RAM tio vm.swappiness 10 30 I/O Scheduler (Elevator) CFQ deadline deadline deadline Filesystem Barriers On Off Off CPU Governor performance Disk Read-ahead throughputperformance 5% 60 ondemand latencyperformance deadline deadline performance performance Off 4x Disable THP Yes Disable C-States Yes https://access.redhat.com/site/solutions/369093
Load Balancing RHEL scheduler tries to keep all CPUs busy by moving tasks form overloaded CPUs to idle CPUs You detect using perf stat, look for excessive migrations Issues on larger systems where the scheduler is a bit too active Can tune sched_migration_cost to help calm the scheduler down This is especially effective on multi-socket systems
Load Balancing /proc/sys/kernel/sched_migration_cost Amount of time after the last execution that a task is considered to be cache hot in migration decisions. A hot task is less likely to be migrated, so increasing this variable reduces task migrations. The default value is 500000 (ns). If the CPU idle time is higher than expected when there are runnable processes, try reducing this value. If tasks bounce between CPUs or nodes too often, try increasing it. Rule of thumb increase by 2-10x to reduce load balancing Increase by 10x on large systems when many CGROUPs are actively used (ex: RHEV/ KVM/RHOS)
sched_migration_cost RHEL6.3 Effect of sched_migration_cost on fork/exit Intel Westmere EP 24cpu/12core, 24 GB mem usec/call default 500us usec/call tuned 4ms percent improvement 140.00% 250.00 120.00% 100.00% 150.00 80.00% 60.00% 100.00 40.00% 50.00 20.00% 0.00 0.00% exit_10 exit_100 exit_1000 fork_10 fork_100 fork_1000 Percent usec/call 200.00
MTU Improved interface allows for setting MTU On faster networks this can be a big win Of course it depends on the data patterns Assumes switch is set correctly
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
RHEV + RHS Integration has been underway Scale testing single volume over 8 RHS servers Not necessarily what we would recommend 1024 guests with RHEV 2048 guests with KVM Another internal group had 2250 guests with RHEV 512 guests all driving IO Sum of guest memory sized to fit in hosts memory No swapping
Software layers in Virtual Block Storage Host (hypervisor) qemu-kvm qemu-kvm process Guest Guest (VM) ext4 /mnt/test LVM volume /dev/vg_guest/test virtio-block /dev/vda /mnt/your-gluster-volume/guest-image-pathname Kernel - FUSE glusterfs network...
Scaling RHEV / KVM / RHS 128 VMs all performed I/O simultaneously RHEL 6.3z RHEV 3.1 RHEL 6.2z Gluster 3.3 RHS 2.0U4 RHEV Host A
RHEV / KVM / RHS Tuning gluster volume set <volume> group virt RHS server: tuned-adm profile rhs-virtualization KVM host: tuned-adm profile virtual-host ideally separate gluster volumes for app. files, disk images For better response time shrink guest block device queue /sys/block/vda/queue/nr_request ( 8 ) for best sequential read throughput, raise VM readahead /sys/block/vda/queue/read_ahead_kb ( 2048 )
Impact of Tuning Gluster and Kernel Alone Effect of Tuning on Large File Virtio-Block I/O 2 replicas, 8 servers, 16 hosts, 128 VMs, 32G per Server, 64K recsz untuned tuned 6000 Throughput in MB per Sec 5000 4000 3000 2000 1000 0 Random Write Random Read Sequential Write Sequential Read
For sequential I/O, RHEV host utilizes 10 GbE net Only 1 RHEV host, 8 RHS servers, 2-replica volume, 1 thread per VM, 16 GB files, 4 KB transfer size VM sequential read throughput 600 1200 500 1000 transfer rate (MB/s) transfer rate (MB/s) VM sequential write throughput 400 300 200 1 VM 600 128 KB 2048 KB 400 Red Hat recommends 200 100 0 800 Guest readahead 2 VMs 4 VMs KVM guests/host 8 VMs 0 1 VM 2 VMs 4 VMs KVM guests/host 8 VMs
RHEV/RHS Scales as Hardware is Added Scaling Sequential I/O of 128 VMs Scaling Random IOPS with 128 guests One host per gluster server, virtual-block 64-KB transfer size, one thread/guest 1 RHS server/rhev host, 64-KB transfer size, 1 thread/guest random-write seq-read 4000 7000 3500 6000 3000 5000 2500 4000 2000 3000 1500 2000 1000 1000 500 0 0 0 1 2 3 4 5 6 RHS servers 7 8 9 random-read 30000 25000 throughput (IOPS) 8000 write MB/s (2-replica) read MB/s seq-write 20000 15000 10000 5000 0 0 1 2 3 4 5 RHS servers 6 7 8 9
For More Information RHEV/RHS with 128 guests https://access.redhat.com/site/articles/393123 RHEV/RHS single-host performance https://access.redhat.com/site/articles/313973
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
Migrating to RHEV Several detailed Reference Architecture papers on this Red Hat customer portal https://access.redhat.com Requires user account Scripts and configuration files provided
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions
Reference Architectures Two places to get Red Hat reference architectures Red Hat resource library www.redhat.com Free Red Hat customer portal https://access.redhat.com Requires user account Scripts and configuration files provided RHEV/RHS with 128 guests https://access.redhat.com/site/articles/393123s RHEV/RHS single-host performance https://access.redhat.com/site/articles/313973
06/12 Sessions Time Title 10:40 AM 11:40 AM Introduction to Red Hat OpenStack 2:30 PM - 3:30 PM Introduction & Overview of OpenStack for IaaS Clouds 3:40 PM - 4:40 PM Red Hat IaaS Overview & Roadmap 3:40 PM - 4:40 PM Integration of Storage,OpenStack & Virtualization
06/13 Sessions Time Title 10:40 AM 11:40 AM 2:30 PM - 3:30 PM KVM Hypervisor Roadmap & Technology Update 3:40 PM - 4:40 PM War Stories from the Cloud: Lessons from US Defense Agencies Migrating 1,000 VMs from VMware to Red Hat Enterprise Virtualization: A Case Study 4:50 PM - 5:50 PM Red Hat Virtualization Deep Dive 4:50 PM - 5:50 PM Red Hat Enterprise Virtualization Performance Real world perspectives: Gaining Competitive Advantages with Red Hat Solutions 4:50 PM - 5:50 PM
06/14 Sessions Time Title 11:00 AM - 12:00 PM 9:45 AM - 10:45 PM Network Virtualization & Software-defined Networking Hypervisor Technology Comparison & Migration
Agenda Overview Features that help with Performance Tuning RHEV + RHS Migration to RHEV Wrap Up Questions