HP Technology Forum & Expo 2009 Produced in cooperation with: 2972 Linux Options and Best Practices for Scaleup Virtualization Thomas Sjolshagen Linux Product Planner June 17 th, 2009 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Virtualization The basics Platform Virtualization A virtual representation of the computer Provides a virtual hardware platform Needs operating system & virtualization Guest OS completely independent from host OS Hypervisor resource manages guests/ domus Management environment Host/dom0 High degree of fault isolation Cooperative or non-cooperative guest instances 2 September 3, 09
Virtualization Basics Platform Virtualization types Full virtualization (or HVM) e.g. Xen, KVM, HP Integrity Virtual Machine Multiple, unmodified, guests. Emulation or HW assist Technically: Any OS Paravirtualization/ Cooperative Virtualization e.g. Xen, PV IO drivers Guest OS are ported to special architecture 3 September 3, 09
Virtualization The basics Operating System Virtualization Virtualization at OS layer Appears to be a discrete OS instance Guest & Host OS cannot differ Can support different minor versions of same version Referred to as containers or zones Strong fault isolation not possible Normally not the goal Applications: Containers appear to be discrete OS 4 September 3, 09
Platform Virtualization Linux hypervisor options 5 September 3, 09
Linux Virtualization A Xen based architecture PV Guests User Apps Hardware Virtual Machine User Apps Hardware VMs User Apps Guest 0 ( dom0 ) Guest 1 Guest 2 Guest n ( domu ) ( domu ) ( domu ) UP or SMP UP or SMP UP or SMP UP or SMP Device driver Backend Driver Frontend driver Frontend driver Xen Hypervisor Hardware Platform PV driver 6 19 March 2007
Oracle VM New server virtualization software and support Free product download from the web Based on Xen 3.x open source software Runs both Linux and Windows domains Supports paravirtualization on all hardware and hardware virtualization (VT-i, VT-x, VT-d or AMD-V) on latest x86 hardware 64-bit and 32-bit guests Up to 64-way SMP Up to 32 virtual processors per guest Includes live migration at no additional cost Integrated, browser-based management console Free downloadable VM images Not available for HP Integrity servers
Red Hat Virtualization Bundled in RHEL v5.x Based on Xen 3.3 Runs both Linux and Windows domains (guest) Supports paravirtualization on all hardware and hardware virtualization 64-bit and 32-bit guests Includes live migration at no additional cost X and web-based management tools Integrated with Red Hat Cluster 8 September 3, 09
Novell SLES/Xen Virtualization Bundled with SLES 10 &11 Based on Xen 3.3 SLES/Xen runs both Linux and Windows domains (guest) Supports paravirtualization on all hardware and hardware virtualization (VT-i, VT-x, VT-d or AMD-V) on latest x86 hardware 64-bit and 32-bit guests Up to 64-way SMP Integration with Yast2 management tools High Availability software integration 9 September 3, 09
Virtualization technologies Linux Kernel Virtual Machines (KVM) Uses standard Linux kernel Kernel module for virtualization Kernel Shared Memory (KSM) Simply put: Share like memory pages between guests >10% savings w/o perf. impact Virtual IO drivers Back-ported by distros Tech Preview in SLES 11 Announced for RHEL v5.4 10 9/3/09
Linux Virtualization KVM Architecture User Apps User Apps ( vmx ) Guest 2 ( vmx ) Guest 1 Regular Linux Application ( App (User Regular Linux Application ( App (User Qemu Qemu KVM Module Linux kernel Device Driver Device Driver Hardware Platform 11 19 March 2007
Platform virtualization Limits Category Xen (OSS) KVM VMware pcpus 126 4096 64 vcpus 32 16[1] 8 Memory: host/guest 1 TB/80 GB 4 TB / 1.4 TB[2] 1 TB / 255 GB Mem over-commit Balloon driver KSM Yes NPT/EPT Yes Yes Yes PCI pass-through Yes Yes[3] Yes[3] Accelerated IO G, D D D G = ParaVirtualized guest D = ParaVirtualized IO drivers [1] Current limits. Tests done at 256 vcpus [2] Booted with 2TB, guest detected 1.4TB [3] assumes IOMMU (or VT-D) 12 9/3/09
RHEL 5.3 Xen (PV guests) AIO Read: 4 guests, 4 vcpus, 1 vdisk, 30GB vram AIOD READ - Full NUMA 400.0000 Bandwidth (MB/s) 350.0000 300.0000 250.0000 200.0000 150.0000 100.0000 AIOD READ - Hybrid NUMA 50.0000 0.0000 1 2 3 4 Guests Bandwidth (MB/s) 400.0000 350.0000 300.0000 250.0000 200.0000 150.0000 100.0000 50.0000 0.0000 1 2 3 4 Guests 13 9/3/09
Kernel Virtual Machine AIO Read: 4 guests, 4vCPUs, 1 vdisk, 30GB vram 14 9/3/09
Kernel Virtual Machine AIO Read: 8 guests, 4vCPUs, 1vDisk, 30GB vram 15 9/3/09
OS Virtualization Linux container options 16 September 3, 09
Parallels Virtuozzo Containers Advanced containers for Linux Parallels Virtuozzo Containers sits on top of a standard Linux distribution User User Applications Common Applications System Software Core Kernel Hardware Drivers Hardware Virtualizes OS Kernel Each Virtual Private Server: Has its own processes, users, files and provides full root or administrator access Owns IP addresses, port numbers, filtering and routing rules Can have its own versions of system libraries or different patch levels Could delete, add, modify any file, install its own application software or system software in its exclusive area 17 9/3/09 Runs the same O.S. the host is running
Linux kernel cpuset Soft partitions in the Linux kernel CPUs & memory Exclusively or shared Simple and powerful to use Management tools Standard file system & OS tools SLES 9, SLES 10, SLES 11 18 June 15, 2008
Virtualization technologies LinuX Containers (LXC) Showing up as cgroup (Control Group) extensions Some instance of cgroup/cpusets have been present since early 2.6 releases Mainline kernel capability Lots happened between 2.6.27 and 2.6.30 (and beyond) As a tech preview in SLES 11 Unknown status for future RHEL release(s) Evolving functionality Seem to be interaction between OpenVZ & LXC developers 19 9/3/09
LinuX Containers - LXC Native containers an emerging technology Expands existing cgroup capability Aggregates CPU, Memory, IO resources Network & storage IO One or more tasks and their children Includes resource capping capabilities IO, Memory & CPU Managed using libvirt (or cset) Updated in recent upstream kernels Mainline kernel base for RHEL 6, SLES 11 Technology preview in SLES 11 RHEL 6 status is unknown 20 June 15, 2008
The Future: Kernel work that may benefit your host or guest environment(s) 21 September 3, 09
Split LRU For Improved VM Scalability Large systems can perform poorly under high memory demand Page replacement requires spinlocked LRU scans Each core will typically scan / reclaim under pressure 128 GB = 32 million x86-64 pages Inadequate swap worsens behavior no way to free memory LRU split into anonymous and file-backed lists Tailored reclaim policies, improved locking Non-reclaimable removed from LRUs mlocked, tmpfs, etc. Results in much better scanning efficiency Virtualized servers also benefit 22 12 May 2009
Virtualization Future 23 September 3, 09 Management tools: The new battle-front Lots of options libguestfs Batch configuration changes Modify file system structure Run commands in guest context Scriptable From host environment: Dormant guests Running guests Even Windows guests
Beneficial Upstream Work Significant benefits for KVM Big Kernel Lock pushdown / elimination Lockless page cache VFS cleanup Includes global inode lock scope reduction, other perf. improvements ext3 fsync performance NUMA node hugepage allocation Improved page reclamation More scalable and capable filesystems 24 12 May 2009
Best Practices 25 September 3, 09
Best Practices: Maximize portability Data management File-backed storage for OS Std. File Systems Cluster File Systems Networked File Systems Data on shared LVM or raw storage Fibre Channel iscsi Guest management Leverage libvirt portability Xen, KVM, LXC, OpenVZ, etc virt-* utilities For RHEL/SLES: Always install both bare-metal & Xen kernels in new guests 26 September 3, 09
Best Practices: Maximize portability Think in terms of appliances Start with.vmdk based images Simplified deployment Always use PV IO drivers Assuming they re available Windows too OS configuration Xen Use poor-mans numa : Pin Dom0 CPUs on nodes/cells with IO attached Interleave memory KVM hugepages Get out of the kernel s way! 27 September 3, 09
In Summary Ever evolving landscape of virtualization Starting to stabilize Oracle is a little bit of a wild-card New focus for vendors: Management Make your environment as flexible as possible Depending on Technology: Watch interleaved memory behavior of system Pin dom0 to IO nodes for Xen 28 9/3/09
29 September 3, 09 Produced in cooperation with: