Real-time KVM from the ground up

Similar documents

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

KVM in Embedded Requirements, Experiences, Open Challenges

Kernel Optimizations for KVM. Rik van Riel Senior Software Engineer, Red Hat June

KVM & Memory Management Updates

Real-Time Virtualization How Crazy Are We?

Architecture of the Kernel-based Virtual Machine (KVM)

2972 Linux Options and Best Practices for Scaleup Virtualization

Cloud Operating Systems for Servers

The QEMU/KVM Hypervisor

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

KVM Architecture Overview

Hard Real-Time Linux

Database Virtualization

Cloud^H^H^H^H^H Virtualization Technology. Andrew Jones May 2011

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Nested Virtualization

An Implementation Of Multiprocessor Linux

Exploiting The Latest KVM Features For Optimized Virtualized Enterprise Storage Performance

RED HAT ENTERPRISE VIRTUALIZATION & CLOUD COMPUTING

Realtime Linux Kernel Features

REAL TIME OPERATING SYSTEMS. Lesson-10:

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

ICS Principles of Operating Systems

Virtualization in Linux KVM + QEMU

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Beyond the Hypervisor

W4118 Operating Systems. Instructor: Junfeng Yang

Virtualization. Dr. Yingwu Zhu

Hyper-V vs ESX at the datacenter

Increasing XenServer s VM density

Understanding Linux on z/vm Steal Time

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

Virtualization and Performance NSRC

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Deploying Extremely Latency-Sensitive Applications in VMware vsphere 5.5

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Red Hat enterprise virtualization 3.0 feature comparison

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Real-Time Scheduling 1 / 39

Virtualization Technologies

Performance Comparison of RTOS

Performance tuning Xen

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

FOR SERVERS 2.2: FEATURE matrix

Multi-core Programming System Overview

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Operating Systems Concepts: Chapter 7: Scheduling Strategies

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

KVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS. Mark Wagner Principal SW Engineer, Red Hat August 14, 2011

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Using Linux as Hypervisor with KVM

OPERATING SYSTEMS SCHEDULING

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

Linux scheduler history. We will be talking about the O(1) scheduler

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

CPU Scheduling Outline

Introduction to Virtualization & KVM

Nested Virtualization

Real-Time Multi-Core Virtual Machine Scheduling in Xen

Comparing Power Saving Techniques for Multi cores ARM Platforms

Virtual Machines. COMP 3361: Operating Systems I Winter

Version 3.7 Technical Whitepaper

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

HRG Assessment: Stratus everrun Enterprise

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

W4118 Operating Systems. Instructor: Junfeng Yang

Large-scale performance monitoring framework for cloud monitoring. Live Trace Reading and Processing

Red Hat Enterprise Virtualization Performance. Mark Wagner Senior Principal Engineer, Red Hat June 13, 2013

Automatic NUMA Balancing. Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP

KVM Virtualization Roadmap and Technology Update

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Process Scheduling in Linux

Chapter 14 Virtual Machines

XtratuM hypervisor redesign for LEON4 multicore processor

Completely Fair Scheduler and its tuning 1

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

3 Red Hat Enterprise Linux 6 Consolidation

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Scaling Microsoft Exchange in a Red Hat Enterprise Virtualization Environment

Lecture 3 Theoretical Foundations of RTOS

RED HAT ENTERPRISE VIRTUALIZATION SCALING UP LOW LATENCY, VIRTUALIZATION, AND LINUX FOR WALL STREET OPERATIONS

How To Get The Most Out Of Redhat.Com

Advanced Computer Networks. Network I/O Virtualization

Performance Management in a Virtual Environment. Eric Siebert Author and vexpert. whitepaper

CPU Scheduling. CPU Scheduling

Transcription:

Real-time KVM from the ground up KVM Forum 2015 Rik van Riel Red Hat

Real-time KVM What is real time? Hardware pitfalls Realtime preempt Linux kernel patch set KVM & qemu pitfalls KVM configuration Scheduling latency performance numbers Conclusions

What is real time? Real time is about determinism, not speed Maximum latency matters most Minimum / average / maximum Used for workloads where missing deadlines is bad Telco switching (voice breaking up) Stock trading (financial liability?) Vehicle control / avionics (exploding rocket!) Applications may have thousands of deadlines a second Acceptable max response times vary For telco & stock cases, a few dozen microseconds Very large fraction of responses must happen within that time frame (eg. 99.99%)

RHEL7.x Real-time Scheduler Latency Jitter Plot

Hardware pitfalls Biggest problems: BIOS, BIOS, and BIOS System Management Mode (SMM) & Interrupt (SMI) Used to emulate or manage things, eg: USB mouse PS/2 emulation System management console SMM runs below the operating system SMI traps to SMM, runs firmware code SMIs can take milliseconds to run in extreme cases OS and real time applications interrupted by SMI Realtime may require BIOS settings changes Some systems not fixable Buy real time capable hardware Test with hwlatdetect & monitor SMI count MSR

Realtime preempt Linux kernel Normal Linux has similar latency issues as BIOS SMI Non-preemptible critical sections: interrupts, spinlocks, etc Higher priority program can only be scheduled after the critical section is over Real time kernel code has existed for years Some of it got merged upstream CONFIG_PREEMPT Some patches in a separate tree CONFIG_PREEMPT_RT https://rt.wiki.kernel.org/ https://osadl.org/rt/

Realtime kernel overview Realtime project created a LOT of kernel changes Too many to keep in separate patches Already merged upstream Deterministic real time scheduler Kernel preemption support Priority Inheritance mutexes High-resolution timer Preemptive Read-Copy Update IRQ threads Raw spinlock annotation NO_HZ_FULL mode Not yet upstream Full realtime preemption

PREEMPT_RT kernel changes Goal: make every part of the Linux kernel preemptible or very short duration Highest priority task gets to preempt everything else Lower priority tasks Kernel code holding spinlocks Interrupts How does it do that?

PREEMPT_RT internals Most spinlocks turned into priority inherited mutexes spinlock sections can be preempted Much higher locking overhead Very little code runs with raw spinlocks Priority inheritance Task A (prio 0), task B (prio 1), task C (prio 2) Task A holds lock, task B running Task C wakes up, wants lock Task A inherits task C's priority, until lock is released IRQ threads Each interrupt runs in a thread, schedulable RCU tracks tasks in grace periods, not CPUs Much, much more...

KVM & qemu pitfalls Real time is hard Real time virtualization is much harder Priority of tasks inside a VM are not visible to the host The host cannot identify the VCPU with the highest priority program Host kernel housekeeping tasks extra expensive Guest exit & re-entry Timers, RCU, workqueues, Lock holders inside a guest not visible to the host No priority inheritance possible Tasks on VCPU not always preemptible due to emulation in qemu

Real time KVM kernel changes Extended RCU quiescent state in guest mode Add parameter to disable periodic kvmclock sync Applying host ntp adjustments into guest causes latency Guest can run ntpd and keep its own adjustment Disable scheduler tick when running a SCHED_FIFO task Not rescheduling? Don't run the scheduler tick Add parameter to advance tscdeadline hrtime parameter Makes timer interrupt happen early to compensate for virt overhead Various isolcpus= and workqueue enhancements Keep more housekeeping tasks away from RT CPUs

Priority inversion & starvation Host & guest separated by clean(ish) abstraction layer VCPU thread needs a high real time priority on the host Guarantee that real time app runs when it wants VCPU thread has same high real time host priority when running unimportant things... Guest could be run with idle=poll VCPU uses 100% host CPU time, even when idle Higher priority things on the same CPU on the host are generally unacceptable could interfere with real time task Lower priority things on the same CPU on the host could starve forever could lead to system deadlock

KVM real time virtualization host partitioning Avoid host/guest starvation Run VCPU threads on dedicated CPUs No host housekeeping on those CPUs, except ksoftirqd for IPI & VCPU IRQ delivery Boot host with isolcpus and nohz_full arguments Run KVM guest VCPUs on isolated CPUs Run host housekeeping tasks on other CPUs

KVM real time virtualization host partitioning Run VCPUs on dedicated host CPUs Keep everything else out of the way Even host kernel tasks System CPUs System tasks isolcpus=4 15 nohz_full=4 15 RT Guest #1 VCPUs RT Guest #2 VCPUs CPUs 0 3 CPUs 4 15

KVM real time virtualization guest partitioning Partitioning the host is not enough Tasks on guest can do things that require emulation Worst case: emulation by qemu userspace on host Poking I/O ports Block I/O Video card access... Emulation can take hundreds of microseconds Context switch to other qemu thread Potentially wait for qemu lock Guest blocked from switching to higher priority task Guest needs partitioning, too!

KVM real time virtualization guest partitioning Guest booted with isolcpus Real time tasks run on isolated CPUs Everything else runs on system CPUs System VCPUs isolcpus=2 7 System tasks Real time tasks VCPUs 0 1 VCPUS 2 7

Real time KVM performance numbers Dedicated resources are ok Modern CPUs have many cores People often disable hyperthreading Scheduling latencies with cyclictest Real time test tool Measured scheduling latencies inside KVM guest Minimum: 5us Average: 6us Maximum: 14us

RHEL7.x Scheduler Latency (cyclictest) Intel Ivy Bridge 2.4 Ghz, 128 GB mem Latency (microseconds) Cyclictest Latency 140 90 40-10 Min Mean 99.9% Stddev Max L atency (m icroseconds) Remove maxes to zoom in Cyclictest Latency 8 6 4 2 0 Min Mean 99.9% Stddev

Doctor, it hurts when I... All kinds of system operations can cause high latencies CPU frequency change CPU hotplug Loading & unloading kernel modules Task migration between isolated and system CPUs TLB flush IPI may get queued behind a slow op Keep real time and system tasks separated Host clocksource change from TSC to!tsc Use hardware with stable TSC Page faults or swapping Run with enough memory Use of slow devices (eg. disk, video, or sound) Only use fast devices from realtime programs Slow devices can be used from helper programs

Cache Allocation Technology Single CPU can have many CPU cores, sharing L3 cache Cannot load lots of things from RAM in 14us ~60ns for a single DRAM access Uncached context switch + TLB loads + more could add up to >50us Low latencies depend on things being in CPU cache Latest Intel CPUs have Cache Allocation Technology CPU cache quotas Per application group, cgroups interface Available on some Haswell CPUs Prevents one workload from evicting another workload from the cache Helps improve the guarantee of really low latencies

Conclusions Real time KVM is actually possible Achieved largely through system partitioning Overcommit is not an option Latencies low enough for various real time applications 14 microseconds max latency with cyclictest Real time apps must avoid high latency operations Virtualization helps with isolation, manageability, hardware compatibility, Requires very careful configuration Can be automated with libvirt, openstack, etc Jan Kiszka's presentation explains how