Dive Into VM Live Migration OpenStack Liberty Summit 2015 Vancouver Michał Dulko Michał Jastrzębski Paweł Koniszewski
Why Bother?
Use Cases o Imminent host failure o Maintenance mode o Optimal resource placement
Imminent Host Failure o Cooling issues o Storage problems o Networking problems o Your datacenter was struck by a flood
Maintenance Mode o Firmware upgrades o Hardware upgrades o Kernel upgrades
Optimal Resource Placement o Reduce costs o Move VMs closer to their storage to lessen network latency o Stack more VMs on hosts to save power o Increase resiliency o Noisy neighbour separation o Spread VMs across more hosts
General Flow
Assumptions o Live o Consistent o Transparent o Minimal service disruption
Migrations in OpenStack Non-live migration (cold migration) o nova migrate <server> True live migration (shared storage or volume-based) o nova live-migration <server> [<host>] Block live migration o nova live-migration --block-migrate <server> [<host>]
Compatibility Migration type Local storage Volumes Shared storage Block LM True LM Block LM with read-only devices True LM with read-only devices
Live Migration Process o Pre-Migration o Reservation o Iterative pre-copy o Stop and copy o Commitment
Pre-migration Pre-migration Reservation Compute node A Compute node B Iterative pre-copy Stop and copy VM A Active Commitment Active VM on physical host A, host B selected by scheduler or preselected.
Reservation Pre-migration Reservation Compute node A Compute node A Compute node B Compute node B Iterative pre-copy Stop and copy VM A VM A Active VM A VM A Reserved Commitment ACTIV E Confirm availability of resources on host B; reserve a new VM.
Iterative pre-copy Pre-migration Reservation Iterative pre-copy Stop and copy Compute node A VM A Active Copying Compute node B VM A Paused Commitment Memory is transferred from A to B and next dirtied pages are iteratively copied.
Stop and copy Pre-migration Reservation Iterative pre-copy Stop and copy Commitment Compute node A Compute node A VM A PAUSE D VM A Paused Copy Compute node B Compute node B VM A PAUSE D VM A Paused Suspend VM and copy remaining pages and CPU state.
Commitment Pre-migration Reservation Compute node A Compute node A Compute node B Compute node B Iterative pre-copy Stop and copy Commitment VM A PAUSE D VM A PAUSE D VM A Active Host B becomes primary host for VM A.
Performance & reliability
Pitfalls o OpenStack does not allow triggering any operations on VM during LM o VMs with intensive memory workload are hard to migrate o LM generates heavy load on network o Migrations between CN with different CPUs o Memory oversubscription
Interacting With Live Migration o OpenStack disallow any operation on ongoing LM o You can use virsh instead to interact
Diagnosis o Information about ongoing LM virsh domjobinfo <domain> Time elapsed 1918595 ms Data processed 410.137 GiB Data remaining 4.600 GiB Data total 16.008 GiB Constant pages 144658 Normal pages 107307605 Normal data 409.346 GiB Expected downtime 1023 ms
Forcing Migration Finish o Cancel on-going LM virsh domjobabort <domain> o Pause VM during LM virsh suspend <domain>
Tuning Maximum Downtime o QEMU virsh qemu-monitor-command --hmp <domain> migrate_set_downtime <time (sec)> o libvirt virsh migrate-setmaxdowntime <domain> <time (sec)>
Auto Converge o nova.conf setting live_migration_flag += VIR_MIGRATE_AUTO_CONVERGE
Tunneled Migration o nova.conf setting live_migration_flag += VIR_MIGRATE_TUNNELLED libvirt libvirt Hypervisor Hypervisor Source Host Destination Host
Tunneled Migration o nova.conf setting live_migration_flag -= VIR_MIGRATE_TUNNELLED libvirt libvirt Hypervisor Hypervisor Source Host Destination Host
Tuning Bandwidth o libvirt virsh migrate-setspeed <domain> <speed (MiB/s)> o nova.conf settings live_migration_bandwidth = <speed (MiB/s)>
XBZRLE Compression o nova.conf settings live_migration_flag += VIR_MIGRATE_COMPRESSED Sent Page Cache Updated Page Delta Compression Source Host Delta Received Pages Delta Apply Delta Destination Host Updated Page
LM On Dedicated Network o nova.conf o live_migration_uri = qemu+tcp://%s/system Compute node A Management Network Compute node B VM A Active VM A Paused
LM On Dedicated Network o o nova.conf o live_migration_uri = qemu+tcp://%s-lm/system Set up your DNS to resolve hostnames with -lm suffix to IPs in your dedicated network. Compute node A Management Network Compute node B VM A Active LM Network VM A Paused
Different CPUs Between Compute Nodes o CPU instruction set of source node needs to be a subset of CPU instruction set of destination node Compute Node A MMX AVX Live Migration Passed Failed Compute Node B MMX SSE2 AVX
Different CPUs Between Compute Nodes o This can be skipped by explicitly setting VM CPU model in nova.conf: o cpu_mode = custom o virt_type = kvm or virt_type = qemu o And then you can set cpu_model o List of supported named CPUs is in libvirt/cpu_map.xml
Memory Oversubscription o LM to specific host does not use memory oversubscription o ram_allocation_ratio Compute Node A 2 GB RAM 2 GB nova-conductor 2 GB Reported RAM = available - reserved 2 GB nova-scheduler ram_allocation_ratio = 2.0 4 GB
Memory Oversubscription o Skip it by o reserved_host_memory_mb=-2048 Compute Node A 2 GB RAM 4 GB nova-conductor 4 GB Reported RAM = available - reserved 4 GB nova-scheduler ram_allocation_ratio = 1.0 4 GB
Secure Live Migration
Why Security Matters? o Everything can be sniffed! o Migrated machines can contain sensitive data o Legal issues with unencrypted data transfer
Encryption o Hypervisor native encryption o QEMU doesn t support it o libvirt tunneled transport o live_migration_uri = qemu+ssh://%s/system o live_migration_flag += VIR_MIGRATE_TUNNELLED o Uses only one core o IPSec tunnel between hosts
Transfer rate [GBps] Memory Access Is Critical 3 2.5 2 1.5 Intel(R) Xeon(R) CPU E5-2690 v2 Intel(R) Xeon(R) CPU E5-2660 v3 1 0.5 0 QEMU+SSH QEMU+TCP
Future Of Live Migration
Multithreaded Compression o Compress every page sent during LM o zlib used for compression o Configurable: o Number of threads o Comperession ratio
Post-copy Live Migration o Move workload immediately to destination host Compute node A Compute node B Copying VM A Paused VM A Active
Post-copy Live Migration o Cheap solution to finish live migration in a finite time o VM needs to be rebooted in case of failure o Heavy performance impact
Active LM Monitoring In OpenStack o Track memory transfer progress o Detect possible problems and take actions
Actions On Ongoing Live Migration o Pause VM o Abort LM o See progress o Change configuration on the fly: o Maximum tolerable VM down time o Transfer bandwith
Your voice matters! o Mailing lists: o openstack-dev@lists.openstack.org o openstack-operators@lists.openstack.org o Win The Enterprise group: o pawel.koniszewski@intel.com (IRC: pkoniszewski) o michal.jastrzebski@intel.com (IRC: inc0) o michal.dulko@intel.com (IRC: dulek)
Q&A (& disclaimers) Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. 2015 Intel Corporation.