Xen and XenServer Storage Performance

Similar documents

Full and Para Virtualization

Performance tuning Xen

HPC performance applications on Virtual Clusters

Datacenter Operating Systems

A client side persistent block cache for the data center. Vault Boston Luis Pabón - Red Hat

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Best Practices for Virtualised SharePoint

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

CON9577 Performance Optimizations for Cloud Infrastructure as a Service

COS 318: Operating Systems. Virtual Machine Monitors

The Data Placement Challenge

Masters Project Proposal

Increasing XenServer s VM density

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Virtual Machine Synchronization for High Availability Clusters

Analysis of VDI Storage Performance During Bootstorm

COS 318: Operating Systems

Virtual Machines. COMP 3361: Operating Systems I Winter

Chapter 5 Cloud Resource Virtualization

Virtual SAN Design and Deployment Guide

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Deep Dive: Maximizing EC2 & EBS Performance

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Cloud Computing CS

Architecture of the Kernel-based Virtual Machine (KVM)

Virtualizare sub Linux: avantaje si pericole. Dragos Manac

NEXENTA S VDI SOLUTIONS BRAD STONE GENERAL MANAGER NEXENTA GREATERCHINA

The Price of Safety: Evaluating IOMMU Performance Preliminary Results

Xen and the Art of. Virtualization. Ian Pratt

KVM & Memory Management Updates

Q & A From Hitachi Data Systems WebTech Presentation:

Hadoop Size does Hadoop Summit 2013

How To Make A Minecraft Iommus Work On A Linux Kernel (Virtual) With A Virtual Machine (Virtual Machine) And A Powerpoint (Virtual Powerpoint) (Virtual Memory) (Iommu) (Vm) (

Comparison of Hybrid Flash Storage System Performance

9/26/2011. What is Virtualization? What are the different types of virtualization.

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

HRG Assessment: Stratus everrun Enterprise

Virtualization and Performance NSRC

Virtual desktops made easy

Decentralized Deduplication in SAN Cluster File Systems

MySQL performance in a cloud. Mark Callaghan

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

Virtual Machine Security

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

HP Smart Array Controllers and basic RAID performance factors

An Overview of Flash Storage for Databases

Knut Omang Ifi/Oracle 19 Oct, 2015

Virtuoso and Database Scalability

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

I3: Maximizing Packet Capture Performance. Andrew Brown

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Enabling Technologies for Distributed Computing

Benchmarking Hadoop & HBase on Violin

Enabling Technologies for Distributed and Cloud Computing

2972 Linux Options and Best Practices for Scaleup Virtualization

Intel DPDK Boosts Server Appliance Performance White Paper

Peter Senna Tschudin. Performance Overhead and Comparative Performance of 4 Virtualization Solutions. Version 1.29

Distribution One Server Requirements

The Xen of Virtualization

Scaling Analysis Services in the Cloud

Xen Project 4.4: Features and Futures. Russell Pavlicek Xen Project Evangelist Citrix Systems

Storage benchmarking cookbook

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

Virtualization Technology. Zhiming Shen

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

COS 318: Operating Systems. Virtual Machine Monitors

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Technology Insight Series

Computing in High- Energy-Physics: How Virtualization meets the Grid

Cloud Computing through Virtualization and HPC technologies

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Servervirualisierung mit Citrix XenServer

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

Xen Live Migration. Networks and Distributed Systems Seminar, 24 April Matúš Harvan Xen Live Migration 1

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

KVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS. Mark Wagner Principal SW Engineer, Red Hat August 14, 2011

Solid State Storage in Massive Data Environments Erik Eyberg

Using Linux as Hypervisor with KVM

matasano Hardware Virtualization Rootkits Dino A. Dai Zovi

Cloud Sure - Virtual Machines

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

WHITE PAPER Optimizing Virtual Platform Disk Performance

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Database Virtualization

Exploiting The Latest KVM Features For Optimized Virtualized Enterprise Storage Performance

A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor

Real-time KVM from the ground up

Cloud Operating Systems for Servers

Maximizing SQL Server Virtualization Performance

Transcription:

Xen and XenServer Storage Performance Low Latency Virtualisation Challenges Dr Felipe Franciosi XenServer Engineering Performance Team e-mail: felipe.franciosi@citrix.com freenode: felipef #xen-api twitter: @franciozzy

Agenda Where do Xen and XenServer stand? When is the virtualisation overhead most noticeable? Current implementation: blkfront, blkback, blktap2+tapdisk, blktap3, qemu-qdisk Measurements over different back ends Throughput and latency analysis Latency breakdown: where are we losing time when virtualising? Proposals for improving 2

Where do Xen and XenServer Stand? When is the virtualisation overhead noticeable?

Where do Xen and XenServer stand? What kind of throughput can I get from dom0 to my device? Using 1 MiB reads, this host reports 118 MB/s from dom0 4

Where do Xen and XenServer stand? What kind of throughput can I get from domu to my device? Using 1 MiB reads, this host reports 117 MB/s from a VM IMPERCEPTIBLE virtualisation overhead 5

Where do Xen and XenServer stand? That s not always the case... Same test on different hardware (from dom0) my disks can do 700 MB/s!!!! 6 Confidential - Do Not Distribute

Where do Xen and XenServer stand? That s not always the case... Same test on different hardware (from domu) VISIBLE virtualisation overhead why is my VM only doing 300 MB/s??? 7 Confidential - Do Not Distribute

Current Implementation How we virtualise storage with Xen

Current Implementation (bare metal) How does that compare to storage performance again? There are different ways a user application can do storage I/O We will use simple read() and write() libc wrappers as examples BD device driver block layer 1. char buf[4096]; HW Interrupt on completion 2. int fd = open( /dev/sda, O_RDONLY O_DIRECT); sys_read(fd, buf, 4096) libc vfs_read() f_op->read()** kernel space user space 3. read(fd, buf, 4096); 4. buf now has the data! user process buf fd 9

Current Implementation (Xen) The Upstream Xen use case The virtual device in the guest is implemented by blkfront Blkfront connects to blkback, which handles the I/O in dom0 dom0 domu BD device driver block layer blkback VDI device blkfront driver block layer xen s blkif protocol syscall / etc() kernel space user space libc kernel space user space user process buf fd 10

Current Implementation (Xen) The XenServer 6.2.0 use case XenServer provides thin provisioning, snapshot, clones, etc. hello VHD This is easily implemented in user space. hello TAPDISK dom0 domu BD device driver block layer tap blktap2 block layer blkback VDI blkfront block layer data stored in VHD aio syscalls xen s blkif protocol syscall / etc() kernel space user space libaio libc kernel space user space tapdisk2 user process buf fd 11

Current Implementation (Xen) The blktap3 and qemu-qdisk use case Have the entire back end in user space dom0 domu BD device driver block layer VDI blkfront block layer data stored in VHD kernel space user space aio syscalls libaio tapdisk3 / tapdisk2 qemu-qdisk gntdev evtchn dev xen s blkif protocol syscall / etc() libc user process kernel space user space buf fd 12

Measurements Over Different Back Ends Throughput and latency analysis

Measurement Over Different Back Ends Same host, different RAID0 logical volumes on a PERC H700 14 All have 64 KiB stripes, adaptive read-ahead and write-back cache enabled

dom0 had: 4 vcpus pinned 4 GB of RAM It becomes visible that certain back ends cope much better with larger block sizes. This controller supports up to 128 KiB per request. Above that, the Linux block layer splits the requests. 15

Seagate ST (SAS) blkback is slower, but it catches up with big enough requests. 16

Seagate ST (SAS) User space back ends are so slow they never catch up, even with bigger requests. This is not always true: if the disks were slower, they would catch up. 17

Intel DC S3700 (SSD) When the disks are really fast, none of the technologies catch up. 18

Measurement Over Different Back Ends There is another way to look at the data: 1 Throughput (data/time) = Latency (time/data) 19

Intel DC S3700 (SSD) The question now is: where is time being spent? Compare time spent: dom0 blkback qdisk 20

Measurement Over Different Back Ends Inserted trace points using RDTSC TSC is consistent across cores and domains 6 dom0 domu 3 BD device driver block layer blkback VDI device blkfront driver block layer kernel space user space 1. Just before issuing read() 2. On SyS_read() 3. On blkfront s do_blkif_request() 4. Just before notify_remote_via_irq() 5. On blkback s xen_blkif_be_int() 6. On blkback s xen_blkif_schedule() 7. Just before blk_finish_plug() 8 7 5 9 8. On end_block_io_op() 9. Just before notify_remove_via_irq() 10. On blkif_interrupt() 11. Just before blk_end_request_all() 12. Just after returning from read() 10 4 1 12 11 syscall / etc() libc user process kernel space user space buf fd 2 21

Measurement Over Different Back Ends Initial tests showed there is a warm up time Simply inserting printk()s affect the hot path Used a trace buffer instead In the kernel, trace_printk() In user space, hacked up buffer and a signal handler to dump its contents Run 100 read requests One immediately after the other (requests were sequential) Used IO Depth = 1 (only one in flight request at a time) Sorted the times, removed the 10 (10%) fastest and slowest runs Repeated the experiment 10 times and averaged the results 22

Measurements on 3.11.0 without Persistent Grants Time spent on device Actual overhead of mapping and unmapping and transferring data back to user space at the end 23

Measurements on 3.11.0 with Persistent Grants Time spent on device Time spent copying data out of persistently granted memory 24

Measurement Over Different Back Ends The penalty of copying can be worth taking depending on other factors: Number of dom0 vcpus (TLB flushes) Number of concurrent VMs performing IO Ideally, blkfront should support both data paths (contention on grant tables) Administrators can profile their workloads and decide what to provide 25

Proposals for Improving What else can we do to minimise the overhead?

New Ideas How can we reduce the processing required to virtualise I/O? 27

New Ideas Persistent Grants Issue: grant mapping (and unmapping) is expensive Concept: back end keeps grants, front end copies I/O data to those pages Status: currently implemented in 3.11.0 and already supported in qemu-qdisk Indirect I/O Issue: blkif protocol limit requests to 11 segs of 4 KiB per I/O ring Concept: use segments as indirect mapping of other segments Status: currently implemented in 3.11.0 28

New Ideas Avoid TLB flushes altogether (Malcolm Crossley) Issue: When unmapping, we need to flush the TLB on all cores But in the storage data path, the back end doesn t really access the data Concept: check whether granted pages have been accessed Status: early prototypes already developed 29 Only grant map pages on the fault handler (David Vrabel) Issue: Mapping/unmapping is expensive But in the storage data path, the back end doesn t really access the data Concept: Have a fault handler for the mapping, triggered only when needed Status: idea yet being conceived

New Ideas Split Rings Issue: blkif protocol supports 32 slots per ring for requests or responses this limits the total amount of outstanding requests (unless multi-page rings are used) this makes inefficient use of CPU caching (both domains writing to the same page) Concept: use one ring for requests and another for responses Status: early prototypes already developed Multi-queue Approach Issue: Linux s block layer does not scale well across NUMA nodes Concept: allow device drivers to register a request queue per core Status: scheduled for Kernel 3.12 30

e-mail: felipe.franciosi@citrix.com freenode: felipef #xen-api twitter: @franciozzy