LOW-LATENCY, HIGH-BANDWIDTH USE CASES FOR NAHANNI / IVSHMEM



Similar documents
Datacenter Operating Systems

Cloud Operating Systems for Servers

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Cloud Computing through Virtualization and HPC technologies

Enabling High performance Big Data platform with RDMA

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Web Technologies: RAMCloud and Fiz. John Ousterhout Stanford University

Enabling Technologies for Distributed and Cloud Computing

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

KVM Virtualized I/O Performance

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Xen Live Migration. Networks and Distributed Systems Seminar, 24 April Matúš Harvan Xen Live Migration 1

Virtual Switching Without a Hypervisor for a More Secure Cloud

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Performance Analysis of Cloud Computing under the Impact of Botnet Attack

Enabling Technologies for Distributed Computing

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Data Centers and Cloud Computing

Marvell DragonFly Virtual Storage Accelerator Performance Benchmarks

SR-IOV: Performance Benefits for Virtualized Interconnects!

Advanced Computer Networks. Network I/O Virtualization

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

Performance Management for Cloud-based Applications STC 2012

Efficient shared memory message passing for inter-vm communications

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

Cloud Computing using MapReduce, Hadoop, Spark

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

Azure VM Performance Considerations Running SQL Server

Network performance in virtual infrastructures

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Data Centers and Cloud Computing. Data Centers

ECLIPSE Performance Benchmarks and Profiling. January 2009

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

Virtualization System Vulnerability Discovery Framework. Speaker: Qinghao Tang Title:360 Marvel Team Leader

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

How Linux kernel enables MidoNet s overlay networks for virtualized environments. LinuxTag Berlin, May 2014

How SSDs Fit in Different Data Center Applications

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Client-aware Cloud Storage

Performance Management for Cloudbased STC 2012

Run-Time Deep Virtual Machine Introspection & Its Applications

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

LS DYNA Performance Benchmarks and Profiling. January 2009

Beyond the Hypervisor

Emerging Technology for the Next Decade

A hypervisor approach with real-time support to the MIPS M5150 processor

Benchmarking Cassandra on Violin

ELI: Bare-Metal Performance for I/O Virtualization

Characterizing the Performance of Parallel Applications on Multi-Socket Virtual Machines

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Can High-Performance Interconnects Benefit Memcached and Hadoop?

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Yahoo! Cloud Serving Benchmark

KVM & Memory Management Updates

KVM Architecture Overview

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

When Does Colocation Become Competitive With The Public Cloud? WHITE PAPER SEPTEMBER 2014

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

High Performance Computing OpenStack Options. September 22, 2015

Virtualization and Cloud Computing. The Threat of Covert Channels. Related Work. Zhenyu Wu, Zhang Xu, and Haining Wang 1

When Does Colocation Become Competitive With The Public Cloud?

Praktijkexamen met Project VRC. Virtual Reality Check

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Amazon EC2 XenApp Scalability Analysis

Exploiting The Latest KVM Features For Optimized Virtualized Enterprise Storage Performance

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Cloud Computing. Adam Barker

VIRTUALIZATION, The next step for online services

Storage I/O Control: Proportional Allocation of Shared Storage Resources

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Switching Architectures for Cloud Network Designs

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

SMB Direct for SQL Server and Private Cloud

Rackspace Cloud Databases and Container-based Virtualization

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Transcription:

1 LOW-LATENCY, HIGH-BANDWIDTH USE CASES FOR NAHANNI / IVSHMEM Cam Macdonell, Xiaodi Ke, Adam Wolfe Gordon, Paul Lu University of Alberta paullu@cs.ualberta.ca KVM Forum 2011 August 16, 2011

2 Contents 1. For the QEMU/KVM developer Nahanni/ivshmem included as of QEMU 0.13.0, August 2010 2. VMs for Web services Memcached: up to 29% lower inter-vm latencies on workloads 3. VMs for computational science Order-of-magnitude lower latency and higher bandwidth on MPI microbenchmarks Up to 30% faster on MPI application benchmarks (GAMESS, SPEC MPI2007)

3 Part 1: What is Nahanni / ivshmem? User-level library for data movement OS bypass: No guest or host OS involvement Memcached, DDI: pointer-based structured data MPI: message passing, stream data Nahanni device looks like a graphics card Guest OS driver (for initialization only) No impact on guests that do not load the driver New Nahanni virtual PCI device in QEMU -device ivshmem,shm=<name>,size=1024! Creates POSIX shared memory on host Cam Macdonell s Ph.D.

Host-Guest, Inter-VM Shared Memory 4

5 Host-Guest Use Case Copying a file from host into a guest VM Nahanni is faster than Netcat, SCP-HPN, 9P

6 Part 2: Web Services in the Cloud Many companies use the cloud for their Web servers Reddit and FourSquare are on Amazon EC2 Memcached is a key-value cache for databases, etc. Used by Facebook, Twitter, others Conclusion: Nahanni reduces look-up latency by 29% on a read-mostly Yahoo Cloud Serving Benchmark (YCSB) workload, for co-located VMs M.Sc. thesis and NetDB 11 paper by Adam Wolfe Gordon

7 Nahanni Memcached + YCSB 1 million 1 KB records Each VM ran 12 million operations; 36 million in total Each VM ran 2,000 operations per second

8 Nahanni Memcached + YCSB VM 3 can do look-up using pointers and synchronization 1 million 1 KB records Each VM ran 12 million operations; 36 million in total Each VM ran 2,000 operations per second

9 Nahanni vs. virtio/bridging (br) vs. virtio/vhost (vh) Average Total Read Latency (ms) 0.0 0.5 1.0 1.5 2.0 2.5 1.93 1.93 1.4 0.21 0.6 1.02 0.59 0.64 0.23 1.87 0.12 0.15 1.87 No cache (br) memcached (br) Nahanni (br) No cache (vh) memcached (vh) Nahanni (vh) Cache Type CASSANDRA_READ CACHE_READ CACHE_WRITE 0.95 0.41 0.67 0.13 0.11 0.43 0.42

10 Nahanni vs. virtio/bridging (br) vs. virtio/vhost (vh) Average Total Read Latency (ms) 0.0 0.5 1.0 1.5 2.0 2.5 1.93 1.93 1.4 0.21 0.6 1.02 0.59 0.64 0.23 1.87 0.12 0.15 1.87 No cache (br) memcached (br) Nahanni (br) No cache (vh) memcached (vh) Nahanni (vh) Cache Type CASSANDRA_READ CACHE_READ CACHE_WRITE 0.95 0.41 0.67 0.13 0.11 0.43 0.42 Workload: 29% lower latency

11 Nahanni vs. virtio/bridging (br) vs. virtio/vhost (vh) Average Total Read Latency (ms) 0.0 0.5 1.0 1.5 2.0 2.5 1.93 1.93 1.4 0.21 0.6 1.02 0.59 0.64 0.23 1.87 0.12 0.15 1.87 No cache (br) memcached (br) Nahanni (br) No cache (vh) memcached (vh) Nahanni (vh) Cache Type CASSANDRA_READ CACHE_READ CACHE_WRITE 0.95 0.41 0.67 0.13 0.11 0.43 0.42 Workload: 29% lower latency Read Ops: 73% lower latency

12 Part 3: Computational Science in VMs Your HPC application is likely to have an Message- Passing Interface (MPI) version We developed the MPI-Nahanni user-level library Port of MPICH2-Nemesis For co-located VM instances Conclusion: MPI-Nahanni has order-of-magnitude lower latency and higher bandwidth; applications are up to 30% faster. M.Sc. thesis of Xiaodi Ke, Ph.D. thesis of Cam Macdonell

13 NetPIPE 2-sided Bandwidth (Nahanni vs. vhost) ($!!! (#!!! *+,-./0/112 *+,-30456 *+,-./0/112 *+,-30456 (!!!! =/1@A2@60<*=B57C? )!!! &!!! $!!! #!!!!! $ "# #%& #' (&' (#)' *755/879:2;7<=>675? (* )* &$* %(#*

14 NetPIPE 2-sided Bandwidth (Nahanni vs. vhost)!"((!((( +,-./010223 +,-.41567 +,-.41567 )+!"#' >02AB3A71=+>C68D@ $(( %(( #(( "(( (!!"#$ ()(" " %&"'( ()(# # #&"(! ()($ $ '%"#) ()!% *&"#!!% ()&! %'!"'( ()%( &" +866098:;3<8=>?786@ #)'"#!!%'"!+ %#!"$!)"! ")&* "'% #)$(

15 NetPIPE 2-sided Bandwidth (full results) D/19G2960C*DH5;<F ($!!! (#!!! (!!!! )!!! &!!! $!!! #!!! *+,-./0/112 *+,-30456 *+,-7829:; *+,-54<=;6 *+,-30456.4>*-*+,-.;?;525.4>*-*+,-54<=;6 *+,-7829:; *+,-54<=;6.4>*-*+,-.;?;525.4>*-*+,-54<=;6!! $ "# #%& #' (&' (#)' *;55/:;@A2B;CDE6;5F (* )* &$* %(#*

16 NetPIPE 2-sided Latency (Nahanni vs. vhost) (&! (!! )*+,-./.001 )*+,2/345 )*+,2/345 $&! $!!?.560@=;A46@> "&! "!! %&! %!! &!!! " # $" %"# &%" )644.76891:6;<=564> "' #' $"'

17 NetPIPE 2-sided Latency (full results) F.5:0;DBG4:;E (&! (!! $&! $!! "&! "!! %&! %!! &! )*+,-./.001 )*+,2/345 )*+,67189: )*+,43;<:5-3=),)*+,-:>:414-3=),)*+,43;<:5 )*+,-./.001 )*+,2/345 )*+,67189: )*+,43;<:5-3=),)*+,-:>:414-3=),)*+,43;<:5!! " # $" %"# &%" ):44.9:?@1A:BCD5:4E "' #' $"'

18 SPEC MPI2007 SPEC MPI2007 is an industry benchmark for MPI We ran 9 of 13 applications from the medium input set Part of Cam Macdonell s Ph.D. thesis Conclusions: Performance benefit of MPI-Nahanni grows as the number of processes grows Improvement is proportional to time spent in MPI functions

SPEC MPI2007 (8x8) 19

% MPI time (y) vs. % Improvement (x) 20

% MPI time (y) vs. % Improvement (x) 21

% MPI time (y) vs. % Improvement (x) 22

23 GAMESS: Non-MPI Communications GAMESS Quantum Chemistry Not MPI; DDI ported to Nahanni

24 Concluding Remarks Nahanni / ivshmem is an alternative to other mechanisms (e.g., virtual network, virtio+vhost) for inter-vm, intra-host IPC OS bypass: does not modify or use data movement paths in host or guest VM Supports pointers, non-stream data too Web: low latency for client-server, structured data 29% lower on a YCSB + Nahanni memcached workload Computational science: low latency and high bandwidth for message-passing applications No changes to MPI code. Up to 30% faster on full applications. paullu@cs.ualberta.ca