A general-purpose virtualization service for HPC on cloud computing: an application to GPUs



Similar documents
The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

Full and Para Virtualization

Cloud Computing through Virtualization and HPC technologies

GPU Accelerated Signal Processing in OpenStack. John Paul Walters. Computer Scien5st, USC Informa5on Sciences Ins5tute

12. Introduction to Virtual Machines

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed Computing

Virtual Computing and VMWare. Module 4

High Performance Computing in CST STUDIO SUITE

COS 318: Operating Systems. Virtual Machine Monitors

COM 444 Cloud Computing

Virtualization for Cloud Computing

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

HPC performance applications on Virtual Clusters

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

Manjrasoft Market Oriented Cloud Computing Platform

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

Virtualization. Types of Interfaces

RCL: Software Prototype

Virtual Switching Without a Hypervisor for a More Secure Cloud

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

PLANNING FOR DENSITY AND PERFORMANCE IN VDI WITH NVIDIA GRID JASON SOUTHERN SENIOR SOLUTIONS ARCHITECT FOR NVIDIA GRID

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

KVM, OpenStack, and the Open Cloud

Scientific Computing Data Management Visions

Virtual Machine Management with OpenNebula in the RESERVOIR project

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Manjrasoft Market Oriented Cloud Computing Platform

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Next Generation Operating Systems

REMOTE HIGH FIDELITY VISUALIZATION. May 2015 Jeremy Main, Sr. Solution Architect GRID

Trends in High-Performance Computing for Power Grid Applications

Determining Overhead, Variance & Isola>on Metrics in Virtualiza>on for IaaS Cloud

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Intel Graphics Virtualization Technology Update. Zhi Wang,

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

KVM, OpenStack, and the Open Cloud

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

Virtualization: Know your options on Ubuntu. Nick Barcet. Ubuntu Server Product Manager

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

ARM TrustZone and KVM Coexistence with RTOS For Automotive

Virtual Machines.

Cloud Computing CS

2) Xen Hypervisor 3) UEC

Virtualization with Windows

Virtualization of ArcGIS Pro. An Esri White Paper December 2015

Parallel Programming Survey

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

GRID SDK FAQ. PG January Frequently Asked Questions

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Computing Service Provision in P2P Clouds

IOS110. Virtualization 5/27/2014 1

SR-IOV In High Performance Computing

Virtualization. Pradipta De

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Networking for Caribbean Development

Dheeraj K. Rathore 1, Dr. Vibhakar Pathak 2

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Stream Processing on GPUs Using Distributed Multimedia Middleware

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

S Hands on Tutorial: Deploying GRID in Citrix and VMware Virtual Desktop Environments

Basics of Virtualisation

Intro to Virtualization

Cloud and Virtualization to Support Grid Infrastructures

Chapter 5 Cloud Resource Virtualization

Distributed and Cloud Computing

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Introduction to Virtualization & KVM

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Vocera Voice 4.3 and 4.4 Server Sizing Matrix

2972 Linux Options and Best Practices for Scaleup Virtualization

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

Parallel Firewalls on General-Purpose Graphics Processing Units

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

EXPLORING LINUX KERNEL: THE EASY WAY!

Rackspace Cloud Databases and Container-based Virtualization

Cloud Operating Systems for Servers

Achieving Performance Isolation with Lightweight Co-Kernels

Introduction to GPU hardware and to CUDA

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Virtualization. Dr. Yingwu Zhu

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

PNY Professional Solutions NVIDIA GRID - GPU Acceleration for the Cloud

Transcription:

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University of Napoli Parthenope Department of Mathematics and Applications University of Napoli Federico II Department of Computer Science University of Madrid Carlos III

Outline Introduction and contextualization GVirtuS: Generic Virtualization Service GPU virtualization High performance cloud computing Who uses GVirtuS Conclusions and ongoing projects

Introduction and contextualization High Performance Computing Grid computing Many core technology GPGPUs Virtualization Cloud computing

High Performance CloudComputing Hardware: High performance computing cluster Multicore / Multi processor computing nodes GPGPUs Software: Linux Virtualization hypervisor Private cloud management software +Special ingredients

GVirtuS Generic Virtualization Service Framework for split-driver based abstraction components Plug-in architecture Independent form Hypervisor Communication Target of virtualization High performance: Enabiling transparent virtualization Wth overall performances not too far from un-virtualized machines

Split-Driver approach Split-Driver Hardware access by priviledged domain. Unpriviledged domains access the device using a frontend/backhend approach Unpriviledged Domain Appliction Wrap library Frontend driver Communicator Frontend (FE): Guest-side software component. Stub: redirect requests to the backend. Backend (BE): Mange device requests. Device multiplexing. Priviledged Domain Backend driver Interface library Device driver 6 Device Requests

GVirtuS approach GVirtuS Frontend Dyinamic loadable library Same application binary interface Run on guest user space Unpriviledged Domain Appliction Wrap library Frontend driver Communicator GVirtuS Backend Server application Run in host user space Concurrent requests Priviledged Domain Backend driver Interface library Device driver 7 Device Requests

The Communicator Provides a high performance communication between virtual machines and their hosts. The choice of the hypervisor deeply affects the efficiency of the communication. Hypervisor FE/BE comm Notes No hypervisor Unix Sockets Used for testing purposes Generic TCP/IP Used for communication testing purposes, but interesting Xen XenLoop runs directly on the top of the hardware through a custom Linux kernel provides a communication library between guest and host machines implements low latency and wide bandwidth TCP/IP and UDP connections application transparent and offers an automatic discovery of thesupported VMS VMware Virtual Machine Communication Interface (VMCI) commercial hypervisor running at the application level. provides a datagram API to exchange small messages a shared memory API to share data, an access control API to control which resources a virtual machine can access and a discovery service for publishing and retrieving resources. KVM/QEMU VMchannel Linux loadable kernel module now embedded as a standard component supplies a high performance guest/host communication based on a shared memory approach.

An application: Virtualizing GPUs GPUs Hypervisor independent Communicator independent GPU independent

Currently full threadsafe support to CUDA drivers CUDA runtime OpenCL Partially supporting (it is needed more work) OpenGL integration GVirtuS -CUDA

GVirtuS libcudard.so Virtual Machine #include <stdio.h> #include <cuda.h> int main(void) { int n; cudagetdevicecount(&n); printf("number of CUDA GPU(s): %d\n", n); return 0; } GVirtuS Frontend Real Machine GVirtuS Backend Process Handler cudaerror_t cudagetdevicecount(int *count) { Frontend *f = Frontend::GetFrontend(); f->addhostpointerforarguments(count); f->execute("cudagetdevicecount"); if(f->success()) *count = *(f->getoutputhostpointer<int>()); return f->getexitcode(); } Result *handlegetdevicecount( CudaRtHandler * pthis, Buffer *input_buffer) { int *count = input_buffer->assign<int>(); cudaerror_t exit_code; exit_code = cudagetdevicecount(count); Buffer *out = new Buffer(); out->add(count); return new Result(exit_code, out); } 11

Choices and Motivations vmsocket We focused on VMware and KVM hypervisors. vmsocketis the component we have designed to obtain a high performance communicator vmsocketexposes Unix Sockets on virtual machine instances thanks to a QEMU device connected to the virtual PCI bus.

vmsocket: virtual PCI device Programming interface: Unix Socket Communication between guest and host: Virtual PCI interface QEMU has been modified FE/BE GPU interaction based high efficiency: performance computing applications usually require massive data transfer between host (CPU) memory and device (GPU) memory there is no mapping between guest memory and device memory the memory device pointers are never de-referenced on the host side CUDA kernels are executed on the BE where the pointers are fully consistent.

Performance Evaluation CUDA Workstation Genesis GE-i940 Tesla i7-940 2,93 133 GHz fsb, Quad Core hyperthreaded 8 Mb cache CPU and 12Gb RAM. 1 nvidia Quadro FX5800 4Gb RAM video card 2 nvidia Tesla C1060 4 Gb RAM The testing system: Ubuntu 10.04 Linux nvidia CUDA Driver, and the SDK/Toolkit version 4.0. VMware vs. KVM/QEMU (using different communicators).

GVirtuS-CUDA runtime performances Results: 0: No virtualization, no accleration (blank) 1: Acceleration without virtualization (target) 2,3: Virtualization with no acceleration 4...6: GPU acceleraion, Tcp/Ip communication Similar performances due to communication overhead 7: GPU acceleration using GVirtuS, Unix Socket based communication 8,9: GVirtuS virtualization Good performances, no so far from the target 4..6 better performances than 0 Evaluation: CUDA SDK benchmarks # Hypervisor Comm. Histogram matrixmul scalarprod 0 Host CPU 100.00% 100.00% 100.00% 1 Host GPU 9.50% 9.24% 8.37% 2 Kvm CPU 105.57% 99.48% 106.75% 3 VM-Ware CPU 103.63% 105.34% 106.58% 4 Host Tcp/Ip 67.07% 52.73% 40.87% 5 Kvm Tcp/Ip 67.54% 50.43% 42.95% 6 VM-Ware Tcp/Ip 67.73% 50.37% 41.54% 7 Host AfUnix 11.72% 16.73% 9.09% 8 Kvm vmsocket 15.23% 31.21% 10.33% 9 VM-Ware vmcl 28.38% 42.63% 18.03% Computing times as Host-CPU rate

Distributed GPUs Hilights: VMSocket Using thetcp/ip Communicator FE/BE could be on different machines. Real machines can access remote GPUs. Applications: CUDA Application Frontend Guest VM Linux Backend CUDA Runtime Hypervisor CUDA Application Frontend Guest VM Linux Host OS Linux CUDA Application Frontend UNIX Socket Inter node load balanced Inter node among VMs Inter node tcp/ip? Security? Compression? Driver GPU for embedded systems as network machines node02 node01 High Performance Cloud Computing Backend CUDA Runtime Host OS Linux Driver

High Performance Cloud Computing Ad hockperformance test for benchmarking Virtual cluster on local computing cloud matrixmul MPI+GPU Benchmark: Matrix-matrix multiplication 2 parallelims levels: distributed memory and GPU Results: Virtual nodes with just CPUs Better performances with virtual nodes GPUs equipped 2 nodes with GPUs perform better than 8 nodes without virtual acceleration.

GVirtuS in the world GPU support to OpenStack cloud software Heterogeneous cloud computing John Paul Walters et Al. University of Southern California / Information Sciences Institute http://wiki.openstack.org/heterogeneousgpuacceleratorsupport HPC at NEC Labs of America in Princeton, University of Missouri Columbia, Ohio State University Columbus Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework Awarded as the best paper at HPDC2011

Download, taste, contribute! http://osl.uniparthenope.it/projects/gvirtus GPL/LGPL License

Conclusions The GVirtuS generic virtualization and sharing system enables thin Linux based virtual machines to use hosted devices as nvidia GPUs. The GVirtuS-CUDA stack permits to accelerate virtual machines with a small impact on overall performance respect to a pure host/gpu setup. GVirtuS can be easily extended to other enabled devices as high performance network devices

Ongoing projects Elastic Virtual Parallel File System MPI wrap for High Performance Cloud Computing XEN Support (is a big challenge!) Download Try & Contribute!