SR-IOV In High Performance Computing

Similar documents
Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Can High-Performance Interconnects Benefit Memcached and Hadoop?

RDMA over Ethernet - A Preliminary Study

Building a Top500-class Supercomputing Cluster at LNS-BUAP

SR-IOV: Performance Benefits for Virtualized Interconnects!

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Enabling Technologies for Distributed Computing

GPU Accelerated Signal Processing in OpenStack. John Paul Walters. Computer Scien5st, USC Informa5on Sciences Ins5tute

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Enabling Technologies for Distributed and Cloud Computing

High Performance Science Cloud! Meeting the Big Data Challenges of Climate Science!

Masters Project Proposal

Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware

New Data Center architecture

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Cloud Computing through Virtualization and HPC technologies

Virtualization for Cloud Computing

State of the Art Cloud Infrastructure

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Headline in Arial Bold 30pt. The Need For Speed. Rick Reid Principal Engineer SGI

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

benchmarking Amazon EC2 for high-performance scientific computing

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

FLOW-3D Performance Benchmark and Profiling. September 2012

Oracle Database Scalability in VMware ESX VMware ESX 3.5

HPC Update: Engagement Model

A Holistic Model of the Energy-Efficiency of Hypervisors

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Determining Overhead, Variance & Isola>on Metrics in Virtualiza>on for IaaS Cloud

White Paper Solarflare High-Performance Computing (HPC) Applications

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

2) Xen Hypervisor 3) UEC

White Paper. Recording Server Virtualization

D1.2 Network Load Balancing

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Understanding the Performance and Potential of Cloud Computing for Scientific Applications

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Michael Kagan.

Virtualization and Performance NSRC

PARALLELS CLOUD SERVER

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Nebula Cloud Computing Project: Background, Technology, Operations, Challenges, and Status

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Intro to Virtualization

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

I/O Virtualization Bottlenecks in Cloud Computing Today

Eucalyptus: An Open-source Infrastructure for Cloud Computing. Rich Wolski Eucalyptus Systems Inc.

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Scientific Computing Data Management Visions

NASA's Strategy and Activities in Server Side Analytics

Parallel Large-Scale Visualization

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

KVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS. Mark Wagner Principal SW Engineer, Red Hat August 14, 2011

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

IOS110. Virtualization 5/27/2014 1

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Virtualization. Types of Interfaces

Optimize Server Virtualization with QLogic s 10GbE Secure SR-IOV

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Networked I/O for Virtual Machines

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

SURFsara HPC Cloud Workshop

PCI Express IO Virtualization Overview

Lustre SMB Gateway. Integrating Lustre with Windows

Kriterien für ein PetaFlop System

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Installing & Using KVM with Virtual Machine Manager COSC 495

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Virtual Switching Without a Hypervisor for a More Secure Cloud

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Brocade Solution for EMC VSPEX Server Virtualization

PRIMERGY server-based High Performance Computing solutions

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK

Toward Exitless and Efficient Paravirtual I/O

SMB Direct for SQL Server and Private Cloud

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

ABoVE Science Cloud: An orientation for the ABoVE Science Team

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

Datacenter Operating Systems

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

DPDK Summit 2014 DPDK in a Virtual World

University of Birmingham & CLIMB GPFS User Experience. Simon Thompson Research Support, IT Services University of Birmingham

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Transcription:

SR-IOV In High Performance Computing Hoot Thompson & Dan Duffy NASA Goddard Space Flight Center Greenbelt, MD 20771 hoot@ptpnow.com daniel.q.duffy@nasa.gov www.nccs.nasa.gov

Focus on the research side of climate study (versus NOAA s operational position) Simulations span multiple time scales Days for weather prediction Seasons to years for short term climate prediction Centuries for climate change projection Examples: High fidelity 3.5 KM global simulations of cloud and hurricane predictions Comprehensive reanalysis of the last thirty years of weather/climate MERRA Multi-millennium analysis for the Intergovernmental Panel on Climate Change Integrated set of supercomputing, visualization and data management technologies Discover computational cluster 30K traditional Intel cores plus 64 GPUs, roughly 400 TFlops DDR/QDR Infiniband (IB) backbone 1 GbE and 10 GbE management infrastructure ~4 PBytes RAID based shared parallel file system (GPFS) Tape archive of over 20 PBytes 2

Discover IB/GPFS Architecture Metadata File Systems: IBM/Engenio DS4700 Brocade 48000 Data File Systems: Data Direct Networks S2A9500 S2A9550 S2A9900 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) 24 DDR IB uplinks to each unit 24 DDR IB uplinks to each unit 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) B 1 2 3 4 5a 5b 6a 6b Base Unit: 512 Dempsey (3.2 GHz) SCU1: 1,024 Woodcrest (2.66 GHz) SCU2: 1,024 Woodcrest (2.66 GHz) SCU3: 3,096 Westmere (2.8 GHz) SCU4: 3,096 Westmere (2.8 GHz) Data Analysis Each circle represents a 288-port DDR IB Switch The triangle represents a 2- to-1 QDR IB Switch fabric SCU5: 4,096 Nehalem (2.8 GHz) SCU6: 4,096 Nehalem (2.8 GHz) SCU7: 14,400 Westmere (2.8 GHz) 7a- 7e 3

Nebula NASA s Cloud Open-source (OpenStack) cloud computing project and service Alternative to costly construction of additional data centers Sharing portal for NASA scientists and researchers Large, complex data sets External partners and the public. Nebula comprised of two components/containers Nebula west at NASA AMES Nebula east at NASA GSFC NCCS team evaluating Nebula as adjunct to Discover hosted science processing Key question can clouds match HPC level of capability needed for climate research Potential obstacle clouds primarily exist in virtualized space Overhead or loss due to virtual machine (VM) versus bare metal Node-to-node communication critical high speed, low latency, RDMA http://nebula.nasa.gov/ 4

Background And Proposition Background Discover s performance tied to it s DDR/QDR IB fabric Nebula, clouds in general, 10 GE based Question can clouds deliver HPC level of performance? Can 10GE compete with high speed, low latency IB? What network performance is lost due to virtualization? What computational performance is lost due to virtualization? Proposition typical NCCS model Build test bed to investigate the virtualization technologies Work with vendors to answer questions and address issues 5

Methodology and Objectives Compare bare metal against virtualized NIC Full software virtualization (SW Virt) device emulation Virtio split driver, para-virtualization Single Root IO Virtualization (SR-IOV) Direct assignment Mapped Virtual Function (VF) Determine overhead of executing within VM construct VM to VM communication Base Network Message passing environment (mvapich2) Application Single node, multi-core Multi-node, multi-core Draw conclusions and comparisons with Discover and Nebula http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html 6

Benchmarks Benchmark Version Description Download Nuttcp nuttcp-7.1.5.c gcc compiler Measure raw network bandwidth, similar to netperf: http://lcp.nrl.navy.mil/nuttcp OSU MPI Benchmarks MVAPICH2 1.7rc1 Intel compiler Linpack 10.2.6 Intel compiler NAS PB 3.3.1 Intel compiler Test latencies and bandwidths of most common MPI functions. Intel version of Linpack NASA Parallel Benchmarks; CFD kernel benchmarks http://mvapich.cse.ohio-state.edu/ http://software.intel.com/enus/articles/intel-math-kernel-librarylinpack-download/ http://www.nas.nasa.gov/resources/soft ware/npb.html Started from the basic benchmarks to analyze system performance and build up towards the application layer 7

System Configurations Configuration Bare1 Bare2 VM1 VM2 Processor Type Intel Nehalem Intel Nehalem Intel Nehalem Intel Nehalem Processor Number E5520 E5520 E5520 E5520 Processor Speed 2.27 GHz 2.27 GHz 2.27 GHz 2.27 GHz Cores per Socket 4 4 4 4 Number of Sockets 2 2 2 2 Cores per Node 8 8 8 8 Theoretical Peak 72.64 GF 72.64 GF 72.64 GF 72.64 GF Main Memory 48 GB 48 GB 16 GB 16 GB Operating System Ubuntu 11.04 Ubuntu 11.04 Ubuntu 11.04 Ubuntu 11.04 Kernel 2.6.38-10.server 2.6.38-10.serve 2.6.38-10.server Hypervisor KVM KVM N/A N/A Hyperthreading Off Off Off Off 2.6.38-10.server 8

Test Configuration R&D Network Q) '\ Q) <J) ::::; ::::; <Ii Cll > > CD CD -.:t 10.10101 10.10.10.2 -.:t -.:t -.:t 0 0 Intel 1000 1 10002 0 Intel 0 ~ ~ ~ ~ 82599EB 82599EB ~ ~ 10. 10.20.1 10.10.20.2 ~ :::J :::J :::J ~ 10.0. 10.1 :::J ~ 10.0.10.2 ~ ~ C C C C :::J :::J :::J :::J.0.0.0.0 ::J ::J ::J ::J " / ro ~ Dell R710 Dell R710 ---> E5520 2.27GHz (X2) -;. E5520 2.27GHz (X2) -;. 48GB -;. 48GB 9

Nuttcp Results Bare to Bare VM to VM Sw Virt VM to VM Virtio VM to VM SR-IOV 4418.8401 Mbps 0 retrans 8028.6459 Mbps 0 retrans 9392.7072 Mbps 0 retrans 9415.2675 Mbps 0 retrans 9341.4362 Mbps 733 retrans 9354.0999 Mbps 208 retrans 9414.7318 Mbps 0 retrans 9414.8207 Mbps 0 retrans 9414.9368 Mbps 0 retrans 9415.1618 Mbps 0 retrans 137.3301 Mbps 0 retrans 145.6024 Mbps 0 retrans 145.7500 Mbps 0 retrans 138.5963 Mbps 0 retrans 141.8702 Mbps 0 retrans 146.1092 Mbps 0 retrans 146.3042 Mbps 0 retrans 146.4449 Mbps 0 retrans 146.2758 Mbps 0 retrans 146.1043 Mbps 0 retrans 5864.0557 Mbps 212 retrans 5678.0625 Mbps 0 retrans 5973.2256 Mbps 0 retrans 6309.8478 Mbps 0 retrans 6223.4034 Mbps 7 retrans 6311.3896 Mbps 0 retrans 6316.7924 Mbps 0 retrans 5955.8176 Mbps 0 retrans 5746.2926 Mbps 0 retrans 5692.8146 Mbps 0 retrans 9151.5769 Mbps 0 retrans 9408.0323 Mbps 0 retrans 8714.4063 Mbps 34 retrans 9313.8894 Mbps 7 retrans 9251.8453 Mbps 0 retrans 9193.1103 Mbps 0 retrans 9348.2984 Mbps 0 retrans 9101.7356 Mbps 73 retrans 8958.5032 Mbps 16 retrans 9228.5370 Mbps 0 retrans 10

OSU Benchmarks Results Bandwidth 1000 900 Throughput (MBytes/sec) 800 700 600 500 400 300 200 Bare to Bare VM to VM SRIOV VM to VM Virtio Better 100 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 Message Size (Bytes) 11

OSU Benchmarks Results Latency 12000 10000 Bare to Bare VM to VM SRIOV Latency (microseconds) 8000 6000 4000 VM to VM Virtio Better 2000 0 0 1000 2000 3000 4000 5000 Message Size (MBytes) 12

OSU Benchmarks Results Latency (Small) 200 Latency (microseconds) 180 160 140 120 100 80 60 Better 40 20 0 Bare to Bare VM to VM SRIOV VM to VM Virtio 0 2,000 4,000 6,000 8,000 10,000 Message Size (Bytes) 13

Linpack Benchmarks Results 90.00% 80.00% % of Peak Performance 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Ubuntu1& Ubuntu2 - Bare Metal VM to VM SRIOV VM to VM Virtio 0 10000 20000 30000 40000 50000 60000 Problem Size (N) Better 14

Going Forward Conclusions to-date Clear advantages to SR-IOV technology Cloud based HPC feasible Data requires further analysis to understand Nebula implications Issues/concerns TCP Slow start, variability and retran impact on HPC processing Additional testing to close the gap More application testing NAS Parallel and HPCC benchmarks Jumbo frames (9000 MTU) Bare metal-to-bare metal and VM-to-VM IB Different hypervisor XEN Other VM guest types RedHat, SUSE Multiple VMs running, bandwidth sharing Add cloud infrastructure to test setup Openstack, Eucalyptus 15

Intel 2/13/2012 Developer Forum 16