Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team



Similar documents
Performance Analysis and Tuning in Windows HPC Server Xavier Pillons Program Manager Microsoft Corp.

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

Parallel application development

SMB Direct for SQL Server and Private Cloud

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Advanced Computer Networks. High Performance Networking I

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Technical Overview of Windows HPC Server 2008

LS DYNA Performance Benchmarks and Profiling. January 2009

Microsoft SMB Running Over RDMA in Windows Server 8

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Cloud Computing through Virtualization and HPC technologies

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

High Speed I/O Server Computing with InfiniBand

A Tour of the Linux OpenFabrics Stack

White Paper Solarflare High-Performance Computing (HPC) Applications

Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3)

PRIMERGY server-based High Performance Computing solutions

Interoperability Testing and iwarp Performance. Whitepaper

Building Enterprise-Class Storage Using 40GbE

Boosting Data Transfer with TCP Offload Engine Technology

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Using the Windows Cluster

Michael Kagan.

RoCE vs. iwarp Competitive Analysis

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Enabling High performance Big Data platform with RDMA

Lustre Networking BY PETER J. BRAAM

SR-IOV In High Performance Computing

Mellanox Academy Online Training (E-learning)

RDMA over Ethernet - A Preliminary Study

Improved LS-DYNA Performance on Sun Servers

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Achieving Performance Isolation with Lightweight Co-Kernels

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

FLOW-3D Performance Benchmark and Profiling. September 2012

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

New Data Center architecture

Configure Windows 2012/Windows 2012 R2 with SMB Direct using Emulex OneConnect OCe14000 Series Adapters

PCI Express High Speed Networks. Complete Solution for High Speed Networking

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

REFERENCE. Microsoft in HPC. Tejas Karmarkar, Solution Sales Professional, Microsoft

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

InfiniBand in the Enterprise Data Center

Clusters: Mainstream Technology for CAE

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

SR-IOV: Performance Benefits for Virtualized Interconnects!

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

State of the Art Cloud Infrastructure

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Windows 8 SMB 2.2 File Sharing Performance

WinOF Updates. Gilad Shainer Ishai Rabinovitz Stan Smith Sean Hefty.

AlphaTrust PRONTO - Hardware Requirements

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

How To Test Nvm Express On A Microsoft I7-3770S (I7) And I7 (I5) Ios 2 (I3) (I2) (Sas) (X86) (Amd)

High Performance Computing in CST STUDIO SUITE

ECLIPSE Performance Benchmarks and Profiling. January 2009

10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications

Microsoft Exchange Server 2003 Deployment Considerations

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

benchmarking Amazon EC2 for high-performance scientific computing

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption

Data Center and Cloud Computing Market Landscape and Challenges

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

Centrata IT Management Suite 3.0

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

New Storage System Solutions

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Transcription:

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team

Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft Forum in HPC China

Microsoft s Commitments to HPC 2004 年 Beginning of HPC Journey 2006 V1 2008 HPC Server 2008 2010 HPC Server 2008 R2 2010-10 SP1 和 SP2 Microsoft Confidential 3

Wall time (Lower is better) Performance and Scale Windows Matches Linux Performance on LSTC LS-DYNA 140000 120000 100000 80000 60000 40000 20000 0 8 16 32 64 128 256 512 Number of CPU Cores Windows Linux Reference: Dataset is car2car, public benchmark. LSTC LS-DYNA data to be posted at http://www.topcrunch.org. LS-DYNA version mpp971.r3.2.1. Similar hardware configuration was used for both Windows HPC Server 2008 and Linux runs: Windows HPC: 2.66GHz Intel Xeon Processor X5550, 24GB DDR3 memory per node, QDR InfiniBand Linux: 2.8GHz Intel Xeon Processor X5560, 18GB DDR3 memory per node, QDR InfiniBand Courtesy of LSTC 4

Windows HPC capture #3 and #5 spots on Green500 Little Green500 strives to raise awareness in the energy efficiency of supercomputing Smaller TSUBAME 2.0 running Windows improves the efficiency from 958.35 Mflops/Watt on the Green500 to 1,031.92 on Little Green500 CASPUR running Windows become the 1st European system on Little Green500 Little Green500 List - November 2010* Rank System Description Vendor OS MFLOPS/ Watt Total (kw) 1 NNSA/SC Blue Gene/Q Prototype IBM LINUX 1684.20 39 2 GRAPE-DR accelerator Cluster NAOJ LINUX 1448.03 25 3 TSUBAME 2.0 - HP ProLiant GP/GPU NEC/HP Windows 1031.92 26 4 EcoG NCSA LINUX 1031.92 37 5 CASPUR-Jazz Cluster GP/GPU Clustervision/HP Windows 933.06 26 * Source: http://www.green500.org/lists/2010/11/little/list.php

Parallel Applications patterns on HPC Server Embarrassingly Parallel Parametric Sweep Jobs CLI**: job submit /parametric:1-1000:5 MyApp.exe /MyDataset=* Search: hpc job scheduler @ http://www.microsoft.com/downloads Message Passing MPI Jobs CLI**: job submit /numcores:64 mpiexec MyApp.exe Search: hpc mpi Interactive Applications Service-Oriented Architecture (SOA) Jobs.NET call to a Windows Communications Foundation (WCF) endpoint Search: hpc soa Big Data LINQ-To-HPC (L2H) Jobs HPC application calls L2H.Net APIs Search: hpc dryad ** CLI = HPC Server 2008 Command Line Interface

Kernel By-Pass NetworkDirect Verbs-based design for native, high-perf networking interfaces Socket-Based App Windows Sockets (Winsock + WSD) MPI App MS-MPI RDMA Equal to Hardware- Optimized stacks for MPI NetworkDirect drivers for key high-performance fabrics: Infiniband 10 Gigabit Ethernet (iwarp-enabled) Embarrassingly Parallel TCP/Ethernet TCP IP NDIS Mini-port Hardware Hardware Driver (ISV) App Hardware Driver NetworkDirect Provider Hardware Hardware Hardware User Mode Access Layer HPCS2008 Component OS Component IHV Component User Mode Kernel Mode

Scaling the network traffic What is left to solve? CPU Utilization and throughput concerns Large I/Os CPU utilization at high data rates (throughput is great) Small I/Os CPU utilization and network throughput at high data rates Solution: Remote Direct Memory Access (RDMA) RDMA is Remote Direct Memory Access a secure way to enable a DMA engine to transfer buffers

Kernel By-Pass Data Path Windows 8 RDMA Architecture Managed App Socket-Based App High-perf App MPI App High-perf RDMA App System.Net Sockets MS-MPI Windows Sockets (Winsock + WSD) Registered IO (RIO) Winsock Extension APIs Light-weight Registered IO SMB Direct Deprecated WinSock Hardware Hardware Direct Provider User Mode Access Layer NetworkDirect Provider User Mode Kernel Mode AFD.sys TCPIP.sys K-mode RDMA Access Layer Data Path NDIS.sys Mini-port Hardware Hardware Driver RDMA NDIS Extensions K-mode proxy driver supporting NDKPI at the top edge Hardware Driver Legend (ISV) App HPCS2008 Component OS Component IHV Component today New IHV Component s New OS Component s K-mode RDMA Component s New Highperf app Microsoft Confidential

Used by File Server and Clustered Shared Volumes SMB 2.2 Client SMB 2.2 Server Scalable, fast and efficient storage access Application Minimal CPU utilization for I/O User High throughput with low latency Required hardware SMB 2.2 Client Kernel SMB 2.2 Server InfiniBand 10G Ethernet w/ RDMA iwarp RDMA on top of TCP Network w/ RDMA support Network w/ RDMA support NTFS SCSI ROCE (RDMA Over Ethernet) R-NIC R-NIC Dis k

SMB Direct 2.2 over RDMA Bandwidth 3100 MBytes/sec CPU Bandwidth CPU

Welcome to Microsoft Forum Oct 28 th PM Time Topic Presenter Title 2-2:45 pm HPC Excel 服 务 :Excel 工 作 簿 高 性 能 计 算 实 战 秘 笈 王 辉 测 试 经 理 3-3:45 pm 微 软 云 计 算 平 台 Windows Azure 及 HPC 应 用 迁 移 4-4:45 pm 基 于 LINQ to HPC 的 大 数 据 (big data) 解 决 方 案 5-5:45 pm 基 于 Windows HPC Server Cluster 架 构 的 汽 车 虚 拟 设 计 HPC 平 台 敖 翔 苏 骏 张 鲲 鹏 博 士 项 目 经 理 资 深 开 发 经 理 上 海 汽 车 集 团 研 发 经 理