Performance of RDMA-Capable Storage Protocols on Wide-Area Network

Similar documents
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

RDMA over Ethernet - A Preliminary Study

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Deploying 10/40G InfiniBand Applications over the WAN

Building Enterprise-Class Storage Using 40GbE

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

SMB Direct for SQL Server and Private Cloud

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Storage at a Distance; Using RoCE as a WAN Transport

Can High-Performance Interconnects Benefit Memcached and Hadoop?

A Tour of the Linux OpenFabrics Stack

Connecting the Clouds

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Efficiency and Scalability

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX

FTP 100: An Ultra-High Speed Data Transfer Service over Next Generation 100 Gigabit Per Second Network

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Cluster Implementation and Management; Scheduling

Hadoop on the Gordon Data Intensive Cluster

High Speed I/O Server Computing with InfiniBand

Microsoft SMB Running Over RDMA in Windows Server 8

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Advancing Applications Performance With InfiniBand

Mellanox Academy Course Catalog. Empower your organization with a new world of educational possibilities

Performance Evaluation of InfiniBand with PCI Express

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

Block based, file-based, combination. Component based, solution based

Mellanox Academy Online Training (E-learning)

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Building a Scalable Storage with InfiniBand

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Quantum StorNext. Product Brief: Distributed LAN Client

POWER ALL GLOBAL FILE SYSTEM (PGFS)

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Hadoop Optimizations for BigData Analytics

Mellanox Reference Architecture for Red Hat Enterprise Linux OpenStack Platform 4.0

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

WinOF Updates. Gilad Shainer Ishai Rabinovitz Stan Smith Sean Hefty.

ECLIPSE Performance Benchmarks and Profiling. January 2009

Interconnecting Future DoE leadership systems

State of the Art Cloud Infrastructure

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

Mellanox Global Professional Services

Michael Kagan.

Mellanox OpenStack Solution Reference Architecture

Introduction to Cloud Design Four Design Principals For IaaS

Lustre Networking BY PETER J. BRAAM

D1.2 Network Load Balancing

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

White Paper Solarflare High-Performance Computing (HPC) Applications

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

MPI / ClusterTools Update and Plans

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

An Oracle White Paper September Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups

Long-Haul System Family. Highest Levels of RDMA Scalability, Simplified Distance Networks Manageability, Maximum System Productivity

FLOW-3D Performance Benchmark and Profiling. September 2012

Isilon IQ Network Configuration Guide

Configuring VMware vsphere 5.1 with Oracle ZFS Storage Appliance and Oracle Fabric Interconnect

3G Converged-NICs A Platform for Server I/O to Converged Networks

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

LS DYNA Performance Benchmarks and Profiling. January 2009

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

RoCE vs. iwarp Competitive Analysis

OFA Training Programs

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media

Enabling High performance Big Data platform with RDMA

The High Performance Data Center: The Role of Ethernet in Consolidation and Virtualization

Mellanox WinOF for Windows 8 Quick Start Guide

Windows 8 SMB 2.2 File Sharing Performance

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Next-Generation Networking for Science

Over the past few years organizations have been adopting server virtualization

Transcription:

Performance of RDMA-Capable Storage Protocols on Wide-Area Network Weikuan Yu Nageswara S.V. Rao Pete Wyckoff* Jeffrey S. Vetter Ohio Supercomputer Center* Managed by UT-Battelle

InfiniBand Clusters around the World SGI (US) Ranger (US) CEA (France) Tsubame (Japan) Dawning (China) EKA (India) 2 Managed by UT-Battelle

The Problem of Computing Islands Islands of InfiniBand (IB) clusters More IB clusters are deployed Some already connected, e.g. through TeraGrid But only via TCP/IP protocols Data transfer across these islands Need ever-greater data movement capabilities. GridFTP, BBCP or other special storage configuration TCP performance on Long Distance can be low With 10GigE on UltraScience Net (no tuning) 9.2 Gbps at 0.2 mile 8.2 Gbps at 1400 miles 2.3-2.5 Gbps at 6600+ miles 3 Managed by UT-Battelle

RDMA (IB) in Clusters and Local Area Networks Sub-microsecond latency Superb bandwidth (32Gbps with IB QDR) Heavily used for clustering Getting popular in storage environment NFS over RDMA (NFSoRDMA( NFSoRDMA) SCSI RDMA Protocol (SRP) iscsi over RDMA (iser( iser) Applications MPI NFS/iSERI/SRP Verbs InfiniBand HCA Applications MPI NFS/iSERI/SRP Verbs InfiniBand HCA 1µsec 4 Managed by UT-Battelle

Sample Performance of RDMA-based Storage RDMA enables good iscsi bandwidth within LAN Nearly doubled the performance for iscsi 5 Managed by UT-Battelle

Feasibility of RDMA (IB) on WAN Long-range Extensions for InfiniBand available Network Equipment Technologies (NET): NX5010 Obsidian Research: Longbow Long latency (10 4 ~10 5 µsec) High bandwidth yet feasible Good distance scalability and tolerance to interfering traffic Good network throughput and MPI-level Performance Can RDMA provide a good transport protocol for storage on WAN? Applications MPI NFS/iSERI/SRP Verbs InfiniBand HCA Applications MPI NFS/iSERI/SRP Verbs InfiniBand HCA 10 4 ~10 5 µsec 6 Managed by UT-Battelle

Experimental Environment Hardware Long-range IB extension devices from NET (Network Equipment Technologies, Inc) Mellanox PCI-Express 4x DDR HCAs (InfiniHost-III and Connect-X) Software Packages OFED-1.3 from openfabrics.org Linux-2.6.25 with NFSoRDMA and iser support Performance of RDMA-based Storage Protocols on WAN NFS over RDMA iscsi over RDMA 7 Managed by UT-Battelle

UltraScience Net at ORNL Experimental WAN Network Oak Ridge, Atlanta, Chicago, Seattle, and Sunnyvale OC192 backbone connections 4300 miles one way, 8600 miles loop-back 8 Managed by UT-Battelle

RDMA-based Transport Request and request becomes pure control messages, and have to travel long distance on WAN Use of RDMA read (round trip operations) for clients to write data Possible additional control messages for NFSoRDMA for long arguments Further fragmentation due to the use of page based operations 9 Managed by UT-Battelle

RDMA on WAN RDMA has good network level performance within short distance WAN High bandwidth at long distance is only possible for large messages Low RDMA read performance for page based messages (4KB), even at 0.2 mile when using InfiniHost III HCAs HCAs 10 Managed by UT-Battelle

NFS over RDMA NFS over RDMA achieves good bandwidth within short distance But significant optimizations are needed for long distance 11 Managed by UT-Battelle

NFS - Large block size NFS over IPoIB CM benefits from large block size NFS over RDMA needs to support large block size for better fit on long distance WAN 12 Managed by UT-Battelle

NFS over RDMA - using Connect-X Better RDMA read in connect X improves the performance the performance of file write for NFS over RDMA Performance at long distance is yet to determine 13 Managed by UT-Battelle

iscsi over RDMA (iser( iser) RDMA enables high performance iscsi within short distance RDMA has good promise over long distance as shown with large messages 14 Managed by UT-Battelle

Perspectives Long-range InfiniBand InfiniBand over SONET is promising Storage protocols are not yet exploiting the bandwidth potential of RDMA at long distance RDMA-based Storage on WAN Need to enable large block sizes Need to avoid page-based RDMA operations in NFS Utilize IB FRMR support to avoid small RDMA operations Need to allow more concurrent RDMA read operations 15 Managed by UT-Battelle

Acknowledgment Network Equipment Technologies, Inc Andrew DiSilvestre Rich Erikson Brad Chalker ORNL Susan Hicks Philip Roth Mellanox 16 Managed by UT-Battelle