Multi-Path RDMA. Elad Raz Mellanox Technologies



Similar documents
OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

RDMA Aware Networks Programming User Manual

Cloud Computing and the Internet. Conferenza GARR 2010

RoCE vs. iwarp Competitive Analysis

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

OFA Training Programs

An Overview of Multipath TCP

Advanced Computer Networks. High Performance Networking I

Multipath TCP in Practice (Work in Progress) Mark Handley Damon Wischik Costin Raiciu Alan Ford

A Tour of the Linux OpenFabrics Stack

InfiniBand Technology Overview. Dror Goldenberg, Mellanox Technologies

New Data Center architecture

Microsoft SMB Running Over RDMA in Windows Server 8

High Performance VMM-Bypass I/O in Virtual Machines

Introduction to the InfiniBand Core Software

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

PowerLink Bandwidth Aggregation Redundant WAN Link and VPN Fail-Over Solutions

Network Virtualization Technologies and their Effect on Performance

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Boosting mobility performance with Multi-Path TCP

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Security in Mellanox Technologies InfiniBand Fabrics Technical Overview

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Telecom - The technology behind

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

MMPTCP: A Novel Transport Protocol for Data Centre Networks

A way towards Lower Latency and Jitter

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Nested Virtualization

Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER

Software Datapath Acceleration for Stateless Packet Processing

Computer Networks. Introduc)on to Naming, Addressing, and Rou)ng. Week 09. College of Information Science and Engineering Ritsumeikan University

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Analysis of Network Segmentation Techniques in Cloud Data Centers

Ad Hoc Distributed Servers. Michal Szymaniak Guillaume Pierre Mariana Simons Nikolova Maarten van Steen

Microsoft RDMA Update

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Connecting your Virtual Machine to the Internet. BT Cloud Compute. The power to build your own cloud solutions to serve your specific business needs

Open Source Bandwidth Management: Introduction to Linux Traffic Control

Firewalls P+S Linux Router & Firewall 2013

SMB Direct for SQL Server and Private Cloud

Network performance in virtual infrastructures

PCI Express IO Virtualization Overview

Advanced Computer Networks. Network I/O Virtualization

Linux KVM Virtual Traffic Monitoring

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

Building Enterprise-Class Storage Using 40GbE

QoS & Traffic Management

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

M U L T I P A T H T C P, P W N I N G T O D A Y S N E T W O R K S W I T H

CERN Cloud Infrastructure. Cloud Networking

Computer Networks COSC 6377

Proactively Secure Your Cloud Computing Platform

Comparison of 40G RDMA and Traditional Ethernet Technologies

VMware. NSX Network Virtualization Design Guide

Overlay networking with OpenStack Neutron in Public Cloud environment. Trex Workshop 2015

Windows Server 2008 R2 Hyper V. Public FAQ

Controlling Ashly Products From a Remote PC Location

Cloud Networking: Framework and VPN Applicability. draft-bitar-datacenter-vpn-applicability-01.txt

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

w h i t e p a p e r The Sytel Path to Non-Stop Productivity

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Network Address Translation (NAT)

TECHNICAL CHALLENGES OF VoIP BYPASS

AFDX Emulator for an ARINC-based Training Platform. Jesús Fernández Héctor Pérez J. Javier Gutiérrez Michael González Harbour

Evaluation and Characterization of NFV Infrastructure Solutions on HP Server Platforms

Getting to know OpenFlow. Nick Rutherford Mariano Vallés

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks

The Coming Decade of Data Center Networking Discontinuities

InfiniBand in the Enterprise Data Center

Alexander Paul IBM Certified Advanced Technical Expert (C.A.T.E.) for Power Systems Certified Cisco Systems Instructor CCSI #32044

SCLP: Segment-oriented Connection-less Protocol for High-Performance Software Tunneling in Datacenter Networks

Advancing Applications Performance With InfiniBand

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

ASA/PIX: Load balancing between two ISP - options

Software-Defined Networking for the Data Center. Dr. Peer Hasselmeyer NEC Laboratories Europe

Next-Gen Securitized Network Virtualization

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

The IP Transmission Process. V1.4: Geoff Bennett

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

Definition of firewall

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Qualifying SDN/OpenFlow Enabled Networks

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Virtual Ethernet Bridging

Networking Topology For Your System

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches

Virtual Networking with z/vm Guest LAN and Virtual Switch

Why Software Defined Networking (SDN)? Boyan Sotirov

Application Description

Transcription:

Multi-Path Elad Raz Mellanox Technologies

Agenda Motivation Introducing Multi-Path Design Status and initial results Next steps Conclusions March 15 18, 2015 #OFADevWorkshop 2

MP- Motivation (1) Failovers and High Availability Support Bandwidth Aggregation L3 datacenter support TOR A TOR B X 10.2.0.0 /16 20.1.0.0 /16 Dev A Dev B Dev A Dev B Host A Host B March 15 18, 2015 #OFADevWorkshop 3

MP- Motivation (2) Transparent migration VM MP- SoftRoCE IB VF driver Eth paravirt Hypervisor Hypervisor IB HCA NIC NIC March 15 18, 2015 #OFADevWorkshop 4

What is Multi-Path? Application HAL and services MP- Verbs Application HAL and services Verbs MP- ib_dev1 roce_dev2 ib_dev1 roce_dev2 Session Level Connection HW 1 HW 2 Underlying Verbs Session Level Connection HW 1 HW 2 Router A Router B March 15 18, 2015 #OFADevWorkshop 5

MP- Operation App rdmacm mp-dev phys-dev1 phys-dev2 Connect Open (virt) resources CM protocol (primary flow) + MP-capability exchange Open (phys) resources Data path (virt) operations Get MP info Data path (phys) operations Traffic Connect Open (phys) resources CM protocol (sub-flow) Data path (virt) operations Data path (phys) operations Traffic March 15 18, 2015 #OFADevWorkshop 6

Connection Migration Virtual consumer Physical producer Send Queue virtual producer Virtual consumer Receive Queue Physical producer virtual producer March 15 18, 2015 #OFADevWorkshop 7

MP- Comparison Automatic Path Migration RoCE-LAG MP- Multi-Port failover Bandwidth aggregation Application agnostic L3 session Multi-device failover Migration support March 15 18, 2015 #OFADevWorkshop 8

MP- and MP-TCP MP-TCP MP- MP Messages TCP options CMA MADs (private data) Address management Add/remove addresses Add/remove CMA address Flow management Add/remove TCP sub-flows Establish/migrate/teardown QPs Communication endpoint TCP socket MP device Data sequencing Sub-flow address combinations Byte-stream divided between sub-flows Flow + session based sequencing Any IP interface to any peer IP interface, subject to middleboxes (e.g., firewalls, NAT) QP and actual HW WQEs (performance) Any addressing to any peer addressing, subject to the same Technology (IB, RoCE, iwarp) March 15 18, 2015 #OFADevWorkshop 9

MP- Design User/kernel mp-rdma driver Device instance hosting MPcapable resources Implements resource virtualization and connection failover Uses underlying physical devices transparently CM MP mgr ib_dev1 CMA MP mgr Application Verbs mp_dev Kernel ULP ib_dev2 CM/CMA support MP capability negotiation cma_dev1 cma_dev2 cm_dev2 cm_dev2 ib_core mp_dev ib_dev1 ib_dev2 HW1 HW2 March 15 18, 2015 #OFADevWorkshop 10

Policies Active backup Dev 0 QP1 QP2 Dev 1 QP1 QP2 QP3 QP3 Load Balancing Dev 0 QP40 QP1 Dev 1 QP8 QP93 Efficiency QP82 SRQ13 QP61 Dev 0 MR16 QP4 CQ42 QP1 QP2 QP74 March 15 18, 2015 #OFADevWorkshop 11

Virtual Namespaces Application 1 MP- (1) vpd1 vmr1 vcq1 vqp1 vqp2 vqp3 vcq2 vsrq1 vqp4 vcq3 QP82 SRQ13 QP61 Dev 0 MR16 CQ42 MR13 QP33 QP98 Dev 1 CQ22 CQ23 March 15 18, 2015 #OFADevWorkshop 12

Resource Creation Virtual resources vqp1 Policy selection Dev 0 : port 1 PD12 Resource creation QP42 Post-send (MR) MR 2 CQ2 March 15 18, 2015 #OFADevWorkshop 13

Data Path Translate MRs: ibv_post_send ibv_post_recv Application 1 MP- (1) Translate QPs: ibv_poll_cq vqp1 vcq1 vpd1 vwq (tx) vwq (rx) Monitor WQs: PSN Completed QP1 CQ1 Dev 0 March 15 18, 2015 #OFADevWorkshop 14

Status and Initial Results User-space driver progressing nicely Resource management Connection management Failover Data path for RC send-receive operations Encouraging initial results Latency vs. Message size (usec) Bandwidth vs. Message Size 9 8 7 Lower is better 50000 45000 40000 Higher is better 6 35000 5 30000 4 25000 3 20000 2 15000 1 10000 0 5000 0 0 10000 20000 30000 40000 50000 60000 70000 Native MP- Native card MP- March 15 18, 2015 #OFADevWorkshop 15

Next Steps Kernel MP- driver and connection model Dynamic device removal notifications and Atomic support Datagram and Multicast support Consider future standardization IBTA CM extensions RFC Open-source the code March 15 18, 2015 #OFADevWorkshop 16

Conclusions MP- solves multiple requirements Multi-devices failovers Transparent BW aggregation Transparent migration Multi-homed hosts Modeled over MP-TCP Promising initial results March 15 18, 2015 #OFADevWorkshop 17

Thank You #OFADevWorkshop