OFA Training Programs



Similar documents
OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

A Tour of the Linux OpenFabrics Stack

RDMA Aware Networks Programming User Manual

High Speed I/O Server Computing with InfiniBand

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

RoCE vs. iwarp Competitive Analysis

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Security in Mellanox Technologies InfiniBand Fabrics Technical Overview

Mellanox Academy Course Catalog. Empower your organization with a new world of educational possibilities

Can High-Performance Interconnects Benefit Memcached and Hadoop?

RDMA over Ethernet - A Preliminary Study

WinOF Updates. Gilad Shainer Ishai Rabinovitz Stan Smith Sean Hefty.

Storage at a Distance; Using RoCE as a WAN Transport

Introduction to the InfiniBand Core Software

Building Enterprise-Class Storage Using 40GbE

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

Mellanox Academy Online Training (E-learning)

Lustre Networking BY PETER J. BRAAM

Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Multi-Path RDMA. Elad Raz Mellanox Technologies

Storage Networking Foundations Certification Workshop

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Microsoft SMB Running Over RDMA in Windows Server 8

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09

State of the Art Cloud Infrastructure

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Efficiency and Scalability

Oracle Big Data Appliance: Datacenter Network Integration

Advancing Applications Performance With InfiniBand

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

SSVP SIP School VoIP Professional Certification

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Advanced Computer Networks. High Performance Networking I

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Michael Kagan.

Know your Cluster Bottlenecks and Maximize Performance

Pre-Conference Seminar E: Flash Storage Networking

SMB Direct for SQL Server and Private Cloud

Using High Availability Technologies Lesson 12

OFED+ Host Software Release 1.5.4

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

InfiniBand in the Enterprise Data Center

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Cloud Computing and the Internet. Conferenza GARR 2010

New Data Center architecture

Data Center Fabric Convergence for Cloud Computing (the Debate of Ethernet vs. Fibre Channel is Over)

Converging Data Center Applications onto a Single 10Gb/s Ethernet Network

Delivering Application Performance with Oracle s InfiniBand Technology

IP SAN Best Practices

Isilon IQ Network Configuration Guide

Building a Scalable Storage with InfiniBand

Interoperability Testing and iwarp Performance. Whitepaper

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Cisco SFS 7000P InfiniBand Server Switch

A Packet Forwarding Method for the ISCSI Virtualization Switch

InfiniBand Technology Overview. Dror Goldenberg, Mellanox Technologies

Data Centers and Cloud Computing

Deploying 10/40G InfiniBand Applications over the WAN

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Using Multipathing Technology to Achieve a High Availability Solution

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

EView/400i Management Pack for Systems Center Operations Manager (SCOM)

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

SAN Conceptual and Design Basics

How To Test Nvm Express On A Microsoft I7-3770S (I7) And I7 (I5) Ios 2 (I3) (I2) (Sas) (X86) (Amd)

Network Attached Storage. Jinfeng Yang Oct/19/2015

IPOP-TinCan: User-defined IP-over-P2P Virtual Private Networks

SSVVP SIP School VVoIP Professional Certification

Cisco Datacenter 3.0. Datacenter Trends. David Gonzalez Consulting Systems Engineer Cisco

Networking 4 Voice and Video over IP (VVoIP)

High Performance VMM-Bypass I/O in Virtual Machines

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

ADVANCED NETWORK CONFIGURATION GUIDE

QoS & Traffic Management

Mellanox OFED for Linux User Manual

IMPLEMENTATION AND COMPARISON OF ISCSI OVER RDMA ETHAN BURNS

MPI / ClusterTools Update and Plans

Telecom - The technology behind

Transcription:

OFA Training Programs Programming and Fabric Admin Author: Rupert Dance Date: 03/26/2012 www.openfabrics.org 1

Agenda OFA Training Programs Writing Applications for RDMA using OFA Software Program Goals and Instructors UNH-IOL facilities & cluster equipment Programming course format Course requirements & syllabus RDMA Benefits Programming course examples Fabric Administration Future courses Course availability www.openfabrics.org 2

OFA Training Program - Overall Goals Provide application developers with classroom instruction and hands on experience writing, compiling and executing an application using OFED verbs API Illustrate how RDMA programming is different from sockets programming and provide the rationale for using RDMA. Focus on the OFED API, RDMA concepts and common design patterns Opportunity to develop applications on the OFA cluster at the University of New Hampshire includes the latest hardware from Chelsio, DDN, Intel, Mellanox, NetApp & QLogic www.openfabrics.org 3

Instructors Dr. Robert D. Russell: Professor in the CS Department at UNH Dr. Russell has been an esteemed member of the University of New Hampshire faculty for more than 30 years and has worked with the InterOperability Laboratory's iscsi consortium, iwarp consortium and the OpenFabrics Interoperability Logo Program. Paul Grun: Chief Scientist for System Fabric Works Paul has worked for more than 30 years on server I/O architecture and design, ranging from large scale disk storage subsystems to high performance networks. He served as chair of the IBTA's Technical Working Group, contributed to many IBTA specifications and chaired the working group responsible for creating the RoCE specification. Rupert Dance: Co-Chair of the OFA Interoperability Working Group Rupert helped to form and has led both the IBTA Compliance and Interoperability and OFA Interoperability programs since their inception. His company, Software Forge, worked with the OFA to create and provide the OFA Training Program. www.openfabrics.org 4

UNH Interoperability Lab Resources 32,000 Square Feet 100+ staff & students IB, iwarp and Windows Clusters www.openfabrics.org 5

Programming Course Format Part One - Introduction to RDMA I/O Architecture and RDMA Architecture Address translation and network operations Verbs Introduction and the OFED Stack Introduction to wire protocols 138 Slides Part Two - Programming with RDMA Hardware resources: HCAs, RNICs, etc Protection Domains and Memory Registration keys Connection Management Explicit Queue Manipulation & Event Handling Explicit Asynchronous Operation Explicit Manipulation of System Data Structures 425 Slides and 20 complete client/server program examples www.openfabrics.org 6

Programming Course Requirements Requirements Knowledge of C programming including concepts such as structures, memory management, pointers, threads and asynchronous programming Knowledge of Linux since this course does not include Windows programming Helpful Knowledge of Event Handlers Knowledge of sockets or network programming Familiarity with storage protocols www.openfabrics.org 7

Programming Course Syllabus Introduction to OFA architecture Verbs and the verbs API A Network perspective RDMA Operations SEND/RECEIVE, RDMA READ & WRITE RDMA Services Isolation and Protection Mechanisms A brief overview of InfiniBand Management A quick introduction to the OFED stack Host perspective Asynchronous processing Channel vs. RDMA semantics Basic Data Structures Connection Manager IDs Connection Manager Events Queue Pairs Completion Queues Completion Channels Protection Domains Memory Registration Keys Work Requests Work Completions Connection management basics Establishing connections using RDMACM RDMACM API Basic RDMA programming Memory registration Object creation Posting requests Polling Waiting for completions using events Common practices for implementing blocking wait Design patterns Send-receive RDMA cyclic buffers Rendezvous Advanced topics Work Request chaining Multicast Unsignaled Completions RDMA ecosystems Native InfiniBand iwarp RoCE www.openfabrics.org 8

Programming Course Sample OFA Software Benefits Conventional I/O versus RDMA I/O Address Translation and Transport Independence Description of the Verbs & Data Structures Preparation for posting a send operation Create the work request Gathering data from memory Putting gathered elements on the wire Multicast concept The Big Picture www.openfabrics.org 9

Programming Course The Big Picture www.openfabrics.org 10

OFA Software Benefits Remote Direct Memory Access provides Low latency stack bypass and copy avoidance Kernel bypass reduces CPU utilization Reduces memory bandwidth bottlenecks High bandwidth utilization Cross Platform support InfiniBand iwarp RoCE www.openfabrics.org 11

Conventional I/O versus RDMA I/O OS involved in all operations Conventional I/O RDMA I/O Channel interface runs in user space No need to access the kernel www.openfabrics.org 12

Address Translation Sockets Based Channel Based www.openfabrics.org 13

Many apps, one interface, three wires www.openfabrics.org 14

Kernel bypass Kernel bypass OFED the whole picture Application Level User APIs Diag Tools Open SM User Level MAD API OpenFabrics User Level User Space IP Based App Access Sockets Based Access SDP Lib UDAPL Various MPIs Block Storage Access Verbs & CMA / API Clustered DB Access Access to File Systems SA MAD SMA PMA IPoIB Subnet Administrator Management Datagram Subnet Manager Agent Performance Manager Agent IP over InfiniBand Upper Layer Protocol Mid-Layer Provider Hardware Kernel Space www.openfabrics.org SA Client VNIC MAD Hardware Specific Driver InfiniBand HCA SMA IPoIB SDP SRP iser RDS Connection Manager Connection Manager Abstraction (CMA) OpenFabrics Kernel Level Verbs / API Connection Manager 2011 OpenFabrics Alliance, Inc. NFS-RDMA RPC Cluster File Sys Hardware Specific Driver iwarp R-NIC SDP SRP iser RDS VNIC UDAPL HCA R-NIC Sockets Direct Protocol SCSI RDMA Protocol (Initiator) iscsi RDMA Protocol (Initiator) Reliable Datagram Service Virtual NIC User Direct Access Programming Lib Host Channel Adapter RDMA NIC Common Apps & Key Access InfiniBand Methods for using iwarp OF Stack 08/23/2011 15

Programming Course - OFED Verbs www.openfabrics.org 16

Programming Course Data Structures www.openfabrics.org 17

Bottom-up client setup phase rdma_create_id() - create struct rdma_cm_id identifier rdma_resolve_addr() - bind struct rdma_cm_id to local device rdma_resolve_route() - resolve route to remote server ibv_alloc_pd() - create struct ibv_pd protection domain ibv_create_cq() - create struct ibv_cq completion queue rdma_create_qp() - create struct ibv_qp queue pair ibv_reg_mr() - create struct ibv_mr memory region rdma_connect() - create connection to remote server www.openfabrics.org 18

Creating Scatter Gather Elements 19

Protection Domains Memory Regions 20

Gather during ibv_post_send() 21

Send Work Request (SWR) Purpose: tell network adaptor what data to send Data structure: struct ibv_send_wr Fields visible to programmer: next wr_id sg_list opcode num_sge send_flags pointer to next SWR in linked list user-defined identification of this SWR array of scatter-gather elements (SGE) IBV_WR_SEND number of elements in sg_list array IBV_SEND_SIGNALED Programmer must fill in these fields before calling ibv_post_send() 22

Posting to send data Verb: ibv_post_send() Parameters: Queue Pair - QP Pointer to linked list of Send Work Requests SWR Pointer to bad SWR in list in case of error Return value: == 0 all SWRs successfully added to send queue (SQ)!= 0 error code 23

Bottom-up client break-down phase rdma_disconnect() - destroy connection to remote server ibv_dereg_mr() - destroy struct ibv_mr memory region rdma_destroy_qp() - destroy struct ibv_qp queue pair ibv_destroy_cp() - destroy struct ibv_cq completion queue ibv_dealloc_pd() - deallocate struct ibv_pd protection domain rdma_destroy_id() - destroy struct rdma_cm_id identifier www.openfabrics.org 24

Multicast concept 2011 OpenFabrics Alliance, Inc 6/13/2011 25

Multicast Optional to implement in IB CAs and switches Uses Unreliable Datagram (UD) mode Only Send/Recv operations allowed Both sides must actively participate in data transfers Receiver must have RECV posted for next SEND Receiver must process each RECV completion Only possible with IB, not iwarp 2011 OpenFabrics Alliance, Inc 6/13/2011 26

OFA Training Programs RDMA Fabric Administration

IB, iwarp and RoCE IB, RoCE and iwarp all present the same APIs Above RDMA-CM there is little or no difference Below RDMA-CM, two distinct management paradigms The IB management paradigm The Ethernet/IP management paradigm (iwarp, RoCE) Below RDMA-CM there is no commonality Recommendation to prepare separate modules IB Fabric Admin iwarp Fabric Admin RoCE Benefit is the ability to deliver targeted information to the end user

Delivery Venue This course is intended to be delivered in a dedicated facility (e.g. UNH-IOL) It can also be delivered in a customer classroom, or remotely Face-to-face delivery is preferred For lab purposes: Access to a dedicated (non-production) cluster is required Students can SSH into a remote cluster (e.g. UNH-IOL) There is no specific requirement for an on-site cluster to support the class Course scheduling should be coordinated with UNH-IOL activities

IB Fabric Administration using the OFED Mid-Layer Syllabus

A System Admin s Guide to IB 4 hrs 1. Introduction to InfiniBand for System Administrators 1 hour IB goals and concepts (leverage material from RDMA Programming Course) IB as a layer 2 fabric RoCE fabrics, iwarp The role of IB management in the overall RDMA architecture Introduction to OFED architecture and ULPs (e.g. IPoIB) 2. IB physical components :30 min HCAs/TCAs, switches, routers, gateways, cables 3. IB Management Concepts 2.5 hours Centralized management architecture Manager/agent communication Fabric management concept Connection Management concept Traffic engineering features (partitioning, VLs, QoS) IB multicast compared to Ethernet

Planning and Installation 4 hrs 1. Planning an IB Installation 1 hr Introduction to topologies Deadlocks and deadlock avoidance Traffic engineering considerations: SLs VLs, Partitioning, QoS 2. Installing IB 3 hrs Combination lab / lecture OFED installation Linux and Windows OFED compile tips and tricks Use pre-compiled binaries Hardware installation, cabling, BKMs Optical versus copper cables, connector types, distances Verifying basic connectivity and operation, BKMs Initial power-on and debugging

IB Management Architecture 4 hrs OFED mid-layer topics 1. Fabric Management 1.5 hrs SMPs, GMPs End-to-end path concepts MTU, PMTU Fabric discovery and configuration (directed routed, routed discovery, VL15) Fabric partitioning, P_Keys, Q_Keys QoS considerations 2. Identifiers, Addresses and Address Resolution 1.5 hrs Global addresses: GUIDs, GIDs, IP addresses Local addresses LIDs, MLIDs, Address resolution 3. Managers and agents - :30 min Introduction to SM, SMA, SA - discuss OpenSM, introduce 3rd party managers Interacting with IB managers - SA query, Path Records, PM etc. 4. ULPs & connection management - :30 min IPoIB, RDMA-CM, IB-CM 5. Multicast management - :30 min Establishing /joining/leaving multicast groups IB multicast and the IB SM

Operating an IB Fabric 4 hrs 1. Configuring the fabric using IB management methods - :30 min Working with SMs, SAs Switch-based versus host-based SMs Discovering SMs, resolving conflicts 2. OFED Tools - :30 min List of IB tools Fabric discovery tools, connectivity tools, debugging tools The Top 10 tools practical advice on using the tools Getting more information Man pages IB specifications 3. Interacting with the system, system tools 3 hrs Verifying correct installation and basic operation Interacting with the system via OFED software OFED tools (ib_rdma_bw etc)

Advanced Topics not included in course 1. Performance analysis Tools Metrics 2. Traffic engineering TC SL VL 3. Advanced topologies 4. Storage applications & management SRP, iser, NFS/RDMA, etc. 5. Bridging technologies EoIB, FCoIB 6. IB over the WAN 7. MPI

Managing an IP Fabric for iwarp Syllabus

Fabric Admin s Guide to iwarp 1. Introduction to iwarp for Network Admins Contrasting and Comparing iwarp with IB and RoCE fabrics Introduction to OFED architecture and ULPs 2. iwarp physical components Ethernet switches and routers. Ethernet cables 3. Software RNIC Verbs MPI APIs Intel MPI, HP-MPI, Scali MPI, MVAPICH2 Non-MPI APIs AMQP Voltaire VMA, NYSE Data Fabric udapl v1.2, v1.3

iwarp Planning and Installation 1. Planning an iwarp Installation iwarp considerations in an Ethernet networking environment Ethernet network topologies Ethernet Switch features and configuration 2. Installing iwarp 3 hrs Combination lab / lecture BIOS and System settings OFED installation Linux and Windows OFED compile tips and tricks Use pre-compiled binaries Hardware installation, cabling, BKMs Optical vs copper cables, connector types, distances Initial power-on and debugging Verifying basic connectivity and operation, BKMs 3. Switch Management Enable Flow Control, Transmit and Receive Enable DCBx and Priority Flow Control (RoCE) Configure default VLAN 0

Future OFA Software Training Course Advanced Programming topics Kernel level programming ULP Training MPI RDS SRP Other www.openfabrics.org 39

OFA Training Course availability Writing Applications for RDMA using OFA Software Next course: May 22-23 2012 OFA Programming Course The course can be presented at your company location Europe and Asia supported in addition to USA Minimum of 8 attendees required The course is available via Webinar Four half day course available for developers in Asia to avoid travel expense Minimum of 8 attendees required Fabric Admin Projected availability - Q3 2012 Available quarterly at the UNH-IOL For more information contact: rsdance@soft-forge.com www.openfabrics.org 40