Advanced Computer Networks. High Performance Networking I

Size: px
Start display at page:

Download "Advanced Computer Networks. High Performance Networking I"

Transcription

1 Advanced Computer Networks High Performance Networking I Patrick Stuedi Spring Semester Oriana Riva, Department of Computer Science ETH Zürich

2 Outline Last week: Wireless TCP Today: High Performance Networking: Part I 2

3 Appendix: Mobile IPv6 Uses IPv6 routing header 3

4 Course Overview Wireless networking technologies: first half of this course We are now here covered in basic ETH Operating Systems and Networks course 4 Datacenter networking: second half of this course

5 Overview High-Performance Computing (HPC) Supercomputers Systems designed from scratch Densely packed Short rack-to-rack cables Expensive Built from custom high-end components Mostly run a single program at a time, e.g., message passing (MPI) applications Cloud Computing Datacenters often built from commodity off-the-shelf hardware May run multiple jobs at the same time Often are multi-tenant - Different jobs running in datacenter have been developed or deployed by different people Often use virtualization - Hardware is multiplexed, e.g., multiple virtual machines per host Runs cloudy workload - Internet based applications: , Social Network, Maps, Search - Analytics: MapReduce/Hadoop, Pregel, NoSql, NewSql, etc... 5

6 IBM Blue Gene P supercomputer 6

7 Blue Gene: Cabling 7

8 Blue Gene Supercomputer Overview Blue Gene is a family of supercomputers from IBM BlueGene/L (2004) BlueGene/P (2006) BlueGene/Q (2010) Blue Gene/L: 64K-node highly integrated supercomputer Many of the components (processor, network, router) on the same chip Blue Gene/L: #1 Supercomputer as ranked by Top500 list from November 2004 June

9 BlueGene: Dense Packaging 9

10 Design Motivation: Processor Clock Frequency Scaling Ends Three decades of exponential clock rate (and electrical power!) growth has ended Yet Moore s Law continues in transistor count What do we do with all those transistors to keep performance increasing to meet demand? Industry response: Multi-core (i.e. double the number of cores every 18 months instead of the clock frequency (and power!) Source: The Landscape of Computer Architecture, John Shalf, NERSC/LBNL, presented at ISC07, Dresden, June 25, 2007 But, added transistors can be used for other functions such as memory/storage controllers, embedded networks, etc. 10

11 Blue Gene/P System-on-a-Chip Compute Node Network logic is a fraction of compute ASIC complexity/area. BG/P Node Card 11

12 Network Topology: Basics Topologies can be classified into: Direct networks Processing nodes directly attached to the switching fabric Indirect networks: separate processing nodes and switching elements In direct networks nodes often have very few ports (2,3,4, ) Low port networks are also called low-radix networks Elements (e.g. switches) in indirect networks often have higher port numbers (16,32,64,128,...) High port networks are also called high-radix networks 12

13 Criteria for choosing a particular network topology Path length between two nodes The more hops the bigger the latency The more hops the more congestion in the network Cost Bisection bandwidth The rate at which communication can take place between one half of a cluster and the other Typically the segmentation refers to the worst-case segmentation Path redundency Multiple paths between src/dst nodes Affects reliability, bandwidth, etc 13

14 Direct Networks: Mesh, Torus, Hypercube Notation <k>-ary-<n>-mesh or <k>-ary-<n>-torus k: radix, number of elements in each dimension (different meaning here than in term high-radix-network ) n: number of dimensions radix k does not have to be the same in each dimension Examples: a) 10-ary 1-torus b) 5-ary 2-torus c) 3-ary 3-torus 14

15 Direct Networks: Mesh, Torus, Hypercube (2) Cost effective at scale Allows for very dense packaging (single node card with compute element and switching element) Great performance for applications with locality Computation has dependencies on results of computations on neighboring nodes (many MPI application have this property) Simple expansion for future growth Just append nodes on one of the dimensions Good path redundancy 15

16 Example Bisection Bandwidth Bisection bandwidth: Minimal #arcs to be removed to partition the network in two equal halves 4-ary-2-mesh: bisection bandwidth 4 2-ary-3-mesh: bisection bandwidth 4 16

17 Example: IBM Blue Gene 3D Torus Network Interconnects compute nodes Communication backbone for computation In BlueGene/L: 32x32x64 connectivity Worst case diameter: = 64 hops Different consecutive packets can follow different routes 17

18 High Performance Neworking: Layer-2 / Interconnect technologies Supercomputer interconnect technologies through the ages: Ten years ago (2002) - Many different interconnect technologies - Myrinet takes about 30% In 2010: - Gigabit Ethernet takes 50% - Infiniband 41 % Datacenter interconnects Almost entirely Ethernet 18

19 Infiniband vs Ethernet Infiniband (IB) Low latency - ~ 1us for two directly connected boxes High bandwidth - For data rates (SDR: 10 Gbit/s, DDR: 20 Gbit/s, QDR: 40 Gbit/s) Supports RDMA interface - RDMA: Remote Direct Memory Access: No OS involvement during transmission and reception of packets Ethernet: 10GbE has 5-6 times the latency of Infiniband 40GbE and 100Gbe in the pipeline Both IB and Ethernet can be operated with a switched fabric topology 19

20 Network latencies in Data centers Factors that contribute to latency in TCP datacenters Delay: cost of a single traversal of the component RTT: total cost in a round-trip traversing 5 switches in each direction OS overhead per packet exchanged between two hosts attached to the same switch: (2*15)/(2*2.5+2*15+10)=66% (!!) 20

21 Packet Processing Overhead user space kernel space DMA accessible memory area Sending-side: Data is copied from the application buffer into a socket buffer Data is DMA copied into NIC buffer Receiver side: Data is DMA copied from NIC buffer into socket buffer Data is copied into application buffer Application is scheduled (context switching)21

22 Throughput and CPU load at 1Gbit/s and 10Gbit/s - Throughput limited because of high CPU load - RX side typically more CPU intensive because highly asynchronous 22

23 TCP Offloading What is TCP offloading Moving IP and TCP processing to the Network Interface (NIC) Main justification for TCP offloading Reduction of host CPU cycles for protocol header processing, checksumming Fewer CPU interrupts Fewer bytes copied over the memory bus Potential to offload expensive features such as encryption 23 Department of Computer Science

24 TCP Offload Engines (TOEs) 24 Department of Computer Science

25 Problems of TCP offloading Moore s Law worked against smart NICs CPU's used to be fast enough Now many cores, cores don't get faster Network processing is hard to parallelize TCP/IP headers don t take many CPU cycles TOEs impose complex interfaces Protocol between TOE & CPU can be worse than TCP Connection management overhead For short connections, overwhelms any savings 25 Department of Computer Science

26 Where TCP offload helps Sweet spot for TCP offload might be apps with: Very high bandwidth Relatively low end-to-end latency network paths Long connection durations Relatively few connections Typical examples of these might be: Storage-server access Cluster interconnects 26 Department of Computer Science

27 User-level networking: Remove OS from the data path Transport offloading is not enough! System call overhead, Context switches Memory copying U-Net: Eicken, Basu, Buch, Vogels, Cornell University, 1995 Virtual network interface that allows applications to send and receive messages without operating system intervention Move all all buffer management and packet processing to userspace (zero-copy) 27

28 U-Net Overview a) Traditional networking architecture Kernel controls the network All communication via kernel b) U-Net architecture: Application access network directly via MUX Kernel involved only in connection setup 28

29 U-Net Building Blocks End points application s handle into the network Buffer area hold message data for sending or buffer space for receiving Message queues hold descriptors pointing to buffer area 29

30 U-Net communication application 1 application 2 application 2 endpoint endpoint endpoint U-Net NI Initialization: Create one or more endpoints Register communication segment with endpoint and associate them with a tag Sending Composes the data in the communication segment Push a descriptor for the message onto the send queue NIC transmits the message after marking it with the appropriate message tag. Receiving: Incoming messages get de-multiplexed based on the message tag Data is placed within the target buffer of the application by the NIC Push message descriptor to the receive queue 30 Department of Computer Science

31 History of User-Level Networking U-Net one of the first (if not the first) system to propose OS-bypassing Other early works SHRIMP: Virtual Memory Mapped Interfaces, IEEE Micro, 1995 Separating Data and Control Transfer in Distributed Operating Systems, Thekkath et. al., ASPLOS'94 Efforts of U-Net eventually resulted in the Virtual Interface Architecture (VIA) Specification jointly proposed by Compaq, Intel and Microsoft, 1997 VIA architecture has led to the implementation of various high performance networking stacks: Infiniband, iwarp, Roce: Commonly referred to as RDMA network stacks RDMA = Remote Direct Memory Access 31 Department of Computer Science

32 RDMA Architecture RDMA verbs interface Application RDMA verbs userlib user space Socket layer TCP UDP traditional socket interface kernel IP Kernel Mod Ethernet NIC Driver RDMA enabled NIC i/o space Traditional socket interface involves kernel RDMA interface involves kernel only on control path, but access the RDMA capable NIC (rnic) directly from user space on the data path Dedicated verbs interface used for RDMA, instead of traditional socket interface 32 Department of Computer Science

33 RDMA Queue Pairs (QPs) Applications use 'verbs' interface to Register memory: - Operating system will make sure the memory is pinned and accessible by DMA Create a queue pair (QP) - send/recv queue Create a completion queue (CQ) - RNIC puts a new completion-queue element into the CQ after an operation has completed Send/Receive data - Place a work-request element (WQE) into the send or recv queue - WQE points to user buffer and defines the 33 type of the operation (e.g., send, recv,..) Department of Computer Science

34 RDMA Queue Pairs (QPs) Applications use 'verbs' interface to This is much like Register memory: U-NET - Operating system will make sure the memory is pinned and accessible by DMA Create a queue pair (QP) - send/recv queue Create a completion queue (CQ) - RNIC puts a new completion-queue element into the CQ after an operation has completed Send/Receive data - Place a work-request element (WQE) into the send or recv queue - WQE points to user buffer and defines the 34 type of the operation (e.g., send, recv,..) Department of Computer Science

35 RDMA operations Send/Receive: Two-sided operation: data exchange naturally involves both ends of the communication channel Each send operation must have a matching receive operation Send WR specifies where the data should be taken from Receive WR on the remote machine specifies where the inbound data is to be placed RDMA (Remote Direct Memory Access) Two independent operations: RDMA Read and RDMA Write Only the application issuing the operation is actively involved in the data transfer An RDMA Write not only specifies where the data should be taken from, but also where it is to be placed (remotely) An RDMA Read quires a buffer advertisement prior to data exchange 35 Department of Computer Science

36 Example: RDMA Send/Recv (1) Sender and receiver have created their QPs and CQs Sender has registered a buffer for sending Receiver has registered a buffer for receiving 36 Department of Computer Science

37 Example: RDMA Send/Recv (2) Receiver places a WQE into its receive queue Sender places a WQE into its send queue 37 Department of Computer Science

38 Example: RDMA Send/Recv (3) Data is transferred between the hosts Involves two DMA transfers, one at the sender and one at the receiver 38 Department of Computer Science

39 Example: RDMA Send/Recv (4) After operation has finished, a CQE is placed into the completion queue of the sender 39 Department of Computer Science

40 RDMA implementations Infiniband Compaq, HP, IBM, Intel Microsoft and Sun Microsystems Provides RDMA semantics First spec released 2000 Based on point-to-point switched fabric Designed from ground up (has its own physical layer, switches, NICs, etc) IWARP (Internet Wide Area RDMA Protocol) RDMA semantics implemented over offloaded TCP/IP Requires custom NICs, but uses Ethernet RoCE RDMA semantics implemented directly over Ethernet All of those implementation can be programmed through the verbs interface 40 Department of Computer Science

41 Typical CPU loads for three network stack implementations 41

42 Performance Mellanox ConnectX-2 bare-metal from within a virtual machine RDMA/read latency (one-sided operation): 2-3 us 42

43 References High Performance Datacenter Networks: Architecture, Algorithms, and Opportunities, Synthesis Lectures on Computer Architecture, Morgan & Claypool, 2010 An Overview of the BlueGene/L Supercomputer, The BlueGene/L Team, 2002 U-Net: A User-Level Network Interface for Parallel and Distributed Computing, SOSP

Advanced Computer Networks. Datacenter Network Fabric

Advanced Computer Networks. Datacenter Network Fabric Advanced Computer Networks 263 3501 00 Datacenter Network Fabric Patrick Stuedi Spring Semester 2014 Oriana Riva, Department of Computer Science ETH Zürich 1 Outline Last week Today Supercomputer networking

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card In-Su Yoon 1, Sang-Hwa Chung 1, Ben Lee 2, and Hyuk-Chul Kwon 1 1 Pusan National University School of Electrical and Computer

More information

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used

More information

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to Introduction to TCP Offload Engines By implementing a TCP Offload Engine (TOE) in high-speed computing environments, administrators can help relieve network bottlenecks and improve application performance.

More information

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER August 21 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...4 Summary...

More information

High Speed I/O Server Computing with InfiniBand

High Speed I/O Server Computing with InfiniBand High Speed I/O Server Computing with InfiniBand José Luís Gonçalves Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal zeluis@ipb.pt Abstract: High-speed server computing heavily relies on

More information

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft

More information

Advanced Computer Networks 263-3825-00 Network Topologies. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015

Advanced Computer Networks 263-3825-00 Network Topologies. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Advanced Computer Networks 263-3825-00 Network Topologies Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 1 What is Data Center? 2 Amadeus Data Center 3 DATA CENTER EVOLUTION 4 Data center

More information

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract: Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure Motti Beck Director, Marketing motti@mellanox.com Michael Kagan Chief Technology Officer michaelk@mellanox.com

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

Storage at a Distance; Using RoCE as a WAN Transport

Storage at a Distance; Using RoCE as a WAN Transport Storage at a Distance; Using RoCE as a WAN Transport Paul Grun Chief Scientist, System Fabric Works, Inc. (503) 620-8757 pgrun@systemfabricworks.com Why Storage at a Distance the Storage Cloud Following

More information

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller

More information

Building Enterprise-Class Storage Using 40GbE

Building Enterprise-Class Storage Using 40GbE Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards WHITE PAPER Delivering HPC Applications with Juniper Networks and Chelsio Communications Ultra Low Latency Data Center Switches and iwarp Network Interface Cards Copyright 20, Juniper Networks, Inc. Table

More information

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect White PAPER Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect Introduction: High Performance Data Centers As the data center continues to evolve to meet rapidly escalating

More information

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com

More information

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

QoS & Traffic Management

QoS & Traffic Management QoS & Traffic Management Advanced Features for Managing Application Performance and Achieving End-to-End Quality of Service in Data Center and Cloud Computing Environments using Chelsio T4 Adapters Chelsio

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8

Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Tom Talpey, Architect Microsoft March 27, 2012 1 SMB2 Background The primary Windows filesharing protocol Initially shipped in Vista and Server

More information

Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise

Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise Paul Grun InfiniBand Trade Association INTRO TO INFINIBAND FOR END USERS

More information

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group IEEE Congestion Management Presentation for IEEE Congestion Management Study Group Contributors Jeff Lynch IBM Gopal Hegde -- Intel 2 Outline Problem Statement Types of Traffic & Typical Usage Models Traffic

More information

3G Converged-NICs A Platform for Server I/O to Converged Networks

3G Converged-NICs A Platform for Server I/O to Converged Networks White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview

More information

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V Jose Barreto Principal Program Manager Microsoft Corporation Agenda Hyper-V over SMB - Overview How to set it up Configuration Options

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel

More information

White Paper Abstract Disclaimer

White Paper Abstract Disclaimer White Paper Synopsis of the Data Streaming Logical Specification (Phase I) Based on: RapidIO Specification Part X: Data Streaming Logical Specification Rev. 1.2, 08/2004 Abstract The Data Streaming specification

More information

Advancing Applications Performance With InfiniBand

Advancing Applications Performance With InfiniBand Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and

More information

Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters Evaluation report prepared under contract with Emulex Executive Summary As

More information

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Michael Kagan. michael@mellanox.com

Michael Kagan. michael@mellanox.com Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service

More information

PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based

PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based Description Silicom 10Gigabit TOE PCI Express server adapters are network interface cards that contain multiple

More information

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters High Throughput File Servers with SMB Direct, Using the 3 Flavors of network adapters Jose Barreto Principal Program Manager Microsoft Corporation Abstract In Windows Server 2012, we introduce the SMB

More information

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand

More information

New Data Center architecture

New Data Center architecture New Data Center architecture DigitPA Conference 2010, Rome, Italy Silvano Gai Consulting Professor Stanford University Fellow Cisco Systems 1 Cloud Computing The current buzzword ;-) Your computing is

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Advanced Computer Networks. Scheduling

Advanced Computer Networks. Scheduling Oriana Riva, Department of Computer Science ETH Zürich Advanced Computer Networks 263-3501-00 Scheduling Patrick Stuedi, Qin Yin and Timothy Roscoe Spring Semester 2015 Outline Last time Load balancing

More information

Performance of Software Switching

Performance of Software Switching Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment

More information

Architecting Low Latency Cloud Networks

Architecting Low Latency Cloud Networks Architecting Low Latency Cloud Networks Introduction: Application Response Time is Critical in Cloud Environments As data centers transition to next generation virtualized & elastic cloud architectures,

More information

Latency Considerations for 10GBase-T PHYs

Latency Considerations for 10GBase-T PHYs Latency Considerations for PHYs Shimon Muller Sun Microsystems, Inc. March 16, 2004 Orlando, FL Outline Introduction Issues and non-issues PHY Latency in The Big Picture Observations Summary and Recommendations

More information

Design and Implementation of the iwarp Protocol in Software. Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff Ohio Supercomputer Center

Design and Implementation of the iwarp Protocol in Software. Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff Ohio Supercomputer Center Design and Implementation of the iwarp Protocol in Software Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff Ohio Supercomputer Center What is iwarp? RDMA over Ethernet. Provides Zero-Copy mechanism

More information

Cloud Computing and the Internet. Conferenza GARR 2010

Cloud Computing and the Internet. Conferenza GARR 2010 Cloud Computing and the Internet Conferenza GARR 2010 Cloud Computing The current buzzword ;-) Your computing is in the cloud! Provide computing as a utility Similar to Electricity, Water, Phone service,

More information

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/2011. www.openfabrics.

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/2011. www.openfabrics. OFA Training Program Writing Application Programs for RDMA using OFA Software Author: Rupert Dance Date: 11/15/2011 www.openfabrics.org 1 Agenda OFA Training Program Program Goals Instructors Programming

More information

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710 COMPETITIVE BRIEF April 5 Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL7 Introduction: How to Choose a Network Interface Card... Comparison: Mellanox ConnectX

More information

Mellanox Academy Online Training (E-learning)

Mellanox Academy Online Training (E-learning) Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),

More information

InfiniBand in the Enterprise Data Center

InfiniBand in the Enterprise Data Center InfiniBand in the Enterprise Data Center InfiniBand offers a compelling value proposition to IT managers who value data center agility and lowest total cost of ownership Mellanox Technologies Inc. 2900

More information

Gigabit Ethernet Design

Gigabit Ethernet Design Gigabit Ethernet Design Laura Jeanne Knapp Network Consultant 1-919-254-8801 laura@lauraknapp.com www.lauraknapp.com Tom Hadley Network Consultant 1-919-301-3052 tmhadley@us.ibm.com HSEdes_ 010 ed and

More information

Fibre Channel Over and Under

Fibre Channel Over and Under Fibre Channel over : A necessary infrastructure convergence By Deni Connor, principal analyst April 2008 Introduction Consolidation of IT datacenter infrastructure is happening in all forms. IT administrators

More information

VMWARE WHITE PAPER 1

VMWARE WHITE PAPER 1 1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the

More information

The proliferation of the raw processing

The proliferation of the raw processing TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer

More information

Lustre Networking BY PETER J. BRAAM

Lustre Networking BY PETER J. BRAAM Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Data Center Architecture Overview

Data Center Architecture Overview 1 CHAPTER Note Important Updated content: The Cisco Virtualized Multi-tenant Data Center CVD (http://www.cisco.com/go/vmdc) provides updated design guidance including the Cisco Nexus Switch and Unified

More information

PCI Express and Storage. Ron Emerick, Sun Microsystems

PCI Express and Storage. Ron Emerick, Sun Microsystems Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature

More information

Architecture and Performance of the Internet

Architecture and Performance of the Internet SC250 Computer Networking I Architecture and Performance of the Internet Prof. Matthias Grossglauser School of Computer and Communication Sciences EPFL http://lcawww.epfl.ch 1 Today's Objectives Understanding

More information

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized

More information

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Windows 8 SMB 2.2 File Sharing Performance

Windows 8 SMB 2.2 File Sharing Performance Windows 8 SMB 2.2 File Sharing Performance Abstract This paper provides a preliminary analysis of the performance capabilities of the Server Message Block (SMB) 2.2 file sharing protocol with 10 gigabit

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

Unified Fabric: Cisco's Innovation for Data Center Networks

Unified Fabric: Cisco's Innovation for Data Center Networks . White Paper Unified Fabric: Cisco's Innovation for Data Center Networks What You Will Learn Unified Fabric supports new concepts such as IEEE Data Center Bridging enhancements that improve the robustness

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E RDMA Performance in Virtual Machines using QDR InfiniBand on vsphere 5 Table of Contents Introduction...

More information

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Local-Area Network -LAN

Local-Area Network -LAN Computer Networks A group of two or more computer systems linked together. There are many [types] of computer networks: Peer To Peer (workgroups) The computers are connected by a network, however, there

More information

Block based, file-based, combination. Component based, solution based

Block based, file-based, combination. Component based, solution based The Wide Spread Role of 10-Gigabit Ethernet in Storage This paper provides an overview of SAN and NAS storage solutions, highlights the ubiquitous role of 10 Gigabit Ethernet in these solutions, and illustrates

More information

Advanced Computer Networks. Layer-7-Switching and Loadbalancing

Advanced Computer Networks. Layer-7-Switching and Loadbalancing Oriana Riva, Department of Computer Science ETH Zürich Advanced Computer Networks 263-3501-00 Layer-7-Switching and Loadbalancing Patrick Stuedi, Qin Yin and Timothy Roscoe Spring Semester 2015 Outline

More information

Routing Heterogeneous CCI Subnets

Routing Heterogeneous CCI Subnets Routing Heterogeneous CCI Subnets Scott Atchley Technology Integration Group, NCCS Oak Ridge National Laboratory Oak Ridge, TN, USA atchleyes@ornl.gov Abstract This electronic document is a live template

More information

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding CS 78 Computer Networks Internet Protocol (IP) Andrew T. Campbell campbell@cs.dartmouth.edu our focus What we will lean What s inside a router IP forwarding Internet Control Message Protocol (ICMP) IP

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

High-Performance Networking for Optimized Hadoop Deployments

High-Performance Networking for Optimized Hadoop Deployments High-Performance Networking for Optimized Hadoop Deployments Chelsio Terminator 4 (T4) Unified Wire adapters deliver a range of performance gains for Hadoop by bringing the Hadoop cluster networking into

More information

Network Virtualization Technologies and their Effect on Performance

Network Virtualization Technologies and their Effect on Performance Network Virtualization Technologies and their Effect on Performance Dror Goldenberg VP Software Architecture TCE NFV Winter School 2015 Cloud Computing and NFV Cloud - scalable computing resources (CPU,

More information

Interconnection Networks. B649 Parallel Computing Seung-Hee Bae Hyungro Lee

Interconnection Networks. B649 Parallel Computing Seung-Hee Bae Hyungro Lee Interconnection Networks B649 Parallel Computing Seung-Hee Bae Hyungro Lee Outline Introduction Interconnecting Two Devices Connecting More than Two Devices Network Topology Network Routing, Arbitration,

More information

Bioscience. Introduction. The Importance of the Network. Network Switching Requirements. Arista Technical Guide www.aristanetworks.

Bioscience. Introduction. The Importance of the Network. Network Switching Requirements. Arista Technical Guide www.aristanetworks. Introduction Over the past several years there has been in a shift within the biosciences research community, regarding the types of computer applications and infrastructures that are deployed to sequence,

More information

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain

More information

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Data Direct I/O Technology (Intel DDIO): A Primer > Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology 3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related

More information

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing WHITE PAPER Highlights: There is a large number of HPC applications that need the lowest possible latency for best performance

More information

Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09

Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09 Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09 Authors: Amphenol, Cisco, Dell, Fulcrum Microsystems, Intel, Ixia, JDSU, Mellanox, NetApp, Panduit, QLogic, Spirent, Tyco Electronics,

More information

Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck

Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck Sockets vs RDMA Interface over -Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck Pavan Balaji Computer Science and Engg., The Ohio State University, Columbus, OH 3, balaji@cse.ohio-state.edu

More information

An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management

An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource

More information