Running SMB3.x over RDMA on Linux platforms

Similar documents
Microsoft SMB Running Over RDMA in Windows Server 8

Can High-Performance Interconnects Benefit Memcached and Hadoop?

The Google File System

A COMPARISON BETWEEN THE SAMBA3 AND LIKEWISE LWIOD FILE SERVERS

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

SAMBA AND SMB3: ARE WE THERE YET? Ira Cooper Principal Software Engineer Red Hat Samba Team

Network Virtualization Technologies and their Effect on Performance

RDMA over Ethernet - A Preliminary Study

Lustre Networking BY PETER J. BRAAM

Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000

Network Attached Storage. Jinfeng Yang Oct/19/2015

Linux Driver Devices. Why, When, Which, How?

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Distributed File Systems

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

OpenFlow Based Load Balancing

Chronicle: Capture and Analysis of NFS Workloads at Line Rate

DPDK Summit 2014 DPDK in a Virtual World

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

CS 416: Opera-ng Systems Design

Cisco Integrated Services Routers Performance Overview

Cisco PIX vs. Checkpoint Firewall

Samba 4 AD + Fileserver

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Why is it a better NFS server for Enterprise NAS?

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Mike Canney. Application Performance Analysis

Chapter 11 Distributed File Systems. Distributed File Systems

Repeat Success, Not Mistakes; Use DDS Best Practices to Design Your Complex Distributed Systems

SerNet. Samba Status Update. Munich 13. March Volker Lendecke SerNet Samba Team. Network Service in a Service Network

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

RCL: Software Prototype

CS 153 Design of Operating Systems Spring 2015

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Operating System Components

Samba's Cloudy Future. Jeremy Allison Samba Team.

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: Security Note

<Samba status report>

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Storage Class Memory Support in the Windows Operating System Neal Christiansen Principal Development Lead Microsoft

Wireless ATA: A New Data Transport Protocol for Wireless Storage

Chapter 5. Data Communication And Internet Technology

Lecture 25 Symbian OS

The Linux Virtual Filesystem

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik

Objectives of Lecture. Network Architecture. Protocols. Contents

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

CGHub Client Security Guide Documentation

Violin: A Framework for Extensible Block-level Storage

High Speed I/O Server Computing with InfiniBand

Outline. Windows NT File System. Hardware Basics. Win2K File System Formats. NTFS Cluster Sizes NTFS

A Dell Technical White Paper Dell PowerConnect Team

Storage at a Distance; Using RoCE as a WAN Transport

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

AIX NFS Client Performance Improvements for Databases on NAS

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

ESSENTIALS. Understanding Ethernet Switches and Routers. April 2011 VOLUME 3 ISSUE 1 A TECHNICAL SUPPLEMENT TO CONTROL NETWORK

Windows 8 SMB 2.2 File Sharing Performance

Digital Scanning Solutions. Versatile, affordable scanning solutions for every day business. Digital Printing Solutions

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Mellanox Accelerated Storage Solutions

Tyche: An efficient Ethernet-based protocol for converged networked storage

Packet-based Network Traffic Monitoring and Analysis with GPUs

Guideline for setting up a functional VPN

Development of complex KNX Devices

VIA CONNECT PRO Deployment Guide

PANDORA FMS NETWORK DEVICES MONITORING

Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF

OpenSAF A Standardized HA Solution

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Operating Systems Design 16. Networking: Sockets

WISE-4000 Series. WISE IoT Wireless I/O Modules

IMPLEMENTATION OF INTELLIGENT FIREWALL TO CHECK INTERNET HACKERS THREAT

Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon

Egnyte Local Cloud Architecture. White Paper

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

The Case Against Jumbo Frames. Richard A Steenbergen <ras@gtt.net> GTT Communications, Inc.

DOCUMENT REFERENCE: SQ EN. SAMKNOWS TEST METHODOLOGY Web-based Broadband Performance White Paper. July 2015

The Native AFS Client on Windows The Road to a Functional Design. Jeffrey Altman, President Your File System Inc.

FortiOS Handbook WAN Optimization, Web Cache, Explicit Proxy, and WCCP for FortiOS 5.0

HP Web Jetadmin IP Range Discovery

PANDORA FMS NETWORK DEVICE MONITORING

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Frequently Asked Questions

Protocols and Architecture. Protocol Architecture.

Addonics T E C H N O L O G I E S. NAS Adapter. Model: NASU Key Features

SMB Direct for SQL Server and Private Cloud

Using Samba to play nice with Windows. Bill Moran Potential Technologies

SerNet. Clustered Samba. Nürnberg April 29, Volker Lendecke SerNet Samba Team. Network Service in a Service Network

Network Licensing. White Paper 0-15Apr014ks(WP02_Network) Network Licensing with the CRYPTO-BOX. White Paper

Transcription:

Running SMB3.x over RDMA on Linux platforms Mark Rabinovich Visuality Systems John Kim Mellanox Technologies

Prehistory NQ CIFS is an implementation of SMB client/server for the embedded world: Consumer devices: printers, MFP, routers, STB, etc. Mobile devices: smart phones, tablets Industrial Automation, Medica, Aerospace and Defense Anything else that is neither PC nor MAC nor Samba.. The last version NQE supports basic SMB3.0, mostly towards data encryption. SMB over RDMA is only supported on Windows platforms Currently we are developing a storage-level version of SMB Server. The presentation is about this venture. 2

Plans and Goals Enterprise-level performance with low latencies 10,000 connections at least Pluggable transports Pluggable VFS 3

Architecture We will mostly talk about the SMBD transport 4

Architecture remarks Socket-based transport uses epoll() or alike. A pluggable transport may be: Fast, zero-copy sockets Proprietary transport protocol Etc. Pluggable transport conforms to Transport API. Core translates SMBs into NTFS semantics. Embedded VFS converts NTFS semantics into POSIX semantics (or whatever the local FS is). Pluggable VFS translates NTFS semantics into Storage primitives. Pluggable VFS conforms to VFS API. The SMBD transport is of a particular interest for this session. 5

Dataflow in layers An example of involved layers as applicable to the subject of this session 6

Dataflow explained Abstract APIs allows replaceable modules SMB Direct is one of the possible transports RDMA actually means two services: Packet delivery Payload delivery, which is applicable to Read(s)/Write(s) and assumes RDMA Our RDMA layer is also replaceable 7

SMBD a drill-down view 8

SMBD remarks Two thread run: New connection listener registers a callback with RDMA API. On callback: Handles connection request. Handles connection acknowledgement. Submits several request buffers to RDMA receive queue. Handles disconnect Data event listener registers a callback with RDMA API. On callback: On Send complete releases the buffer On Receive complete accepts the incoming packet and delegates it to SMB. Sometimes assembles an SMB. On Read complete and Write complete releases a 1MB buffer. 9

Buffer sizes Each protocol negotiates its own buffer sizes SMB Max Transaction Size Max Read Size Max Write Size SMBD Preferred Send Size Max Read/Write Size Max Receive Size Max Fragmented Size 10

Buffer sizes (cont.) Windows 2012 NQ SMB Max Transaction Size 1048576 1048576 Max Read Size 1048576 1048576 Max Write Size 1048576 1048576 SMBD Preferred Send Size 1364 8192 Max Read/Write Size 1048576 1048576 Max Receive Size 8192 8192 Max Fragmented Size 1048576 1048576 11

Buffer vs. iovec IOVEC Buffer Grants zero-copy i/o Proper buffer management (see Requires more complicated below) will not also require data code and also causes copy (apparently -minor) performance decrease SMBD over RDMA grants packet delivery in one piece if it fits into the receive buffer Most SMBs fit into one buffer and are delivered in one piece Conclusions Majority of SMBs will fit into one receive buffer SMBD assembly is rarely needed Buffers are sufficient no need for iovec 12

Buffer management Small buffer is 8192 bytes long (Max Receive Size!!) Most SMB requests (except, maybe, for some odd IOCTLs) Most SMB responses (except for Query Directory) Reads and Writes normally small since RDMA is enforced. Some exceptions happen - see below Almost all receive buffers are small Medium buffer is 64K bytes long Query Directory responses Some RPC responses (e.g., NetrEnumShares) Odd requests/response (Query Quota, IOCTL) Big buffer is 1M bytes long RDMA reads and writes as SMB Read or Write payload Non-RDMA Reads and Writes (rarely happens) 13

SMBD Fragmentation and Assembly Fragmentation may happen on transmitting a long response (e.g., - Query Directory). Fragmentation is achieved by using the same buffer and sending its contexts with shifting offsets. This is available due to the RDMA layer s capability of submitting the same buffer with different offsets. Fragmentation does not cause data copy and applies virtually zero overhead. Assembly may happen upon receiving a long request. This almost never happens since the vast majority of requests are small, including RDMA reads and writes. On a (rare) message assembly, data copy happens. A long request is assembled in a medium or big buffer. 14

Buffer pre-allocation On a 10Gbs network memory allocation may become a bottleneck. NQ uses an abstract API for memory (pre)allocation which counts on the underlying memory management mechanism. This is not enough so NQ pre-allocates buffers in: SMB Direct layer SMB layer 15

SMBD Crediting SMBD requests and SMBD responses imply a simple crediting algorithm. Credits == number of receive buffers per a connection. 16

RDMA Remarks Listens to connection events using rdma_get_cm_event(). New connection request New connection acknowledge Disconnect Listens to data events using ibv_get_cq_event() and ibv_poll_cq(). An incoming message placed into a receive buffer An outgoing packet was sent RDMA transfer completed Any of the above causes a callback to the SMBD layer. 17

RDMA Remarks (cont.) Queues a buffer for receiving through rdma_post_recv(). Receive completion on this buffer will generate an event. Queues a buffer for sending through rdma_post_send(). Send completion will generate an event. Queues a buffer for RDMA read/write through rdma_post_read(). A completion on this buffer will generate an event. 18

Test Environment The Windows client side: HP Proliant ML350P Server Runs Windows 2012 Mellanox ConnectX-3 network adapter The Linux server side: HP Proliant ML350P Server Runs Redhat 2.6.32 Runs NQ CIFS Server 8.0 (prototype) Mellanox ConnectX-3 network adapter The Mellanox card was tested in both InfiniBand mode and Ethernet mode. 19

The Testbench The aim was to validate SMB over RMDA Currently out of scope: Multiple connections Random operations Small chunk operations We were only uploading and downloading a 10GBfile. We plugged Embedded VFS with POSIX backend. On 10Gbs this is definitely a bottleneck. Then we measured with a dummy backend. 20

Performance Embedded VFS with POSIX backend. Embedded VFS with a dummy backend. 21

Numbers We used Embedded VFS with a dummy backend. The network is InfiniBand Download: 940 MByte/sec Upload: 1,400 MByte/sec These results are similar to Win-Win numbers 22

Thank you 23