SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation
Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel SMB Scale-Out
Remote File Storage for Server Apps What is it? Server applications storing their data files on SMB file shares (UNC paths) Examples: Hyper-V: Virtual Hard Disks (VHD), config. SQL Server: Database and log files What is the value? Easier provisioning shares instead of LUNs Easier management shares instead of LUNs Flexibility dynamic server relocation Leverage network investments no need for specialized storage networking infrastructure or knowledge Lower cost Acquisition and Operation cost SQL Serve r IIS Hyper-V VDI Deskto p File Server SQL Serve r File Server IIS First class storage Item by item, a storage solution that can match the capabilities of traditional block solutions Shared Storage
SMB Direct (SMB over RDMA) SMB DIRECT (SMB OVER RDMA)
SMB Direct (SMB over RDMA) New class of SMB file storage for the Enterprise Minimal CPU utilization for file storage processing Low latency and ability to leverage high speed s Fibre Channel-equivalent solution at a lower cost File Client Application User File Server Traditional advantages of SMB file storage SMB Client Kernel SMB Server Easy to provision, manage and migrate Leverages converged network No application change or administrator configuration Network w/ RDMA support Network w/ RDMA support NTFS SCSI Required hardware RDMA-capable network interface (R-) R- R- Disk Support for iwarp, InfiniBand and RoCE
What is RDMA? Remote Direct Memory Access Protocol Accelerated IO delivery model which works by allowing application software to bypass most layers of software and communicate directly with the hardware Client SMB Client Memory RDMA Memor y File Server SMB Serve r RDMA benefits Low latency High throughput Zero copy capability OS / Stack bypass SMB Direct NDKPI SMB Direct NDKPI RDMA Hardware Technologies Infiniband iwarp: RDMA over TCP/IP RoCE: RDMA over Converged Ethernet RDMA Ethernet or InfiniBand RDMA
SMB over TCP and RDMA User Kernel Client 1 Application Unchanged API SMB Client Memory 4 RDMA Memory SMB Server File Server 1. 1 Application (Hyper- V, SQL Server) does not need to change. 2. 2 SMB client makes the decision to use SMB Direct at run time 2 SMB Direct SMB Direct 3. 3 NDKPI provides a much thinner layer than TCP/IP TCP/ IP NDKPI 3 NDKPI TCP/ IP 4. 4 Remote Direct Memory Access performed by the network interfaces. RDMA RDMA Ethernet and/or InfiniBand
Lower CPU Utilization under load Lower latency Comparing RDMA Technologies Type (Cards*) Pros Cons Non-RDMA Ethernet (wide variety of s) TCP/IP-based protocol Works with any Ethernet switch Wide variety of vendors and models Support for in-box teaming (LBFO) Currently limited to 10Gbps per port Higher CPU Utilization under load Higher latency iwarp (Intel NE020*) TCP/IP-based protocol Works with any switch RDMA traffic routable Currently limited to 10Gbps per port* RoCE (Mellanox ConnectX-2, Mellanox ConnectX-3*) Ethernet-based protocol Works with high-end /40GbE switches Offers up to 40Gbps per port today* RDMA not routable via existing IP infrastructure Requires DCB switch with Priority Flow Control (PFC) InfiniBand (Mellanox ConnectX-2, Mellanox ConnectX-3*) Offers up to 54Gbps per port today* es typically less expensive per port than switches* es offer or 40GbE uplinks Commonly used in HPC environments Not an Ethernet-based protocol RDMA not routable via existing IP infrastructure Requires InfiniBand switches Requires a subnet manager (on the switch or the host) * This is current as of the release of Windows Server 8 beta. Information on this slide is subject to change as technologies evolve and new cards become available.
SMB Direct Performance Workload IO Size IOPS Bandwidth Latency Large IOs, high throughput (SQL Server DW) 512 KB 4,210 2.21GB/s 4.41ms Typical application server (SQL Server OLTP) 8 KB 214,000 1.75GB/s 870µs Small IOs, high IOPs (not typical, benchmark only ) 1 KB 294,000 0.30GB/s 305µs Preliminary results based on Windows Server 8 beta
SMB Multichannel SMB MULTICHANNEL
SMB Multichannel Full Throughput Bandwidth aggregation with multiple s Multiple CPUs cores engaged when using Receive Side Scaling (RSS) Automatic Failover SMB Multichannel implements end-to-end failure detection Leverages teaming (LBFO) if present, but does not require it Automatic Configuration SMB detects and uses multiple network paths Single RSS-capable SMB Client SMB Server Sample Configurations Multiple s SMB Client SMB Server Multiple in LBFO team SMB Client 10G be LBFO 10G be 10G 10G be be LBFO SMB Server Multiple RDMA s /IB /IB /IB SMB Client SMB Server /IB /IB /IB
SMB Multichannel Single 1 session, without Multichannel Can t use full 10Gbps Only one TCP/IP connection Only one CPU core engaged 1 session, with Multichannel Full 10Gbps available Multiple TCP/IP connections Receive Side Scaling (RSS) helps distribute load across CPU cores SMB Client CPU utilization per core SMB Client CPU utilization per core SMB Server Core 1 Core 2 Core 3 Core 4 SMB Server Core 1 Core 2 Core 3 Core 4
SMB Multichannel Multiple s 1 session, without Multichannel No automatic failover Can t use full bandwidth Only one engaged Only one CPU core engaged 1 session, with Multichannel Automatic failover Combined bandwidth available Multiple s engaged Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
MB/sec SMB Multichannel Performance SMB Client Interface Scaling - Throughput Preliminary results using four s simultaneously Linear bandwidth scaling 1 1150 MB/sec 2 s 2330 MB/sec 3 s 3320 MB/sec 4 s 4300 MB/sec Leverages support for RSS (Receive Side Scaling) to engage multiple CPU cores per Bandwidth for small IOs is bottlenecked on CPU 4500 4000 3500 3000 2500 2000 1500 1000 500 0 1 x 2 x 3 x 4 x I/O Size Preliminary results based on Windows Server 8 Developer Preview http://go.microsoft.com/fwlink/p/?linkid=227841
SMB Multichannel + LBFO 1 session, with LBFO, no MC 1 session, with LBFO and MC Automatic failover Can t use full bandwidth Only one engaged Only one CPU core engaged Automatic failover (faster with LBFO) Combined bandwidth available Multiple s engaged Multiple CPU cores engaged SMB Client 1 LBFO SMB Client 2 LBFO SMB Client 1 LBFO SMB Client 2 LBFO LBFO SMB Server 2 LBFO SMB Server 2 LBFO SMB Server 1 LBFO SMB Server 2
SMB Direct and SMB Multichannel 1 session, without Multichannel 1 session, with Multichannel No automatic failover Can t use full bandwidth Only one engaged RDMA capability not used Automatic failover Combined bandwidth available Multiple s engaged Multiple RDMA connections SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 R- 32GbIB R- 32GbIB R- R- R- 32GbIB R- 32GbIB R- R- 32GbIB 32GbIB 32GbIB 32GbIB R- 32GbIB R- 32GbIB R- R- R- 32GbIB R- 32GbIB R- R- SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
Troubleshooting SMB Multichannel PowerShell Get-NetAdapter Get-SmbServerNetworkInterface Get-SmbClientNetworkInterface Get-SmbMultichannelConnection Event Log Application and Services Log, Microsoft, Windows, SMB Client Performance Counters SMB2 Client Shares
SMB Scale-Out SMB SCALE-OUT
Historical: Windows Server 2008 R2 Active-Passive Multiple File Servers 2+ logical file servers 2+ virtual IP addresses Access to disparate shares through different nodes Active-Passive Single File Server 1 logical file server 1 virtual IP address Active/Passive FSA=10.1.1.3 \\FSA\Share1 \\FSB\Share1 \\FSA\Share1 \\FSA\Share2 FSA=10.1.1.3 FSB=10.1.1.4 Client Leverage investment More complex to manage Multiple names Single name Simple Easy to manage Client \\FSA\Share 1 \\FSA\Share2 \\FSA\Share1 \\FSB\Share1 Name=FSA IP=10.1.1.3 Name=FSA IP=10.1.1.3 Name=FSB IP=10.1.1.4 Active Passive Active for FSA Active for FSB File Server Cluster File Server Cluster
File Server for scale-out application data New in Windows Server 8 Targeted for server app storage Example: Hyper-V and SQL Server Increase available bandwidth by adding cluster nodes Hyper-V Cluster (Up to 64 nodes) Key capabilities: Active/Active file shares Fault tolerance with zero downtime Fast failure recovery CHKDSK with zero downtime Support for app consistent snapshots Support for RDMA enabled networks Optimization for server apps Simple management Data Center Network (Ethernet, InfiniBand or combination) Single Logical File Server (\\FS\Share) Single File System Namespace Cluster Shared Volumes File Server Cluster (Up to 4 nodes)
New File Server Type File Server for scale-out application data Manage all nodes as a single file share service Leverages: Clustered Shared Volumes (CSV) Single File System Namespace no drive letters CSV volumes are online on all cluster nodes Distributed Network Name (aka DNN name) Manages DNS registration and deregistration of node IP addresses Round Robin DNS to distribute clients Requirements: Windows Failover Cluster with CSV Both server application and file server cluster must be running SMB 2.2 SMB1 and earlier clients cannot connect to scale-out file shares Scale-Out File Server
Putting it all together PUTTING IT ALL TOGETHER
1. SMB Direct High throughput with low CPU utilization and low latency 2. SMB Multichannel Load balance with multiple interfaces Failover with multiple interfaces 3. SMB Transparent Failover Zero downtime for planned/unplanned events 4. SMB Scale-Out Active/active file shares across cluster nodes 5. Clustered Shared Volumes (CSV) SMB used for inter-node traffic 6. SMB PowerShell Management of File Shares Enabling and disabling SMB features 7. SMB Performance Counters Provide insight into storage performance Equivalent to disk counters 8. SMB Eventing Putting it all together Administrator 6 7 8 Hyper-V Parent 1 Config VHD 1 Child 1 Disk File Server 1 Disk Share CSV Disk Disk Shared SAS Storage Hyper-V Parent N Config VHD 1 2 Share CSV File Server 2 Disk Child N Disk 1 2 1 2 1 2 1 2 4 5 3 5 2 4
Thank you!