QoS & Traffic Management



Similar documents
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Linux NIC and iscsi Performance over 40GbE

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Performance and Recommended Use of AB545A 4-Port Gigabit Ethernet Cards

PERFORMANCE TUNING ORACLE RAC ON LINUX

Building Enterprise-Class Storage Using 40GbE

Introduction to MPIO, MCS, Trunking, and LACP

VLAN for DekTec Network Adapters

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Flash Storage Gets Priority with Emulex ExpressLane

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

ABC of Storage Security. M. Granata NetApp System Engineer

An Oracle Technical White Paper November Oracle Solaris 11 Network Virtualization and Network Resource Management

D1.2 Network Load Balancing

How To Evaluate Netapp Ethernet Storage System For A Test Drive

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

NComputing L-Series LAN Deployment

Demartek June Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment

Overview: X5 Generation Database Machines

Interoperability Testing and iwarp Performance. Whitepaper

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

Latency on a Switched Ethernet Network

3G Converged-NICs A Platform for Server I/O to Converged Networks

IP videoconferencing solution with ProCurve switches and Tandberg terminals

State of the Art Cloud Infrastructure

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

Private cloud computing advances

Improving Quality of Service

Building a Scalable Storage with InfiniBand

W H I T E P A P E R. VMware Infrastructure Architecture Overview

White Paper. Recording Server Virtualization

VXLAN Performance Evaluation on VMware vsphere 5.1

RoCE vs. iwarp Competitive Analysis

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Data Center Architecture Overview

Hosting Performance-Sensitive Applications in the Cloud

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

AIX NFS Client Performance Improvements for Databases on NAS

Michael Kagan.

Cloud Computing through Virtualization and HPC technologies

SUN DUAL PORT 10GBase-T ETHERNET NETWORKING CARDS

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

SLIDE 1 Previous Next Exit

Cloud Optimize Your IT

SMB Direct for SQL Server and Private Cloud

Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

10th TF-Storage Meeting

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Quantum StorNext. Product Brief: Distributed LAN Client

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

CloudLink - The On-Ramp to the Cloud Security, Management and Performance Optimization for Multi-Tenant Private and Public Clouds

CON Software-Defined Networking in a Hybrid, Open Data Center

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

Network Performance: Networks must be fast. What are the essential network performance metrics: bandwidth and latency

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

50. DFN Betriebstagung

Realizing the next step in storage/converged architectures

10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

High-performance vswitch of the user, by the user, for the user

Networking Virtualization Using FPGAs

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

AFDX networks. Computers and Real-Time Group, University of Cantabria

Network Function Virtualization

VMWARE WHITE PAPER 1

N5 NETWORKING BEST PRACTICES

Dell EqualLogic Best Practices Series

Block based, file-based, combination. Component based, solution based

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Network Considerations for IP Video

Virtual SAN Design and Deployment Guide

Frequently Asked Questions

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

OpenFlow Based Load Balancing

Fulvio Risso Politecnico di Torino

On real-time delay monitoring in software-defined networks

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

H.323 Traffic Characterization Test Plan Draft Paul Schopis,

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Doubling the I/O Performance of VMware vsphere 4.1

Physical Security EMC Storage with ISS SecurOS

Best Practices: Microsoft Private Cloud Implementation

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Pivot3 Reference Architecture for VMware View Version 1.03

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Transcription:

QoS & Traffic Management Advanced Features for Managing Application Performance and Achieving End-to-End Quality of Service in Data Center and Cloud Computing Environments using Chelsio T4 Adapters Chelsio Terminator 4 (T4) based network adapters have sophisticated features for both Traffic Management and Quality of Service (QoS). These features can be used to achieve required levels of performance by controlling how certain application I/O is handled within the Chelsio T4 NIC, allowing you to effectively control end to end QoS for various host applications using a shared 10GbE network resource. Any latency sensitive application can be assigned a high priority which allows it to bypass application traffic that is not latency sensitive. Two examples of high priority data include inter node communication by distributed lock managers in clustered database systems and streaming data traffic from stock market feed servers to high frequency trading systems. End-to-End Application Quality of Service (QoS) Dedicated Channel Dedicated Channel Host Apps Tag Tag Tag Tag Tag Tag Tag Tag 10GbE 10GbE T4 NIC Internet Cloud T4 NIC Host Apps Switch Switch Tag Tag High Priority Channel Lower Priority Channel Figure 1: Quality of Service in Chelsio T4 End to end QoS for application I/O is achieved by the Chelsio T4 NIC using dedicated and independent high priority hardware channels for receive and transmit. The desired QoS can be extended through the network using VLAN Tags to designate high priority packets, such as those needed for inter node communication in clustered databases. Chelsio s T4 NIC diverts high priority packets to a separate hardware channel allowing them to bypass lower priority packets. Tagged packets then pass through the intervening switches and are treated as high priority packets by the T4 NIC on the host at the other end, where the Chelsio T4 NIC again diverts high priority tagged packets to a dedicated channel to the host. This ensures end to end quality of service to latency sensitive traffic, even in the presence of high bandwidth traffic.

Advanced Traffic Management Traffic Management capabilities in the Chelsio T4 NIC can shape transmit data traffic through the use of sophisticated queuing and scheduling algorithms built in to the T4 ASIC hardware, which provide fine grained control over packet rate and byte rate. These features can be used in a variety of data center application environments to solve traffic management problems. Examples include configuring video servers to deliver predictable bandwidth and jitter, and implementing bandwidth provisioning for virtual interfaces carved out of a shared network resource in a virtualized host. Traffic flows from various applications are rate controlled, then prioritized and provisioned using a weighted round robin scheduling algorithm. For example, in Figure 2, App3 could be a cloud based SaaS application with a set of specified SLAs (service level agreements) that place demands on certain required levels of performance and QoS. Traffic from multiple flows can be load balanced, with programmable parameters (bandwidth and latency), across host CPUs to deliver predictable latency and bandwidth thus ensuring predictable host application performance (see Figures 8 & 10 for Chelsio T4 Traffic Management performance test results). In Summary, Traffic Management features in Chelsio s T4 adapters allow you to control three main functions: Guarantee low latency in the presence of high bandwidth (data mover) traffic Control maximum bandwidth that a connection, flow or class (group of flows) can use Allocate available bandwidth to several connections or flows based on desired levels of performance App1 App2 Streaming Data Mover App3 Transactional Multiple Connections per Transmit (Tx) Queue Up to 16 Classes of Connections per Channel Multiple Connections per Receive (Rx) Queue Up to 16 Classes of Connections per Channel Tx Q1 Tx Q2... Tx Q1000+ Tx Q1 Tx Q2... Tx Q1000+ Rx Q1 Rx Q2... Rx Q1000+ Programmable Rate Limiters Programmable Weighted Round Robin Queue Prioritization Programmable Weighted Round Robin Queue Prioritization Programmable Weighted Round Robin Queue Prioritization Traffic Manager Figure 2: Traffic Management in Chelsio T4

Applications for Traffic Management Features in Chelsio T4 Listed below are some of the key applications for Chelsio s Traffic Management Features: Messaging Traffic Cluster Interconnects Oracle Real Application Clusters (RAC) Shared DB Scale Out Architecture Oracle Real Applications Clusters (RAC) is an N+1 scale out shared data database architecture. A typical 4 Node RAC cluster is shown below. The RAC nodes are clustered using standard 1Gb or 10Gb Ethernet. Inter node Traffic (Red) (Figure 3) consisting of Cluster Synchronization and Distributed Lock Manager (DLM) Traffic is prioritized over the NAS or SAN storage Data Mover Traffic (Blue) allowing Inter node Traffic to be delivered with the lowest latency possible without affecting the bandwidth of the Data Mover Traffic. 4-Node Oracle Real Application Cluster with Traffic Management enabled Chelsio T4 10GbE Unified Wire Adapters RAC Instance1 RAC Instance2 10GbE Switch RAC Instance3 RAC Instance4 Oracle RAC Shared Database Storage NAS or isan (iscsi) Inter-Node & Lock Mgr Traffic Datamover Traffic RAC Database Clients Figure 3: Traffic Management with Oracle Real Application Cluster Interconnects Landmark Seismic Simulation High Performance Computing (HPC) Scale Out Clusters High Performance Computing (HPC) Clusters use an N+1 scale out computing architecture for numerical processing workloads in various vertical industry applications such as Energy, Biotech, Media and National Labs. The HPC nodes are clustered using standard 1Gb or 10Gb Ethernet. Messaging Traffic (Red) (Figure 4) consisting of HPC inter node communication & cluster synchronization traffic is prioritized over NAS Data Mover Traffic (Blue) allowing Messaging Traffic to be delivered with the lowest latency possible without affecting the bandwidth of the Data Mover Traffic. N-Node High Performance Computing Cluster Running Landmark Seismic Simulation Workloads with Traffic Management enabled Chelsio T4 10GbE Unified Wire Adapters Cluster Node 1 Cluster Node 2 10GbE Switch Cluster Node N Shared Storage NAS Messaging Traffic Datamover Traffic Landmark Seismic Simulation Application Clients Figure 4: Traffic Management with Landmark Seismic Processing HPC Cluster Interconnects

Storage QoS Implementing Storage QoS is an excellent use of Chelsio T4 s advanced Traffic Management functionality. In this example (Figure 5), all client traffic to LUN1 (show in Red) is prioritized over all other LUNs and Volumes in the unified storage server shown. LUN1 Client is running a demanding latency sensitive application that requires its traffic to be delivered with minimal latency and acceptable bandwidth. Bypassing all traffic to LUN1 over a dedicated channel allows LUN1 Client to meet the desired level of storage I/O QoS measured in terms of IOPs and latency. Unified Storage Server with Traffic Management enabled Chelsio T4 10GbE Adapters LUN2 Vol1 10GbE Switch Vol2 Logical View of Storage Server with iscsi/san LUNs and NFS/NAS Vols LUN2 & Vols1&2 Client Traffic LUN1 Client Traffic LUN2 Client iscsi Initiator Vol1 NFS Client Vol2 NFS Client LUN1 Client iscsi Initiator Figure 5: Traffic Management in Storage Servers Streaming Data Video Video streaming presents a very compelling application for Traffic Management. In this example (Figure 6) T4 traffic manages tens of thousands of IPTV and Web/HTTP video streams at the MPEG2/4 and AVC/H.264 rates ex. 200kbps, 500kbps, 1Mbps, 2.5Mbps, 4Mbps, 12Mbps and 18Mpbs. The traffic manager can handle multiple media rates simultaneously and manage each of the flows with low jitter. The media rate for a particular flow can be adjusted dynamically by moving the flow from one traffic class to another, e.g. from the traffic class for AVC/H.264 at 4Mbps to the class at 12Mbps, etc Video Streaming Server with Traffic Management enabled Chelsio T4 10GbE Adapters Web Streams Video Server with Two Different Classes of Video Streams 1000s of MPEG4 IPTV Streams 1000s of AV/H.264 Web Streams Internet Cloud IPTV Set-Top Box Client Web Browser Video Clients IPTV MPEG4 Streams Web AVC/H.264 Streams Figure 6: Traffic Management in Video Servers Web Browser Video Clients Web Browser Video Clients

Cloud Computing Multi Tenant Software as a Service (SaaS) Applications In SaaS based Cloud Computing environments, computing resources are carved up based on customer SLAs for their application. These SLAs specify availability and response time (example, query response time) requirements among other things. Based on these SLAs and the type of hosted SaaS application, multiple customers sharing the SaaS application and hardware infrastructure can be assigned different classes of network resources such as bandwidth and latency. A typical cloud based SaaS application configuration (see Figure 7) uses server, storage and network virtualization technologies to partition hardware resources to be shared among a set of SaaS customers. In Figure 7, Customer4 has a higher level of SLA (service level agreement) shown in Red. For example, the customer may require a more bandwidth or lower latency or both relative to the other customers (Customer 1 3). Each customer s hosted application environment is siloed within a Virtual Machine (VM) with well defined ACL security and SLA policies associated with each customer/vm container. Configuring and enabling Chelsio T4 Traffic Management for allocating shared network resources on a percustomer access basis is an extremely cost effective use of this powerful feature to meet or exceed customer SLAs. Multi-Tenant lication running in a Fully Virtualized Infrastructure Environment with Traffic Management enabled Chelsio T4 10GbE Adapters App for Customer4 App for Customer3 Internet Cloud Customer4 App for Customer2 Customer3 App for Customer1 Shared Computing and Storage Resources Allocated based on SAS Client SLAs Customer1 Customer4 Customers 1-3 Customer2 Figure 7: Traffic Management in Cloud Computing

T4 Performance Snapshot: QoS and Traffic Management The following graphs illustrate the benefits of the traffic prioritization provided by Chelsio s T4 based NICs, highlighting high priority traffic can be delivered with low latency even in the presence of high bandwidth traffic (large TCP sends see details in following section). In particular, Figure 8 shows that traffic management maintains latency at near baseline numbers, whereas a regular NIC would have shown an order of magnitude deterioration in latency performance. Application Latency Performance (Traffic Management Low Jitter) 100 80 60 40 20 0 Baseline No Traffic Mgmt With Traffic Mgmt Min (µs) Ave (µs) Max (µs) Figure 8: Application QoS Latency With Traffic Management is Shielded From Data Mover Traffic Application Throughput Performance (Traffic Management Bandwidth Unaffected) 11000 9000 7000 5000 3000 1000 Baseline (Netperf Standalone) No Traffic Mgmt (Netperf + UDP Verbs Simultaneously) With Traffic Mgmt (Netperf + UDP Verbs Simultaneously) Throughput (Mbps) Figure 9: Application QoS Throughput Aggregate is Preserved with Traffic Management App Performance Without Traffic Management Un-predictable 10X Swings in Performance App Performance With Traffic Management Predictable and Consistent Performance Time Time Figure 10: Application Performance is Predictable and Consistent with Traffic Management

T4 Performance Test Configuration: Traffic Management and QoS Performance Test Configuration & Hardware Specs Servers r9 & r10 Direct Connected with Traffic Management Enabled on Port0 Server r9 Server r10 Port 0 Port 1 Port 0 Port 1 Server r9 Configuration: Hardware: CPU: 2 x Quad Xeon x5482 8 cores total RAM: 8GB NIC: Chelsio T440-CR Software: RHEL 6.0, kernel 2.6.32.71 Drivers: Latest Chelsio T4 Drivers Server r10 Configuration: Hardware: CPU: 2 x Quad Xeon x5482 8 cores total RAM: 8GB NIC: Chelsio T440-CR Software: RHEL 6.0, kernel 2.6.32.71 Driver: Latest Chelsio T4 Drivers Dedicated Priority Channel for RR UDP (Latency Sensitive Traffic) Lower Priority Channel for Netperf Test (Throughput Traffic) Figure 11: Traffic Management Performance Test Configuration First, UDP low latency traffic and netperf throughput test run separately to baseline performance: [root@r9 ~]# udp_test -t rr -T 5 -I 192.168.42.111 192.168.42.112 Average RTT: 6.551 usec, RTT/2: 3.276 usec RTT/2: calls 763228 samples 3072 Ave 3.30113 µs, Min 3.21038 µs, Max 6.13572 µs [root@r9 ~]# netperf -H 192.168.42.112 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.42.112 (192.168.42.112) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/s 87380 65536 65536 10.00 9478.53 RR UDP & netperf tests run simultaneously without Traffic Management (single channel): [root@r9 ~]# netperf -H 192.168.42.112 &udp_test -t rr -T 5 -I 192.168.42.111 192.168.42.112 [1] 6419 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.42.112 (192.168.42.112) port 0 AF_INET Average RTT: 258.428 usec, RTT/2: 129.214 usec RTT/2: calls 19348 samples 3072 Ave 36.5201 µs, Min 3.24538 µs, Max 83.806 µs Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/s 87380 65536 65536 10.00 9482.35 Average latency increases tenfold from 3.3usecs to 36.5usecs without traffic management. Finally, RR UDP &netperf tests simultaneously with traffic management (two channels): [root@r9 linux_t4_build]# netperf -H 192.168.42.112 &udp_test -t rr -T 5 -I 192.168.42.111 192.168.42.112 [1] 7979 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.42.112 (192.168.42.112) port 0 AF_INET Average RTT: 8.508 usec, RTT/2: 4.254 usec RTT/2: calls 587716 samples 3072 Ave 4.20943 µs, Min 3.24626 µs, Max 9.08127 µs [root@r9 linux_t4_build]# Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/s 87380 65536 65536 10.00 9454.00 When the RR UDP traffic is mapped to an independent channel, average RR latency drops from 36.5 µs to 4.2 µs, which is near baseline. Note that due to the non preemptive sharing of the PCI bus and Ethernet MAC port, the maximum latency slightly increases due to wait for partial frame transmission. [root@r9 ~]# Recv Send Send 2011 Chelsio Communications, Inc. HQ: 370 San Aleso Ave, Sunnyvale, CA 94085 T: 408.962.3600 E: sales@chelsio.com W: www.chelsio.com Twitter: @Chelsio1