Performance of Software Switching



Similar documents
Intel DPDK Boosts Server Appliance Performance White Paper

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

RouteBricks: A Fast, Software- Based, Distributed IP Router

Network Function Virtualization: Virtualized BRAS with Linux* and Intel Architecture

Cisco Datacenter 3.0. Datacenter Trends. David Gonzalez Consulting Systems Engineer Cisco

ADVANCED NETWORK CONFIGURATION GUIDE

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Cisco Integrated Services Routers Performance Overview

Unified Fabric: Cisco's Innovation for Data Center Networks

Autonomous NetFlow Probe

Evaluating the Suitability of Server Network Cards for Software Routers

Evaluation of 40 Gigabit Ethernet technology for data servers

CS 91: Cloud Systems & Datacenter Networks Networks Background

How To Improve Performance On A Linux Based Router

RFC 2544 Performance Evaluation for a Linux Based Open Router

High-performance vswitch of the user, by the user, for the user

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

MIDeA: A Multi-Parallel Intrusion Detection Architecture

Tyche: An efficient Ethernet-based protocol for converged networked storage

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

A Performance Analysis of Gateway-to-Gateway VPN on the Linux Platform

An Oracle Technical White Paper November Oracle Solaris 11 Network Virtualization and Network Resource Management

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

Challenges in high speed packet processing

STATE OF THE ART OF DATA CENTRE NETWORK TECHNOLOGIES CASE: COMPARISON BETWEEN ETHERNET FABRIC SOLUTIONS

Collecting Packet Traces at High Speed

Gigabit Ethernet Design

SummitStack in the Data Center

Open-source routing at 10Gb/s

Jun (Jim) Xu Principal Engineer, Futurewei Technologies, Inc.

AERONAUTICAL COMMUNICATIONS PANEL (ACP) ATN and IP

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

SUN DUAL PORT 10GBase-T ETHERNET NETWORKING CARDS

UEFI PXE Boot Performance Analysis

MPLS-TP. Future Ready. Today. Introduction. Connection Oriented Transport

PC-based Software Routers: High Performance and Application Service Support

Performance Evaluation of Linux Bridge

Networking Driver Performance and Measurement - e1000 A Case Study

Network Virtualization for Large-Scale Data Centers

Achieving Low-Latency Security

Brocade Solution for EMC VSPEX Server Virtualization

Introduction to MPIO, MCS, Trunking, and LACP

A way towards Lower Latency and Jitter

White Paper Abstract Disclaimer

Scaling Networking Applications to Multiple Cores

ON THE IMPLEMENTATION OF ADAPTIVE FLOW MEASUREMENT IN THE SDN-ENABLED NETWORK: A PROTOTYPE

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Router Architectures

The proliferation of the raw processing

Novel Systems. Extensible Networks

High Speed I/O Server Computing with InfiniBand

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Networking Virtualization Using FPGAs

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Virtual Software Routers: A Performance and Migration Study

Operating Systems Design 16. Networking: Sockets

High-Density Network Flow Monitoring

Using Intel AES New Instructions and PCLMULQDQ to Significantly Improve IPSec Performance on Linux*

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

How To Test A Microsoft Vxworks Vx Works (Vxworks) And Vxwork (Vkworks) (Powerpc) (Vzworks)

Design considerations for efficient network applications with Intel multi-core processor-based systems on Linux*

EVALUATING THE NETWORKING PERFORMANCE OF LINUX-BASED HOME ROUTER PLATFORMS FOR MULTIMEDIA SERVICES. Ingo Kofler, Robert Kuschnig, Hermann Hellwagner

PCI Express High Speed Networks. Complete Solution for High Speed Networking

- An Essential Building Block for Stable and Reliable Compute Clusters

Wave Relay System and General Project Details

Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014

A Packet Forwarding Method for the ISCSI Virtualization Switch

Presentation of Diagnosing performance overheads in the Xen virtual machine environment

Architecture of distributed network processors: specifics of application in information security systems

Lustre Networking BY PETER J. BRAAM

RDMA over Ethernet - A Preliminary Study

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

UPPER LAYER SWITCHING

Computer Organization & Architecture Lecture #19

QoS Switching. Two Related Areas to Cover (1) Switched IP Forwarding (2) 802.1Q (Virtual LANs) and 802.1p (GARP/Priorities)

Packet-based Network Traffic Monitoring and Analysis with GPUs

Control and forwarding plane separation on an open source router. Linux Kongress in Nürnberg

VMWARE WHITE PAPER 1

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Monitoring high-speed networks using ntop. Luca Deri

CHAPTER 3 STATIC ROUTING

On Multi Gigabit Packet Capturing With Multi Core Commodity Hardware

How Linux kernel enables MidoNet s overlay networks for virtualized environments. LinuxTag Berlin, May 2014

Rohde & Schwarz R&S SITLine ETH VLAN Encryption Device Functionality & Performance Tests

Steve Worrall Systems Engineer.

D1.2 Network Load Balancing

TE in action. Some problems that TE tries to solve. Concept of Traffic Engineering (TE)

Bioscience. Introduction. The Importance of the Network. Network Switching Requirements. Arista Technical Guide

DPDK Summit 2014 DPDK in a Virtual World

10/100/1000Mbps Ethernet MAC with Protocol Acceleration MAC-NET Core with Avalon Interface

Boosting Data Transfer with TCP Offload Engine Technology

Open Flow Controller and Switch Datasheet

Network Performance Evaluation of Latest Windows Operating Systems

Analyzing and Optimizing the Linux Networking Stack

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

1-Gigabit TCP Offload Engine

Transcription:

Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET)

Agenda Motivation Performance Aspects Software Switching Evaluation Results Conclusion

Motivation Commodity hardware is emerging as an option for specialized networking devices A cheap and flexible solution Acceptable performance up to ~40Gbps and higher Commercial products are available Excellent platform for network protocol research Implement and experiment with new protocols in the real world Debugging in user space is easy Performance with kernel module or specialized I/O engine

Motivation While moving from 1Gbps ports to 10Gbps ports, we wanted to also test the new Intel Sandy Bridge microarchitecture More specifically, we were interested in three questions 1. How do the improvements of the Sandy Bridge microarchitecture affect throughput? 2. What is the effect of specialized packet processing software on throughput? 3. What kind of throughput can be expected from a high-end single CPU setup with current hardware and software?

Performance Aspects Both hardware and software of the platform have key roles in the total device performance Hardware features Direct Memory Access Multi-core processors with on-die caches Modern (point to point) I/O buses (QPI, PCI-E, ) Multi-queue network interface cards Software-driven features Interrupt scheme (NAPI, softirqs, polling, ) Batching Memory management (huge buffers, buffer pools, ) Thread/process affinity Data locality

Software Switching Linux implements 802.3d as a kernel module Network I/O using normal Linux network stack PacketShader has a Packet-IO concept that seeks to minimize memory copy when forwarding packets Click modular router is a framework for building network processing nodes Functions in user space and as a (Linux) kernel module Click routers are composed of Elements Network I/O, Processing, Monitoring, Classifying, We have implemented TRILL as a set of Click Modular Router Elements TRILL is a next generation bridging protocol from IETF Tunnels packets through the network Brings features from layer three protocols to layer two, f.ex. hop counts and ECMP Implementation emphasis on easy extendability and performance

Evaluation Setup Compare different software implementations Linux Bridge Click raw I/O PacketShader Click with our TRILL implementation Quad core processors with either 4 single queue 1Gbps ports 4 10 Gbps ports Different topologies with 2 and 4 port of aggregated traffic TRILL I/O performance evaluation requires both Edge- and Transit nodes Maximum lossless throughput in frames per second with various Ethernet payloads (RFC 2544)

Measurement Results: One Bidirectional Flow Throughput (Nehalem, 1 Gbps)

Measurement Results: Two Bidirectional Flows Throughput

Evaluation Results: Raw I/O Throughput Nehalem Click SNB Click Nehalem PacketShader SNB PacketShader 40 35 71% -1% -15% Gigabits per Second 30 25 20 15 10 40% 100% 2% 5 0 78 imix (~430B) 1518 Frame Size in Bytes

Evaluation Results: Processed I/O Throughput Nehalem Click IP SNB Click IP Nehalem Linux IP 40 35 SNB Linux IP Nehalem Linux Bridge SNB Linux Bridge 66% 2% 15% 15% Gigabits per Second 30 25 20 15 10 5 40% 7% 6% 8% 4% 0 78 imix (~430B) 1518 Frame Size in Bytes

Conclusion Thoughput does not scale linearly with small frame sizes in all techs Single queue network interface cards do not fully benefit from modern multi-core processors The Sandy Bridge platform excels especially with small frames when used with a specialized packet processing engine Linux network stack overhead eats most of the performance benefits of Sandy Bridge Our TRILL implementation performs similarly with Linux bridging component and Click raw I/O and benefits the most from parallelization The long term goal is to reach a point in performance and scaling, where the hardware limits are the definite bottleneck (but where?)