Network Virtualization Technologies and their Effect on Performance Dror Goldenberg VP Software Architecture TCE NFV Winter School 2015
Cloud Computing and NFV Cloud - scalable computing resources (CPU, Memory, Storage), services and applications on demand over the network with pay as you grow model Cloud networking is Virtualized. Network has to scale, isolate, perform, gauge and provide services NFV, enabled by Cloud Computing - focused on packet handling (TX/RX operations, DPI, encryption etc.) vcpu vcpu vcpu vcpu vcpu vcpu vmem vcpu vcpu vmem vcpu vcpu vnet vstor vcpu vcpu vstor vcpu vcpu In this talk we will concentrate on the network components required to realize an NFV instance and the ways to accelerate its network performance 2015 Mellanox Technologies TCE NFV Winter School 2015 2
Agenda Virtual interfaces overview Virtual switching overview -space networking overview -space middleware High-Performance Networking Solutions 2015 Mellanox Technologies TCE NFV Winter School 2015 3
Virtual Interfaces Device Emulation Host emulates a complete HW device E.g., Intel e1000 NIC Guest runs unmodified driver Pros No need to install special drivers in guests Transparent migration Unlimited virtual interfaces Cons Slow Emulation exists only for very simple devices High overhead Host Qemu process phys netdev SW Switch e1000 driver e1000 emulator macvtap netdev NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 4
Virtual Interfaces Para-Virtualization Host exposes a virtual SW-friendly device E.g., virtio-net runs special device driver Host emulates device back-end Host Qemu process virtio-net Pros Decent performance Transparent migration Unlimited virtual interfaces Cons Simple devices only phys netdev SW Switch virt-io emulator macvtap netdev NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 5
Virtual Interfaces Accelerated Para-Virtualization Same para-virtual control interface Fast path offloaded to host kernel vhost_net Host Qemu process virtio-net virt-io control SW switch phys netdev macvtap netdev vhost-net NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 6
Virtual Interfaces Physical Device Pass-Through Hosts grants guest direct access to a physical device Security and isolation still maintained - PCI configuration space is virtualized - IOMMU governs DMA access runs standard device driver Host Qemu process HW driver Pros Near-native performance s can use any device that is passed to them Cons No transparent migration Very limited scalability (physical devices are not shared) HW driver NIC 1 NIC 2 Switch 2015 Mellanox Technologies TCE NFV Winter School 2015 7
Virtual Interfaces Virtual Device Pass-Through (SR-IOV) Single Root I/O Virtualization (SR-IOV) Hosts grants guest direct access to a virtual device Security and isolation still maintained - PCI configuration space is virtualized - IOMMU governs DMA access runs device driver for virtual function Host Qemu process VF driver Pros Near-native performance High scalability (128-256 VFs) Cons No transparent migration NIC PF driver Phyiscal Function Virtual Function Embedded switch Switch 2015 Mellanox Technologies TCE NFV Winter School 2015 8
Virtual Switching - Overview Required for to communication (same host) Virtual switch - Linux bridge - OVS - NIC based hardware embedded switch (eswitch) Accelerated OVS over DPDK External switch (802.1Qbg) - Classical switch does not switch traffic back to the source port (thus required a change, hairpin ) - Unnecessary BW consumption on the NIC-Switch link All of the above holds also for Containers 2015 Mellanox Technologies TCE NFV Winter School 2015 9
Linux Bridge Compliance to 802.1D Emulates a legacy Layer 2 switch in software FDB, unknown destination flooding (FDB miss), spanning tree, etc. Linux Bridge vs a regular Host Bridge places NIC HW in promiscuous mode - Further offloads available (e.g. MAC, VLAN demux) The Bridge switches traffic between Local s to the adjacent switch through the Host NIC Host eth0 Linux Bridge eth0 NIC driver NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 10
OVS Open-vSwitch L2,3,4 and more Openflow based Fastpath vs Slowpath Fastpath flows cache - Consulted on every received packet (multi threaded) - Cache miss packet handover to user space Slowpath - ovs-vswitchd user space daemon - Openflow pipeline executed on the packet install a new entry in the cache for subsequent packets - The original packet is handed back to the fastpath module Host ovsdb ovsvswitchd eth0 Slow path openvswitch.kmod eth0 NIC driver NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 11
OVS Cont d Overall state is kept in local DB in user space L2 FDB, L3 fwd state, QoS policy, etc. Encap/Decap for tunneling Overwrite capabilities (change packet fields) Performance boost HW offload e.g. tunneling, packet processor, etc. space with DPDK support OVS VXLAN Tunneling Support OS OS OS vtap vtap vtap vtap BR0 Open vswitch (OVS) VXLAN Overlay UDP IP BR1 VNI100 VNI300 OS VxLAN Overlay (tenant) networks Underlay Network (Layer 2 or Layer 3) 2015 Mellanox Technologies TCE NFV Winter School 2015 12
NIC Hardware Embedded Switch OVS Control Path Enables switching between virtual SRIOV functions Utilize OVS control plane Fast path in HW embedded switch L0 cache like 2015 Mellanox Technologies TCE NFV Winter School 2015 13
Mode OVS and DPDK Move OVS data path from to user space space process uses the DPDK libraries to efficiently process packets s connect from Virtio through VHOST (in the kernel) to the Virtio of the DPDK process Host Qemu Virtio Driver Vhost* Qemu Virtio Driver Vhost* mode OVS DPDK Virtio PMD Virtio PMD DPDK PMD Space Physical NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 14
-space Networking - Overview HW managed by kernel Applications create resource via system calls, e.g. - Queues (descriptor rings) - Registering memory buffers Data-path bypasses kernel Shared queues between application and HW HW accesses registered buffers directly Direct signaling mechanism ( doorbells ) Direct completion detection - In memory polling - Can also register for event (interrupt) Multiple HW resources No need for locking if resources are accessed by a single thread Efficient Asynchronous progress Zero copy Application / Middleware Registered Memory HW kernel driver NIC Thread Thread Send Q Access library Recv Q HW user-space driver Comp Q 2015 Mellanox Technologies TCE NFV Winter School 2015 15
-space Networking Packet Interfaces Verbs Raw Ethernet Queues as an example Basic data path operations Post packet buffers to be sent out Post buffers to receive packets Poll for completions Checksum offloads for TCP/UDP Insert checksum on TX Validate checksum on RX VLAN insertion/stripping Receive-side scaling (RSS) Distribute incoming packets into multiple queues Distribution is semi-random (hash based) Flow steering Deterministic steering of specific flows to specific RQs Deliver very high packet rate to the application E.g., 25Mpps for 64b packets Application / Middleware Registered Memory HW kernel driver NIC Thread Thread Send Q Access library Recv Q HW user-space driver Comp Q 2015 Mellanox Technologies TCE NFV Winter School 2015 16
-Space Networking RDMA Interfaces Pass messages instead of packets Up to 2GB in size Semantics Channel (Message passing) - Requestor provides source buffer - Responder provides receive buffer Send Queue Receive Queue Remote Direct Memory Access (RDMA) - Requestor provides both source and target buffers - Both RDMA-read and write are supported Advanced RDMA operations Atomics - Compare & swap - Fetch & add - Multi-field Data integrity Extreme performance 700ns one-way latency between applications 40GE BW at negligible CPU utilization Packet rate > 35Mpps Send Queue 2015 Mellanox Technologies TCE NFV Winter School 2015 17
-Space Middleware Data Plane Development Kit (DPDK) A set of data plane libraries and network interface controller drivers for fast packet processing Functionality Memory/Queue/Buffer managers (Memory pools, pre-allocations of buffers, lockless queues) Poll-Mode-Driver - PMD (w/out asynchronous interrupt base signaling for speedup) Packet flow classification - L3 (LPM V4/V6) - 5 tuple (via Hash) - L2 (FDB via Hash) Encryption (via Extensions) - Intel advanced Encryption Standard, IPsec Minimal overhead Leverages user-space packet interfaces 2015 Mellanox Technologies TCE NFV Winter School 2015 18
-Space Middleware A TCP/IP Complete implementation of the TCP/IP stack in user-space Built over Raw Ethernet Queues interfaces Exposes standard socket interface to applications Existing applications need not be modified LD_PRELOAD used to hijack socket system calls Provides extreme TCP performance to applications 1.6us latency Saturates a 40GE link Unmodified Application -space TCP/IP stack Access library HW user-space driver HW kernel driver NIC 2015 Mellanox Technologies TCE NFV Winter School 2015 19
-Space Middleware Accelio - RDMA RPC Library High performance RPC library Request-response One-way messages Key properties Optimized for high-performance and multi-threading Pro-actor (event based) programming model Reliable message delivery Automated buffer management Connections and threads Unique HW connection between each pair of communicating threads No locking within library Asynchronous event loop Library controls CPU affinity, buffer management, and event monitoring Application callbacks invoked as needed Client Thread Thread Thread Thread Thread Server Thread Thread 2015 Mellanox Technologies TCE NFV Winter School 2015 20
High-Performance Networking Solutions for the VNFs Packet processing s Direct pass-through using SRIOV DPDK/Raw Eth Queues over VF within Accelerated virtual switching -space OVS implementation over DPDK HW switching (eswitch) High-throughput servers Direct pass-through using SRIOV Run user-space TCP/IP stack within Leverage RDMA offloading technologies Efficient synchronization, data structures access Storage Enjoy great performance while maintaining same management 2015 Mellanox Technologies TCE NFV Winter School 2015 21
Thank You