June 22, 2010 Software Datapath Acceleration for Stateless Packet Processing FTF-NET-F0817 Ravi Malhotra Software Architect Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink
Agenda What can be accelerated Stateless and stateful Various applications Sample TCP offload Soft Data Path Engine Architecture Feature set Packet flow DPE API Performance Soft DPE advantage Leverage key hardware offloads Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 2
Stateful Path and Stateless Packet Processing Most network packet processing protocols can be broken down into two paths Stateless path, also known as the data path, requires quick and efficient switching/routing of packets Can be broken down into packet identification (classification) and forwarding Stateful path, also known as the control path, requires more processing and has more inherent latency than the data path Stateful control path requires 90% of the code and is used 10% of the time. Stateless data path requires just 10% of the code and is used 90% of the time. This session focuses on how to accelerate the 10% of the code in the stateless path to increase packet processing performance. Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 3
Stateless Data Path for Different Applications Application Data Path Control Path Layer 2 bridging IPv4 forwarding NAPT FDB lookup, VLAN add/delete, Learning Dest-cache lookup, L2 modify 5-tuple lookup, IP/Port/L2 modify Aging, STP LPM route-table lookup, ARP, IP Options Connection setup/destroy, policy, ALG Firewall Access control list, pin-holes Stateful packet inspection, ALG IPSec QoS 5-tuple lookup, encap/decap + crypto Enforcement sched, police, congestion, shaper SA setup, security policy Policy, provisioning Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 4
Netfilter Connection Tracking Connection established/ assured event Connection destroyed Rule/Stream Tables DPA Control Module Event Handlers Connection Offload Success New Connection Connection Deleted request Aging status Connection Aging Subsystem Conn destroy/ageout Probe Status NetFilter Hooks Networking Stack Ack Create Rule Asynchronous Offload Mechanism Ack Create Stream Dynamic Connection Offloading with Soft DPA (L4 TCP NAPT flow, no QoS) Control Path Ageout Delete listrule Run Aging Delete Stream Lkup FIN/ACK Miss Hit Data Path (terminate pkt) Classifier/ Action Table Asynchronous Low Level API Stateless Data Path Engine Pkt Flow Ctrl Flow Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 5
Current Linux Forwarding Data Packet flow Control Packet flow Configuration flow Platform SoC P1, P2 P1010, P1020, P2020, 85xx, 83xx P3, P4, P5 P4080, P3040, P5020 Control Plane Applications (DHCP/DNS/IGMP etc) e500 Cores Linux User-space Linux Kernel Linux Network Stack SEC/ QM Driver SEC (QM) etsec/qm Driver etsec/qm Driver etsec or FM-QM-BM etsec or FM-BM-QM Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 6
Data Packet flow Control Packet flow Configuration flow Stateless Data Path Procesing in QE Platform SoC P1, P2 8323, 8360, 8569 e500 Core QE RISC Cores Linux User-space Linux Kernel RISC Microcode Control Plane Applications (DHCP/DNS/IGMP etc) UCC Network Driver Linux Network Stack Control Logic DPE API Interworking microcode UCC Completely re-used from Linux Existing Solution for 8360 and 8323 SEC Network Interface Network Interface Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 7
Stateless Data Path Processing in Software Data Packet flow Control Packet flow Configuration flow Platform SoC P1, P2 P1010, P1020, P2020, 85xx P3, P4, P5 N/A Control Plane Apps (DHCP/DNS/IGMP/IKE etc) VortiQa CP + NMS e500 Cores Linux User-space Linux Kernel Linux Network Stack Control Logic VortiQa Network Stack Completely re-used from QE based Platforms DPE API VeTSEC Driver Soft Data Path Engine VeTSEC Driver SEC Driver SEC VeTSEC VeTSEC Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 8
Soft Data Path Engine Feature List Stateless packet processing (all stateful processing including ALG, SPI firewall, ARP, routing, learning etc. done by control-path) Offloads following stateless processing IPv4 forwarding NAPT/firewall (ACL) processing Layer 2 switching with VLAN IPSec forwarding Quality of service Support for the following interfaces: Ethernet VLAN PPPoE WLAN Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 9
Soft Data Path Engine Feature List (cont.) Maintenance Per-flow statistics and aging Platform support Multicore support over VeTSEC Provides a standard configuration across platforms Integrates seamlessly with Linux networking stack and applications using SWANG package Integrates seamlessly with VortiQa networking stack and customer network stacks Leverages hardware acceleration (hashing, scheduling, classification, security) where available Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 10
Soft Data Path Engine Functional Model Control Plane (s) Application Offload Crypto, PME etc. Backplane Processing / Inter-plane/processor communication Ingress HM-ops Data Path Engine Egress HM ops Recycle Other Data Path Engine Recycle Policer Classification/ Lookup Scheduler/ Shaper Packet Parsing Rx Processing Tx Processing Network Interface Network Interface Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 11
Data Path Engine API Architecture Overview Buffer Manager 1 Bandwidth Manager 1 ETH Tx Q 2 PHY Logical HdrMan 2 Port 1 Tx Q 3 Stream 3 MAC Shaper Scheduler Tx Q 1 HdrMan 1 Stream 1 Stream 2 error traffic Send(data, stream3) Rx Queues Lookup Stream 4 Stream 5 Rx Q 1 Rx Q 2 Control Path PHY MAC ETH Logical Port 2 Shaper Scheduler Tx queues Classification Rule_1 Rule_2 Bandwidth Manager 2 Buffer Manager 2 Rule_3 Match HdrMan 3 Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 12
Soft Data Path Engine Performance Advantage Results on P2020 RDB - 1200/600/400 : 2-core SMP Linux 2000 IPv4 Linux IPv4 Soft DPA % Diff 2.50 NAPT Linux NAPT Soft DPA % Diff 2000 6.00 IPSec Linux IPSec Soft DPA % Diff 600 3.00 1500 2.00 1500 5.00 500 2.50 Throughput in Kpps 1000 1.50 1.00 % Increase Throughput in Kpps 1000 4.00 3.00 2.00 % Increase Throughput in Kpps 400 300 200 2.00 1.50 1.00 % Increase 500 0.50 500 1.00 100 0.50 0 0.00 0 0.00 0 0.00 64 390 1500 64 390 1500 64 390 1456 IPv4 NAPT IPSec Significant (2x to 5x) performance improvement over native Linux Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 13
Soft Data Path Engine Multicore Scaling Results on P2020 RDB - 1200/600/400: 1-core non-smp vs. 2-core SMP Linux Scaling Limited by SEC HW IPv4 NAPT IPSec Scaling factor of > 1.8x when migrating from 1-core to 2-core Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 14
Soft Data Path Engine Flow Scaling Results on P2020 RDB - 1200/600/400 : 2-core SMP Linux 64 byte traffic IPv4 NAPT IPSec Low performance degradation for handling multiple flows Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 15
Data Path Hardware Acceleration Core(s) Network Stack (SMP optimized) Autonomous aware Drivers/API Look-Aside Offload Generic Offload Ingress Offload Autonomous Processing Egress Offload Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 16
Hardware Acceleration Support Offload Feature Advantage Ingress Generic Hash calculation Coarse classification Packet parsing Hardware buffer management Hardware queue management Packet distribution to multiple cores, flow-pinning, table lookup Offload stateless ACL processing Avoid software overhead No buffer alloc/free operations in software Simpler packet Rx/Tx, efficient stashing (to L1/L2), leaves room in cache for other data Egress Hardware QoS Avoid software overhead, mitigate DoS attacks, prioritize CPU cycles Core Backside L2 cache Faster access for multiple flow tables Look- Aside Protocol-aware cryptography Offload protocol encapsulation/decapsulation, sequence tracking etc. Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 17
Hardware Acceleration Advantage Cycles Throughput (kpps) Absolute Tput % 300.00 Throughput in kpps 250.00 200.00 150.00 Relative speedup 100.00 Baseline IPv4 + QoS Shaping + WFQ WRED Policing Hash results in FD Parse results in FD HM ops in HW HW Buffer HW Queue Mgmt Mgmt Stash on Dequeue BS L2 cache Hardware Acceleration provides upto 2.5x improvement Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 18
Summary Software data path engine Optimized packet processing path Consistent interface across platforms Easy integration with network stacks Single solution across QorIQ LE/ULE platforms Performance advantage Flexibility to leverage hardware acceleration Optimized for multicore scaling Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMAROS, TurboLink 19