FlexPath Network Processor



Similar documents
Cisco Integrated Services Routers Performance Overview

Software Datapath Acceleration for Stateless Packet Processing

Getting the most TCP/IP from your Embedded Processor

Definition of a White Box. Benefits of White Boxes

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Architectures and Platforms

Multiprocessor System-on-Chip

Foundation for High-Performance, Open and Flexible Software and Services in the Carrier Network. Sandeep Shah Director, Systems Architecture EZchip

Networking Virtualization Using FPGAs

Reconfig'09 Cancun, Mexico

Architecture of distributed network processors: specifics of application in information security systems

Intel DPDK Boosts Server Appliance Performance White Paper

NFV Network and Compute Intensive H/W Acceleration (using SDN/PI forwarding)

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

DynaCORE Coprocessor

IP videoconferencing solution with ProCurve switches and Tandberg terminals

Optimizing TCP Forwarding

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Quality of Service Analysis of site to site for IPSec VPNs for realtime multimedia traffic.

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014

Dynamic Network Resources Allocation in Grids through a Grid Network Resource Broker

4 Internet QoS Management

Operating System Support for Multiprocessor Systems-on-Chip

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC

Lustre Networking BY PETER J. BRAAM

Voice over Internet Protocol (VoIP) systems can be built up in numerous forms and these systems include mobile units, conferencing units and

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

Network Simulation Traffic, Paths and Impairment

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

Introduction VOIP in an Network VOIP 3

Requirements of Voice in an IP Internetwork

OpTiMSoC Build Your Own System-on-Chip!

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Performance Evaluation of Linux Bridge

Parallel Firewalls on General-Purpose Graphics Processing Units

Master Course Computer Networks IN2097

MMPTCP: A Novel Transport Protocol for Data Centre Networks

Performance of Software Switching

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Programmable Networking with Open vswitch

LogiCORE IP AXI Performance Monitor v2.00.a

Chapter 7 outline. 7.5 providing multiple classes of service 7.6 providing QoS guarantees RTP, RTCP, SIP. 7: Multimedia Networking 7-71

Layer 2 Network Encryption where safety is not an optical illusion Marko Bobinac SafeNet PreSales Engineer

Mobility research group

PC-over-IP Protocol Virtual Desktop Network Design Checklist. TER Issue 2

DS3 Performance Scaling on ISRs

Network Function Virtualization

Accelerating 4G Network Performance

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)

QoS in PAN-OS. Tech Note PAN-OS 4.1. Revision A 2011, Palo Alto Networks, Inc.

Achieving Low-Latency Security

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Boosting Data Transfer with TCP Offload Engine Technology

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Leveraging Advanced Load Sharing for Scaling Capacity to 100 Gbps and Beyond

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Open Flow Controller and Switch Datasheet

Software Stacks for Mixed-critical Applications: Consolidating IEEE AVB and Time-triggered Ethernet in Next-generation Automotive Electronics

Bivio 7000 Series Network Appliance Platforms

Using a Generic Plug and Play Performance Monitor for SoC Verification

Frequently Asked Questions

A Preferred Service Architecture for Payload Data Flows. Ray Gilstrap, Thom Stone, Ken Freeman

Deploying Silver Peak VXOA with EMC Isilon SyncIQ. February

Kirchhoff Institute for Physics Heidelberg

Use Cases for the NPS the Revolutionary C-Programmable 7-Layer Network Processor. Sandeep Shah Director, Systems Architecture EZchip

Steve Worrall Systems Engineer.

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Configuring an efficient QoS Map

WhitePaper: XipLink Real-Time Optimizations

Improving Quality of Service

Requirements for Simulation and Modeling Tools. Sally Floyd NSF Workshop August 2005

Design Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

A Transport Protocol for Multimedia Wireless Sensor Networks

White Paper. Accelerating VMware vsphere Replication with Silver Peak

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Data Center and Cloud Computing Market Landscape and Challenges

VMWARE WHITE PAPER 1

40G MACsec Encryption in an FPGA

Open vswitch and the Intelligent Edge

QoS (Quality of Service)

GATEWAY TRAFFIC COMPRESSION

Investigation and Comparison of MPLS QoS Solution and Differentiated Services QoS Solutions

VegaStream Information Note Considerations for a VoIP installation

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)

ReCoSoC'11 Montpellier, France. Implementation Scenario for Teaching Partial Reconfiguration of FPGA

Application Delivery Testing at 100Gbps and Beyond

High-performance vswitch of the user, by the user, for the user

Chapter 5 Configuring QoS

SILVER PEAK ACCELERATION WITH EMC VSPEX PRIVATE CLOUD WITH RECOVERPOINT FOR VMWARE VSPHERE

Transcription:

FlexPath Network Processor Rainer Ohlendorf Thomas Wild Andreas Herkersdorf Prof. Dr. Andreas Herkersdorf Arcisstraße 21 80290 München http://www.lis.ei.tum.de

Agenda FlexPath Introduction Work Packages (2nd Phase) Multi-Processor Extensions Load Balancing Techniques in FlexPath Dedicated Assignment Packet Spraying FlexPath Implementation / Demonstrator Project Highlights and Outlook FlexPath NP - 2

Basic Ideas of FlexPath NP Flexible packet processing paths on-chip Hardware decision on packet path depending on networking application / protocol Per-packet analysis in real-time Run-time reconfigurable rule base Supports varying traffic patterns over time Hardware-offload relieves programmable resources Ingress Hardware Processing Pipeline Egress Hardware Processing Pipeline AutoRoute-Path CPU Path FlexPath NP - 3

Work Packages of Second Phase MPSoC Enhancements How to notify Processing Elements (CPU, HW Acc.)? Multi-Processor Interrupt- Controller / Packet Distributor Software / Drivers Load Balancing Techniques How to get the optimal ressource utilization? Evaluation of different balancing concepts Mapping on FlexPath NP FPGA Demonstrator Fully working Network Processor on FPGA Demonstrator Plattform Demonstration tomorrow! FlexPath NP - 4

Multi-Processor Extensions Fully Multi-Processor ready Interrupt Controller Based on Xilinx Single Processor IntC Extended by several configuration registers (e.g. to configure queue- CPU assignment) Atomic register read out (instead of read/modify/write-back) Packet Distributor Priority based Supports Dedicated Assignment and Spraying CPU 3 CPU 2 CPU 1 Packet Distributor CPU 0 IDLE BUSY Path Dispatcher FlexPath NP - 5

Load Balancing Techniques Load Balancing is actually not a new problem: Cluster load balancing in HPC but also for NPUs So what's in the NP literature? Simple Hashing & Packet Spraying for "aggressive flows" (Dittmann, 2002) Adaptive HRW Hashing (Kencl, 2003) Adaptive Burst Shifting (Shi, 2005) HABS (Kencl/Shi, 2006) All known techniques applied uniformly to all packets homogeneous balancing no QoS consideration Important side effect: Flow reassignments may lead to packet reordering Packet reordering significantly reduces the efficiency of TCP (>90% of all network packets) FlexPath NP - 6

Load Balancing - Networking Applications IP Router: 90% of traffic uses TCP 50% of TCP packets are ACKs <10% of packets are DSCP marked Plain routing is "stateless": each packet is independent from other packets QoS behavior (DSCP) preferential treatment hot candidate for AutoRoute Security / Crypto Gateways: Either requires high processing per packet ( N x 1000 instr / pkt), or invocation of dedicated HW accelerators / coprocessors IPSec is a stateful protocol connection parameters (keys, etc.) sequence numbers processing on same instance optimal may require HW/SW partitioning and use of crypto cores IPsec Tunnel VPN GW GW Corporate Corporate NW NW 2 2 Corporate Corporate NW NW 1 1 VPN GW GW Internet Internet FlexPath NP - 7

Load Balancing - Stateless Networking Applications no shared connection state is maintained any processor can work on any packet dedicated assignment flow processor may be inefficient "bursty" traffic patterns lead to temporal overloads high balancing overhead needed to compensate overloads packet spraying is a "self-organizing" solution: assignment of packets to a single queue (difference to Dittmann) every idle CPU fetches packets from this queue QoS may be implemented by providing additional "high priority" queues Dittmann (2002): FlexPath (2009): Path Ctrl. Path Disp. Scheduler Packet Reordering Problem addressed in FlexPath by Path Control! FlexPath NP - 8

Load Balancing - Stateful Networking Applications Hashing-based assignment of flows to processors well suited for stateful processing without run-time adaptation: problem of biased hash bundle sizes (Dittmann) run-time adaptation (HRW, Kencl) relatively complex to implement HLU: simple, adaptive flow assignment hashing-based, high flow mapping persistence, low implementation complexity ρ=65% ρ=35% ρ=40% ρ=60% ρ=45% ρ=55% CPU1 CPU2 CPU1 CPU2 CPU1 CPU2 Fl. 1 Fl. 2 Fl. 4 Fl. 2 Fl. 4 Fl. 3 Fl. 4 Fl. 3 Fl. 5 Fl. 3 Fl. 5 Fl. 7 Fl. 5 Fl. 7 Fl. 6 Fl. 7 Fl. 6 Fl. 1 Fl. 6 Fl. 1 Fl. 2 FlexPath NP - 9

Load Balancing - System Simulation 100.0000% 10.0000% FlexPath: assignment for stateful traffic slightly better than Kencl's adaptive scheme Loss Rate (%) 1.0000% 0.1000% 0.0100% 0.0010% 0.0001% HABS HLU spray AHH 0.0000% FlexPath: assignment for stateless traffic achieves lossless operation with fewest CPUs! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Number of PEs in Processor Cluster FlexPath NP - 10 all dedicated assignment schemes reach a minimum packet loss floor!

Average Packet Latency (µs) Technische Universität München Load Balancing - System Simulation 10,000 1,000 100 10 FlexPath: spraying with 2 priorities QoS guarantee HABS QoS Latency HABS IPSec Latency HABS BE Latency S&H QoS Latency S&H IPSec Latency S&H BE Latency Separating IPsec and Forwarding in FlexPath better Forwarding latency 5 6 7 8 9 10 11 12 13 14 15 16 Number of PEs in Processor Cluster FlexPath NP - 11

FlexPath Demonstrator - Floorplan Xilinx ML410 (Virtex-4 FX60) Slices: 18,549 (73%) BRAMs: 119 (51%) PPCs: 2 (100%) EMACs: 2 (100%) Critical Path: 9.965 ns => f max =100.351 MHz FlexPath NP - 12

FPGA Prototype - Measurement Results (1) 100% 40 100% 40 CPU Load (%) 90% 80% 70% 60% 50% 40% 30% Data Plane CPU 1 [%] IPSec [kpps] Fwd [kpps] 36 32 28 24 20 16 12 Packet Rate (kpps) CPU Load (%) 90% 80% 70% 60% 50% 40% 30% Data Plane CPU 1 [%] IPSec [kpps] Fwd [kpps] 36 32 28 24 20 16 12 Packet Rate (kpps) 20% Path Dispatcher 8 20% Path Dispatcher 8 10% 4 10% 4 0% 0 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 IPSec Traffic Share (kbit/s) 0% 0 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 IPSec Traffic Share (kbit/s) 70% CPU load with IP forwarding on single CPU core (latency: ~20 µs) small IPSec data rate dramatically increases CPU load (latency: >1 ms) IP forwarding performance declines before CPU gets saturated classification in Path Dispatcher reduces CPU load to 37% (latency: ~10 µs) IPSec packets are lost when more than 2.7 Mbps are injected FlexPath NP - 13

FPGA Prototype - Measurement Results (2) 100% 90% Data Plane Data Plane 40 36 100% 90% 40 36 CPU Load (%) 80% 70% 60% 50% 40% 30% Path Dispatcher CPU 1 [%] CPU 2 [%] IPSec [kpps] Fw d [kpps] 32 28 24 20 16 12 Packet Rate (kpps) CPU Load (%) 80% 70% 60% 50% 40% 30% Data Plane Path Dispatcher CPU 1 [%] IPSec [kpps] Fw d [kpps] 32 28 24 20 16 12 Packet Rate (kpps) 20% 8 20% 8 10% 4 10% 4 0% 0 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 IPSec Traffic Share (kbit/s) 0% 0 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 IPSec Traffic Share (kbps) forwarding traffic is either processed exclusively by second CPU or traffic is sprayed among both CPUs no more forwarding traffic is lost instead of using second CPU, forwarding traffic may also be AutoRouted... throughput chart identical to 2 CPU case AutoRoute latency ~0.5 µs vs. 10 µs FlexPath NP - 14

Demonstrator - NPU HW Enablement Overheads Parsing / Context builder: 2660 slices 5 BRAMs Classification / Balancing: 1446 slices, 14 BRAMs Manipulation: 1615 slices 3 BRAMs MicroBlaze CPU: 1533 slices, 4 BRAMs MicroBlaze CPU MicroBlaze CPU MicroBlaze CPU Forwarding performance: up to 3.2 Gbps (32 Bit @ 100 MHz), independent of packet size Forwarding performance: ~95 kpps, 366 Mbps* *assuming average packet size of 481 Byte (IMIX) FlexPath NP - 15

Project Highlights New Architectural Approach for Network Processors Hardware offload possibilites Run-time reconfigurable processing path assignment System adapts itself to various application requirements Application-aware Load Balancing Strategy Concept Evaluation by System-level simulations (SystemC) Prototype Implementation on FPGA and (Proof of concept) Publications Last Phase 1 Publications: International Workshop Paper 8 International Conference Papers "A Packet Classification Technique for On-Chip Processing Path Decision", WASP 2007, Salzburg 2 Journal Papers "A HW Packet Resequencer Unit for Network Processors", ARCS 2008, Dresden "A Processing Path Dispatcher in Network Processor MPSoCs", IEEE Transactions on VLSI, 10/2008 "FlexPath NP - A Network Processor Architecture with Flexible Processing Paths", SoC 2008, Tampere "An Efficient HW Architecture for Packet Re-sequencing in NP MPSoCs", DSD 2009, Patras "An Application-aware Load Balancing Strategy for NPs", HiPEAC 2010, Pisa FlexPath NP - 16

Outlook FlexPath NP - 17

Outlook Lawrence G. Roberts: "A Radical New Router"...... might be a FlexPath Router! Path Dispatcher determines processing path GP-Processing for unknown packets AutoRoute for known packet streams FlexPath NP - 18

Thank you! FlexPath NP - 19