DynaCORE Coprocessor Dynamically Reconfigurable Coprocessor for Network Processors Carsten Albrecht, Roman Koch, Christoph Osterloh, Thilo Pionteck,, Erik Maehle Universität t zu LübeckL Head: Prof. Dr.-Ing Ing.. Erik Maehle I T I DFG-SPP SPP-1148 Final Colloquium Karlsruhe, September 24 th 2009
Overview Introduction System Architecture Key Components Internal Interconnect Runtime-Adaptive Network-on-Chip Architecture Buffer Sizes Fault Tolerance Fault Scenarios Stepwise Procedure Modelling DynaCORE Principles DynaCore Model Simulation Runtime Reconfiguration Point of Reconfiguration Technical Aspects Evaluation and Demonstrator Publications Summary 2
Situation Introduction (1/2) In-transit packet processing in edge routers Processing tasks Header processing Routing Quality-of-Service Accounting Payload processing Encryption/decryption Compression Intrusion Detection 3
DynaCOREApproach Introduction (2/2) DynaCORE = Dynamically adaptable COprocessor based on Reconfiguration Reconfigurable hardware accelerator for payload processing Allows flexible adaptation to changes in network traffic profile Dynamic partial reconfiguration of FPGA Combination of Network processor (e. g. FlexPath NP) header processing + DynaCORE (in Xilinx Virtex-4 FX) payload processing Loose coupling Gigabit Ethernet Suitable for various network processors 4
Transmit-Unit Static partition Overview System Hardware Architecture 1AssistDynamic partitionhardware 3 (1/3) Receive-Unit Interface Assist Reconfiguration Dispatcher (HW Manager + SW) Interface Hardware 2Assist Hardware Assist Reconfiguration 4 LogicICAP Application Type Type specific 0S Type H External memory 5 Type V
Transmit Unit Send processed packets back to NP Receive Unit/Dispatcher Components in the Static Partition System Architecture (2/3) Recognise requested type of processing Assign packets to suitable hardware assists Report to reconfiguration manager in case of unassignable packets Reconfiguration Manager Implemented as software running on embedded PowerPC Collect utilisation information from hardware assists, decide when and how to reconfigure Control actual process of reconfiguration, i.e. send configuration data to reconfiguration logic Reconfiguration Control Logic Write configuration data to FPGA-internal configuration access port (ICAP) Software-based Hardware Assist Backup processing unit Utilises additional hard-wired PowerPC cores (UltraController II) 6 I/O Interface
Hardware Assists Actual payload processing modules Equipped with universal, algorithmindependent interface Embedded off-the-shelf IP cores Switches Components in the Dynamic Partition System Architecture (3/3) Forward packets from static partition to HAs and back 7
CoNoChi Runtime-Adaptive Network-on on-chip (1/2) CoNoChi = Confígurable Network on Chip Hardware NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links Interface Assist (16 bit) Low hardware overhead compared to other NoCs Switches not needed for a certain setting of processing units phy log add add can be removed from the network low latency Support for QoS Physical and logical addresses Physical addresses: refer to specific switches at specific locations within the NoC topology Logical addresses: refer to processing entities inside hardware modules logical logical address address physical physicaladdress address 8
Topology Adaptation Runtime-Adaptive Network-on on-chip (2/2) Network topology can be adapted at runtime Coarse-grained tile Merging/separation of neighbouring tiles Provides space for modules of varying complexity Interface HA 6 HA 5Interface 9
DynaCORE Fault Tolerance (1/3) Fault scenarios: User data Non-permanent fault Huge hardware effort to detect and correct Tolerated by application area Processing units and infrastructure Device degradation Fault in hardware structure Single-Event Functional Interrupts (SEFIs) Bitflip in configuration data Approach: Combination of Configuration readback Slow (33 ms for one tile) Does not detect hardware faults Test packets Do not cover all faults Alive messages Missing alive message indicates problem 10 Permanent faults need to be corrected
Fault detection Alive messages Test packets Periodic configuration readback DynaCORE Fault Tolerance (2/3) Fault localization and correction Stepwise procedure using test packets Test against different assumptions SEU in control registers tile reset SEFI rewritting reconfiguration Permanent hardware fault reorganization 11
DynaCORE Fault Tolerance (3/3) Example: no alive message from switch 1 1. Identification of faulty segment Identify path under test Known by the reconfiguration manager Send test packets to all switch along the path under test If a test packet does not return correctly, faulty segment has been identified 12
DynaCORE Fault Tolerance (3/3) Example: no alive message from switch 1 2. Assumption: SEU in control registers of switches or routing tables Reset switches in affected section Send new routing tables Repeat test 13
DynaCORE Fault Tolerance (3/3) Example: no alive message from switch 1 3. Assumption: SEFI Readback configuration data for each tile and compare with reference In case of mismatch, reconfigure tile If tile contains a switch, send new routing tables Repeat test permanent hardware error reorganize system Procedure takes time, does not cover all fault scenarios, yet is hardware efficient 14
Abstract DynaCOREModel Modelling (1/4) Dynamically Structured Discrete Event-Based System Network (DSDEVN) Extends discrete-event based system (DEVS) formalism States of controller χcan again be models Simple DEVS simulator sufficient for simulation of DSDEVN DynaCORE Model: identifies DynaCORE DSDEVN = < X, Y, χ, Mχ> X, valid inputs of the system, and Y, outputs of the system: messages received from and send to the NP χ: DynaCORE-specific controller Mχ: model description of the controller (as DEVS) 15
Abstract DynaCOREModel Modelling (2/4) Controller Description as DEVS: Mχ= < Xχ, Sχ, Yχ, δintχ,, δextχ, λχ, τχ> Xχ: Set of valid controller input Sχ: Controller state space Yχ: Set of valid controller output δintχ: State transition function for internal events including timeouts δextχ: State transition function for external events λχ: Output function τχ: Timeout function (assigns a timeout value to states from Sχ) Controller States Include information on system configuration, i.e. configured HAs Contain, in turn, models of system components active in respective state 16
Simulation Modelling DynaCORE (3/4) Structure of SystemC Simulation Model Simulation Stimulus and Output Bandwidth [Mbit/s] 1400 1200 1000 800 600 400 200 0 0.0005 0.0165 0.0325 0.0485 0.0645 Reconfiguration input data rate output data rate reconfiguration Input burst Aggregated traffic composed of four b-modeled packet streams No packet loss (sufficient buffer sizes) 17
Influence of Buffer Sizes Simulation Modelling DynaCORE (4/4) Latency [ms] 13,0 12,0 11,0 10,0 9,0 8,0 4 16 Buffer Switch [#Pkt] 64 2 32 128 8 Buffer NoC- Interface [#Pkt] Ratio 18 1,20 1,00 0,80 0,60 0,40 0,20 0,00 4 8 16 32 64 128 Buffer size [#packets] Low impact of buffer sizes between NoC and HAData rate Packet loss Latency Large switch buffers: Only little advantage for latency Increased packet loss in case of reconfiguration 12,00 10,00 8,00 6,00 4,00 2,00 0,00 Time [ms]
Configuration State Space Three modules Three types of HA Determining the Point of Reconfiguration Runtime Reconfiguration (1/3) { A C } 1{ A A C } 1 { A A A B 2A } { A A B } C } { A B B } { C 12 23 11 2 1 1 C C } { B C C } 1 1 1 1 12 232 2 1 23323 2 2 2 3 1332 3 { 2B 2B C } 11{ B B B 1} Possible transitions between configurations Transition costs (number of HAs to be reconfigured) { 19
Determining the Point Runtime Reconfiguration (2/3) A C 23A A C22 A A A 2 Reduced Configuration State Space A B C1 39A B AA B B Transition cost limited C C C 10 C 16 6 25 7 19 8 2130 4 11 B C5 13 B B C B B 1228 29 20 of Reconfiguration 2426 SχvSχu Monitor -datum SχvSχu Sχv Reconfiguration Trigger Schwellwert Sχu Configurable per-ha utilisation Sχu 18 14 B threshold exceeded multiple times in sequence ZeitT Sχu Sχv Sχu 20 17 1527
Technical Aspects Runtime Reconfiguration (3/3) Merging and Separating Tiles Changes number and shapes of partially reconfigurable regions Different sets of bus macros Scenario 1 Scenario 2 Bus macros Reconfiguration Speed Static elements in original design as part of hard macro Memory bandwidth Allowable clock-ratios between system components removed bus macro Achievable maximum speed dependent on Fraction of theoretically possible speed 21
Demonstrator Structure Evaluation/Demonstrator (1/2) 22
FlexPathand DynaCOREDemonstrator Evaluation/Demonstrator (2/2) FlexPath NP NP with reconfigurable data-path Virtex-4 FX 60 DynaCORE reconfigurable processing modules (HAs) Virtex-4 FX 60 analysis, visualisation stimulus 23
Publications [PKA09] Pionteck, T.: Koch, R.; Albrecht, C.; Maehle, E.: A Design Technique for Adapting Number and Boundaries of Reconfigurable Modules at Runtime. International Journal of Reconfigurable Computing, vol. 2009, Article ID 942930,, Hindawi Publishing Corporation, New York 2009 [PAK08a] Pionteck, T.; Albrecht, C.; Koch, R,; Maehle, E,: Adaptive Communication Architectures for Runtime Reconfigurable System-on-Chips. Parallel Processing Letters, 2008 [AFK09] Albrecht, C.; Foag, J.; Koch, R.; Maehle, E.; Pionteck, T.: DynaCORE Dynamically Reconfigurable Coprocessor for Network Processors. To Appear: Dynamically Reconfigurable Systems Architectures: Design Methods and Applications, Springer, 2009 [AKP09] Albrecht, C.; Koch, R.; Pionteck, T.; Glösekötter, P.: Towards a Flexible Fault-Tolerant System-on-Chip. 22th International Conference on Architecture of Computing Systems - Workshop Proceedings, 83-90, VDE Verlag GmbH, Berlin 2009 [KAP09] Koch, R.; Albrecht, C.; Pionteck, T.: Adaptive Health Monitoring in a Reconfigurable Network-on-Chip. Workshop on Diagnostic Services in Network-on-Chips (DSNOC), Nice 2009 [AOP08] Albrecht, C.; Osterloh, Ch.; Pionteck, T.; Koch, R.; Maehle, E.: An Application-Oriented Synthetic Network Traffic Generator. European Conference on Modelling and Simulation 2008, 299-305, ECMS, Nicosia, Cyprus 2008 [ARK08] Albrecht, C.; Roß, P.; Koch, R. ; Pionteck, T. ; Maehle, E.: Performance Analysis of Bus-Based Interconnects for a Run-Time Reconfigurable Co-Processor Platform. PDP 08, 200-205, IEEE Computer Society, Toulouse, France 2008 [AWP08] Albrecht, C.; Werner, M.; Pionteck, T.; Fuchsen, R.; Koch, R.; Maehle, E.: WCET Determination Tool for Embedded Systems Software. SIMUTools08 Proceedings, 1, ICST, Marseille, France 2008 [PAK08] Pionteck, T.; Albrecht, C.; Koch, R.; Brix, T.; Maehle, E.: Design and Simulation of Runtime Reconfigurable Systems. IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS 2008 ), 2008 [PAK08b] Pionteck, T.; Albrecht, C.; Koch, R.; Maehle, E.: Performance and Reliability Monitoring in Network-on-Chips. To Appear: Workshop on Diagnostic Services in Network-on-Chips (DSNOC), 2008 [PAK08c] Pionteck, T.; Albrecht, C.; Koch, R.; Maehle, E.: On the Design Parameters of Runtime Reconfigurable Systems. Accepted for: International Conference on Field Programmable Logic and Applications (FPL 2008), Heidelberg, Germany 2008 [AKP07] Albrecht, C.; Koch, R.; Pionteck, T.; Maehle, E.: Simulation System for Run-Time Reconfigurable Networks-on-Chip. Proceedings of the 6th EUROSIM Congress on Modelling and Simulation, ARGESIM - ARGE Simulation News, Wiedner Hauptstrasse 8-10, 1040 Vienna 2007 [APK07]Albrecht, C.; Pionteck, T.; Koch, R.; Maehle, E.: Modelling Tile-Based Run-Time Reconfigurable Systems Using SystemC. European Conference on Modelling and Simulation 2007, Prague, Czech Republic 2007 24
Summary DynaCORE-specific aspects: Interconnect performance analysis Bus versus NoC based on a formally derived simulation model Synthetic traffic generator Performance enhancement compared to software based systems Proof of concept by means of demonstrator In cooperation with FlexPath / TU Munich Universal aspects SystemC simulation methodology for runtime reconfigurable systems SystemC kernel needs not to be adapted Reconfiguration Management Determining point of reconfiguration NoC for runtime adaptable systems Tile-based design methodology for runtime reconfigurable designs Merging/separating reconfigurable regions 25