NEST COBRA CA4 A Mixed Time-Criticality SDRAM Controller MeAOW 3-9-23 Sven Goossens, Benny Akesson, Kees Goossens
Mixed Time-Criticality 2/5 Embedded multi-core systems are getting more complex: Integrating more applications Applications get more complex (Functionality/Energy) ratio increases Driven by power, area and cost constraints Results in a mix of applications of different timecriticalities sharing hardware resources Firm real-time + Soft real-time = Mixed real-time The hardware can no longer be tailored for a specific time-criticality class
SDRAM Controllers 3/5 DRAM: Most commonly used off-chip memory resource Shared across FRT and SRT Performance metrics: bandwidth (throughput) and latency (response time) Difficult to bound performance: locality dependent Firm Real-Time Controllers Maximize worst-case performance Simple / analyzable command scheduler No attention for average-case performance Do not exploit locality across requests Soft Real-Time Controllers Maximize average-case performance Complex high performance command scheduler Guaranteeable performance is usually low Exploit locality as much as possible Mixed Real-Time Controllers: requirements For FRT: guarantee enough worst-case performance to satisfy requirements For SRT: maximizing the average-case performance How can locality be exploited by a MRT controller?
Outline 4/5 Introduction Firm Real-Time Performance Conservative Open-Page Policy Reconfigurable Controller Architecture Conclusions
Firm Real-Time Performance Our approach: do not schedule individual commands at run time, instead, use design-time computed command sequences called patterns, and schedule those. Select the right memory map / configuration for the mix of applications. 5/5 An example read pattern for a DDR3-8 in configuration (BI 2, BC 4): The number of Banks Interleaved (BI) in the access ACT 4 NOP RD 3 NOP RD 3 NOP RD 2 NOP ACT 3 3 RD RD RD NOP NOP 3 NOP RD 3 NOP RD Pattern length Each read command results in a burst data transfer. The number of burst per bank is called Burst Count (BC) Burst Length (BL): The number of words per read command. They are transferred in BL/2 clock cycles. Interface width (IW) Access Granularity (AG): Number of bytes read/written in a pattern: AG = BI BC BL IW (Gross) efficiency: fraction of time that the data bus is occupied in the worst-case The designer can choose the bank interleaving and burst count. Each configuration results in a different trade-off between bandwidth, latency and power Goossens, Kouters, Akesson, Goossens, Memory Map Selection for Firm Real-time SDRAM Controllers, Proc. DATE 22
Firm Real-Time Performance 6/5.6 42 9.7 5.4 3.7 2.8 2.2.7.4 Bandwidth ( GB/s) b AG (GB/s) net.4.2.8.6.4.2 2 4 8 6 64 32 8,64 2 2, 2,8 8 2,2 4 4,4 32 6 2, 2,4 8,64 2,4 4,2 2 8,2 2,2 4, 4,2 4 8 6 32 8, 8, 4, 8,64 2, 28MB_DDR3-8 256MB_LPDDR2-8-S4.2.4.6.8.2 Power (W) 2,2 28MB_DDR2-8 28MB_DDR3-8 28MB_DDR2-8 2,4 Labels: BI,BC (BI is omitted) All memories in this graph run at 4MHz Pareto optimal points are connected Isolines denote energy efficiency in GB/J Peak bandwidth (.6GB/s) Power (W) Select the configuration based on the real-time requirements of the requestors, and their request sizes. 8,2 4,2 4, 8,..9.7.6.4.3.
Outline 7/5 Introduction Firm Real-Time Performance Conservative Open-Page Policy Reconfigurable Controller Architecture Conclusions
Open vs. Close-Page Policy 8/5 Time ε A Read P A Read P A Read P A Read P A Read P Close-Page policy A Read Read Read P A Read P A Read Open-Page policy Request arrivals: 2 3 4 Color indicates locality (and request origin) For the blue requestor the open-page policy: Increases the worst-case execution time Reduces the average-case execution time We would like to improve average case performance for SRT applications, without hurting the FRT guarantees
Conservative Open-Page Policy 9/5 Key idea: Do not precharge if next request is known to target the open row Precharge if next address is not known in time, or in case of a miss ε Time A Read P A Read P A Read P A Read P A Read P Close-Page policy Conservative Open- Page policy A Read Read P A Read P A Read P A Read P A Read Read Read P A Read P A Read Open-Page policy Request arrivals: 2 3 4 Goossens, Akesson, Goossens, Conservative Open-Page Policy for Mixed Time-Criticality Memory Controllers, Proc. DATE 23
Starting point is a predictable memory pattern set, with a bypass in case of a page hit Use explicit precharges instead of auto-precharge flags Cmd: Bank: Conservative Open-Page Policy postpones the precharge as long as possible, to increase the hit-window in which we can decide to bypass the precharge and activate. (DDR3-6) ACT-to-ACT constraint = 38 cycles A N N N N N N N A N R N N N R N N N R N N N R N N N N N N N N N N N N N N N A N N N N /5 Hit window (4 cc) Next request Cmd: Bank: A N N N N N N N A N R N N N R N N N R N N N R N N N N N P N N N N N N N P N A N N N N Hit window (28 cc) PRE-to-ACT = Conservative Open-Page policy can be used in a MRT controller: Worst-case guarantees are equal to a close-page policy. Average-case performance is better, leading to lower execution times, lower average-case latencies. SRT applications can even benefit indirectly from the quicker service to FRT requests! The execution time reduction depends on the memory load of the application. Goossens, Akesson, Goossens, Conservative Open-Page Policy for Mixed Time-Criticality Memory Controllers, Proc. DATE 23
Outline /5 Introduction Firm Real-Time Performance Conservative Open-Page Policy Reconfigurable Controller Architecture Conclusions
Reconfigurable Back-end 2/5 SDRAM back-end Logical address Address generator row/col, bank Request type Offset Pattern selector Refresh timer Pattern LUT Command player Pattern base-address, length Pattern memory commands RAS, CAS WE, etc Address masks, shift-amounts Internal configuration bus Configuration data Patterns are reprogrammable at run time. Can support all devices supported by the PHY (all DDR3 devices) Different pattern different worst-case bandwidth, latency and power trade-off. Allows different trade-off per use case. Goossens, Kuijsten, Akesson, Goossens, A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems, Proc. CODES+ISSS 23
SDRAM PHY Resource Bus Reconfigurable Controller Architecture 3/5 Resource front-end Memory client Memory client 2 Atomizer Atomizer Width Converter Width Converter Req./Resp. queue Req./Resp. queue SDRAM back-end TDM Arbiter Configuration Bus Configuration data Run-time reconfiguration infrastructure (memory mapped) Reconfigurable TDM arbiter (predictable and composable during reconfiguration) Reconfigurable back-end Implemented in SystemC, and on a ML65 Virtex-6 development board from Xilinx 2-port instance: 3754 registers, 9543 LUTs and BRAM 4-port instance: 2265 registers, 46 LUTs and BRAM (Most registers are used in the req./resp. queue, that contain 256 bytes / port) Goossens, Kuijsten, Akesson, Goossens, A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems, Proc. CODES+ISSS 23
Outline 4/5 Introduction Firm Real-Time Performance Conservative Open-Page Policy Reconfigurable Controller Architecture Conclusions
Conclusions 5/5 Mixed-time criticality controllers should focus on: For FRT: guarantee enough worst-case performance to satisfy requirements For SRT: maximizing the average-case performance Choosing the right memory map / pattern configuration for the mix of applications: Trade-offs exist between worst-case bandwidth, latency and power Select the configuration that satisfies the firm real-time requirements Using a conservative open-page policy, some of the locality across requests can be exploited: Decrease the gap between worst-case performance and average-case performance Reduce average case latency and thus average case execution time For soft real-time applications Reconfigurable architecture allows changing the memory map / configuration at run-time: Select the right trade-off per use-case Leads to other interesting challenges (see CODES23 paper on predictable reconfiguration)
6/6 For further information / a broader perspective: 5-tile compsoc platform: Sven Goossens <s.l.m.goossens@tue.nl> Benny Akesson <kessoben@fel.cvut.cz> Kees Goossens <k.g.w.goossens@tue.nl> Referred papers: www.svengoossens.nl Electronic Systems Group Electrical Engineering Faculty Eindhoven University of Technology