FF-Bond: Multi-bit Flip-flop Bonding at Placement CHANG-CHENG TSAI YIYU SHI GUOJIE LUO IRIS HUI-RU JIANG. IRIS Lab NCTU MST PKU

Transcription

1 FF-Bond: Multi-bit Flip-flop Bonding at Placement CHANG-CHENG TSAI YIYU SHI GUOJIE LUO IRIS HUI-RU JIANG IRIS Lab NCTU MST PKU ISPD-13

2 Outline 2 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

3 Multi-Bit Flip-Flops (MBFFs) 3 Clock power is critical for modern IC designs Dynamic power CV 2 f MBFFs present a smaller load on the clock network Replace FFs with MBFFs Effectively reduces both clock network power and MBFF power Prefer large FFs (high bit number), avoid orphans (single-bit FF) Avoid impacting timing critical paths D Master Slave 1 Q 1 latch latch clk D 2 Master latch Slave latch Q 2 Bit number Normalized power per bit Power efficient Normalized area per bit

4 Prior Work 4 Relocating flip-flops benefits clock network synthesis [Cheon+, DAC05], [Papa+, Micro11], [Lee/Markov, TCAD12] Replacing flip-flops with MBFFs saves clock power [Yan/Chen, ICGCS10], [Chang+, ICCAD10], [Wang+, ISPD11] [Jiang+, ISPD11], [Liu+, DATE12] Focus on post-placement MBFF clustering Pre-placement Lack physical information Post-placement Cells are immovable Limited clustering flexibility and quality MBFF bonding at-placement

5 One Possible Solution 5 Directly integrate placement & post-placement MBFF clustering Netlist Timing-driven placement MBFF clustering End The movement of flip-flops Is constrained by the placement at the current iteration May oscillate among iterations

6 Ionic Bonding and Flip-flop Bonding 6 Goal: Guide flip-flops towards merging friendly locations e Na F Ionic bonding Na F Flip-flop Na + F NaF Flip-flop bonding Mergeable flip-flop sets Example: MBFF library: 1-bit, 2-bit, 4-bit

7 Post-Placement vs. At-Placement - s Post-placement clustering At-placement bonding # MBFFs 4-/2-/1-bit 35/252/237 SBFF MBFF # MBFFs 4-/2-/1-bit 159/105/35

9 Post-Placement MBFF Clustering 9 Given A placed design MBFF library Timing slacks of flip-flops Replace FFs with MBFFs Minimize flip-flop power Satisfy timing constraints SBFF MBFF MBFF Clustering

10 Intersection Graph 10 Define the feasible region of a flip-flop according to its slack Model the overlap of feasible regions by an intersection graph A proper-sized clique corresponds to an MBFF Feasible region f o (i) Fanin slack f i (i) i Fanout slack y Intersection graph x

11 INTEGRA (INTErval GRAph) 11 Perform coordinate transformation Sort starting (s) and ending (e) points of projection in the x and y axes y x FF1 FF2 FF3 FF4 FF5 FF6 FF7 FF8 TYPE s s s s e e s s s e s e e e e e FF# Jiang et al. INTEGRA: Fast multibit flip-flop clustering for clock power saving, TCAD12, ISPD11.

12 INTEGRA 12 Find decision points se in x axis Retrieve maximal cliques at decision points y Check x and y axes {1, 2, 4} or {1, 3, 4} Form MBFFs of proper sizes e.g., {1, 2} FF1 FF2 FF3 FF4 FF5 FF6 FF7 FF8 x FF1 FF3 FF4 FF2 FF6 FF5 FF7 FF8 e 1 e 3 e 4 s 3 e 2 s 4 s 1 s 2 T Y P E F F # TYPE s s s s e e s s s e s e e e e e FF# Decision points

13 Example: INTEGRA 13 Example: MBFF library: 1-bit, 2-bit, 4-bit FF1 FF3 FF4 FF6 FF5 FF8 2 7 y FF2 FF7 x Guide 2 dual-bit four-bit FFs flip-flops towards merging 1 four-bit friendly flip-flop locations

15 The MBFF Bonding at Placement Problem 15 Given Gate-level netlist MBFF library Timing constraints Find a placement and replace FFs with MBFFs Minimize flip-flop power Satisfy timing constraints

17 The Overview of FF-Bond 17 Guide flip-flops towards merging friendly locations at the global placement stage without sacrificing timing Netlist Global placement FF-Bond FF-Bond Objective function construction with timing-driven net weighting Signoff timer Legalization Gradient-based optimization solver Detailed placement Clock tree synthesis N Sparse enough? < d 2 Y Flip-flop bonding N Evenly distributed? < d 1 Y Routing Pseudo-net generation MBFF clustering End : overlap index = overlapped_area total_cell_area

18 Example: s Netlist Global placement Objective function construction with timing-driven net weighting FF-Bond Signoff timer Gradient-based optimization solver N Sparse enough? < d 2 Y N Evenly distributed? < d 1 Y Flip-flop bonding Pseudo-net generation MBFF clustering : overlap index = overlapped_area total_cell_area

19 Example: s Spread cells until sparse enough Netlist Global placement Objective function construction with timing-driven net weighting FF-Bond Signoff timer Gradient-based optimization solver N Sparse enough? < d 2 Y N Evenly distributed? < d 1 Y Flip-flop bonding Pseudo-net generation MBFF clustering : overlap index = overlapped_area total_cell_area

20 Example: s Apply flip-flop bonding Netlist Global placement Objective function construction with timing-driven net weighting FF-Bond Signoff timer Gradient-based optimization solver N Sparse enough? < d 2 Y N Evenly distributed? < d 1 Y Flip-flop bonding Pseudo-net generation MBFF clustering : overlap index = overlapped_area total_cell_area

21 Timing-Driven Placement 21 Pure wirelength-driven placement + slack-based net-weighting Pure wirelength-driven analytical placement: mpl min W x, y = e E max x i x j + max y i y j v i,v j e,i<j v i,v j e,i<j s. t. D ij = K, 1 i m, 1 j n Smooth the objective function and the constraints Log-sum-exp approximation Inverse Laplace transformation Slack-based net weighting W x, y = η e E log v k e exp x k η + log v k e exp x k η min W x, y s. t. ψ ij = K, 1 i m, 1 j n net weight = 1 slack α, α > 1 T clk + log slack = 0 for the 1 st iteration (pure wirelength-driven placement) v k e exp y k η + log v k e exp y k η T. Chen et al. Multilevel generalized force-directed method for circuit placement, ISPD05.

22 Flip-Flop Bonding 22 Bond flip-flops into perfect-sized cliques Perfect: most power efficient Example: Power efficient Bit number Normalized power per bit Normalized area per bit Oversized Perfect Undersized

23 Flip-Flop Bonding 23 Bond flip-flops into perfect-sized cliques Priority of processing maximal cliques: Perfect > undersize > oversize Perfect size: preserved Undersize/oversize: try to form a target-sized clique by selecting the nearest flip-flops in a specified search region The target size: the flip-flop configuration that is larger than, nearest to, and more power efficient than the investigated clique size Adjacency inside the search region Physical & timing: x c x i + y c y i ε i s fi i + s fo i

24 Example: Flip-Flop Bonding (1/3) 24 Extract maximal cliques Flip-flop Bit number Normalized power per bit Power efficient Normalized area per bit

25 Example: Flip-Flop Bonding (2/3) 25 Bonding strategy Choose an undersized clique with priority 3>2>1 Select nearest flip-flops to form target-sized cliques Choose an oversized clique Select nearest flip-flops to form target-sized cliques Even 4X Odd even

26 Example: Flip-Flop Bonding (3/3) 26 Anchor Direct flip-flops to each other Set a flip-flop bonding anchor

27 Overlapping Maximal Cliques Bonding strategy Priority: undersized > oversized Undersized 3 > 2 > Oversized Even 4X Odd even Independency (#shared flip-flops)/cliquesize

28 Pseudo Net Generation 28 Set an anchor Pseudo-net f o (i) i Overlapped feasible region between i and j i Pseudo net f o (j) f i (i) j Anchor j Anchor High pseudo-net weight f i (j)

30 Experimental Setting 30 Programming language: C++ Platform: Intel Xeon 2.4GHz CPU, 16GB memory, Linux OS Technology: UMC 55nm Faraday MBFF library Bit number Signoff timer: PrimeTime Normalized power per bit Normalized area per bit

31 Comparison with Two Representative Flows 31 PMC IMC Post-placement MBFF clustering INTEGRA Interleaving placement & post-placement MBFF clustering Timing-driven mpl + INTEGRA Netlist Netlist Timing-driven placement Timing-driven placement MBFF clustering MBFF clustering End End

32 32 Flip-flop Power Comparison Power ratio FF-Bond < IMC < PMC Lower bound of power ratio: 0.78 (All are 4-bit MBFFs) Main constituents of formed FFs FF-Bond: 4-bit FFs, PMC: 1-bit FFs, IMC: 2-bit FFs Wirelength PMC < IMC < FF-Bond Timing is satisfied, the induced power increment < 1% PMC IMC FF-Bond Circuit #Flipflops 4-/2-/1-bit ratio 4-/2-/1-bit ratio 4-/2-/1-bit ratio #MBFFs FF power #MBFFs FF power #MBFFs FF power HPWL HPWL Name HPWL s /57/ E+06 23/51/ E+06 35/31/ E+06 s /29/ E+06 18/23/ E+06 23/15/ E+06 s /252/ E /179/ E /105/ E+07 s /291/ E+07 96/282/ E /116/ E+07 b /264/ E /201/ E /124/ E+08 b /886/ E /742/ E /425/ E+08 Avg. ratio /0.91/ /2.05/ /3.33/ d 1 = 0.1, d 2 = 0.5, = 1.2, the search region is bounded by 20% of chip dimension the net weight of a pseudo net is 5X the default value for a positive slack net.

33 33 Experimental Results s38417 s38417 PMC IMC FF-Bond Global placement result (before MBFF merging) MBFF merging result (before legalization) #MBFFs (4-/2-/1-bit) 35/252/ /179/ /105/35

34 Experimental Results s PMC: Post-placement MBFF clustering Global placement SBFF

35 Experimental Results s PMC: Post-placement MBFF clustering MBFF merging SBFF MBFF 4-/2-/1-bit MBFFs 35/252/237

36 Experimental Results s IMC: Interleaving placement and MBFF clustering Global placement SBFF

37 Experimental Results s IMC: Interleaving placement and MBFF clustering MBFF merging SBFF MBFF 4-/2-/1-bit MBFFs 105/179/103

38 Experimental Results s FF-Bond Global placement SBFF

39 Experimental Results s FF-Bond MBFF merging SBFF MBFF 4-/2-/1-bit MBFFs 159/105/35

40 Clock Power Comparison 40 Clock power (flip-flops + clock network) FF-Bond < IMC < PMC < Without MBFF FF-Bond totally saves 27% clock power on average Compared with PMC, FF-Bond further reduces 14% clock power Circuit Name Without MBFF clustering PMC IMC FF-Bond Total Total Total Total Cap. Sinks Buffer Wire Cap. Sinks Buffer Wire Cap. Sinks Buffer Wire Cap. (pf) (pf) (pf) (pf) Sinks Buffer Wire s % 36.4% 15.1% % 39.7% 13.5% % 38.8% 11.7% % 40.2% 10.0% s % 47.1% 9.6% % 50.6% 8.5% % 52.6% 7.8% % 53.1% 7.4% s % 31.6% 15.2% % 26.8% 15.3% % 27.4% 14.5% % 28.6% 12.9% s % 28.9% 17.6% % 29.9% 15.8% % 24.1% 15.3% % 32.8% 13.3% b % 28.1% 20.0% % 28.7% 19.5% % 30.5% 17.6% % 26.3% 15.5% b % 26.8% 21.0% % 28.1% 19.5% % 23.9% 18.7% % 24.5% 16.8% Avg. Ratio

41 Slack Comparison: PMC vs. FF-Bond 41 The slacks of the two flows are quite similar PMC FF-Bond Circuit Name Clock period Worst slack Average slack Worst slack Average slack (ns) (ns) (ns) (ns) (ns) s s s s b b Worst slack represents the worst timing slack over all endpoints Average slack means the average

42 FF-Bond: d 2 Analysis 42 d 2 : the sparsity criterion to apply flip-flop bonding Smaller d 2 : flip-flop bonding starts at later iterations but does not guide FFs well because of the low flexibility of moving FFs Power, WL Larger d 2 : flip-flop bonding starts at earlier iterations but incurs longer wirelength because distant flip-flops are attracted Power (slightly), WL d 2 =0.3 d 2 =0.5 d 2 =0.7 Circuit #Flipflops 4-/2-/1-bit ratio 4-/2-/1-bit ratio 4-/2-/1-bit ratio #MBFFs FF power #MBFFs FF power #MBFFs FF power HPWL HPWL Name HPWL s /41/ E+06 35/31/ E+06 38/25/ E+06 s /20/ E+06 23/15/ E+06 22/16/ E+06 s /85/ E /105/ E /89/ E+07 s /135/ E /116/ E /134/ E+07 b /135/ E /124/ E /116/ E+08 b /427/ E /425/ E /431/ E+08 Avg. ratio /3.44/ /3.33/ /3.04/

43 Wirelength FF power ratio 43 Search Region Analysis E E E E E E E E E+07 Search region, power ratio, wirelength Search Region vs. FF Power Ratio Unit: chip_width+chip_height Search Region vs. Wirelength Unit: chip_width+chip_height s38417 Search region FF power (Unit: chip_width+chip_height) ratio HPWL Worst slack (ns) E E E E E E E E

45 Conclusions 45 MBFFs reduce clock load effectively Prior work: post-placement MBFF clustering Proposed MBFF bonding at-placement FF-Bond Directed flip-flops towards merging friendly locations FF-Bond can save 27% clock power on average Compared with post-placement MBFF clustering, FF-Bond can further reduce 14% clock power Future work: MBFF bonding with routability consideration

46 46 Thank You! Contact info: Iris Hui-Ru Jiang