FF-Bond: Multi-bit Flip-flop Bonding at Placement CHANG-CHENG TSAI YIYU SHI GUOJIE LUO IRIS HUI-RU JIANG. IRIS Lab NCTU MST PKU

Similar documents
A Utility for Leakage Power Recovery within PrimeTime 1 SI

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

Perimeter-degree: A Priori Metric for Directly Measuring and Homogenizing Interconnection Complexity in Multilevel Placement

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

How To Design A Chip Layout

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

Optimal Technology Mapping and Cell Merger for Asynchronous Threshold Networks

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Implementation of Multi-Bit flip-flop for Power Reduction in CMOS Technologies

Rethinking SIMD Vectorization for In-Memory Databases

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization. TingTing Hwang Tsing Hua University, Hsin-Chu

2003 IC/CAD Contest Problem 5: Clock Tree Optimization for Useful Skew

Glitch Free Frequency Shifting Simplifies Timing Design in Consumer Applications

Timing Methodologies (cont d) Registers. Typical timing specifications. Synchronous System Model. Short Paths. System Clock Frequency

IL2225 Physical Design

Gate Delay Model. Estimating Delays. Effort Delay. Gate Delay. Computing Logical Effort. Logical Effort

Scheduling Shop Scheduling. Tim Nieberg

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Power-Aware High-Performance Scientific Computing

CS311 Lecture: Sequential Circuits

Definition Given a graph G on n vertices, we define the following quantities:

Designing a Schematic and Layout in PCB Artist

Collective behaviour in clustered social networks

Solving polynomial least squares problems via semidefinite programming relaxations

Route Power 10 Connect Powerpin 10.1 Route Special Route 10.2 Net(s): VSS VDD

High-fidelity electromagnetic modeling of large multi-scale naval structures

ECE 3401 Lecture 7. Concurrent Statements & Sequential Statements (Process)

Security-Aware Beacon Based Network Monitoring

Binary Image Scanning Algorithm for Cane Segmentation

On the effect of forwarding table size on SDN network utilization

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

SimPL: An Effective Placement Algorithm

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

Sequential 4-bit Adder Design Report

Lecture 5: Gate Logic Logic Optimization

Chapter 13: Query Processing. Basic Steps in Query Processing

Alpha CPU and Clock Design Evolution

CLASS: Combined Logic Architecture Soft Error Sensitivity Analysis

Closest Pair Problem

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

Improving Grid Processing Efficiency through Compute-Data Confluence

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Fairchild Solutions for 133MHz Buffered Memory Modules

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation

LLamasoft K2 Enterprise 8.1 System Requirements

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

CIS 700: algorithms for Big Data

Cluster Analysis: Advanced Concepts

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

Lecture 7: Clocking of VLSI Systems

Physical Data Organization

Rectifier circuits & DC power supplies

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Nonlinear Optimization: Algorithms 3: Interior-point methods

Power Analysis of Link Level and End-to-end Protection in Networks on Chip

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Operating Systems. Virtual Memory

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

CSC2420 Spring 2015: Lecture 3

Scalable Prefix Matching for Internet Packet Forwarding

Thermal-Aware 3D IC Physical Design and Architecture Exploration

IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE

Performance Evaluation of Linux Bridge

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Atmel Norway XMEGA Introduction

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

AirWave 7.7. Server Sizing Guide

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

GraySort on Apache Spark by Databricks

Strategic Online Advertising: Modeling Internet User Behavior with

Linear Programming. March 14, 2014

Content Distribution Management

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

R u t c o r Research R e p o r t. A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS.

Lecture 5: Logical Effort

Central Processing Unit (CPU)

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

Roots of Equations (Chapters 5 and 6)

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI)

An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution

Lecture 4: BK inequality 27th August and 6th September, 2007

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Layout of Multiple Cells

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

HPSA Agent Characterization

Transcription:

FF-Bond: Multi-bit Flip-flop Bonding at Placement CHANG-CHENG TSAI YIYU SHI GUOJIE LUO IRIS HUI-RU JIANG IRIS Lab NCTU MST PKU ISPD-13

Outline 2 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

Multi-Bit Flip-Flops (MBFFs) 3 Clock power is critical for modern IC designs Dynamic power CV 2 f MBFFs present a smaller load on the clock network Replace FFs with MBFFs Effectively reduces both clock network power and MBFF power Prefer large FFs (high bit number), avoid orphans (single-bit FF) Avoid impacting timing critical paths D Master Slave 1 Q 1 latch latch clk D 2 Master latch Slave latch Q 2 Bit number Normalized power per bit 1 1.00 1.00 2 0.86 0.96 4 0.78 0.71 Power efficient Normalized area per bit

Prior Work 4 Relocating flip-flops benefits clock network synthesis [Cheon+, DAC05], [Papa+, Micro11], [Lee/Markov, TCAD12] Replacing flip-flops with MBFFs saves clock power [Yan/Chen, ICGCS10], [Chang+, ICCAD10], [Wang+, ISPD11] [Jiang+, ISPD11], [Liu+, DATE12] Focus on post-placement MBFF clustering Pre-placement Lack physical information Post-placement Cells are immovable Limited clustering flexibility and quality MBFF bonding at-placement

One Possible Solution 5 Directly integrate placement & post-placement MBFF clustering Netlist Timing-driven placement MBFF clustering End The movement of flip-flops Is constrained by the placement at the current iteration May oscillate among iterations

Ionic Bonding and Flip-flop Bonding 6 Goal: Guide flip-flops towards merging friendly locations e - + - Na F Ionic bonding Na F Flip-flop Na + F NaF Flip-flop bonding Mergeable flip-flop sets Example: MBFF library: 1-bit, 2-bit, 4-bit http://en.wikipedia.org/wiki/ionic_bond

Post-Placement vs. At-Placement - s38417 7 Post-placement clustering At-placement bonding # MBFFs 4-/2-/1-bit 35/252/237 SBFF MBFF # MBFFs 4-/2-/1-bit 159/105/35

Outline 8 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

Post-Placement MBFF Clustering 9 Given A placed design MBFF library Timing slacks of flip-flops Replace FFs with MBFFs Minimize flip-flop power Satisfy timing constraints SBFF MBFF MBFF Clustering

Intersection Graph 10 Define the feasible region of a flip-flop according to its slack Model the overlap of feasible regions by an intersection graph A proper-sized clique corresponds to an MBFF Feasible region f o (i) 1 3 6 Fanin slack f i (i) i Fanout slack y 5 8 4 2 7 Intersection graph x

INTEGRA (INTErval GRAph) 11 Perform coordinate transformation Sort starting (s) and ending (e) points of projection in the x and y axes y x FF1 FF2 FF3 FF4 FF5 FF6 FF7 FF8 TYPE s s s s e e s s s e s e e e e e FF# 1 2 3 4 1 2 5 6 7 3 8 4 5 7 6 8 Jiang et al. INTEGRA: Fast multibit flip-flop clustering for clock power saving, TCAD12, ISPD11.

INTEGRA 12 Find decision points se in x axis Retrieve maximal cliques at decision points y Check x and y axes {1, 2, 4} or {1, 3, 4} Form MBFFs of proper sizes e.g., {1, 2} FF1 FF2 FF3 FF4 FF5 FF6 FF7 FF8 x FF1 FF3 FF4 FF2 FF6 FF5 FF7 FF8 e 1 e 3 e 4 s 3 e 2 s 4 s 1 s 2 T Y P E F F # TYPE s s s s e e s s s e s e e e e e FF# 1 2 3 4 1 2 5 6 7 3 8 4 5 7 6 8 Decision points

Example: INTEGRA 13 Example: MBFF library: 1-bit, 2-bit, 4-bit 1 3 4 5 6 8 FF1 FF3 FF4 FF6 FF5 FF8 2 7 y FF2 FF7 x Guide 2 dual-bit four-bit FFs flip-flops towards merging 1 four-bit friendly flip-flop locations

Outline 14 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

The MBFF Bonding at Placement Problem 15 Given Gate-level netlist MBFF library Timing constraints Find a placement and replace FFs with MBFFs Minimize flip-flop power Satisfy timing constraints

Outline 16 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

The Overview of FF-Bond 17 Guide flip-flops towards merging friendly locations at the global placement stage without sacrificing timing Netlist Global placement FF-Bond FF-Bond Objective function construction with timing-driven net weighting Signoff timer Legalization Gradient-based optimization solver Detailed placement Clock tree synthesis N Sparse enough? < d 2 Y Flip-flop bonding N Evenly distributed? < d 1 Y Routing Pseudo-net generation MBFF clustering End : overlap index = overlapped_area total_cell_area

Example: s38584 18 Netlist Global placement Objective function construction with timing-driven net weighting FF-Bond Signoff timer Gradient-based optimization solver N Sparse enough? < d 2 Y N Evenly distributed? < d 1 Y Flip-flop bonding Pseudo-net generation MBFF clustering : overlap index = overlapped_area total_cell_area

Example: s38584 19 Spread cells until sparse enough Netlist Global placement Objective function construction with timing-driven net weighting FF-Bond Signoff timer Gradient-based optimization solver N Sparse enough? < d 2 Y N Evenly distributed? < d 1 Y Flip-flop bonding Pseudo-net generation MBFF clustering : overlap index = overlapped_area total_cell_area

Example: s38584 20 Apply flip-flop bonding Netlist Global placement Objective function construction with timing-driven net weighting FF-Bond Signoff timer Gradient-based optimization solver N Sparse enough? < d 2 Y N Evenly distributed? < d 1 Y Flip-flop bonding Pseudo-net generation MBFF clustering : overlap index = overlapped_area total_cell_area

Timing-Driven Placement 21 Pure wirelength-driven placement + slack-based net-weighting Pure wirelength-driven analytical placement: mpl min W x, y = e E max x i x j + max y i y j v i,v j e,i<j v i,v j e,i<j s. t. D ij = K, 1 i m, 1 j n Smooth the objective function and the constraints Log-sum-exp approximation Inverse Laplace transformation Slack-based net weighting W x, y = η e E log v k e exp x k η + log v k e exp x k η min W x, y s. t. ψ ij = K, 1 i m, 1 j n net weight = 1 slack α, α > 1 T clk + log slack = 0 for the 1 st iteration (pure wirelength-driven placement) v k e exp y k η + log v k e exp y k η T. Chen et al. Multilevel generalized force-directed method for circuit placement, ISPD05.

Flip-Flop Bonding 22 Bond flip-flops into perfect-sized cliques Perfect: most power efficient Example: Power efficient Bit number Normalized power per bit 1 1.00 1.00 2 0.86 0.96 4 0.78 0.71 Normalized area per bit Oversized Perfect Undersized

Flip-Flop Bonding 23 Bond flip-flops into perfect-sized cliques Priority of processing maximal cliques: Perfect > undersize > oversize Perfect size: preserved Undersize/oversize: try to form a target-sized clique by selecting the nearest flip-flops in a specified search region The target size: the flip-flop configuration that is larger than, nearest to, and more power efficient than the investigated clique size Adjacency inside the search region Physical & timing: x c x i + y c y i ε i s fi i + s fo i

Example: Flip-Flop Bonding (1/3) 24 Extract maximal cliques Flip-flop Bit number Normalized power per bit 1 1.00 1.00 2 0.86 0.96 4 0.78 0.71 Power efficient Normalized area per bit

Example: Flip-Flop Bonding (2/3) 25 Bonding strategy Choose an undersized clique with priority 3>2>1 Select nearest flip-flops to form target-sized cliques 3 4 2 4 1 2 Choose an oversized clique Select nearest flip-flops to form target-sized cliques Even 4X Odd even

Example: Flip-Flop Bonding (3/3) 26 Anchor Direct flip-flops to each other Set a flip-flop bonding anchor

Overlapping Maximal Cliques 27 2 1 3 Bonding strategy Priority: undersized > oversized Undersized 3 > 2 > 1 3 4 2 4 1 2 Oversized Even 4X Odd even Independency (#shared flip-flops)/cliquesize

Pseudo Net Generation 28 Set an anchor Pseudo-net f o (i) i Overlapped feasible region between i and j i Pseudo net f o (j) f i (i) j Anchor j Anchor High pseudo-net weight f i (j)

Outline 29 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

Experimental Setting 30 Programming language: C++ Platform: Intel Xeon 2.4GHz CPU, 16GB memory, Linux OS Technology: UMC 55nm Faraday MBFF library Bit number Signoff timer: PrimeTime Normalized power per bit 1 1.00 1.00 2 0.86 0.96 4 0.78 0.71 Normalized area per bit

Comparison with Two Representative Flows 31 PMC IMC Post-placement MBFF clustering INTEGRA Interleaving placement & post-placement MBFF clustering Timing-driven mpl + INTEGRA Netlist Netlist Timing-driven placement Timing-driven placement MBFF clustering MBFF clustering End End

32 Flip-flop Power Comparison Power ratio FF-Bond < IMC < PMC Lower bound of power ratio: 0.78 (All are 4-bit MBFFs) Main constituents of formed FFs FF-Bond: 4-bit FFs, PMC: 1-bit FFs, IMC: 2-bit FFs Wirelength PMC < IMC < FF-Bond Timing is satisfied, the induced power increment < 1% PMC IMC FF-Bond Circuit #Flipflops 4-/2-/1-bit ratio 4-/2-/1-bit ratio 4-/2-/1-bit ratio #MBFFs FF power #MBFFs FF power #MBFFs FF power HPWL HPWL Name HPWL s13207 212 8/57/66 0.892 4.569E+06 23/51/18 0.837 4.975E+06 35/31/10 0.814 5.344E+06 s15850 128 10/29/30 0.868 2.117E+06 18/23/10 0.826 2.869E+06 23/15/6 0.809 2.903E+06 s38417 881 35/252/237 0.885 4.599E+07 105/179/103 0.838 4.789E+07 159/105/35 0.808 5.406E+07 s38584 1069 46/291/303 0.886 5.992E+07 96/282/121 0.847 6.213E+07 203/116/25 0.803 6.926E+07 b17 1068 53/264/328 0.887 1.346E+08 137/201/118 0.834 1.363E+08 196/124/36 0.806 1.470E+08 b19 4384 378/886/1100 0.868 7.187E+08 593/742/528 0.834 7.267E+08 851/425/130 0.802 8.023E+08 Avg. ratio - 0.21/0.91/1.00 0.881 0.85 1.20/2.05/1.00 0.836 0.92 5.33/3.33/1.00 0.807 1.00 d 1 = 0.1, d 2 = 0.5, = 1.2, the search region is bounded by 20% of chip dimension the net weight of a pseudo net is 5X the default value for a positive slack net.

33 Experimental Results s38417 s38417 PMC IMC FF-Bond Global placement result (before MBFF merging) MBFF merging result (before legalization) #MBFFs (4-/2-/1-bit) 35/252/237 105/179/103 159/105/35

Experimental Results s38417 34 PMC: Post-placement MBFF clustering Global placement SBFF

Experimental Results s38417 35 PMC: Post-placement MBFF clustering MBFF merging SBFF MBFF 4-/2-/1-bit MBFFs 35/252/237

Experimental Results s38417 36 IMC: Interleaving placement and MBFF clustering Global placement SBFF

Experimental Results s38417 37 IMC: Interleaving placement and MBFF clustering MBFF merging SBFF MBFF 4-/2-/1-bit MBFFs 105/179/103

Experimental Results s38417 38 FF-Bond Global placement SBFF

Experimental Results s38417 39 FF-Bond MBFF merging SBFF MBFF 4-/2-/1-bit MBFFs 159/105/35

Clock Power Comparison 40 Clock power (flip-flops + clock network) FF-Bond < IMC < PMC < Without MBFF FF-Bond totally saves 27% clock power on average Compared with PMC, FF-Bond further reduces 14% clock power Circuit Name Without MBFF clustering PMC IMC FF-Bond Total Total Total Total Cap. Sinks Buffer Wire Cap. Sinks Buffer Wire Cap. Sinks Buffer Wire Cap. (pf) (pf) (pf) (pf) Sinks Buffer Wire s13207 1.333 48.5% 36.4% 15.1% 1.223 46.8% 39.7% 13.5% 1.094 49.6% 38.8% 11.7% 1.056 49.8% 40.2% 10.0% s15850 0.901 43.3% 47.1% 9.6% 0.837 40.9% 50.6% 8.5% 0.806 39.6% 52.6% 7.8% 0.799 39.5% 53.1% 7.4% s38417 5.051 53.2% 31.6% 15.2% 4.113 57.9% 26.8% 15.3% 3.884 58.2% 27.4% 14.5% 3.711 58.5% 28.6% 12.9% s38584 6.100 53.5% 28.9% 17.6% 5.352 54.3% 29.9% 15.8% 4.576 60.6% 24.1% 15.3% 4.870 53.9% 32.8% 13.3% b17 6.273 51.9% 28.1% 20.0% 5.574 51.8% 28.7% 19.5% 5.241 51.9% 30.5% 17.6% 4.513 58.2% 26.3% 15.5% b19 25.611 52.2% 26.8% 21.0% 22.081 52.4% 28.1% 19.5% 19.410 57.4% 23.9% 18.7% 18.277 58.7% 24.5% 16.8% Avg. Ratio 1.00 - - - 0.87 - - - 0.77 - - - 0.73 - - -

Slack Comparison: PMC vs. FF-Bond 41 The slacks of the two flows are quite similar PMC FF-Bond Circuit Name Clock period Worst slack Average slack Worst slack Average slack (ns) (ns) (ns) (ns) (ns) s13207 1.5 0.042 0.580 0.041 0.579 s15850 1.8 0.158 0.336 0.154 0.334 s38417 2.0 0.164 1.049 0.163 1.047 s38584 2.0 0.217 0.871 0.209 0.869 b17 3.0 0.122 1.272 0.128 1.269 b19 2.7 0.112 1.278 0.109 1.273 Worst slack represents the worst timing slack over all endpoints Average slack means the average

FF-Bond: d 2 Analysis 42 d 2 : the sparsity criterion to apply flip-flop bonding Smaller d 2 : flip-flop bonding starts at later iterations but does not guide FFs well because of the low flexibility of moving FFs Power, WL Larger d 2 : flip-flop bonding starts at earlier iterations but incurs longer wirelength because distant flip-flops are attracted Power (slightly), WL d 2 =0.3 d 2 =0.5 d 2 =0.7 Circuit #Flipflops 4-/2-/1-bit ratio 4-/2-/1-bit ratio 4-/2-/1-bit ratio #MBFFs FF power #MBFFs FF power #MBFFs FF power HPWL HPWL Name HPWL s13207 212 27/41/22 0.834 5.243E+06 35/31/10 0.814 5.344E+06 38/25/10 0.809 5.518E+06 s15850 128 20/20/8 0.819 2.620E+06 23/15/6 0.809 2.903E+06 22/16/8 0.814 2.895E+06 s38417 881 171/85/27 0.802 5.258E+07 159/105/35 0.808 5.406E+07 164/89/47 0.808 5.569E+07 s38584 1069 194/135/23 0.805 6.861E+07 203/116/25 0.803 6.926E+07 192/134/33 0.807 6.944E+07 b17 1068 186/135/23 0.809 1.460E+08 196/124/36 0.806 1.470E+08 202/116/28 0.803 1.485E+08 b19 4384 847/427/142 0.803 7.927E+08 851/425/130 0.802 8.023E+08 851/431/118 0.802 8.037E+08 Avg. ratio - 4.99/3.44/1.00 0.812 0.97 5.33/3.33/1.00 0.807 1.00 5.04/3.04/1.00 0.807 1.01

Wirelength FF power ratio 43 Search Region Analysis 0.820 0.818 0.816 0.814 0.812 0.810 0.808 0.806 5.60E+07 5.50E+07 5.40E+07 5.30E+07 5.20E+07 5.10E+07 5.00E+07 4.90E+07 4.80E+07 Search region, power ratio, wirelength Search Region vs. FF Power Ratio 0 0.1 0.2 0.3 0.4 Unit: chip_width+chip_height Search Region vs. Wirelength 0 0.1 0.2 0.3 0.4 Unit: chip_width+chip_height s38417 Search region FF power (Unit: chip_width+chip_height) ratio HPWL Worst slack (ns) 0.05 0.817 4.89E+07 0.163 0.08 0.819 4.98E+07 0.163 0.10 0.814 5.07E+07 0.164 0.15 0.812 5.23E+07 0.163 0.18 0.811 5.34E+07 0.164 0.20 0.808 5.41E+07 0.163 0.25 0.807 5.41E+07 0.163 0.28 0.810 5.41E+07 0.163

Outline 44 Introduction Preliminaries Problem formulation Algorithm - FF-Bond Experimental results Conclusion

Conclusions 45 MBFFs reduce clock load effectively Prior work: post-placement MBFF clustering Proposed MBFF bonding at-placement FF-Bond Directed flip-flops towards merging friendly locations FF-Bond can save 27% clock power on average Compared with post-placement MBFF clustering, FF-Bond can further reduce 14% clock power Future work: MBFF bonding with routability consideration

46 Thank You! Contact info: Iris Hui-Ru Jiang huiru.jiang@gmail.com