Getting the Most Out of Synthesis



Similar documents
Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

ECE232: Hardware Organization and Design. Part 3: Verilog Tutorial. Basic Verilog

VHDL GUIDELINES FOR SYNTHESIS

A New Paradigm for Synchronous State Machine Design in Verilog

Lecture 5: Gate Logic Logic Optimization

Lecture 7: Clocking of VLSI Systems

Modeling Latches and Flip-flops

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

LAB #4 Sequential Logic, Latches, Flip-Flops, Shift Registers, and Counters

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

High Fanout Without High Stress: Synthesis and Optimization of High-fanout Nets Using Design Compiler

Counters and Decoders

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Digital Design Verification

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

How To Design A Chip Layout

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

Finite State Machine Design and VHDL Coding Techniques

Set-Reset (SR) Latch

EXPERIMENT 8. Flip-Flops and Sequential Circuits

CS311 Lecture: Sequential Circuits

IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR

Chapter 7: Advanced Modeling Techniques

Digital Electronics Detailed Outline

VHDL Test Bench Tutorial

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Chapter 13: Verification

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

CHAPTER 3 Boolean Algebra and Digital Logic

More Verilog. 8-bit Register with Synchronous Reset. Shift Register Example. N-bit Register with Asynchronous Reset.

CSE140: Components and Design Techniques for Digital Systems

Multiplexers Two Types + Verilog

ECE 3401 Lecture 7. Concurrent Statements & Sequential Statements (Process)

Cascaded Counters. Page 1 BYU

IE1204 Digital Design F12: Asynchronous Sequential Circuits (Part 1)

CHAPTER 11 LATCHES AND FLIP-FLOPS

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Module-I Lecture-I Introduction to Digital VLSI Design Flow

A Utility for Leakage Power Recovery within PrimeTime 1 SI

Life Cycle of a Memory Request. Ring Example: 2 requests for lock 17

Combinational Logic Design Process

Asynchronous counters, except for the first block, work independently from a system clock.

design Synopsys and LANcity

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Lecture 11: Sequential Circuit Design

Lecture 10: Sequential Circuits

路 論 Chapter 15 System-Level Physical Design

ESP-CV Custom Design Formal Equivalence Checking Based on Symbolic Simulation

Asynchronous & Synchronous Reset Design Techniques - Part Deux

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

Timing Methodologies (cont d) Registers. Typical timing specifications. Synchronous System Model. Short Paths. System Clock Frequency

Take-Home Exercise. z y x. Erik Jonsson School of Engineering and Computer Science. The University of Texas at Dallas

Implementation Details

Register File, Finite State Machines & Hardware Control Language

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop.

Counters are sequential circuits which "count" through a specific state sequence.

STA - Static Timing Analysis. STA Lecturer: Gil Rahav Semester B, EE Dept. BGU. Freescale Semiconductors Israel

Modeling Registers and Counters

CHAPTER 11: Flip Flops

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

Computer organization

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Registers & Counters

Wiki Lab Book. This week is practice for wiki usage during the project.

Sequential Circuits. Combinational Circuits Outputs depend on the current inputs

To debug an embedded system,

High-Level Synthesis for FPGA Designs

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

Testing & Verification of Digital Circuits ECE/CS 5745/6745. Hardware Verification using Symbolic Computation

Chapter 7. Registers & Register Transfers. J.J. Shann. J. J. Shann

Sequential Logic: Clocks, Registers, etc.

A Verilog HDL Test Bench Primer Application Note

Sequential 4-bit Adder Design Report

E158 Intro to CMOS VLSI Design. Alarm Clock

United States Naval Academy Electrical and Computer Engineering Department. EC262 Exam 1

Coding Guidelines for Datapath Synthesis

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Lecture 10 Sequential Circuit Design Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

CDA 3200 Digital Systems. Instructor: Dr. Janusz Zalewski Developed by: Dr. Dahai Guo Spring 2012

Prototyping ARM Cortex -A Processors using FPGA platforms

ECE124 Digital Circuits and Systems Page 1

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

DIGITAL COUNTERS. Q B Q A = 00 initially. Q B Q A = 01 after the first clock pulse.

ASYNCHRONOUS COUNTERS

Understanding Verilog Blocking and Non-blocking Assignments

Chapter 5. Sequential Logic

FINITE STATE MACHINE: PRINCIPLE AND PRACTICE

Digital Systems Design! Lecture 1 - Introduction!!

Optimal Technology Mapping and Cell Merger for Asynchronous Threshold Networks

9/14/ :38

Introduction to Digital System Design

Memory Elements. Combinational logic cannot remember

Traffic Light Controller. Digital Systems Design. Dr. Ted Shaneyfelt

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

DEPARTMENT OF INFORMATION TECHNLOGY

Master/Slave Flip Flops

Transcription:

Outline Getting the Most Out of Synthesis Dr. Paul D. Franzon 1. Timing Optimization Approaches 2. Area Optimization Approaches 3. Design Partitioning References 1. Smith and Franzon, Chapter 11 2. D.Smith, Chapters 4, 6, 8. 3. Kurup, Chapters 2, 4, 5. 4. Synopsys Documentation, Applications Notes, Methodology Notes, and Synthesis Documentation -> HDL Compiler for Verilog Manual 1

Optimizing For Speed General HDL Coding Style Techniques: Specify don t cares on outputs whenever possible. e.g. Consider the following Karnaugh Map with `x and with `0 instead of x ab: 00 01 10 11 cd: 0 1 x 0 0 1 1 1 0 0 Be careful: Does increase potential for mismatch between behavioral simulation and post-synthesis simulation. Specify multiplexors if possible (see earlier example). Using `+ and `- automatically gives you a ripple carry adder unless you specify otherwise (hand design or use Design Ware) y = a/b; and z=a%b; will NOT give you efficient circuits as Design Ware does not contain designs for these Design the circuits in detail instead 2

...Optimizing for Speed Use Design Knowledge: Use parantheses to group items if you have special design knowledge. e.g. If you know inputs c and d will arrive after a and b: A output = (((a + b) + c) + d); B + C + Don t use () unless you have extra knowledge D + Find parallelism whenever possible: see `Writing Successful RTL Descriptions in Verilog Use bit-slicing to make the Synthesis more optimal Synopsys can optimize smaller designs easier than larger ones see `Writing Successful RTL Descriptions in Verilog Keep the critical path in one module It is difficult to optimize across modules Try to keep this module as small as possible Use structuring to improve the design see `Writing Successful RTL Descriptions in Verilog 3

Once within Synopsys:... Optimizing For Speed Make sure the constraints are correctly set: set_input_delay set_output_delay set_driving_cell set_load create_clock Initially, the first four of these are estimated. Later those estimates can be refined once the actual fan out, wire loading, etc. is known. Other constraints might have to be set, depending on your library. Use worst case library for set-up timing and best case for hold timing Nets that don t go through a clocked element will need a max delay constraint. (This is usually poor technique though) e.g. max_delay 8ns -from in1 -to out1 Some inputs never change (e.g. configuration pins) and some outputs are asynchronous - use set_false_path on these. 4

Within Synopsys:...Optimizing For Speed When you have multiple clocks, Synopsys needs to know what signals are going between the different clocks. e.g. create_clock period 20 clock20 waveform {0 10} create_clock period 10 clock10 waveform {0 5} set_multicycle_path 2 -setup -from ff1/cp -to ff2/d // ff1 has a 20 MHz clock while ff1 has a 10 MHz clock set_multicycle_path is also often required for Mealy outputs of FSMs see Methodology Notes -> State Machine Design Techniques for details How Synopsys Works: When the design is read in it is `mapped onto a logic library but not optimized You then give Synopsys your design constraints and goals `compile does logic minimization while trying to meet timing constraints `compile -incremental can be performed (after changing various aspects of the constraints, etc.) to try to improve the design 5

Timing Optimization Within Synopsys Scenario 1: You are trying to work out the fastest possible clock speed: Start with a clock speed that should be achievable but only just so and do a compile Find out the results with report_constraints -all_violators -verbose report_timing -max_paths 5 // report 5 worst case timings Tighten the clock with create_clock and execute a compile -incremental Never start with no clock specification or one that is way too tight or loose -- the initial optimization within Synopsys is signicant in determining the final design Iterate on the create_clock/compile/report loop, saving the.db file whenever you get the best result If only a few paths are creating setup violations, then you have a good chance of getting a better design. 6

Timing Optimization Within Synopsys Scenario 2. You are having difficulty meeting required or desired timing After the first compile use report_constraints -all_violators -verbose report_timing -max_paths 5 // report 5 worst case timings to analyze your results Normally Synopsys concentrates on one critical path at a time, improving it until it is no longer critical, and then moving to another. If instead you tell it to concentrate on several paths simultaneously, it might produce a better results: group_path -name critical -critical_range 5 -to clock20 // all paths within 5ns of the most critical path are // given a higher weight of 1.0 Modify the constraints to be more realistic Identify false paths and multicycle paths If you are close (usually 2ns) then throw the computer at it: compile incremental -map_effort high Re-assess your HDL code and design partitioning If using latches, consider cycle borrowing 7

HDL Coding Style Techniques: Optimizing For Area Automatic resource sharing e.g. always@(a or b or c or d or i) begin z = 0; y = 0; if (i) z= a + b + c; else y = a + b + d; end Synopsys will build 2 adders and a mux (on c/d), rather than 4 adders and bigger output muxes as long as resource sharing is turned on (the default) and the sharable code is in the same procedural block. If the performance constraint is too tight, you might get 4 adders Not all resources can be shared... see `HDL compiler for Verilog manual if sharing does not seem to be working. Minimize the number of flip-flops Everything on the LHS after a always@posedge(clock) becomes a flip-flop. 8

Synthesis Strategies:... Optimizing for Area After synthesis, execute check_design to make sure there is no unused logic. Use report_resources and refer to HDL to see if all possible resource sharing has taken place. Check timing constraints as you might be over-constraining the design. set_structure -boolean true, before a compile might lead to a smaller design by turning on certain boolean minimizations. A carefully tuned synthesis strategy leads to a smaller design Separately optimizing control (esp. FSMs) and datapath. For Area or Speed: Design by hand (using GTECH cell library directly) and place a set_dont_touch on it. 9

Partitioning a Design ie. Deciding what to put in each module. Keep the critical path within one module Critical path = slowest path between flip-flops Keep related combinational logic and sharable rsources within one module e.g. combinational logic blocks that relate to each other e.g. adders and other datapath units that can be potentially shared put sharable resources in same procedural block! Separate designs with different compile strategies into separate modules e.g Separate speed-critical modules from area-critical modules e.g. Separate random logic that you plan to use `set_flatten on from ALU resources, such as adders `Chiplets with different clocks or that use different edges should be in different modules Removes complexity of having to deal with timing constraints that go between different clocks and edges 10

... Design Partitioning Register all outputs, especially when interfacing with someone else s design Makes `set_driving_cell and `set_input_delay easy for others to determine Try to register all module outputs even within your own design, or at least keep the logic depths at the outputs small Separate FSMs into different modules if planning to optimize Have registers in each synthesized module or module group Synthesize the smallest module or group of modules of consistent with the above guidelines Designs larger than 250 to 2000 gates are difficult for Synopsys to optimize Write modules that meet these guidelines OR use `current_design and `group and `ungrouup commands to create suitably sized synthesis targets However, Synopsys can not break up modules for you Best strategy is to `right-size the modules in design Use `characterize in a hierarchical synthesis strategy 11

Partitioning Example Notes: - circles = combinational logic - bar instanced twice as U2 and U3 12

...Partitioning Example Write top level Verilog module (ignoring details of inputs and outputs): module top (); endmodule; Synthesis Script Extract: (instead of current compile):... // on worst_case cells/conditions: characterize -constraints {U1} current_design = foo compile current_design = top group {U4 U5} -design_name pets -cell_name U10 characterize -constraints {U10} current_design = pets compile current_design = top uniquify -cell U3 -new_name bar2 characterize -constraints {U2} current_design = bar compile current_design = top characterize -constraints {U3} current_design = bar2 compile current_design = top report_timing 13

Comments: Function of `characterize...partitioning Example Determines all constraints (set_input_delay,etc.) of cell being characterized based on current design mapping How are the critical paths handled? By bringing them together into one module. Would this strategy lead to the smallest design? No, as U1 can be optimized now that U2 is compiled What could you do to reduce the design size? Compile U1 again at end How could you avoid having to do this step at the RTL coding stage? Designing with this end partition in mind. Why did we `uniquify U3? Has different input constraints than U2 If you do not uniquify all instances of bar will have identical mappings 14

What is wrong with this partitioning? Partitioning Critical path is shared between two modules Wont be timing or area optimized Solution: bar 15

Design Ware Synopsys, and others, provide libraries of carefully optimized design blocks for you to use -- called `Design Ware Libraries include: Arithmetic, Advanced Math, DSP, Control, Sequential, and FAult Tolerant For +,-,*, >=, <=, >, and <, design ware is automatically used More complex cells must be inferred via a procedure call. e.g. cosine: module trigger (angle, cos_out); parameter wordlength1 = 8, wordlength2 = 8; input [wordlength-1:0] angle; output [wordlength-1:0] cos_out; // passes the widths to the cos function parameter angle_width = wordlength1, cos_width = wordlength2; include $SYNOPSYS/dw/dw02/src_ver/DW02_cos_function.inc wire [wordlength2-1:0] cos_out; // infer DW02_cos assign cos_out = cos(angle); endmodule 16

Summary Combinational Logic implied by always@(x...) or assign statements Try to build unprioritized logic in a case statement when possible Use structural Verilog (assign) to build muxes for smallest fastest design Finite State Machines Describe and optimize separately from datapath and random logic Consider style of implementation for best design Fastest speed and density always starts with good knowledge about the design Design capture in the HDL has a major impact Synthesis strategy can help improve a design but it IS NOT A MAGIC BULLET. Use DesignWare for reuse of carefully optimized designs done by others You can add to your own design ware library 17

Partitioning Remember: Partition the design into the smallest modules that Entirely contain the critical paths Have FFs for all outputs (as much as practical) Contain sharable logic Make sense from a design perspective 18