EEC 118 Lecture #17: Implementation Strategies, Manufacturability, and Testing Stanley Hsu 6/5/2012 Slides courtesy of Rajeevan Amirtharajah and Bevan Baas, UC Davis, Zhiyi Yu, Fudan University, and Jeff Parkhurst, Intel Corp.
Permissions to Use Conditions & Acknowledgment Permission is granted to copy and distribute this slide set for educational purposes only, provided that the complete bibliographic citation and following credit line is included: "Copyright 2002 J. Rabaey et al." Permission is granted to alter and distribute this material provided that the following credit line is included: "Adapted from (complete bibliographic citation). Copyright 2002 J. Rabaey et al." This material may not be copied or distributed for commercial purposes without express written permission of the copyright holders. Hsu/Amirtharajah, EEC 118 Spring 2012 2
Announcements Homework 8 due this Friday Last lab this week. Lab 6: 8T SRAM Course evaluations at end of next lecture Final exam: Monday 6/11, 1-3PM in Chem 166. Final review next time Hsu/Amirtharajah, EEC 118 Spring 2012 3
Implementation Strategies Hsu/Amirtharajah, EEC 118 Spring 2012 4
Abstraction of Design Complexity Design complexity Tens of transistors (i.e. logic gate, or analog circuits) Each xtr. hand crafted along with placement and wiring Hundreds of transistors Can be hand crafted (i.e. ALU, register file, filters, and _) Thousands to 100s of thousands of transistors Must find regularity in structure and exploit it (re-use cells) Need CAD tool (i.e. standard cell design, memory) Millions to billions of transistors Must find high-level regularity in structure and exploit it (reuse modules and subsystems) Need CAD tool! i.e. Microprocessor, System on Chip (SOC RF, BB, GPU, MPU, Mem., and etc.), and Hsu/Amirtharajah, EEC 118 Spring 2012 5
Design Abstraction Levels SYSTEM + FUNCTIONAL UNIT, or MODULE GATE CIRCUIT S n+ G DEVICE n+ D Hsu/Amirtharajah, EEC 118 Spring 2012 6
Hierarchical Abstraction Circuit design is really about constructing models While designing at the gate level, we do not consider what is inside each gate, but must select the most appropriate model of the gate. A model of the gate/block is used. Functional/logical Electrical Physical Timing Power Signal Integrity A C A B B C AND AND AND OR OR Carry- Out Hsu/Amirtharajah, EEC 118 Spring 2012 7
Why Learn About Circuits and Layout Then? Best designers can: Construct and apply appropriate model Understand limitations of models multi-disciplinary view required limited attainable performance and energyefficiency Wire or interconnect performance Must update model. i.e. technology scaling Troubleshoot and debug their own designs Hsu/Amirtharajah, EEC 118 Spring 2012 8
Design Aspects that Defy Hierarchy Clock distribution Timing skew Power distribution Sufficient current handling (IR drop) Adequate noise suppression (coupling from clock) Must come up with models to capture these effects. Hsu/Amirtharajah, EEC 118 Spring 2012 9
Design Styles Full Custom Standard Cell Mixed Full Custom and Standard Cell Gate Array Field Programmable Gate Array (FPGA) Examples: Heterogeneous Platforms System On Chip (SOC) AsAP Hsu/Amirtharajah, EEC 118 Spring 2012 10
All transistors and interconnect drawn by hand Full control over sizing and layout Highest area density and higher performance Full Custom Longest time to design maturity [figure from S. Hauck] Hsu/Amirtharajah, EEC 118 Spring 2012 11
Full Custom Design Example Multiplier Chip Hsu/Amirtharajah, EEC 118 Spring 2012 12
Standard Cell Constantheight cells Regular pin locations Cells represent gates, latches, flipflops Placed and routed by software [figure from S. Hauck] Hsu/Amirtharajah, EEC 118 Spring 2012 13
Standard Cell Channels for routing only in older technologies (not necessary with modern processes with many levels of interconnect) [figure from S. Hauck] Hsu/Amirtharajah, EEC 118 Spring 2012 14
Application- Specific Integrated Circuit (ASIC) Hardwired combination of standard cells implements a fixed logic function or FSM (e.g., video codec) Standard Cell Example [Brodersen92] Hsu/Amirtharajah, EEC 118 Spring 2012 15
Combination Standard Cell and Full Custom [figure from S. Hauck] Hsu/Amirtharajah, EEC 118 Spring 2012 16
Soft MacroModules Synopsys DesignCompiler Source: Digital Integrated Circuits, 2nd Hsu/Amirtharajah, EEC 118 Spring 2012 17
Gate Array Polysilicon and diffusion are fixed for all designs Metal layers are customized for particular chips PMOS transistor p-type diffusion polysilicon n-type diffusion NMOS transistor Hsu/Amirtharajah, EEC 118 Spring 2012 18
Unprogrammed Gate Array Isolation Provided by Spacing Hsu/Amirtharajah, EEC 118 Spring 2012 19
Metal connections made to create particular function What logic gate is this? Programmed Gate Array Hsu/Amirtharajah, EEC 118 Spring 2012 20
Gate Array Sea-of-Gates Uncommited Cell polysilicon V DD rows of uncommitted cells GND metal possible contact Committed Cell (4-input NOR) In1 In2 In3 In4 routing channel Hsu/Amirtharajah, EEC 118 Spring 2012 Source: Digital Integrated Circuits, 2nd 21 Out
Unprogrammed Sea-of-Gates Array Hsu/Amirtharajah, EEC 118 Spring 2012 22
Programmed Sea-of-Gates Array Isolation Provided by Cutoff Bias Hsu/Amirtharajah, EEC 118 Spring 2012 23
Gate Array Polysilicon and diffusion the same for all designs 0.125 um example [figure from LETI] Hsu/Amirtharajah, EEC 118 Spring 2012 24
Field Programmable Gate Array (FPGA) Metal layers now programmable with RAM instead of hardwired during manufacture as with a gate array Cells contain general programmable logic and registers [figure from S. Hauck] Hsu/Amirtharajah, EEC 118 Spring 2012 25
Field Programmable Gate Array (FPGA) Chips can now be designed with software User pays for up-front chip design costs All costs: full-custom, standard cell Half: gate array Shared: FPGA User writes code (e.g., Verilog), compiles it, and downloads into the chip Can be used to prototype standard cell (ASIC) design Hsu/Amirtharajah, EEC 118 Spring 2012 26
Heterogeneous Programmable Platforms FPGA Fabric Embedded PowerPC Embedded memories Hardwired multipliers Xilinx Vertex-II Pro High-speed I/O Courtesy Xilinx Hsu/Amirtharajah, EEC 118 Spring 2012 27
Design at a Crossroad: System-on-a-Chip Multi- Spectral Imager 500 k Gates FPGA RAM + 1 Gbit DRAM Preprocessing 64 SIMD Processor Array + SRAM Image Conditioning 100 GOPS Analog µc system +2 Gbit DRAM Recognition Often used in embedded applications where cost, performance, and energy are big issues! DSP and control Mixed-mode Combines programmable and application-specific modules Software plays crucial role Hsu/Amirtharajah, EEC 118 Spring 2012 28
A SoC Example: High Definition TV Chip Courtesy: Philips Hsu/Amirtharajah, EEC 118 Spring 2012 29
Standard Cell Based VLSI Design Flow Front end (logic/rtl design) System specification and architecture HDL coding & behavioral simulation Synthesis & gate level simulation Back end (physical design) Placement and routing DRC, LVS Dynamic simulation and static analysis Hsu/Amirtharajah, EEC 118 Spring 2012 30
Front-End Design Flow Standard Cell Library (from foundary) System Specification (FSM design) RTL Coding (sim.) Synthesis Gate Level (Structural Code) Ex: c =!a & b INV (.in (a),.out (a_inv)); AND (.in1 (a_inv),.in2 (b),. out (c)); Hsu/Amirtharajah, EEC 118 Spring 2012 31
Back-End Design Flow Gate Level (Structural Code) Place & Route Final layout (sent for fabrication) Gate level Verilog DRC LVS Design rule check Layout vs. Schematic check Timing information If timing not met... iterate! Gate level dynamic and/or static timing analysis (STA) Hsu/Amirtharajah, EEC 118 Spring 2012 32
Example: Asynchronous Array of Simple Processors (AsAP) AsAP Project by Prof. Baas VLSI Computation Lab A processing chip containing multiple (6x6) uniform simple processor elements Each processor has its local clock generator and can communicate with its neighbor processors using dual-clock first-in first-out buffers (FIFOs) Source: http://www.ece.ucdavis.edu/vcl/asap/ Hsu/Amirtharajah, EEC 118 Spring 2012 33
Diagram of a 3x3 AsAP More information: http://www.ece.ucdavis.edu/vcl/asap/ Hsu/Amirtharajah, EEC 118 Spring 2012 34
Front-End Design of AsAP MEMORY - Instruction and Data Oscillator FIFO Processor design Communication protocal Tools: HDL Simulators, MATLAB, high-level programming languages, and others. Hsu/Amirtharajah, EEC 118 Spring 2012 35
Back-End Design of AsAP Technology: CMOS 0.18 um Standard cell library: Artisan Tools Placement & Route: Cadence Encounter Final layout edit: icfb (old virtuoso) DRC & LVS: Mentor Graphics Calibre Static timing analysis: Synopsys Primetime Hsu/Amirtharajah, EEC 118 Spring 2012 36
Flow of Placement and Routing 1. Import needed files 2. Floorplan 3. Placement & in-place optimization 4. Clock tree generation 5. Routing 6. Verification Hsu/Amirtharajah, EEC 118 Spring 2012 37
1) Import Needed Files Gate level verilog (.v) Geometry information (.lef) Timing information (.lib) INV (.in (a),.out (a_inv)); AND (.in1 (a_inv),.in2 (b),.out (c)); INV: 1um width AND: 2 um width a INV b AND C INV: 1ns delay; AND: 2 ns delay Delay (a->c): 1ns + 2ns = 3ns Hsu/Amirtharajah, EEC 118 Spring 2012 38
2) Floorplan Size of chip Location of pins Location of main blocks Power supply: deliver enough power for each gate Power Supply (1.8V) Current 1.75V 1.7V VDD (Metal) Gate 1 Gate 2 Gate 3 Gate 4 VSS 1.65V (need another power line) Voltage drop equation: V2 = V1 I * R Hsu/Amirtharajah, EEC 118 Spring 2012 39
2) Single Processor Floorplan Hsu/Amirtharajah, EEC 118 Spring 2012 40
3) Placement and In-Placement Optimization Placement: place the gates on floorplan In-placement optimization Why: timing information difference between synthesis and layout (wire delay) How: change gate size, insert buffers Should not change the circuit function!! Hsu/Amirtharajah, EEC 118 Spring 2012 41
3) Single Processor: Gate Placement Hsu/Amirtharajah, EEC 118 Spring 2012 42
4) Clock Tree Generation Main parameters: skew, delay, transition time Hsu/Amirtharajah, EEC 118 Spring 2012 43
4) Single Processor Clock Tree Hsu/Amirtharajah, EEC 118 Spring 2012 44
5) Routing Connect the gates using wires - Difficult theoretical problem! In practice: Not a click-and-wait process! Steps: Specify routing and timing constraints Connect global signals Connect local signals Issues: Congestion and (clock) coupling Hsu/Amirtharajah, EEC 118 Spring 2012 45
5) Single Processor Layout Area: 0.8mm x 0.8mm Estimated clock frequency: 450 MHz Hsu/Amirtharajah, EEC 118 Spring 2012 46
5) First Generation 6x6 AsAP Layout Area: 30mm 2 36 processors 114 I/O pads Single Processor Hsu/Amirtharajah, EEC 118 Spring 2012 47
6) Verifications After Layout DRC (Design Rule Check) LVS (Layout vs. Schematic) GDS vs. (Verilog + Spectre/Spice module) Gate level Verilog dynamic simulation Mainly check the function Different with synthesis result: clock, OPT Gate level static timing analysis Check all the paths Hsu/Amirtharajah, EEC 118 Spring 2012 48
Useful Design and Simulation Tools Dynamic HDL Simulation Modelsim (Mentor), NC-verilog (Cadence), Active- HDL Dynamic Analog Simulation Spectre (Cadence), Hspice (Synopsys) Synthesis RTL Compiler (Cadence) Design-Compiler, Design-Analyzer (Synopsys) Placement & Routing Encounter & Virtuoso (Cadence) Astro (Synopsys) Hsu/Amirtharajah, EEC 118 Spring 2012 49
Useful Verification Tools DRC & LVS Calibre (Mentor) Diva (Cadence used in EEC 116/119AB) Dracula (Cadence) Static Analysis Primetime (Synsopsys) Hsu/Amirtharajah, EEC 118 Spring 2012 50
Standard cell flow: Design by re-using the same set of logic cells (standard cell library) Many EDA tools to place, route, verify, and automate the design flow Shorter design time Summary Custom flow: - Designed by individual engineers using manual process - Still require EDA tools to help with design verification, rule checks, and simulations. - Time consuming - Requres careful planning! - Closer to target performance Hsu/Amirtharajah, EEC 118 Spring 2012 51
Manufacturability Hsu/Amirtharajah, EEC 118 Spring 2012 52
Design for Manufacturability For class projects or university research, goal is a single working circuit or small number of prototypes Similar scale for industrial research projects Production goal is usually thousands to 100s of millions of working (or at least marketable) parts Must evaluate circuit designs over a range of parameter variations to ensure correct functionality, performance Design for Manufacturability or Statistical Circuit Design encompasses a variety of techniques Yield estimation and maximization, worst-case analysis, etc. Hsu/Amirtharajah, EEC 118 Spring 2012 53
Circuit Parameter Variations All circuit parameters vary some amount due to variations in process, lithography, or environment Geometric parameters: transistor W and L Device parameters: V T, t ox, µ Interconnect parameters: R, C Operating conditions: V DD, T Variations occur both spatially and temporally Circuit-to-circuit on same die (spatial) Die-to-die on same wafer (spatial) Wafer-to-wafer in same fab (temporal) Example: transistor width W = W 0 + W Designer controls Increasing variation Random Hsu/Amirtharajah, EEC 118 Spring 2012 54
CMOS Inverter Example For both NMOS and PMOS: W = W 0 + W L = L 0 + L V T = V T0 + V T k = k 0 + k For capacitor: C load = C 0 + C For entire circuit: T = T 0 + T V DD = V DD0 + V DD Vin Vout C load All these parameters affect circuit performance! Hsu/Amirtharajah, EEC 118 Spring 2012 55
Performance Variation Example 25 20 Inverter Delay Histogram Q: What kind of simulation is this? Count 15 10 5 0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90- Propagation Delay (ps) 100 Delay variations with parameters, loading, V DD, and T Hsu/Amirtharajah, EEC 118 Spring 2012 56
Yield Estimation and Maximization Parametric Yield: ratio of total acceptable circuits to total manufactured circuits Design for manufacturability aims to maximize yield (and $$) Yield statistics are usually complicated since circuit performance is complex function of parameters Numerous methods for estimating and maximizing yield Response surface models (RSM): compact analytical model fit to circuit simulations using Design of Experiments Direct Monte Carlo circuit simulations or the RSM can be used to estimate yields Designer controlled parameters then adjusted to maximize yield estimates Hsu/Amirtharajah, EEC 118 Spring 2012 57
Worst-Case Design 1 Given: range of process, voltage, and temperature (PVT) variations Identify: worst (best) cases for performance parameter of interest Corner Analysis: worse case analysis. All max-min combinations of PVT variations. Labeled by NMOS-PMOS pairs, e.g. Typical NMOS-Typical PMOS (TT) Additional corners: Fast NMOS-Fast PMOS (FF), Slow NMOS- Slow PMOS (SS), Fast NMOS-Slow PMOS (FS), Slow NMOS- Fast PMOS (SF) Voltage corners: Nominal V DD +10% and Nominal V DD -10% Can include threshold voltage, too Temperature corners: 0, 27, and 100 o C Hsu/Amirtharajah, EEC 118 Spring 2012 58
Worst-Case Design 2 Identify worst (best) cases for performance parameter of interest Typical Speed Corner (TT) Nominal VDD, room temperature 27 o C, typ. Vth Slow Speed Corner (SS) 0.9 x VDD, maximum temperature 100 o C, max. Vth Fast Speed Corner (FF) 1.1 x VDD, minimum temperature 0 o C, min. Vth Hsu/Amirtharajah, EEC 118 Spring 2012 59
Process Corners 25 20 Inverter Delay Histogram TT corner Count 15 10 FF corner SS corner 5 0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90- Propagation Delay (ps) 100 Delay variations with parameters, loading, V DD, and T Hsu/Amirtharajah, EEC 118 Spring 2012 60
Summary Design for manufacturability converts a prototype into a real design for large-scale production Statistical models of process, device and interconnect parameters, and operating conditions used to estimate and maximize yield (and profits) Analysis is difficult because of complexity, usually numerical models and many simulations required Variability trend is worsening as processes shrink For example, locations of individual dopant atoms can affect transistor performance Statistical circuit design is becoming as important as performance and power! Hsu/Amirtharajah, EEC 118 Spring 2012 61
Testing Design for test (DFT): constructing designs easy to test. IC testing cost. Automated test pattern generators and checkers As more functionality is packed into a single chip, testing time and complexity increases. Million of chips each need to be tested. i.e. 32 logic inputs -> 2^32=4.2 billion input combinations! Check 1 million combinations in 1 second -> 1.2 hour/chip!! Goal: Maximize testing coverage (minimizes failure probability) in shortest time. Hsu/Amirtharajah, EEC 118 Spring 2012 62
Next Topics: Low power design principles and circuit techniques Future directions in CMOS digital circuits Alternative logic technologies to CMOS Final exam review Hsu/Amirtharajah, EEC 118 Spring 2012 63