IC Technologies Programmable technologies ASIC technologies
Programmable technologies Classification Programming Connections Cells
Programmable technologies Hardware devices providing Logic components Gates, Flip-flops, Buffers Connection lines Such devices are said to be programmable since the ready-made components can be connected according to the design needs There are several classes of programmable devices PAL, PLA, ROM, GAL CPLD FPGA
Programmable technologies Classification ca be made based on: Programming One-Time Programmable, OTP: Fuse, Antifuse Reprogrammable: Reconfigurable: Connections Cells E 2 PROM, SRAM SRAM Global, local, hierarchical, programmable matrix based Simple, complex
Programming Programming technologies influence Area Minimum: Fuse/Antifuse Maximum: SRAM Programming time Cost Minimum : SRAM Maximum : Fuse/Antifuse Minimum : SRAM/Flash Maximum : Fuse/Antifuse
Fuse (OTP) Lines are initially fully connected Programming Consists in burning (fuse) some of the connections points and leave connected only the necessary ones Is performed by means of a voltage higher than the operating voltage LINE1 LINE1 PROGRAM FUSE PROGRAM FUSE LINE2 LINE2 LINE1 LINE2 LINE1 LINE2
Antifuse (OTP) Lines are initially fully disconnected Programming Consists in creating only the necessary connections Is performed by means of a voltage higher than the operating voltage LINE1 LINE2 ANTIFUSE LINE1 LINE2 ANTIFUSE METAL2 METAL2 SiO 2 METAL1 SiO 2 METAL1 SiO 2 SiO 2 LINE1 LINE2 LINE1 LINE2
E 2 PROM (Reprogrammable) Lines are initially fully disconnected Programming Consists in depositing a charge on the floating gate so that the transistors maintains a conducting channel FLOATING GATE PROGRAM GATE FLOATING GATE PROGRAM GATE LINE1 SOURCE DRAIN LINE2 LINE1 SOURCE + + + + + DRAIN LINE2 CHANNEL LINE1 LINE2 LINE1 LINE2
SRAM (Reprogrammable) Lines are initially fully disconnected Programming Consists is storing a logical value (0 or 1) in a static RAM cell LINE1 LINE1 R/W R/W 1 SRAM CELL 1 SRAM CELL LINE2 LINE2 LINE1 LINE2 LINE1 LINE2
SRAM Cell LINE1 Word Vdd B' GND B
FLASH (Reprogrammable) Lines are initially fully disconnected Programming Consists is storing a logical value (0 or 1) in a FLASH cell LINE1 LINE1 R/W R/W 1 FLASH CELL 1 FLASH CELL LINE2 LINE2 LINE1 LINE2 LINE1 LINE2
FLASH Cell LINE1 Floaring Gate Word B' B LINE2
Connections Connections influence: Area Global connections require more area Delays Local connections very efficient for neighboring cells Global connections very efficient for distant cells Routability Local connections peovide more flexibility and thus a better routability Complexity of routing algorithms increases when the locality of connections increases
Global connections Connections spanning a large portion of the die Advantages Distant cells are connected easily Propagation delay easily predictable Propagation delay relatively low for distant cells Simpler architecture of the device Disadvantages Propagation delay high for neighboring cells Each line is shared among several cells Can be drived by a single cell and thus offers limited flexibility
Global connections The capacitance is that of the entire line Almost fixed delay constant The capacitance is acceptable for long wirings No active elements The laine resides on a single metal layer (no vias) A A The capacitance is unacceptable for short wirings B B
Local connections Connections much shorter than the die size Advantages Very low delays for short connections Neighboring elements are easily connected Each line is shared among few cells High flexibility Disadvantages Propagation delay is hard to predict Propagation delay between distant cells is higher compared to global connections due to vias and active elements More complex architecture of the device
Local connections Connection of neighboring cells Capacintance depends on distance and is thus limited No active elements The line lays on two metal layers at most Few or no vias Only the necessary routing resources are used No waste and higher routability A B A B
Local connections Connection of distant cells Capacintance depends on distance and is thus high No active elements The line lays on two metal layers at most Several vias are needed to span long distances Parasitic capacitance is higher Several local wiring resources are needed A A
Hierarchical connections Combines advantages of local and global schemes Advantages Uses local and fast lines for neighboring cells Uses global and efficient lines for distant cells Good flexibility Disadvantages Global resources are limited due to size constraints Complexity of routing algorithms is higher Propagation delays are harder to predict
Hierarchical connections Connection of neighboring cells Capacintance depends on distance and is thus limited No active elements The line lays on two metal layers at most Few or no vias Only the necessary routing resources are used A A
Hierarchical connections Connection of distant cells Capacintance does not depend on distance and is thus limited and predictable No active elements The line lays on two metal layers at most Few or no vias No local resources are wasted B A A B
Programmable Switch Matrix PSMs have input/output ports connected to the wiring resources of the device Allow connecting an input port to several (possibly all) output ports in a selective and programmable way
Programmable Switch Matrix Several connection schemes can be implemented using PSMs The local scheme is the simplest
Programmable Switch Matrix A hierarchical scheme offers higher efficiency PSMs are connected through linse of different length 1-length lines connect adjacents PSMs N-lengthlines connects PSMs at a disstance equal to N 1-length lines 2-length lines 4-length lines
Simple cells Limited number of I/Os Sequential and combinatorial cells are distinct Complex functions require several cells Easier technology mapping algorithms Complex and less efficient routing Better optimization opportunities LC LC LC LC LC LC LC LC
Complex cells High number of I/Os Sequential and combinatorial elements in a cell Complex functions require a sigle (or few) cell Complex technology mapping algorithms Simpler and more efficient routing Less optimization opportunities LC LC
Programmable devices Families PAL, PLA, GAL PLD, CPLD FPGA
2-level programmable devices Used for functions in SoP/PoS two-levels forms Provide Fized number of I/Os An AND-plane to construct implicants An OR-plane, to sum implicant into functions For electical reasons they also provide Input and output buffers Input Input Buffers AND Plane OR Plane Output Buffers Output
2-level programmable devices Three major families Programmable Logic Array (PLA) AND- and OR-planes programmable Only necessary implicants are built Programmable Array Logic (PAL) AND-plane only is programmable Only necessary implicants are built Read-Only Memory (ROM) AND-plane pre-programmed with a decoder All minterms are available
Programmable Logic Array a b c I 1 I 2 I 3 I 4 I 5 I 6 AND Plane Input buffers Inverters P 1 P 2 P 3 P 4 Output buffers OR Plane O 1 f 1 O 2 f 2
Programmable Array Logic a b c I 1 I 2 I 3 I 4 I 5 I 6 AND Plane Input buffers Inverters P 1 P 2 P 3 P 4 Output buffers OR Plane O 1 f 1 O 2 f 2
Read-Only Memory A ROM associates a word (output) to each address (input) Usually several functions are needed: f i = f i (x 1,x 2,...,x n ) i={1,2,...,t} Using a different notation: (x 1,x 2,...,x n ) => (f 1,f 2,...,f t ) This form shows a transformation going from an n-tuple of inputs x i to a t-tuple of outputs f j The address decoder generates all 2 n minterms from the n input variables x i
Read-Only Memory Address decoder The variables x i are the inputs Outputs are all the minterms built on the input variables x 1 x 2 x 3 000 = x 1 x 2 x 3 001 = x 1 x 2 x 3 111 = x 1 x 2 x 3
Read-Only Memory x 1 x 2 x 3 000 001 010 ROM 111 Address Decoder Output Buffers f 1 f 2 f 3 f 4
Programmable Logic Devices An extension of PLA-PAL devices Provides an internal feedback network Provides sequential elements Primary Inputs Feedback Input f i D Q Q Output select Output enable f i Clock Feedback
Generic Array Logic Further extension over PAL, PLA, PLD Provides possibly several AND/OR planes Provides complex cells for I/O and feedback routing I/CLK I I OLMC I/O AND-OR Planes OLMC I/O I OLMC I/O
Generic Array Logic OLMCs (Output Logic Macro Cells) make the architecture significantly flexible Simple output OLMC O Input OLMC I
Generic Array Logic Output with internal feedback OLMC O Output with external feedback OLMC O OLMC I/O
Complex PLD Evolution of PLDs and GALs Characterized by Global Connections Lumped logic With respect to PLDs and GALs Are much larger (up to ~1M equivalent gates) Available cells are much more complex Several advantages High density High speed Regular and easily programmable structure
Complex PLD Input Interconnect Logic Flip-Flop Ouput
Field Programmable Gate Array Field Programmable Gate Arrays (FPGAs) are the most complex and powerful prorammable devices currently available Characterized by Distributed connections Distributed logic With respect to PALs, PLAs, GALs and CPLDs Are much larger (up to ~10-20M equivalent gates) Cells have different complexity Extremely flexible Since few years provide fused components as well Multipliers, memories, microprocessor cores,...
Field Programmable Gate Array I/O I/O I/O I/O I/O I/O I/O I/O LC LC LC LC LC I/O I/O LC LC LC LC LC I/O I/O LC LC LC LC LC I/O I/O LC LC LC LC LC I/O I/O I/O I/O I/O I/O I/O I/O
Field Programmable Gate Array Structured according to different philosopies Simple cells Better exploitation of the logic resources Higher complexity of the interconnections structure Routability problems Complex cells Complex functions use one or few cells Poorer exploitation of logic resources Simpler interconnections structure Improved routability
Programmable devices Commercial devices Altera Actel Xilinx
Altera MAX3000A: Device
Altera MAX3000A: Macrocell
Altera EP312: Device
Altera EP312: Macrocell
Altera EP910: Device
Altera EP910: Macrocell
Altera Flex10K: Device
Altera Flex10K: EAB
Altera Flex10K: LAB
Altera Flex10K: LE
Altera Apex20K: Device
Altera Apex20K: MegaLAB
Altera Apex20K: LAB
Altera Apex20K: Device
Altera Apex20K: Product Term
Altera Apex20K: ESB Logic
Altera Apex20K: ESB Memory
Actel ACT3: Device
Actel ACT3: I/O & Clock Modules
Actel ACT3: C & S Modules
Actel 40MX: Device
Actel 40MX: D & I/O Module
Actel 40MX: S Modules
Actel 40MX: Memory Module
Actel ProASIC: Device
Actel ProASIC: Cell
Actel ProASIC: Local Routing
Actel ProASIC: Long Routing
Actel ProASIC: Very Long Routing
Xilinx XC3000: Connections
Xilinx XC3000: Switch Matrix
Xilinx XC3000: IOB
Xilinx XC3000: CLB
Xilinx XC5200: Device
Xilinx XC5200: Connections
Xilinx XC5200: VersaBlock
Xilinx XC5200: LC & IOB
Xilinx XC9500: Device
Xilinx XC9500: Macrocell
Xilinx Spartan: Device
Xilinx Spartan: Connections
Xilinx Spartan: CLB
Xilinx Spartan: IOB
Xilinx Virtex-II: Device
Xilinx Virtex-II: CLB
Xilinx Virtex-II: Slice
Xilinx Virtex-II: IOB
ASIC Standard Cell Layout Cells Placement & Routing Tecnology
Standard Cell Is the most used ASIC technology Cells More flexible than a gate array Simpler that a full custom Huge number (several hundreds) of commonly used cells are available and are precharacterized Layout Masks Regural structure organized into rows Cells have a fixed height A design requires a full set of masks
Layout I/O cells Core cells Routing channels
Power supply Power supply and ground lines run adjacent to the upper and lower side of cells rows VDD Rail VSS Rail Cells provide axial symmetry that allows packing rows and sharing VDD/VSS rails VDD Rail Common VSS Rail VDD Rail
Cell structure Five areas VDD Rail: p-tub: Local wiring: n-tub: VSS Rail: power supply pmos, pull-up I/O pins, local connections nmos, pull-down groung VDD Rail p-tub Local wiring n-tub VSS Rail
Cell geometry Fixed height All rows have the same height The two-dimensional layout process is decomposed into several almost one-dimensional placement problems Variable width Dpending on the cells complexity and MOS size Symmetry Cells are often symmetric w.r.t both x and y axes No flip Flip y Flip X Flip xy
NAND3 Gate VDD rail p-tub Local wiring Pin n-tub VSS rail
Placement Each row is (logically) divided into adjacent sites A cells always occupies an integral number of sites site cell Special cells called fillers create empty spaces into the rows without beaking VDD/VSS rails site cell filler
Routing Routing is constrained to predefined areas Local wiring Routing channels Feedthrough over-the-cell routing Routing channel Feedthrough Routing channel Local wiring
Routing grid The routing area is logically organized as a grid Horizontal grid (corresponds to one layer) Veritical grid (corresponds to a different layer) The grid defines Positions of the nets Position of the cell pins
Routing layer Routing involves several dedicated metal layers Horizontal nets Vertical nets Power supply and ground Clock trees... This constraint Significantly simplifies routing algorithms Allows obtaining satisfactory routing density Requires one via hole at each corner of a wire made of several nets
Routing layer Verially aligned pins Unaligned pins 1 layers 0 via Horizontally aligned pins 2 layers 2 vias Unaligned constrained pins 2 layers 2 vias 2 layers 4 vias
Clock tree Distributes the clock(s) to flip-flops The distance between flip-flops introduces a skew Clock tree generation has the goals of: Distributing the clock signal to all flip-flops Maintain the skew below a given threshold (few ps) In the simple case is a two-step process Generation of an (sub)optimal geometry of the tree H-tree, Steiner tree Introduction of non-inverting buffers where necessary To add a delay To satisfy the constraints on maximum load and fan-out
Clock tree Unbalanced clock tree: maximum skew 20ps 10 10 10 5 5 5 skew = 10 skew = 10 FF FF FF delay = 15 delay = 25 delay = 35 Balanced clock tree: maximum skew 0ps (ideal) 10 5 10 5 10 5 10 10 10 skew = 20 skew = 0 skew = 0 skew = 0 FF FF FF delay = 35 delay = 35 delay = 35
Standard cell libraries A library is composed of tree main files LEF: Library Exchange Format Technology Routing: layer description Cells: size, type, placement, symmetries, pins CTLF: Compiled Timing Library Format Cell types and pins Timing characterization of single cells GCF: General Constraint Format System-level (chip) timing constraints
Tecnology: Metal Layer LAYER METAL1 TYPE ROUTING; DIRECTION HORIZONTAL; WIDTH 0.80; SPACING 0.60; SPACING 3.20 RANGE 20 200; PITCH 1.80; OFFSET 0.0; HEIGHT 1.195; THICKNESS 0.480; RESISTANCE RPERSQ 0.07000000; CAPACITANCE CPERSQDIST 0.0000289; EDGECAPACITANCE 0.0000039000; END METAL1 METAL1 VIA SQUARE PITCH CLOSEST SEPARATION SPACING WIDTH
Tecnology: Sites SITE xlite_core_site SYMMETRY y; CLASS core; SIZE 1.80 BY 7.20; END xlite_core_site CLASS pad CLASS core 7.20 1.80 SYMMETRY Y
Tecnology: Cells MACRO AND2X1_TAX0 CLASS core; ORIGIN 0.00 3.60 ; SIZE 3.60 BY 7.20 ; SYMMETRY x y ; SITE xlite_core_site ; PIN A DIRECTION INPUT; PORT LAYER METAL11; RECT 2.10 1.50 2.70-0.90; END END A... PIN B... END B... END AND2X1_TAX0 ORIGIN 1.80-1.80 SIZE 7.20 3.60 5.40 1.80 3.60 0.00 (2.70, 0.90) (2.10, 1.50) 0 1.80 3.60
Place & Route
ASIC Gate Array Layout Cells Routing
Gate Array Progressively less used technology Cells OTP devices offer competitive alternative Are transistors, transistor pairs or NAND gates Already fused on the partly manufactured die Layout Cells have fixed positions Routing is extremely complex Masks A design requires only the the masks for wiring
Layout: Channeled (gate array) I/O cells Logic cells (transistors) Routing channels
Layout: Channelless (sea of gates) I/O cells Logic cells (transistors) I/O routing only
Layout Channeled Routing is constrained to channels only: easy Limited cell/net density No standard cells can be exploited Channelless Over-the-cell routing: complex High cell/net density Supports standard cells and RAM/ROM arrays Today s trend is to use almos only channelles gate array
Cells VDD rail pmos (small) pmos (large) pmos (small) pins verical tracks nmos (small) nmos (large) nmos (small) VSS rail
Over-the the-cell routing Vertical Grid Horizontal Grid
ASIC Full custom Layout Floorplanning Macrocell
Full Custom Used only for very critical applications Cells Extremely complex and costly Offers maximum flexibility and performance No predefined cells The physical design describes the complete geometry Layout Free Placement and routing are extremely complex Masks A design requires a full set of masks
Layout I/O cells Macrocells Routing area
Floorplanning The floorplan defines the position of macro blocks Goals Area minimization Rotuing simplification (global and detailed phases) Minimization of the average length of critical nets Each macro block Contains complex custom logic Contains dedicated areas for routing Estimated: Logic area is expanded by a factor depending on the connection density Exact: Logic elements are already placed
Floorplanning A block is defined by Dimensions: (x,y) x Area: A=xy y A Aspect ratio: r=x/y Can exploit Rotation Multiples of 90 Rotation Reshaping Reshaping Fixed area Constraints on the aspect ratio
Macrocell Combines Cells Flexibility and performance of full custom Simplicity of standard cell Standard cells Full custom macrocells Layout Floorplan Standard cell areas are organized into rows Masks A design requires a full set of masks
Macrocell I/O cells Macrocells (full custom) Standard cell area Routing channels
Conclusioni Tecnologie a confonto
Tecnologie a confronto Densità Mgates/cm 2 Frequenza MHz Sviluppo months/mgate Dimensione celle Tipo celle Posizione celle Full custom 25 4000 6-9 Variable Variable Variable Standard cell 10 1500 2-4 Fixed height Variable Fixed rows Gate array 5 1000 3-6 Fixed Fixed Fixed OTP PLD 1 750 0.5 Fixed Programmable Fixed FPGA 0.4 500 0.5 Fixed Programmable Fixed
Tecnologie a confronto 6 y Design time / Complexity (gates) 1 y 2 m 1971: Intel 4004 2.2 Kgates Full Custom Standard Cell 1 w SSI CPLD FPGA SSI 2000: Intel Pentium 4 42 Mgates 1 d 1 10 100 1K 10K 100K 1M 10M 100M
Intel 4004 2.2 Kgates Intel Pentium 4 42 Mgates Intel 4004 Intel Pentium 4
Further reading 1. Integrated circuit products EBook 2. Basic Integrate Circuit Manufacturing EBook 3. Xilinx Virtex 6 Family Overview Xilinx Inc. 4. Xilinx Virtex 6 CLB Usage Xilinx Inc.