ECE 5745 Complex Digital ASIC Design Course Overview



Similar documents
Design Cycle for Microprocessors

Architectures and Platforms

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

7a. System-on-chip design and prototyping platforms

Curriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

ECE 5745 Complex Digital ASIC Design, Spring Course Syllabus

SOC architecture and design

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Implementation Details

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Introducción. Diseño de sistemas digitales.1

Introduction to Electrical and Computer Engineering

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Computer Architecture TDTS10

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

How To Understand The Design Of A Microprocessor

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Introduction to Digital System Design

Power Reduction Techniques in the SoC Clock Network. Clock Power

DESIGN CHALLENGES OF TECHNOLOGY SCALING

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

路 論 Chapter 15 System-Level Physical Design

System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems.

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

Optimizing Code for Accelerators: The Long Road to High Performance

Area 3: Analog and Digital Electronics. D.A. Johns

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

High-Level Synthesis for FPGA Designs

What is a System on a Chip?

Layout and Cross-section of an inverter. Lecture 5. Layout Design. Electric Handles Objects. Layout & Fabrication. A V i

CFD Implementation with In-Socket FPGA Accelerators

Introduction to Cloud Computing

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

Low Power AMD Athlon 64 and AMD Opteron Processors

Enabling Technologies for Distributed and Cloud Computing

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Chapter 2 Logic Gates and Introduction to Computer Architecture

VLSI Design Verification and Testing

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Intel Labs at ISSCC Copyright Intel Corporation 2012

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

IL2225 Physical Design

CISC, RISC, and DSP Microprocessors

Digital Design for Low Power Systems

9/14/ :38

Design Verification and Test of Digital VLSI Circuits NPTEL Video Course. Module-VII Lecture-I Introduction to Digital VLSI Testing

EEC 119B Spring 2014 Final Project: System-On-Chip Module

Rethinking SIMD Vectorization for In-Memory Databases

Enabling Technologies for Distributed Computing

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

State-of-Art (SoA) System-on-Chip (SoC) Design HPC SoC Workshop

Lecture 5: Gate Logic Logic Optimization

FPGA-based Multithreading for In-Memory Hash Joins

Computer Organization

EE411: Introduction to VLSI Design Course Syllabus

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

EE361: Digital Computer Organization Course Syllabus

AMD Opteron Quad-Core

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Testing & Verification of Digital Circuits ECE/CS 5745/6745. Hardware Verification using Symbolic Computation

design Synopsys and LANcity

Alpha CPU and Clock Design Evolution

Generations of the computer. processors.

FLIX: Fast Relief for Performance-Hungry Embedded Applications

System on Chip Design. Michael Nydegger

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

OC By Arsene Fansi T. POLIMI

Testing of Digital System-on- Chip (SoC)

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

數 位 積 體 電 路 Digital Integrated Circuits

İSTANBUL AYDIN UNIVERSITY

What is this course is about? Design of Digital Circuitsit. Digital Integrated Circuits. What is this course is about?

INTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

FPGA-based MapReduce Framework for Machine Learning

Digital Systems Design! Lecture 1 - Introduction!!

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Computer Engineering: Incoming MS Student Orientation Requirements & Course Overview

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Design of Digital Circuits (SS16)

Radeon HD 2900 and Geometry Generation. Michael Doggett

Course Development of Programming for General-Purpose Multicore Processors

FSMD and Gezel. Jan Madsen

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

CHAPTER 4 MARIE: An Introduction to a Simple Computer

IC-EMC Simulation of Electromagnetic Compatibility of Integrated Circuits

Riding silicon trends into our future

A Lab Course on Computer Architecture

Transcription:

ECE 5745 Complex Digital ASIC Design Course Overview Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745

Application Algorithm PL OS ISA Arch RTL Gates Circuits Complex Digital ASIC Design I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course? I Activity 1: Evaluation of Integer Multiplier I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips I Activity 2: Brainstorming for Sorting Accelerator I Course Logistics Devices Technology ECE 5745 Course Overview 2 / 42

The Computer Engineering Stack Application Gap too large to bridge in one step (but there are exceptions e.g., magnetic compass) Technology In its broadest definition, computer architecture is the design of the abstraction/implementation layers that allow us to execute information processing applications efficiently using available manufacturing technologies ECE 5745 Course Overview 3 / 42

The Computer Engineering Stack Computer Architecture Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology In its broadest definition, computer architecture is the design of the abstraction/implementation layers that allow us to execute information processing applications efficiently using available manufacturing technologies ECE 5745 Course Overview 3 / 42

What is Computer Architecture? Computer Architecture Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology Application Requirements Suggest how to improve architecture Provide revenue to fund development Architecture provides feedback to guide application and technology research directions Technology Constraints Restrict what can be done efficiently New technologies make new arch possible In its broadest definition, computer architecture is the design of the abstraction/implementation layers that allow us to execute information processing applications efficiently using available manufacturing technologies ECE 5745 Course Overview 4 / 42

Key Metrics in Computer Architecture I$ Network I$ I$ I$ I Primary Metrics. Execution time (cycles/task). Energy (Joules/task). Cycle time (ns/cycle). Area (µm 2 ) P D$ P P P Network D$ D$ D$ Network I Secondary Metrics. Performance (ns/task). Average power (Watts). Peak power (Watts). Cost ($). Design complexity. Reliability. Flexibility Discuss qualitative first-order analysis from ECE 4750 on board ECE 5745 Course Overview 5 / 42

Unanswered Questions from ECE 4750 P I$ D$ Network Accelerated Instructions Xcel Xcel Network Network I$ P D$ D$ D$ I How can we quantitatively evaluate area, cycle time, and energy? I How do we actually implement processors, memories, and networks in a real chip? I How should we analyze application-specific accelerators?. Very loosely coupled memory-mapped accelerators. More tightly coupled co-processor accelerators. Specialized instructions and functional units ECE 5745 Course Overview 6 / 42

Application-Specific Accelerators Network Out-of-Order Superscalar Superpipelined C D P I$ Accelerated Instructions Xcel Xcel Network I$ P Energy (Joules per Task) Simple Proc Superscalar w/ Deeper Pipelines B A Multicore E F Processor Power Constraint Specialized Accelerators D$ D$ D$ D$ Network Performance (Tasks per Second) ECE 5745 Course Overview 7 / 42

Application-Specific Accelerators P I$ D$ Network Accelerated Instructions Xcel Xcel Network I$ P D$ D$ D$ Energy Efficiency (Tasks per Joule) Design Performance Constraint Embedded Architectures Simple Processor Custom ASIC Less Flexible Accelerator More Flexible Accelerator Design Power Constraint Flexibility vs. Specialization High-Performance Architectures Performance (Tasks per Second) Network ECE 5745 Course Overview 8 / 42

Goal for ECE 5745 is to answer these questions! P I$ D$ Network Accelerated Instructions Xcel Xcel Network Network I$ P D$ D$ D$ I How can we quantitatively evaluate area, cycle time, and energy? I How do we actually implement processors, memories, and networks in a real chip? I How should we analyze application-specific accelerators?. Very loosely coupled memory-mapped accelerators. More tightly coupled co-processor accelerators. Specialized instructions and functional units ECE 5745 Course Overview 9 / 42

Piece of full-custom multiplier array, 1.0 m 2-metal Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Full Custom Design vs. Standard-Cell Design Full-custom layout in 1.0µm w/ 2 metal layers uires complete customization of all layers of wafer Reserved for very high performance or very high volume devices (Intel microprocessors, RF power amps for cellphones) t time consuming design style though each design team usually imposes some discipline igner is free to do anything, anywhere I Full-Custom Design Full Custom Design. Designer is free to do anything, anywhere; though team usually imposes some design discipline. Most time consuming design style; reserved for very high performance or very high volume chips (Intel microprocessors, RF power amps for cellphones) I Standard-Cell Design. Fixed library of standard cells and SRAM memory generators. Register-transfer-level description is automatically mapped to this library of standard cells, then these cells are placed and routed automatically ECE 5745 Course Overview 10 / 42

Also called Cell-Based ICs (CBICs) Fixed library Complex of cells Digital plus ASIC memory Design generators Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Cells can be synthesized from HDL, or entered in schematics Cells placed and routed automatically Standard-Cell Design Methodology Requires complete set of custom masks for each design Currently most popular hard-wired ASIC type (6.884 will use this) Cells arranged in rows Standard Cell Design Mem 1 Mem 2 Cells have standard height but vary in width Designed to connect power, ground, and wells Generated by abutment memory arrays ring 2005 2 Feb 2005 Clock Rail (not typical) Power Rails in M1 Clock Rail VDD Rail L01 Introduction 24 Well Contact under Power Rail Cell I/O on M2 Ripple carry adder with carry chain highlighted NAND2 GND Rail Flip-flop ECE 5745 Course Overview 11 / 42 6.884 Spring 2005 2 Feb 2005 L01 Introduction 25

Standard-Cell Design Methodology Verilog RTL Verilog Simulator Synthesis Place&Route Area & Cycle Time Gate-Level Model Layout Educational Synopsys 90nm Process and Standard Cells Verilog Simulator Switching Activity Power Analysis Execution Time Energy ECE 5745 Course Overview 12 / 42

Motivation Architectural Patterns Scale VT Core Maven VT Core Evaluation Example Standard-Cell Chip Plot Single-Lane Vector-Thread Unit w/ 256 Registers MIT CSAIL Christopher Batten 32 / 42 ECE 5745 Course Overview 13 / 42

What is Complex Digital ASIC Design? Computer Architecture Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology Complex digital ASIC design is the process of quantitatively exploring the area, cycle time, execution time, and energy trade-offs of various general-purpose and application-specific designs using automated standard-cell CAD tools and then to transform the most promising design to layout ready for fabrication ECE 5745 Course Overview 14 / 42

Application Algorithm PL OS ISA Arch RTL Gates Circuits Complex Digital ASIC Design I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course? I Activity 1: Evaluation of Integer Multiplier I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips I Activity 2: Brainstorming for Sorting Accelerator I Course Logistics Devices Technology ECE 5745 Course Overview 15 / 42

Course Structure Prereq Computer Architecture Part 2 Digital CMOS Circuits Part 1 ASIC Design Overview P P MM Part 3 CAD Algorithms ECE 5745 Course Overview 16 / 42

Part 1: ASIC Design Overview P P Topic 1 Hardware Description Languages M Topic 4 Full-Custom Design Methodology Topic 6 Closing the Gap Topic 5 Automated Design Methodologies Topic 8 Testing and Verification Topic 3 CMOS Circuits Topic 2 CMOS Devices Topic 7 Clocking, Power Distribution, Packaging, and I/O ECE 5745 Course Overview 17 / 42

Part 2: Digital CMOS Circuits Topic 9 Combinational Logic Topic 11 Interconnect Topic 10 Sequential State ECE 5745 Course Overview 18 / 42

Part 3: CAD Algorithms Topic 12 Synthesis Algorithms Topic 13 Physical Design Automation Placement x = a'bc + a'bc' y = b'c' + ab' + ac x = a'b y = b'c' + ac RTL to Logic Synthesis Technology Independent Synthesis Technology Dependent Synthesis Global Routing Detailed Routing ECE 5745 Course Overview 19 / 42

Five-Week Design Project P I$ D$ Network Accelerated Instructions Xcel Xcel Network I$ P D$ D$ D$ Energy Efficiency (Tasks per Joule) Design Performance Constraint Embedded Architectures Simple Processor Custom ASIC Less Flexible Accelerator More Flexible Accelerator Design Power Constraint Flexibility vs. Specialization High-Performance Architectures Performance (Tasks per Second) Network ECE 5745 Course Overview 20 / 42

Application Algorithm PL OS ISA Arch RTL Gates Circuits Complex Digital ASIC Design I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course? I Activity 1: Evaluation of Integer Multiplier I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips I Activity 2: Brainstorming for Sorting Accelerator I Course Logistics Devices Technology ECE 5745 Course Overview 21 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Course Motivation: Comp Arch Research Perspective 7 Intel 48-Core Prototype 10 6 AMD 4-Core Opteron 10 5 4 10 DEC Alpha 21264 Frequency (MHz) 3 10 Parallel Proc Performance Sequential Processor Performance Intel Pentium 4 10 Transistors (Thousands) MIPS R2K 2 Typical Power (Watts) 1 Number of Cores 10 10 0 10 1975 1980 1985 1990 1995 2000 2005 2010 2015 Data partially collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond ECE 5745 Course Overview 22 / 42

Cross-Layer Interaction is Critical Computer Architecture Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology Need to quantitatively understand area, cycle time, and energy trade-offs to be able to create new revolutionary architectures that tackle the most challenging physical design issues ECE 5745 Course Overview 23 / 42

Course Motivation: Circuits Research Perspective Your Digital Circuit Here Your Analog Circuit Here Fighter Airplane ~100,000 parts Intel Sandy Bridge E 2.27 Billion transistors ECE 5745 Course Overview 24 / 42

Cross-Layer Interaction is Critical Computer Architecture Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology Digital circuit researchers need to appreciate the system-level context for their subsystems; cross-layer interaction can generate some of the most exciting research ideas ECE 5745 Course Overview 25 / 42

Course Motivation: Industry Development Perspective ECE 5745 Course Overview 26 / 42

Course Motivation: Industry Development Perspective I Ever increasing chip size and complexity. Microprocessors: 100M gates growing to 1000M gates. ASICs: 5M to 10M gates growing to 50M to 100M gates I Ever increasing costs and design team sizes. More than $30M for a 10M gate ASIC. More than $1M for re-spin in case of error (not including redesign costs) I 18 months to design, only an 8 month selling opportunity in market. Fewer new chip-starts every year. Looking for alternatives (e.g., FPGAs) I For chip-starts that do happen. Conservative design. No time for exploration. Educated guess and then implement ECE 5745 Course Overview 27 / 42

Cross-Layer Interaction is Critical Computer Architecture Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology Most significant impact on area, cycle time, execution time, and energy is usually at architectural and microarchitectural levels ECE 5745 Course Overview 28 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Application Algorithm PL OS ISA Arch RTL Gates Circuits Complex Digital ASIC Design I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course? I Activity 1: Evaluation of Integer Multiplier I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips I Activity 2: Brainstorming for Sorting Accelerator I Course Logistics Devices Technology ECE 5745 Course Overview 29 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Scalar Processors with Multithreading Programmer's Logical View T0 T1 T2 T3 T4 Ti Memory Control Instructions Memory Instructions Execution Resources Architectural Registers ECE 5745 Course Overview 30 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Scalar Processors with Multithreading Programmer's Logical View T0 T1 T2 T3 T4 Ti Memory Vector-Vector Add loop: load a, a_ptr load b, b_ptr add c, a, b store c, c_ptr a_ptr++ b_ptr++ c_ptr++ branch Masked Filter loop: load a, a_ptr a_ptr++ branch a = 0 op0 op1... branch ECE 5745 Course Overview 30 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Scalar Processors with Multithreading Programmer's Logical View T0 T1 T2 T3 T4 Ti Memory Typical Core Micro- Architecture Instr Memory T0 T1 T2 T3 Multithreaded Cores Energy MIMD Data Memory Performance ECE 5745 Course Overview 30 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Vector-SIMD Processors Programmer's Logical View CT0 elm 0 elm 1 elm 2 elm 3 CTj elm i Memory Vector-SIMD Arithmetic Instructions Vector-SIMD Memory Instructions Architectural Vector Register with 4 Elements ECE 5745 Course Overview 31 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Vector-SIMD Processors Programmer's Logical View CT0 elm 0 elm 1 elm 2 elm 3 CTj elm i Memory Vector-Vector Add loop: vload A, a_ptr vload B, b_ptr vadd C, A, B vstore C, c_ptr a_ptr++ b_ptr++ c_ptr++ branch Masked Filter loop: vload A, a_ptr vset F, A = 0 vop0 under flag F vop1 under flag F... a_ptr++ branch ECE 5745 Course Overview 31 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Vector-SIMD Processors Programmer's Logical View CT0 elm 0 elm 1 elm 2 elm 3 CTj elm i Memory Instruction Memory VIU 1 3 Vector Lanes Energy MIMD Typical Core CP Micro- Architecture 0 2 Vector- SIMD VMU Data Memory Performance ECE 5745 Course Overview 31 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Motivation Architectural Patterns Scale VT Core Maven VT Core Evaluation Quantitative Area Evaluation Single-Lane Vector-Thread Unit w/ 256 Registers MIT CSAIL Christopher Batten 32 / 42 ECE 5745 Course Overview 32 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Quantitative Area Evaluation Normalized Area 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 1 thread 2 threads 4 threads 8 threads 1 core w/ 4 lanes 4 cores w/ 1 lane ea control logic register file memory units floating-point functional units integer functional units control processor instruction cache data cache Single-core multi-lane design reduces area by 15% Quad-Core w/ Vertical Multithreading Vector- SIMD Multi-core single-lane design increases area by 20% ECE 5745 Course Overview 33 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Quantitative Performance and Energy Evaluation Normalized Energy / Task 1.6 1.4 1.2 1.0 0.8 0.6 0.4 Single-Lane Vector (increasing vlen) Multithreaded Multicore (increasing number of threads per core) 0.8 1.0 1.2 1.4 1.6 Normalized Tasks / Second Energy / Task ( J) 25 20 15 10 5 0 1 thread 2 threads 4 threads 8 threads 1 elm/lane 2 elm/lane 4 elm/lane 8 elm/lane control logic register file memory units int func units control proc inst cache data cache leakage Quad-Core w/ Vertical Multithreading Vector- SIMD (4 core w/ 1 lane) ECE 5745 Course Overview 34 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Simple RISC Processor ASIC SP Control SP Datapath SP Regfile AHIP Controller RAM Interface VCO RAM Subbank (2KB) RAM Subbank (2KB) RAM Subbank (2KB) RAM Subbank (2KB) Course 2: STC1 chip plot and die photo. On the Overview chip plot, the Exec Unit is colored blue, and the 35 / 42 Fetch Unit is colored green and purple. ECE 5745 Figure

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Simple RISC Processor ASIC 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 750 -------------......................... 740......................... 730......................... 720......................... 710...................... * *. 700 -------------.................... * * * *. 690................... * * * * *. 680................. * * * * * *.. 670................ * * * * * * *.. 660................ * * * * * * *.. 650 -------------.............. * * * * * * * * * *. 640.............. * * * * * * * * * *. 630............. * * * * * * * * * *.. 620............ * * * * * * * * * * * *. 610........... *. * * * * * * * * * *.. 600 -------------........... * * * * * * * * * * * *.. 590.......... * * * * * * * * * * * * * *. 580.......... * * * * * * * * * * * * *.. 570......... * * * * * * * * * * * * * * * * 560......... * * * * * * * * * * * * * * * * 550 -------------....... * * * * * * * * * * * * * * * * *. 540....... * * * * * * * * * * * * * * * *. * 530....... * * * * * * * * * * * * * * * * * * 520...... * * * * * * * * * * * * * * * * * * * 510...... * * * * * * * * * * * * * * * * * * * 500 -------------..... * * * * * * * * * * * * * * * * * * * * 490.... * * * * * * * * * * * * * * * * * * * * * 480.... * * * * * * * * * * * * * * * * * * * * * 470... * * * * * * * * * * * * * * * * * * * * * * 460... * * * * * * * * * * * * * * * * *. * * * * 450 -------------... * * * * * * * * * * * * * * * * * * * * * * 440.. * * * * * * * * * * * * * * * * * * * * * * * 430.. * * * * * * * * * * * * * * * * * * * * * * * 420.. * * * * * * * * * * * * * * * * * * * * * * * 410. * * * * * * * * * * * * * * * * * * * * * * * * 400 -------------. * * * * * * * * * * * * * * * * * * * * * * * * 390..... * * * * * 380..... * * * * * 370..... * * * * * 360..... * * * * * 350 ---.... * * * * * * 340.... * * * * * * 330.... * * * * * * 320... * * * * * * * 310... * * * * * * * 300 ---... * * * * * * * 290... * * * * * * * 280.. * * * * * * * * 270.. * * * * * * * * 260.. * * * * * * * * 250 ---.. * * * * * * * * 240. * * * * * * * * * 230. * * * * * * * * * 220. * * * * * * * * * 210. * * * * * * * * * 200 --- * * * * * * * * * * 190 * * * * * * * * * * 180 * * * * * * * * * * I RISC processor w/ 8 KB SRAM I TSMC 0.18 µm process Figure 5: Shmoo plot for a wide voltage range. The horizontal axis plots supply voltageiin mv, and the vertical 1.7 2.1 mm axis plots frequency in MHz. A * indicates correct operation at that operating point and a dot indicates failure. The shmoo plot shows combined results from two di erent chips. I 610K Transistors I 450 MHz at 1.8 V ECE 5745 Course Overview 36 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Scale Vector-Thread Processor ASIC ECE 5745 Course Overview 37 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Scale Vector-Thread Processor ASIC TSMC 0.18µm 7.14 Million Transistors 16.6 mm 2 Core Area Cache Tag CAMs CP Cache Control Cache Data RAM Banks X B A R V M U Vector Lane 0 Vector Lane 1 Vector Lane 2 V I U Vector Lane 3 ECE 5745 Course Overview 38 / 42

Motivation Complex Digital ASIC Vector-Threading Design ActivityScale 1 VT Processor Case Study: Scalar Maven vs. Vector VT Processors FutureActivity Research 2 Energy-Efficiency Scale vs. ofperformance the Scale VTResults Processor ECE 5745 Course Overview 39 / 42 MIT CSAIL Christopher Batten 24 / 39

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Application Algorithm PL OS ISA Arch RTL Gates Circuits Complex Digital ASIC Design I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course? I Activity 1: Evaluation of Integer Multiplier I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips I Activity 2: Brainstorming for Sorting Accelerator I Course Logistics Devices Technology ECE 5745 Course Overview 40 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Brainstorming for Sorting Accelerator I$ Sort array of integers stored in memory Proc sends Xcel base address and size Proc waits for Xcel to finish sorting I Software Baseline. Insertion sort O(n 2 ). 20% of dynamic instructions are lw/sw P Xcel Arbiter D$ Energy Efficiency (Tasks per Joule) Software Baseline I Hardware Accelerator. What is ideal speedup assuming we still use insertion sort?. What if we use a different algorithm?. Sketch hardware impl of a sorting accelerator Performance (Tasks per Second). Where will accelerator be in the energy/perf space? ECE 5745 Course Overview 41 / 42

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2 Application Algorithm PL OS ISA Arch RTL Gates Circuits Devices Take-Away Points I Complex digital ASIC design is the process of quantitatively exploring the area, cycle time, execution time, and energy trade-offs of general-purpose and application-specific designs using automated standard-cell CAD tools and then to transform the most promising design to layout ready for fabrication I Course can provide useful experience with cross-layer interaction for students interested in pursuing research in computer architecture or circuits or students interested in pursuing a career in in industry development of ASICs Technology ECE 5745 Course Overview 42 / 42