Architectures and Platforms

Similar documents

7a. System-on-chip design and prototyping platforms

System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems.

Testing of Digital System-on- Chip (SoC)

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

What is a System on a Chip?

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Introduction to System-on-Chip

ELEC 5260/6260/6266 Embedded Computing Systems

on-chip and Embedded Software Perspectives and Needs

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

SOC architecture and design

ARM Microprocessor and ARM-Based Microcontrollers

Codesign: The World Of Practice

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

NIOS II Based Embedded Web Server Development for Networking Applications

Rapid System Prototyping with FPGAs

Introduction to Digital System Design

Data Centric Systems (DCS)

Defining Platform-Based Design. System Definition. Platform Based Design What is it? Platform-Based Design Definitions: Three Perspectives

Embedded Systems. introduction. Jan Madsen

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

1. Introduction to Embedded System Design

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Router Architectures

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models

EMBEDDED SYSTEM BASICS AND APPLICATION

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Networking Remote-Controlled Moving Image Monitoring System

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

Computer Engineering: Incoming MS Student Orientation Requirements & Course Overview

Embedded System Hardware - Processing (Part II)

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Computer Systems Structure Input/Output

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

High Performance or Cycle Accuracy?

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

OpenSPARC T1 Processor

Introduction to the Latest Tensilica Baseband Solutions

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Chapter 2 Logic Gates and Introduction to Computer Architecture

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉志尉 National Chiao Tung University

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

MPSoC Virtual Platforms

High-Level Synthesis for FPGA Designs

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

Motivation: Smartphone Market

İSTANBUL AYDIN UNIVERSITY

Chapter 11 I/O Management and Disk Scheduling

Computer System Design. System-on-Chip

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Custom design services

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

synthesizer called C Compatible Architecture Prototyper(CCAP).

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Software Defined Radio Architecture for NASA s Space Communications

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Computer Systems Structure Main Memory Organization

Digital Systems Design! Lecture 1 - Introduction!!

Systems on Chip Design

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

A Mixed-Signal System-on-Chip Audio Decoder Design for Education

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Chapter 1 Computer System Overview

Multiprocessor System-on-Chip

System-on-Chip Design with Virtual Components

Central Processing Unit (CPU)

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS

PRINT SERVER IMPLEMENTATION ALTERNATIVES. An XCD White Paper

FSMD and Gezel. Jan Madsen

Lesson 10:DESIGN PROCESS EXAMPLES Automatic Chocolate vending machine, smart card and digital camera

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)

Architecture of distributed network processors: specifics of application in information security systems

Design Cycle for Microprocessors

FPGA Prototyping Primer

Power Reduction Techniques in the SoC Clock Network. Clock Power

Mobile Processors: Future Trends

Low Cost System on Chip Design for Audio Processing

SYSTEM-ON-PROGRAMMABLE-CHIP DESIGN USING A UNIFIED DEVELOPMENT ENVIRONMENT. Nicholas Wieder

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

SoC Design Lecture 12: MPSoC Multi-Processor System-on-Chip. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

Designing Systems-on-Chip Using Cores

White Paper. S2C Inc Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: Fax:

Early Hardware/Software Integration Using SystemC 2.0

Embedded Systems Engineering Certificate Program

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Processor Generation Method from Instruction Behavior Description Based on Specification of Pipeline Stages and Functional Units

Implementation Details

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

CISC, RISC, and DSP Microprocessors

Transcription:

Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation 4. ASIP Design Flow 5. Specialisation of a VLIW ASIP 6. Tool Support for Processor Specialisation 7. Application Specific Platforms 8. IP-Based Design (Design Reuse) 9. Reconfigurable Systems Hardware/Software Codesign Arch&Platf. - 2 Remember the Design Flow Informal Specification, Constraints Arch. Selection System architecture Estimation Modeling System model Mapping Scheduling Functional Simulation Formal Verification not OK Mapped and scheduled model OK not OK Simulation Formal Verification Softw. model Simulation Hardw. model Softw. Generation Hardw. Synthesis Softw. blocks Simulation Hardw. blocks not OK Testing OK Fabrication Prototype

Hardware/Software Codesign Arch&Platf. - 3 Architecture Selection and Mapping Select the underlying hardware structure on which to run the modelled system. Map the functionality captured by the system over the components of the selected architecture. Functionality includes processing and communication. Hardware/Software Codesign Arch&Platf. - 4 Architecture Selection General Purpose vs. Application Specific Use a general purpose, existing platform and map the application on it. or something in-between Build a customised architecture strictly optimised for the particular application. Software vs. Hardware Use programmable processors running software. or both Use dedicated electronics fixed reconfigurable Mono vs. Multipr. Single vs. Multichip Monoprocessor single chip Multiprocessor multi chip

Hardware/Software Codesign Arch&Platf. - 5 The trade-offs: Architecture Selection (cont d) Performance (high speed, low power consumption) Application specific high Hardware high General purpose low Reconfigurable hardware Software low Flexibility (how easy it is to upgrade or modify) General purpose high Software high Application specific low Reconfigurable hardware Hardware low Hardware/Software Codesign Arch&Platf. - 6 Architecture Selection (cont d) energy consumed order of magnitude high med. FPGA ASIP GP proc. order of magnitude low ASIC low med. high flexibility

Hardware/Software Codesign Arch&Platf. - 7 General Purpose vs. Application Specific Processors Both GP processors and ASIPs (application specific instruction set processors) can be RISCs, CISCs, DSPs, microcontrollers, etc. - One could look at DSPs and microcontrollers as being specific for DSP and simple control applications respectively. - An application specific DSP or microcontroller is, however, more specialised then just for DSP or control applications. GP processors - Neither instruction set nor microarchitecture or memory system are customised for a particular application or family of applications ASIPs - Instruction set, microarchitecture and/or memory system are customised for an application or family of applications. - What results is better performance and reduced power consumption. Hardware/Software Codesign Arch&Platf. - 8 What Makes an ASIP Specific? What can we specialize in a processor? Instruction set (IS) specialisation Exclude instructions which are not used - reduces instruction word length (fewer bits needed for encoding); - keeps controller and data path simple. Introduce instructions, even exotic ones, which are specific to the application: combinations of arithmetic instructions (multiplyaccumulate), small algorithms (encoding/decoding, filter), vector operations, string manipulation or string matching, pixel operations, etc. - reduces code size reduced memory size, memory bandwidth, power consumption, execution time.

Hardware/Software Codesign Arch&Platf. - 9 What Makes an ASIP Specific? Function unit and data path specialisation Once an application specific IS is defined, this IS can be implemented using a more or less specific data path and more or less specific function units. Adaptation of word length. Adaptation of register number. Adaptation of functional units - Highly specialised functional units can be introduced for string matching and manipulation, pixel operation, arithmetics, and even complex units to perform certain sequences of computations (co-processors). Hardware/Software Codesign Arch&Platf. - 10 Memory specialisation What Makes an ASIP Specific? Number and size of memory banks. Number and size of access ports. - They both influence the degree of parallelism in memory access. - Having several smaller memory blocks (instead of one big) increases parallelism and speed, and reduces power consumption. - Sophisticated memory structures can increase cost and bandwidth requirement. Cache configuration: - separate instruction/data? - associativity - cache size - line size Depends very much on the characteristics of the application and, in particular, on the properties related to locality. Very large impact on performance and power consumption.

Hardware/Software Codesign Arch&Platf. - 11 Interconnect specialization What Makes an ASIP Specific? Interconnect of functional modules and registers. Interconnect to memory and cache. - How many internal buses? - What kind of protocol? - Additional connections increase the potential of parallelism. Control specialisation Centralised control or distributed (globally asynchronous)? Pipelining? Out of order execution? Hardwired or microprogrammed? Hardware/Software Codesign Arch&Platf. - 12 ASIP Design Flow (It can be seen as a part of the big design flow - slide 2) Processor Architecture Algorithm(s) Compiler Simulator Performance numbers

Hardware/Software Codesign Arch&Platf. - 13 A SOC for Multimedia Applications Glue logic A/D and D/A µcontroller (ASIP) VLIW processor (ASIP) On-chip memory DSP (GP) This is a typical application specific platform. Its structure has been adapted for a family of applications. Besides GP processor cores, the platform also consists of ASIP cores which themselves are specialised. The application specific µcontroller performs master control of the system and memory access control. The off-the-shelf (GP) DSP performs less computation intensive modem and sound codec functions. The VLIW ASIP performs computation intensive functions: discrete cosine and inverse discrete cosine transforms, motion estimation, etc. Hardware/Software Codesign Arch&Platf. - 14 Specialization of a VLIW ASIP Internal storage & interconnect To memory system Crossbar / Bus Register File 1 Register File 2 Register File 3 ALU A1 MULT MULT MULT MULT ALU M1 M2 M3 M4 A2 ALU MAC A3 MA1 ALU A4 MULT M5 Cluster 1 Cluster 2 Cluster 3 Datapath ALU A5 Instruction fetch & decode From memory system

Hardware/Software Codesign Arch&Platf. - 15 Specialization of a VLIW ASIP (cont d) That s how an instruction word looks like: op 1 op 2 op 3 op 4 op 5 op 6 op 7 op 8 op 9 op 10 op 11 Cluster 1 Cluster 2 Cluster 3 Hardware/Software Codesign Arch&Platf. - 16 Specialization of a VLIW ASIP (cont d) Traditionally the datapath is organised as single register file shared by all functional units. Problem: Such a centralised structure does not scale! We increase the nr. of functional units in order to increase parallelism We have to increase the number of registers in the register file Internal storage and communication between functional units and registers becomes dominant in terms of area, delay, and power. High performance VLIW processors are limited not by arithmetic capacity but by internal bandwidth.

Hardware/Software Codesign Arch&Platf. - 17 A solution: clustering. Specialization of a VLIW ASIP (cont d) Restrict the connectivity between functional units and registers, so that each functional unit can read/write from/to a subset of registers. Organise the datapath as clusters of functional units and local register files. Nothing is for free!!! Moving data between registers belonging to different clusters takes much time and power! You have to drastically minimise the number of such moves by: - Carefully adapting the structure of clusters to the application. - Using very clever compilers. Hardware/Software Codesign Arch&Platf. - 18 Specialization of a VLIW ASIP (cont d) Instruction set specialisation: nothing special. Function unit and data path specialisation - Determine the number of clusters. - For each cluster determine - the number and type of functional units; - the dimension of the register file. Memory specialisation is extremely important because we need to stream large amounts of data to the clusters at high rate; one has to adapt the memory structure to the access characteristics of the application. - determine the number and size of memory banks

Hardware/Software Codesign Arch&Platf. - 19 Specialization of a VLIW ASIP (cont d) Interconnect specialization - Determine the interconnect structure between clusters and from clusters to memory: - one or several buses, - crossbar interconnection - etc. Control specialisation: That s more or less done, as we have decided for a VLIW processor. Hardware/Software Codesign Arch&Platf. - 20 Tool Support for Processor Specialisation Look at the design flow on slide 12! In order to be able to generate a specialised architecture you need: Retargetable compiler Configurable simulator

Hardware/Software Codesign Arch&Platf. - 21 Retargetable Compiler Retargetable compiler Processor Architecture Retargetable Compiler Algorithm Object code Hardware/Software Codesign Arch&Platf. - 22 Retargetable Compiler (cont d) An automatically retargetable compiler can be used for a range of different target architectures. The actual code optimization and code generation is done by the compiler, based on a description of the target processor architecture. This description is formulated in a, so called, architecture description language. Having a good compiler is not only important for the processor specialisation process! Once you have got your specialised ASIP you need a good compiler in order to efficiently make use of it!

Hardware/Software Codesign Arch&Platf. - 23 Configurable Simulator Processor Architecture Object code Simulator Performance numbers Such a simulator can be configured for a particular architecture (based on an architecture description) In this context, the most important output produced by the simulator is performance numbers: - throughput - delay - power/energy consumption Hardware/Software Codesign Arch&Platf. - 24 Application Specific Platforms Not only processors but also hardware platforms can be specialised for classes of applications. The platform will define a certain communication infrastructure (buses and protocols), certain processor cores, peripherals, accelerators commonly used in the particular application area, and basic memory structure.

Hardware/Software Codesign Arch&Platf. - 25 Application Specific Platforms (cont d) µproc. Core3 µproc. Core2 µproc. Core1 Cache DMA Memory Bridge System bus Peripheral bus Peripheral Reconfigurable logic Peripheral Hardware/Software Codesign Arch&Platf. - 26 Application Specific Platforms (cont d) Design space exploration for platform definition: Platform Architecture Mapping/ Compiling Applications Simulator Performance numbers

Hardware/Software Codesign Arch&Platf. - 27 Instantiating a Platform Once we have an application, the chip to implement on will not be designed as a collection of independently developed blocks, but will be an instance of an application specific platform. The hardware platform will be refined by - determining memory and cache size - identifying the particular cores, peripherals to be used - adding specific ASICs, accelerators - determining the amount of reconfigurable logic (if needed) Hardware/Software Codesign Arch&Platf. - 28 Instantiating a Platform (cont d) Platform Architecture Platform Instance Mapping/ Compiling Application Simulator Performance numbers

Hardware/Software Codesign Arch&Platf. - 29 System Platforms What we discussed about (see previous slides) are so called hardware platforms. The hardware platform is delivered together with a software layer: hardware platform + software layer = system platform. Software layer: - real-time operating system - device drivers - network protocol stack - compilers The software layer creates an abstraction of the hardware platform (an application program interface) to be seen by the application programs. Hardware/Software Codesign Arch&Platf. - 30 IP-Based Design (Design Reuse) The key concept in order to increase designers productivity is reuse. In order to manage the complexity of current large designs we do not start from scratch but reuse as much as possible from previous designs, or use commercially available pre-designed IP blocks. IP: intellectual property. Some people call this IP-based design, core-based design, reuse techniques, etc.: Core-based design is the process of composing a new system design by reusing existing components.

Hardware/Software Codesign Arch&Platf. - 31 IP-Based Design (cont d) What are the blocks (cores) we reuse? interfaces, encoders/decoders, filters, memories, timers, microcontroller-cores, DSP-cores, RISC-cores, GP processor-cores. Possible(!) definition A core is a design block which is larger than a typical RTL component. Of course: We also reuse software components! Hardware/Software Codesign Arch&Platf. - 32 Library Vendor A IP-Based Design (cont d) Library Vendor B Core 1 Core 2 Core 3 glue glue glue Interconnection bus/switch glue Core 4 µprocessor Interface I/O Library What we have designed here can be: Vendor C An application specific SOC A platform to be further instantiated for a particular application.

Hardware/Software Codesign Arch&Platf. - 33 Types of Cores Hard cores: are fully designed, placed, and routed by the supplier. A completely validated layout with definite timing rapid integration low flexibility Firm cores: technology-mapped gate-level netlists. less predictability flexibility during place and route Hardware/Software Codesign Arch&Platf. - 34 Types of Cores (cont d) Soft cores: synthesizable RTL or behavioral descriptions. much work with integration and verification. maximal flexibility Flexibility can provide opportunities like e.g. adding application specific instructions to a processor core by modifying the behavioral description.

Hardware/Software Codesign Arch&Platf. - 35 Reconfigurable Systems Programmable Hardware Circuits: They implement arbitrary combinational or sequential circuits and can be configured by loading a local memory that determines the interconnection among logic blocks. Reconfiguration can be applied an unlimited number of times. Main applications: - Software acceleration - Prototyping Hardware/Software Codesign Arch&Platf. - 36 Reconfigurable Systems (cont d) Dynamic reconfiguration: spacial and temporal partitioning at t 3 at t 4 at t 1 at t 2 µprocessor FPGA Accelerator temporally partitioned Memory

Hardware/Software Codesign Arch&Platf. - 37 Reconfigurable Systems (cont d) System on Chip with dynamically reconfigurable datapath C code On chip mem. CPU Reconfigurable datapath Profiling & Kernel extraction Kernels Hw/Sw partitioning Datapath synthesis C code Hardware/Software Codesign Arch&Platf. - 38 Summary Architecture selection is about making trade-offs along the dimensions of speed, cost, flexibility, and power consumption. ASIPs are programmable processors, specialised for a particular application or for a family of applications. Specialisation of an ASIP concerns instruction set, function units and data path, memory system, interconnect, and control. Two design tools are of great importance in order to perform processor specialisation: retargetable compiler and configurable simulator. Not only processors can be specialised but also platforms. A Platform is specialised to execute a certain family of applications. The particular hardware to be used for a given application is a specialised instantiation of the platform.

Hardware/Software Codesign Arch&Platf. - 39 Summary (cont d) Reuse is a key technique in order to achieve high design productivity. Cores to be reused can be from interfaces and decoders to filters and processors. The three types of cores differ in their flexibility, predictability, and the effort needed for integration: hard, firm, and soft cores. Reconfigurable systems can provide good flexibility and, at the same time, many of the advantages of classical hardware implementation. They are mainly used for software acceleration and prototyping.