SOC architecture and design

Similar documents

Architectures and Platforms

Computer System Design. System-on-Chip

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

What is a System on a Chip?

CMSC 611: Advanced Computer Architecture

Computer Architecture TDTS10

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Introduction to Cloud Computing

Computer Engineering: Incoming MS Student Orientation Requirements & Course Overview

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Introduction to System-on-Chip

7a. System-on-chip design and prototyping platforms

- Nishad Nerurkar. - Aniket Mhatre

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

Scalability and Classifications

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Design Cycle for Microprocessors

How To Understand The Design Of A Microprocessor

Chapter 2 Parallel Architecture, Software And Performance

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Introducción. Diseño de sistemas digitales.1

On-Chip Communications Network Report

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

LSN 2 Computer Processors

Chapter 2 Heterogeneous Multicore Architecture

AMD Opteron Quad-Core

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

Chapter 1 Computer System Overview

Optimizing Configuration and Application Mapping for MPSoC Architectures

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed and Cloud Computing

Driving force. What future software needs. Potential research topics

Binary search tree with SIMD bandwidth optimization using SSE

VLIW Processors. VLIW Processors

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

OpenSoC Fabric: On-Chip Network Generator

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Computer Systems Structure Input/Output

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

High Performance Computing in the Multi-core Area

Hardware/Software Co-Design of a Java Virtual Machine

Systolic Computing. Fundamentals

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GPGPU. Tiziano Diamanti

ANNEX. to the. Commission Delegated Regulation

CSE597a - Cell Phone OS Security. Cellphone Hardware. William Enck Prof. Patrick McDaniel

OC By Arsene Fansi T. POLIMI

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

ELEC 5260/6260/6266 Embedded Computing Systems

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

CISC, RISC, and DSP Microprocessors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Accelerate Cloud Computing with the Xilinx Zynq SoC

Computer Organization

System Design Issues in Embedded Processing

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Rethinking SIMD Vectorization for In-Memory Databases

Computer Organization and Components

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Software Programmable Data Allocation in Multi-Bank Memory of SIMD Processors

Improving System Scalability of OpenMP Applications Using Large Page Support

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

ARM Microprocessor and ARM-Based Microcontrollers

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Lecture 17: Virtual Memory II. Goals of virtual memory

Interconnection Networks

Intel Labs at ISSCC Copyright Intel Corporation 2012

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

ARM Webinar series. ARM Based SoC. Abey Thomas

Optimizing Code for Accelerators: The Long Road to High Performance

The ARM Architecture. With a focus on v7a and Cortex-A8

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

PCI Express Overview. And, by the way, they need to do it in less time.

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

A Lab Course on Computer Architecture

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Memory Architecture and Management in a NoC Platform

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Board Notes on Virtual Memory

Cortex -A15. Technical Reference Manual. Revision: r2p0. Copyright 2011 ARM. All rights reserved. ARM DDI 0438C (ID102211)

Switched Interconnect for System-on-a-Chip Designs

Architecture of Hitachi SR-8000

Transcription:

SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external memory interconnect: buses, network-on-chip impact: time, area, power, reliability, configurability customisability: specialized processors, reconfiguration productivity/tools: model, explore, re-use, synthesise, verify examples: crypto, graphics, media, network, comm, security future: autonomous SOC, self-optimising/verifying design our focus overview, processor, memory wl 2015 10.1

iphone SOC Processor I/O I/O 1 GHz ARM Cortex A8 Memory I/O Source: UC Berkeley wl 2015 10.2

Basic system-on-chip model wl 2015 10.3

2MB shared L3 Cache 512KB L2 512KB L2 512KB L2 512KB L2 AMD s Barcelona Multicore Processor Core 1 Core 2 4 out-of-order cores 1.9 GHz clock rate 65nm technology 3 levels of caches integrated Northbridge Northbridge Core 3 Core 4 http://www.techwarelabs.com/reviews/processors/barcelona/ wl 2015 10.4

SOC vs processors on chip with lots of transistors, designs move in 2 ways: complete system on a chip multi-core processors with lots of cache processor System on chip multiple, simple, heterogeneous Processors on chip few, complex, homogeneous cache one level, small 2-3 levels, extensive memory embedded, on chip very large, off chip functionality special purpose general purpose interconnect wide, high bandwidth often through cache power, cost both low both high operation largely stand-alone need other chips wl 2015 10.5

Processor types: overview Processor type Architecture / Implementation approach SIMD Vector VLIW Superscalar Single instruction applied to multiple functional units Single instruction applied to multiple pipelined registers Multiple instructions issued each cycle under compiler control Multiple instructions issued each cycle under hardware control wl 2015 10.6

Processors for SOCs SOC Basic ISA Processor description Freescale c600: signal processing PowerPC Superscalar with vector extension ClearSpeed CSX600: general Proprietary Array processor with 96 processing elements PlayStation 2: gaming ARM VFP11: general MIPS ARM Pipelined with 2 vector coprocessors Configurable vector coprocessor wl 2015 10.7

Sequential and parallel machines basic single stream processors pipelined: overlap operations in basic sequential superscalar: transparent concurrency VLIW: compiler-generated concurrency multiple streams, multiple functional units array processors vector processors multiprocessors wl 2015 10.8

Pipelined processor Instruction #1 IF ID AG DF EX WB Instruction #2 IF ID AG DF EX WB Instruction #3 IF ID AG DF EX WB Instruction #4 Time IF ID AG DF EX WB wl 2015 10.9

Superscalar and VLIW processors Instruction #1 IF ID AG DF EX WB Instruction #2 IF ID AG DF EX WB Instruction #3 IF ID AG DF EX WB Instruction #4 IF ID AG DF EX WB Instruction #5 IF ID AG DF EX WB Instruction #6 IF ID AG DF EX WB Time wl 2015 10.10

Superscalar VLIW hardware for parallelism control wl 2015 10.11

Array processors perform op if condition = mask operand can come from neighbour mask op dest sr1 sr2 n PEs, each with memory; neighbour communications one instruction issued to all PEs wl 2015 10.12

Vector processors vector registers, eg 8 sets x 64 elements x 64 bits vector instructions: VR3 = VR2 VOP VR1 wl 2015 10.13

Memory addressing: three levels (each segment contains pages for a program/process) wl 2015 10.14

User view of memory: addressing a program: process address (offset + base + index) virtual address: from page address and process/user id segment table: process base and bound (for each process) system address: process base + page address pages: active localities in main/real memory virtual address: page table lookup to physical address page miss: virtual pages not in page table TLB (translation look-aside buffer): recent translations TLB entry: corresponding real and (virtual, id) address a few hashed virtual address bits address TLB entries if virtual, id = TLB (virtual, id) then use translation wl 2015 10.15

TLB and Paging: Address translation Virtual Address (recent translations) (find process) process base System Address (find page) Physical Address wl 2015 10.16

SOC interconnect interconnecting multiple active agents requires bandwidth: capacity to transmit information (bps) protocol: logic for non-interfering message transmission bus AMBA (Adv. Microcontroller Bus Architecture) from ARM, widely used for SOC bus performance: can determine system performance network on chip array of switches statically switched: eg mesh dynamically switched: eg crossbar wl 2015 10.17

Design cost: product economics increasingly product cost determined by design costs, including verification not marginal cost to produce manage complexity in die technology by engineering effort engineering cleverness design effort often dictated by product volume Design time and effort Basic physical tradeoffs Balance point depends on n, number of units wl 2015 10.18

Design complexity processors wl 2015 10.19

Cost: product program vs engineering Chip design Fixed costs Variable costs Verify & test Labor costs Marketing, sales, administration Manufacturing costs Software CAD support Engineering Engineering costs Mask costs Product cost CAD programs Capital equipment Fixed project costs wl 2015 10.20

Example: two scenarios fixed costs K f, support costs 0.1 x function(n), and variable costs K v x n, so design gets more complex, while production costs decrease K f increases while K v decreases if same price, requires higher volumes to break even when compared with 1995, in 2015 K f increased by 10 times K v decreased by the same amount wl 2015 10.21

More recent: higher NRE 2015 1995 wl 2015 10.22

IP: Intellectual Property wl 2015 10.23

Answers to Unassessed Coursework 5 1. rdl 1 R = snd [-] -1 ; R rdl n+1 R = snd apr n -1 ; rsh ; fst (rdl n R) ; R 2. P0 = rdl n Pcell; 1 <<s,x>, a> Pcell <sx+a, x> 3. rdl n R = row n (R i ; 2-1 ) ; 2 P1 = loop (row n Pcell1 ; fst map n D) ; 1 <<s,x>, a> Pcell1 <a,<sx+a, x>> 4. loop (row n R) = (loop R) n Proof: induction on n (see www.doc.ic.ac.uk/~wl/papers/scp90.pdf) P1 = P2 ; [D,D] -n P2 = (loop (Pcell1 ; [D,[D,D]])) n wl 2015 10.24