PULPino: A small single-core RISC-V SoC



Similar documents
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Software based Finite State Machine (FSM) with general purpose processors

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

Designing a System-on-Chip (SoC) with an ARM Cortex -M Processor

The new 32-bit MSP432 MCU platform from Texas

High Performance or Cycle Accuracy?

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Chapter 13. PIC Family Microcontroller

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

Software-Programmable FPGA IoT Platform. Kam Chuen Mak (Lattice Semiconductor) Andrew Canis (LegUp Computing) July 13, 2016

What is a System on a Chip?

Migrating Application Code from ARM Cortex-M4 to Cortex-M7 Processors

7a. System-on-chip design and prototyping platforms

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

USB 3.0 Connectivity using the Cypress EZ-USB FX3 Controller

Pre-tested System-on-Chip Design. Accelerates PLD Development

Going Linux on Massive Multicore

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

STM32 F-2 series High-performance Cortex-M3 MCUs

Am186ER/Am188ER AMD Continues 16-bit Innovation

The ARM Cortex-A9 Processors

OpenSPARC T1 Processor

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

ARM Ltd 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.

Architectures and Platforms

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Developing Embedded Applications with ARM Cortex TM -M1 Processors in Actel IGLOO and Fusion FPGAs. White Paper

UG103.8 APPLICATION DEVELOPMENT FUNDAMENTALS: TOOLS

Qsys and IP Core Integration

ZigBee Technology Overview

System on Chip Platform Based on OpenCores for Telecommunication Applications

1. Computer System Structure and Components

The ARM Architecture. With a focus on v7a and Cortex-A8

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

The Advanced JTAG Bridge. Nathan Yawn 05/12/09

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

ARM Webinar series. ARM Based SoC. Abey Thomas

System Design Issues in Embedded Processing

Wireless Microcontrollers for Environment Management, Asset Tracking and Consumer. October 2009

AMD Opteron Quad-Core

Computer and Set of Robots

ARM Microprocessor and ARM-Based Microcontrollers

SmartFusion csoc: Basic Bootloader and Field Upgrade envm Through IAP Interface

Embedded System Hardware - Processing (Part II)

PROBLEMS #20,R0,R1 #$3A,R2,R4

Thingsquare Technology

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Overview of the Cortex-M3

Using a Generic Plug and Play Performance Monitor for SoC Verification

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

SOC architecture and design

AN4664 Application note

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Implementation Details

Building Blocks for PRU Development

Hardware accelerated Virtualization in the ARM Cortex Processors

ECE 3803: Microprocessor System Design D Term 2011 Course Syllabus Department of Electrical and Computer Engineering Worcester Polytechnic Institute

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Concept Engineering Adds JavaScript-based Web Capabilities to Nlview at DAC 2016

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Five Families of ARM Processor IP

AppliedMicro Trusted Management Module

Computer Organization and Components

Embedded Development Tools

CSE2102 Digital Design II - Topics CSE Digital Design II

The functions of system LSI become more and more complicated

CoreSight SoC enabling efficient design of custom debug and trace subsystems for complex SoCs

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Atmel Norway XMEGA Introduction

21152 PCI-to-PCI Bridge

OpenSoC Fabric: On-Chip Network Generator

World-wide University Program

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

Keep your tentacles off my bus: Introducing Die Datenkrake. REcon 2013, Montréal Dmitry Nedospasov, Thorsten Schröder

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Attention. restricted to Avnet s X-Fest program and Avnet employees. Any use

Avoiding pitfalls in PROFINET RT and IRT Node Implementation

Multiprocessor System-on-Chip

Architectures, Processors, and Devices

SPARC64 VIIIfx: CPU for the K computer

Microprocessor and Microcontroller Architecture

The Central Processing Unit:

SBC6245 Single Board Computer

ICRI-CI Retreat Architecture track

Design of Digital Circuits (SS16)

How To Use Nuc123 (Nuc123) For A Week

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 1 Computer System Overview

MPC603/MPC604 Evaluation System

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

USER GUIDE EDBG. Description

Bootloader with AES Encryption

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

CB-OLP425 DEVELOPMENT KIT GETTING STARTED

An Introduction to the ARM 7 Architecture

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

CPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14,

Transcription:

PULPino: A small single-core RISC-V SoC Andreas Traber, Florian Zaruba, Sven Stucki, Antonio Pullini, Germain Haugou, Eric Flamand, Frank K. Gürkaynak, Luca Benini

Our group: Prof. Luca Benini ETH Zurich, Integrated Systems Lab (IIS) University of Bologna, Micrel Lab Around 40 people Most involved in the PULP project Close Collaborations Politecnico di Milano CEA-LETI EPFL Industrial Support from STMicroelectronics Silicon access to 28nm FDSOI 06.01.2016 2

Computing for the Internet of Things Sense Analyze and Classify Transmit MEMS IMU µcontroller Short range, medium BW MEMS Microphone ULP Imager e.g. CortexM IOs Low rate (periodic) data EMG/ECG/EIT 1 25 MOPS 1 10 mw SW update, commands Long range, low BW 100 µw 2 mw Battery + Harvesting powered a few mw power envelope Idle: ~1µW 06.01.2016 3 Active: ~ 50mW

Computing for the Internet of Things Sense Analyze and Classify Transmit MEMS IMU MEMS Microphone ULP Imager µcontroller L2 Memory e.g. CortexM IOs Short range, medium BW Low rate (periodic) data EMG/ECG/EIT 11 2000 25 MOPS 1 10 mw SW update, commands Long range, low BW 100 µw 2 mw Battery + Harvesting powered a few mw power envelope Idle: ~1µW 06.01.2016 4 Active: ~ 50mW

Cluster Bus PULP: Parallel Ultra-Low-Power Processor Exploit parallelism Multiple small cores organized in clusters Share memory within the cluster Multiple clusters per chip Simple but efficient processor cores Based on OpenRISC/RISC-V Custom ISA extensions Dedicated accelerators Multiple Technologies Near-threshold operation L2 RAM Cluster IF DMA HW Acc. Bus Adapter SCM SRAM TCDM Logarithmic Interconnect Core #1 SCM SRAM SCM SRAM Core #2 SCM SRAM Shared Instruction Cache Instruction Bus HW Synch Core #N L0 L0 L0 SCM SRAM 06.01.2016 5

PULP related chips Main PULP chips (ST 28nm FDSOI) PULPv1 PULPv2 PULPv3 (in production) PULPv4 (in progress) PULP development (UMC 65nm) Artemis - IEEE 754 FPU Hecate - Shared FPU Selene - Logarithic Number System FPU Diana - Approximate FPU Mia Wallace full system Imperio - PULPino chip (Jan 2016) Fulmine Secure cluster (Jan 2016) RISC-V based systems (GF 28nm) Honey Bunny Early building blocks (UMC180) Sir10us Or10n Mixed-signal systems (SMIC 130nm) VivoSoC EdgeSoC (in planning) IcySoC chips approx. computing platforms (ALP 180nm) Diego Manny Sid 130nm 180nm 180nm 180nm 180nm 180nm 28nm 28nm 28nm 65nm 28nm 65nm 65nm 65nm 65nm 06.01.2016 6

PULPino: A single-core RISC-V SoC Motivation We want to publish our work as open-source But PULP is huge Energy efficient many-core SoC Large number of IPs Software frameworks (OpenMP, OpenVX, ) Custom toolset (Virtual Platform, profiling tools, toolchain, ) PULPino as a first step Focused on simplicity Minimal system Single-core 06.01.2016 7

PULPino: A single-core RISC-V SoC Overview Microcontroller-style platform Focused on core No caches, no memory hierarchy, no DMA Re-uses IPs from the PULP project Same peripherals, same cores Available for FPGA First ASIC tapeout end of January Student project using UMC 65nm 06.01.2016 8

Core directly connected to instruction and data RAM Single-cycle access, no waiting for memories Single-port RAMs Multi-banked RAMs for ASIC Block RAMs for FPGA 06.01.2016 9

Central AXI interconnect Core connects via bridge RAM multiplexed between AXI port and core port 06.01.2016 10

APB for peripherals Fine grained clock gating for peripherals Easy to add new peripherals 06.01.2016 11

Internal event unit Clock gates core when inactive (sleep) Waits for events and wakes up core 06.01.2016 12

SPI slave acts as master on AXI Provides access to whole memory map of the SoC from external SPI slave supports standard SPI and QPI 06.01.2016 13

Debug support via adv. debug unit Provides JTAG port Access to whole memory map via JTAG 06.01.2016 14

Boot ROM Bootloader loads program from SPI flash 06.01.2016 15

PULPino on FPGA ZedBoard Program PULPino via SPI or JTAG Interface fully compatible with final ASIC 06.01.2016 16

PULPino Build & Run Flow Managed via CMake 06.01.2016 17

RISC-V & OpenRISC Evaluating RISC-V in comparisons to OpenRISC Our interest in RISC-V More modern than OpenRISC No set flag instructions No delay slot Compressed instructions Easily extendable 06.01.2016 18

Extending RISC-V Our goals Energy-efficient signal processing Tune the core micro-architecture and ISA towards this Extensions should have low overhead in area & power Adding non-standard extensions for Hardware loops Post-incrementing load and store instructions Multiply-Accumulate ALU extensions (min, max, absolute value, ) 06.01.2016 19

RI5CY Extension: Hardware Loops Simple vector addition for (i = 0; i < 100; i++) { d[i] = a[i] + 1; } Baseline mv x4, 0 mv x5, 100 Lstart: lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11) addi x11, x11, 4 addi x4, x4, 1 bne x4, x5, Lstart with hardware loops lp.setupi 100, Lend lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11) Lend: addi x11, x11, 4 7 instr. + branch cost 5 instr. 06.01.2016 20

RI5CY Extension: Post-incrementing LD/ST Simple vector addition for (i = 0; i < 100; i++) { d[i] = a[i] + 1; } with hardware loops with hardware loops & post-incr. lp.setupi 100, Lend lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11) Lend: addi x11, x11, 4 5 instr. lp.setupi 100, Lend lw x2, 4(x10!) addi x2, x3, 1 Lend: sw x2, 4(x11!) 3 instr. 4 instructions + branch cost saved 06.01.2016 21

RI5CY Core ISA Support Full support for RV32I Partial support for RV32M Basically just the mul instruction Single-cycle multiplication Support for compressed instructions Support for our own custom instruction set extensions 06.01.2016 22

RI5CY Core Features Interrupts Vectorized on 32 lines Events Allows the core to sleep and wait for an event Exceptions Debug Software breakpoints Access to all registers Performance Counters 06.01.2016 23

RI5CY Core Simple 32-bit 4-stage in-order RISC-V core 06.01.2016 24

RI5CY Interfaces Data Interface Supports variable-latency systems Request-Grant protocol Instruction Interface Identical to Data Interface, but read-only Debug Interface Aligned with Adv. Debug Bridge 06.01.2016 25

RI5CY Performance Compiler used ARM: Official GCC ARM Embedded Toolchain (GCC 4.9) OR1k: Custom GCC Toolchain (GCC 5.2) RISC-V: Custom GCC Toolchain (GCC 5.2) 06.01.2016 26

RI5CY Performance Compiler used ARM: Official GCC ARM Embedded Toolchain (GCC 4.9) OR1k: Custom GCC Toolchain (GCC 5.2) RISC-V: Custom GCC Toolchain (GCC 5.2) 06.01.2016 27

RI5CY in Comparison RI5CY ARM Cortex M4 2 Technology 65 nm 90nm 65 nm Conditions 25 C, 1.2V 25 C, 1.2V 25 C, 1.2V Dynamic Power [μw/mhz] 17.5 32.82 23.2 1 Area [mm 2 ] 0.050 0.119 0.062 1 Critical paths in RI5CY Delay Delay [#ND2 Equiv.] Internal 1.15 ns 29 Memory Request 0.6 ns + mem. delay 15 + mem. delay Memory Response mem. delay + 0.4 ns mem. delay + 10 1 scaled from 90nm to 65nm 2 ARM Cortex M4, http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php 06.01.2016 28

Area Breakdown PULPino Component Area [GE] RI5CY Core 34 500 6.9% Peripherals 50 000 10% Instruction RAM (32kB) 190 000 38% Data RAM (32kB) 190 000 38% AXI Interconnect 10 000 2% Adv. Dbg Unit 6 000 1.2% Total 500 000 100% RI5CY Component Area [GE] Prefetch Buffer 3 500 10.1% Decoder 420 1.2% Compressed Decoder 400 1.1% Register File 10 600 30.7% Multiplier 4 300 12.5% ALU 2 700 7.8% LSU 1 500 4.3% CSR 2 160 6.3% Total 34 500 100% 06.01.2016 29

Open Source Release What we release Complete PULPino RTL source Including RI5CY core Including all IPs FPGA build flow Simulation/build infrastructure for software FreeRTOS port Liberal license: SolderPad When? Awaiting final approval 06.01.2016 30

Summary PULP: Energy-efficient many-core SoC PULPino: Single-core SoC A complete system that can be easily employed in various projects Easily extendable Ready to use system for educational purposes RI5CY: Small and optimized RISC-V core Check our website: pulp.ethz.ch 06.01.2016 31

Outlook First tape out of PULPino end of this month IP-XACT port of PULPino RI5CY features Floating-point support Already available for our OpenRISC core Branch prediction Evaluate further non-standard ISA extensions 06.01.2016 32

Questions? http://pulp.ethz.ch