Building Blocks for PRU Development



Similar documents
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Chapter 13. PIC Family Microcontroller

Microtronics technologies Mobile:

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Am186ER/Am188ER AMD Continues 16-bit Innovation

System Design Issues in Embedded Processing

Computer Organization and Components

DS1104 R&D Controller Board

SBC8600B Single Board Computer

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

STM32 F-2 series High-performance Cortex-M3 MCUs

Lab Experiment 1: The LPC 2148 Education Board

Embedded Systems on ARM Cortex-M3 (4weeks/45hrs)

A DIY Hardware Packet Sniffer

Serial Communications

Develop a Dallas 1-Wire Master Using the Z8F1680 Series of MCUs

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

System Considerations

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Hello, and welcome to this presentation of the STM32L4 reset and clock controller.

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

CSE2102 Digital Design II - Topics CSE Digital Design II

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

SPI I2C LIN Ethernet. u Today: Wired embedded networks. u Next lecture: CAN bus u Then: wireless embedded network

Using a Generic Plug and Play Performance Monitor for SoC Verification

8051 MICROCONTROLLER COURSE

Freescale Semiconductor, Inc. Product Brief Integrated Portable System Processor DragonBall ΤΜ

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC Microprocessor & Microcontroller Year/Sem : II/IV

Computer Systems Structure Input/Output

AN LPC1700 timer triggered memory to GPIO data transfer. Document information. LPC1700, GPIO, DMA, Timer0, Sleep Mode

8-Bit Flash Microcontroller for Smart Cards. AT89SCXXXXA Summary. Features. Description. Complete datasheet available under NDA

Microcontrollers in Practice

Computer Architecture

Serial port interface for microcontroller embedded into integrated power meter

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

Network connectivity controllers

ARM Cortex STM series

ARM Thumb Microcontrollers. Application Note. Software ISO 7816 I/O Line Implementation. Features. Introduction

The I2C Bus. NXP Semiconductors: UM10204 I2C-bus specification and user manual HAW - Arduino 1

AND8336. Design Examples of On Board Dual Supply Voltage Logic Translators. Prepared by: Jim Lepkowski ON Semiconductor.

7a. System-on-chip design and prototyping platforms

AXI Performance Monitor v5.0

Hybrid Platform Application in Software Debug

Pre-tested System-on-Chip Design. Accelerates PLD Development

OpenSPARC T1 Processor

How to design and implement firmware for embedded systems

SoC-Based Microcontroller Bus Design In High Bandwidth Embedded Applications

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Reconfigurable System-on-Chip Design

Computer Organization & Architecture Lecture #19

Software based Finite State Machine (FSM) with general purpose processors

Allows the user to protect against inadvertent write operations. Device select and address bytes are Acknowledged Data Bytes are not Acknowledged

System-on-a-Chip with Security Modules for Network Home Electric Appliances

Computer-System Architecture

MicroBlaze Debug Module (MDM) v3.2

PAC52XX Clock Control Firmware Design

High Performance or Cycle Accuracy?

TURBO PROGRAMMER USB, MMC, SIM DEVELOPMENT KIT

Implementing SPI Master and Slave Functionality Using the Z8 Encore! F083A

Data Cables. Schmitt TTL LABORATORY ELECTRONICS II

Central Processing Unit (CPU)

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

What is a System on a Chip?

Architectures and Platforms

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

Microcontroller Based Low Cost Portable PC Mouse and Keyboard Tester

Switch Fabric Implementation Using Shared Memory

Software engineering for real-time systems

ZigBee Technology Overview

MICROPROCESSOR AND MICROCOMPUTER BASICS

Memory unit. 2 k words. n bits per word

1. Memory technology & Hierarchy

Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada

Application Note 132. Introduction. Voice Video and Data Communications using a 2-Port Switch and Generic Bus Interface KSZ MQL/MVL

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

Building A RISC Microcontroller in an FPGA

CB-OLP425 DEVELOPMENT KIT GETTING STARTED

2.0 Command and Data Handling Subsystem

Interfacing Of PIC 18F252 Microcontroller with Real Time Clock via I2C Protocol

System Performance Analysis of an All Programmable SoC

Motor Control using NXP s LPC2900

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Automating with STEP7 in LAD and FBD

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

USB 3.0 Connectivity using the Cypress EZ-USB FX3 Controller

Serial Communications

Architectures, Processors, and Devices

Location-Aware and Safer Cards: Enhancing RFID Security and Privacy

MicroMag3 3-Axis Magnetic Sensor Module

Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller

Raspberry Pi. Hans- Petter Halvorsen, M.Sc.

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

ELEC 5260/6260/6266 Embedded Computing Systems

M85 OpenCPU Solution Presentation

Chapter 11 I/O Management and Disk Scheduling

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Transcription:

Building Blocks for PRU Development Module 1 PRU Hardware Overview This session covers a hardware overview of the PRU-ICSS Subsystem. Author: Texas Instruments, Sitara ARM Processors Oct 2014

2

ARM SoC Architecture ARM Subsystem Cortex-A L1 Instruction Cache L1 Data Cache L1 D/I caches: Single cycle access Shared Memory L2 Data Cache On-chip SRAM L3 Interconnect Peripherals L2 cache: Min latency of 8 cycles Access to on-chip SRAM: 20 cycles Access to shared memory over L3 Interconnect: 40 cycles L4 Interconnect Peripherals GP I/O 3

ARM + PRU SoC Architecture ARM Subsystem L1 Instruction Cache Cortex-A L1 Data Cache L2 Data Cache Programmable Real-Time Unit (PRU) Subsystem Shared RAM PRU0 (200MHz) Inst. RAM Data RAM Interconnect PRU1 (200MHz) Inst. RAM Data RAM PRU0 I/O PRU1 I/O On-chip SRAM INTC Peripherals L3 Interconnect Shared Memory Peripherals L4 Interconnect Access Times: Instruction RAM = 1 cycle DRAM = 3 cycles Shared DRAM = 3 cycles Peripherals GP I/O 4

Programmable Real-Time Unit (PRU) Subsystem Programmable Real-Time Unit (PRU) is a low-latency microcontroller subsystem Two independent PRU execution units 32-Bit RISC architecture 200MHz 5ns per instruction Single cycle execution - No pipeline Dedicated instruction and data RAM per core Shared RAM Includes Interrupt Controller for system event handling Fast I/O interface Up to 30 inputs and 32 outputs on external pins per PRU unit 32 GPO 30 GPI 32 GPO 30 GPI Events to ARM INTC Events from Peripherals + PRUs Industrial Ethernet Industrial Ethernet PRU Subsystem Block Diagram MII0 RX/TX PRU0 Core (IRAM0) Scratchpad PRU1 Core (IRAM1) MII1 RX/TX MDIO UART Interrupt Controller (INTC) 32-bit Interconnect bus Data RAM0 Data RAM1 Shared RAM Master I/F (to SoC interconnect) Slave I/F (from SoC interconnect) IEP (Timer) ecap MPY/MAC 5

Features & Benefits Feature Each PRU has dedicated instruction and data memory and can operate independently or in coordination with the ARM or the other PRU core Access all SoC resources (peripherals, memory, etc.) Interrupt controller for monitoring and generating system events Dedicated, fast input and output pins Small, deterministic instruction set with multiple bit-manipulation instructions Benefit Use each PRU for a different task; use PRUs in tandem for more advanced tasks Direct access to buffer data; leverage system peripherals for various implementations Communication with higher level software running on ARM; detection of peripheral events Input/output interface implementation; detect and react to I/O event within two PRU cycles Easy to use; fast learning curve 6

Now let s go a little deeper 7

PRU Functional Block Diagram General Purpose Registers All instructions are performed on registers and complete in a single cycle Register file appears as linear block for all register to memory operations Special Registers (R30 and R31) R30 Write: 32 GPO R31 Read: 30 GPI + 2 Host Int status Write: Generate INTC Event 32 GPO 30 GPI INTC PRU Execution unit CONST R0 TABLE R1 R2 EXECUTION UNIT R29 R30 R31 Instruction RAM Constant Table Ease SW development by providing freq used constants Peripheral base addresses Few entries programmable Execution Unit Logical, arithmetic, and flow control instructions Scalar, no Pipeline, Little Endian Register-to-register data flow Addressing modes: Ld Immediate & Ld/St to Mem Instruction RAM Typical size is a multiple of 4KB (or 1K Instructions) Can be updated with PRU reset 8

Fast I/O Interface Cortex A8 L3F L3S L4 PER Peripherals GPIO1 GPIO2 GPIO3... GPIO 3.19 Pinmux Device pin 9

Fast I/O Interface Reduced latency through direct access to pins Read or toggle I/O within a single PRU cycle Detect and react to I/O event within two PRU cycles Independent general purpose inputs (GPIs) and general purpose outputs (GPOs) PRU R31 directly reads from up to 30 GPI pins PRU R30 directly writes up to 32 PRU GPOs Configurable I/O modes per PRU core GP input modes Direct connect 16-bit parallel capture 28-bit shift GP output modes Direct connect Shift out Cortex A8 L3F L3S L4 PER Peripherals GPIO1 GPIO2 GPIO3... GPIO 3.19 PRU Subsystem PRU output 5 Pinmux Device pin 10

GPIO Toggle: Bench measurements ARM GPIO Toggle PRU IO Toggle: ~200ns ~5ns = ~40x Faster 11

Integrated Peripherals Provide reduced PRU read/write access latency compared to external peripherals Local peripherals don t need to go through external L3 or L4 interconnects Can be used by PRU or by the ARM as additional hardware peripherals on the device Integrated peripherals: PRU UART PRU ecap PRU IEP (Timer) Programmable Real-Time Unit (PRU) Subsystem PRU0 PRU1 (200MHz) (200MHz) Shared RAM Inst. RAM Data RAM Inst. RAM Data RAM Interconnect INTC UART ecap IEP (Timer) 12

PRU Interrupts The PRU does not support asynchronous interrupts. However, specialized h/w and instructions facilitate efficient polling of system events. The PRU-ICSS can also generate interrupts for the ARM, other PRU-ICSS, and sync events for EDMA. From UofT CSC469 lecture notes, Polling is like picking up your phone every few seconds to see if you have a call. Interrupts are like waiting for the phone to ring. Interrupts win if processor has other work to do and event response time is not critical Polling can be better if processor has to respond to an event ASAP Asynchronous interrupts can introduce jitter in execution time and generally reduce determinism. The PRU is optimized for highly deterministic operation. 13

PRU Memory Map PRU local memory map PRU global memory map SoC memory map 14

PRU Read Latencies: Local vs Global Memory Map The PRU directly accessing internal MMRs (Local MMR Access) is faster than going through the L3 interconnects (Global MMR Access) Local MMR Access Global MMR Access ( PRU cycles @ 200MHz ) ( PRU cycles @ 200MHz ) PRU R31 (GPI) 1 N/A PRU CTRL 4 36 PRU CFG 3 35 PRU INTC 3 35 PRU DRAM 3 35 PRU Shared DRAM 3 35 PRU ECAP 4 36 PRU UART 14 46 PRU IEP 12 44 Note: Latency values listed are best-case values. 15

PRU Memory Access FAQ Q: Why does my PRU firmware hang when reading or writing to an address external to the PRU Subsystem? A: The OCP master port is in standby and needs to be enabled in the PRU-ICSS CFG register space, SYSCFG[STANDBY_INIT]. 16

Sitara Device Comparison Features AM18x AM335x AM437x PRUSS PRU-ICSS1 PRU-ICSS1 PRU-ICSS0 Number of PRU cores 2 2 2 2 Max Frequency CPU freq / 2 200 MHz 200 MHz 200 MHz IRAM size (per PRU core) 4 KB 8 KB 12 KB 4 KB DRAM size (per PRU core) 512 B 8 KB 8 KB 4 KB Shared DRAM size 0 KB 12 KB 32 KB 0 KB General Purpose Input (per PRU core) General Purpose Output (per PRU core) GPI Pins (PRU0, PRU1) GPO Pins (PRU0, PRU1) Direct Direct; or 16-bit parallel capture; or 28-bit shift Direct; or 16-bit parallel capture; or 28-bit shift Direct; or 16-bit parallel capture; or 28-bit shift Direct Direct; or Shift out Direct; or Shift out Direct; or Shift out 30, 30 17, 17 13, 0 20, 20 32, 32 16, 16 12, 0 20, 20 MPY/MAC N Y Y Y Scratchpad N Y (3 banks) Y (3 banks) N INTC 1 1 1 1 Peripherals n/a Y Y Y UART 0 1 1 1 ecap 0 1 1 not pinned out IEP 0 1 1 not pinned out MII_RT 0 2 2 not pinned out MDIO 0 1 1 not pinned out 17

Examples of how people have used the PRU 18

Use Cases Examples Industrial Protocols ASRC 10/100 Switch Smart Card DSP-like functions Filtering FSK Modulation LCD I/F Camera I/F RS-485 UART SPI Monitor Sensors I2C Bit banging Custom/Complex PWM Stepper motor control Not all use cases are feasible on PRU - Development complexity - Technical constraints (i.e. running Linux on PRU) Development Complexity 19

Replicape 3D Printer Replicate 3D Printer uses AM335x on BeagleBone Cortex-A8 runs Linux, networking, HMI, model processing Host apps written in Python PRU controls step and direction of 5 stepper motors App written in PRU assembly A8 calculates data, PRU communicates with motors Shared region of DDR reserved for A8/PRU communication Data consist of pin/delay timing tuples (8 bytes each) Sequence: 1. GPIO pins are set one or more of the 32-bit GPIO banks set with a predefined mask 2. Delay is applied (# of 200MHz instructions) 3. After sequence completes, PRU sends a signal to the host indicating that the segment is finished 4. Host updates its memory usage for the PRU More info @ hipstercircuits.com 20

Thank you! For more information about the PRU, visit: Presentation Home www.ti.com/sitarabootcamp PRU-ICSS Wiki http://processors.wiki.ti.com/index.php/pru-icss PRU Evaluation Hardware http://www.ti.com/tool/prucape Support http://e2e.ti.com 21

Backup Slides 22

PRU Event/Status Register (R31) Writes: Generate output events to the INTC. Write the event number (0 through 15) to PRU_R31_VEC[3:0] (R31 bits 3:0) and simultaneously set PRU_R31_VEC_VALID (R31 bit 5) to create a pulse to INTC. Outputs from both PRUs are ORed together to form single output. Output events 0 through 15 are connected to system events 16 through 31 on INTC. Reads: Return Host 1 & 0 interrupt status from INTC and general purpose input pin status. PRU<n> R30 GPO Content i PR1_PRU<n>_PRU_R30[ i:0 ] R31(R) INTC status (bit 31) INTC status (bit 30) GPI Content (bits 29:0) j PR1_PRU<n>_PRU_R31[ j:0 ] R31(W) INTC Interrupt Generation 23

PRU-ICSS Enhanced GPIO Signals GPI Signals Function Signal Name PRU Reg Mapping Direct Input Mode Data input PRU<n>_GPI pru<n>_r31 [29:0] Parallel Capture Mode Data input PRU<n>_DATAIN pru<n>_r31 [15:0] Clock PRU<n>_CLOCK pru<n>_r31 [16] Shift In Mode Function Signal Name PRU Reg Mapping Direct Output Mode Data output PRU<n>_GPO pru<n>_r30 [31:0] Shift Out Mode Data output PRU<n>_DATAOUT pru<n>_r30 [0] Clock PRU<n>_CLOCK pru<n>_r30 [1] Load gpo_sh0 GPO Signals PRU<n>_LOAD_GPO _SH0 pru<n>_r30 [29] Data input PRU<n>_DATAIN pru<n>_r31 [0] Shift counter PRU<n>_CNT_16 pru<n>_r31 [28] Start bit detection PRU<n>_GPI_SB pru<n>_r31 [29] Load gpo_sh1 Enable shift PRU<n>_LOAD_GPO _SH1 pru<n>_r30 [30] PRU<n>_ENABLE_S HIFT pru<n>_r30 [31] 24

Direct Input / Output Modes Direct Input PRU<n> R31[16:0] feed directly into the PRU Direct Output PRU<n> R30[15:0] feed directly out of the PRU 25 25

Shift In Mode PRU<n> R31[0] is sampled and shifted into a 28-bit shift register. Shift Counter (Cnt_16) feature uses pru<n>_r31_status [28] Start Bit detection (SB) feature uses pru<n>_r31_status [29] Shift rate controlled by effective divisor of two cascaded dividers applied to the 200MHz clock. Each cascaded dividers is configurable through the PRU-ICSS CFG to a value of {1,1.5,, 16}. 26 26

Shift Output Mode PRU<n> R30[0] is shifted out on every rising edge of the internal PRU<n>_CLOCK (pru<n>r30 [1]). Shift rate is controlled by the effective divisor of two cascaded dividers applied to the 200MHz clock. See Shift Input Mode. 27 27

Parallel Capture Mode PRU<n>_R31 [15:0] is captured by posedge or negedge of PRU<n>_CLOCK (pru<n>_r31_status [16]). 28 28