BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA



Similar documents
Architecture and Implementation of the ARM Cortex -A8 Microprocessor

Building Blocks for PRU Development

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM


ARM Microprocessor and ARM-Based Microcontrollers

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Which ARM Cortex Core Is Right for Your Application: A, R or M?

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

SBC8600B Single Board Computer

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Industry First X86-based Single Board Computer JaguarBoard Released

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

SABRE Lite Development Kit

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Architectures, Processors, and Devices

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Introduction to the Latest Tensilica Baseband Solutions

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

VLIW Processors. VLIW Processors

The ARM Architecture. With a focus on v7a and Cortex-A8

Next Generation GPU Architecture Code-named Fermi

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Computer Architecture TDTS10

4/2/2014 Linux Dev-Boards. Linux Dev Boards. Tagung Forth Gesellschaft e.v. Maerz file:///home/cas/talk/linux-boards/html/linux-boards.

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

Cortex -A15. Technical Reference Manual. Revision: r2p0. Copyright 2011 ARM. All rights reserved. ARM DDI 0438C (ID102211)

Atmel SMART ARM Core-based Embedded Microprocessors

System Design Issues in Embedded Processing

High-Performance, Highly Secure Networking for Industrial and IoT Applications

Thread level parallelism

Application Performance Analysis of the Cortex-A9 MPCore

The ARM Cortex-A9 Processors

Pipelining Review and Its Limitations

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

Five Families of ARM Processor IP

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Introduction to Cloud Computing

OC By Arsene Fansi T. POLIMI

Lessons from the ARM Architecture. Richard Grisenthwaite Lead Architect and Fellow ARM

SBC8100 Single Board Computer

SPARC64 VIIIfx: CPU for the K computer

Reconfigurable System-on-Chip Design

NVIDIA GeForce GTX 580 GPU Datasheet

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Introduction to Microprocessors

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Network connectivity controllers

Performance of Host Identity Protocol on Nokia Internet Tablet

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Embedded System Hardware - Processing (Part II)

Video/Cameras, High Bandwidth Data Handling on imx6 Cortex-A9 Single Board Computer

Embedded Linux development with Buildroot training 3-day session

Intel Pentium 4 Processor on 90nm Technology

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Migrating Application Code from ARM Cortex-M4 to Cortex-M7 Processors

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Designing Feature-Rich User Interfaces for Home and Industrial Controllers

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Innovative development and deployment of Intuitive Human Machine Interface for embedded applications

element14 BeagleBone Black

GPU Architecture. Michael Doggett ATI

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A case study of mobile SoC architecture design based on transaction-level modeling

Cisco MCS 7816-I3 Unified Communications Manager Appliance

IA-64 Application Developer s Architecture Guide

social networking and use of media and portable device to manage digital operating system and offer a series of applications for web browsing,

Computer Graphics Hardware An Overview

Embedded Display Module EDM6070

STM32 F-2 series High-performance Cortex-M3 MCUs

Product Specifications. Digital Signage Player DSA2LS. Feature Highlights. ARM based Digital Signage Player.

"JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Product Specifications. Digital Signage Player DSA2LS. Feature Highlights. ARM based Digital Signage Player.

Chapter 13. PIC Family Microcontroller

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

picojava TM : A Hardware Implementation of the Java Virtual Machine

Product Brief. 2.0 microtoled. Intelligent GOLDELOX Display Module. µtoled-20-g2. Rev 1.0

OpenSPARC T1 Processor

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Qt on Raspberry Pi. Jeff Tranter Integrated Computer Solutions (ICS) Qt Developer Days

Details of a New Cortex Processor Revealed. Cortex-A9. ARM Developers Conference October 2007

Week 1 out-of-class notes, discussions and sample problems

PROBLEMS #20,R0,R1 #$3A,R2,R4

Radeon HD 2900 and Geometry Generation. Michael Doggett

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

CHAPTER 7: The CPU and Memory

Applications Development on the ARM Cortex -M0+ Free On-line Development Tools Presented by William Antunes

Am186ER/Am188ER AMD Continues 16-bit Innovation

Embedded Development Tools

7a. System-on-chip design and prototyping platforms

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

IMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

FPGA-Accelerated Heterogeneous Hyperscale Server Architecture for Next-Generation Compute Clusters

LAPTOP PROCESSOR GUIDE That is why laptop processor guide 2012 guides are far superior than the pdf guides.

Transcription:

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK

Single Board Computer Low Cost Small Size (Credit Card) Community Supported Open Sourced (No Licenses Involved with BeagleBone materials) Cortex-A8 ARM Processor from Texas Instruments Simple to setup your own board (Linux comes installed) Website: http://beagleboard.org/black Wiki: http://elinux.org/beagleboard:beagleboneblack

http://beagleboard.org/static/images/black_hardware_details.png

http://cdn.arstechnica.net/wp-content/uploads/2013/04/beaglebone-specs.png

SPECS AM335x 1GHz ARM Cortex-A8 Processor 512MB DDR3 RAM 4GB 8-bit emmc on-board flash storage 3D graphics accelerator NEON floating-point accelerator 2x PRU 32-bit microcontrollers Latest: Rev C

TECHNOLOGIES Cortex-A8 Technologies TrustZone Security Jazelle RCT Acceleration / Thumb 2EE Instruction Set Thumb-2 Instruction Set NEON Advanced SIMD(+VFPv3) Superscalar ARMv7 Core Description Device Integrity / Secure Transactions Fast & Responsive Java Applications Greater Performance With Less Code Size Enhanced Multimedia Experience Highest-performance mobile processor http://processors.wiki.ti.com/index.php/cortex-a8_features

NEON SIMD (Single Instruction Multiple Data) accelerator processor 64/128-bit Hybrid SIMD architecture single instruction performs the same operation on multiple elements that are packed within registers Parallel architecture Two Integer 64-bit ALUs Fully pipelined http://processors.wiki.ti.com/index.php/cortex-a8_features

NEON Two Integer 64-bit ALUs operating in parallel Can perform 128-bit length equivalent ALU operation in 1 cycle Supports 128-bit data streaming from both L1D$ and L2$ Byte permute function allows for on-the-fly data shuffling Two Integer Multipliers of 32x16 Each can perform one 32x16, two 16x16 or four 8x8 operations in a single pass Support 32x32 operation in two passes http://processors.wiki.ti.com/index.php/cortex-a8_features

SMID LOAD/STORE Native support for structures (e.g. complex numbers, pixels, coordinates) Memory treated as an array of structures (AoS) Eliminates shuffling overhead Optimised memory access as single transfer Data arranged for efficient SIMD processing http://processors.wiki.ti.com/index.php/cortex-a8_features

BENEFITS OF NEON + both aligned and unaligned data access + efficient vectorization of SIMD operations + both Integer and FP + broader range of applications (compression decoding, 3D graphics) + tightly coupled to ARM core + single instruction stream & unified view of memory + single development platform target + simpler tool flow + large register file with multiple views + efficient handling of data & minimizes memory accesses + better throughput performance

CORTEX-A8

http://www.ti.com/lit/ds/symlink/am3358.pdf HIGH-LEVEL BLOCK DIAGRAM

http://www.ti.com/lit/ds/symlink/am3358.pdf

http://www.ti.com/lit/ug/spruh73l/spruh73l.pdf MPU SUBSYSTEM

ARM CORTEX-A8 MPU SUBSYSTEM ARMv7 and Thumb 2 ISA ISA Efficiency = 2.01 DMIPS/MHz dual-issue, in-order execution engine Integrated L1 and L2 caches with NEON SIMD (Single Instruction, Multiple Data) Media Processing Unit Static Scheduling with Instruction Replay on Memory Stall Fire-And-Forget Issue

CACHE Split Level 1 Caches - Instruction and Data o o o Both 16 kb 4-way Set Associative Single Cycle Load-Use Penalty Unified Level 2 Cache o o o o 256 kb 8-Way Set Associative Minimum Latency - 8 Cycles High BW Interface to L1 Cache

MEMORY: GPMC Fully pipelined Onboard flash: 4GB, 8-bit embeded MMC SDRAM Mem: 512MB DDR3L 800MHz

PIPELINE dual-issue, in-order

CONTROL 95% Accuracy in Dynamic Branch Prediction Dynamic branch predictor components 512-entry 2-way BTB 4K-entry GHB indexed by branch history and PC 8-entry return stack Branch resolution resolved in single stage maintains speculative and non-speculative versions of branch history and return stack

INSTRUCTION DECODE

INSTRUCTION DECODE 4 entry pending queue decreases fetch stalls increases pairing opportunities replay queue: keeps instructions for reissue on memory system stall scoreboard: static scheduling to predict register availability cross-checks in D3 allow issue of dependent instruction pairs

INSTRUCTION EXECUTE

2 symmetric ALU pipelines: Shift/ALU/SAT Load/store pipe used by instructions in either pipeline Multiply instructions are tied to pipe 0 All key forwarding paths supported Static scheduling allows for extensive clock gating

PIPELINE

10 STAGE NEON PIPELINE

NEON PIPELINE Instruction issue static scheduling with fire-and-forget issue 1 LS + 1 NINT/NFP can issue each cycle Execution pipelines All pipelines are 64-bit SIMD Floating-point MAC executed using both FADD and FMUL pipelines

NEON: INTERFACING

NEON Block Diagram

NEON 16-Entry Instruction queue Dual view register file 32 x 64-bit 16 x 128-bit 6 Stage execution Pipeline Integer Single precision floating point Load store permute Non-pipelined IEEE vector floating point support 12 Entry load data queue

http://www.ti.com/lit/ds/symlink/am3358.pdf

PRU-ICSS Programmable Real Time Unit Subsystem and Industrial Communication Subsystem Dual PRUs Three 120-byte register banks accessible by each Supports e.g. EtherCAT, PROFIBUS, PROFINET, EtherNet/IP 12KB of Shared RAM With Single-Error Detection (Parity) UART port ecap module Dual MII Ethernet Ports Single NDIO Port http://www.ti.com/lit/ug/spruh73l/spruh73l.pdf

POWER TPS65217C PMIC regulator + LDO

BEAGLEBONE BLACK VS RASPBERRY PI Base Cost $45 $35 Processor Speed 1 GHz 700 MHz GPIO 65 pins 8 pins Power Consumption 210-460 ma @ 5V 260-350 ma @ 5V Onboard Storage 4 GB, SD SD card Cache L1-32 kb L2-256 kb L1-16 kb L2-128 kb ISA Efficiency 2.01 DMIPS/MHz 1.25 DMIPS/MHz RAM 512 MB DDR3L 512 MB SDRAM Video Connections Micro HDMI HDMI, Composite Audio Connections HDMI HDMI, 3.5 mm Jack

WHEN TO USE THE BEAGLEBONE BLACK... o o o o o o Processing Speed is important Constrained Size Requirements Projects with Many Hardware Connections Projects that may be commercialized o Open Sourced Non-Media Heavy Projects o Raspberry Pi is a slightly better option here Simple Startup o Linux Distro already installed

REFERENCE AM335x Sitara Processors Technical Reference Manual (Rev. L) http://www.ti.com/lit/ug/spruh73l/spruh73l.pdf AM335x Sitara Processors (Rev. H) http://www.ti.com/lit/ds/symlink/am3358.pdf NEON & VFP http://processors.wiki.ti.com/index.php/cortex-a8 Cortex A8 Arch http://processors.wiki.ti.com/index.php/cortex-a8_architecture A8 NEON Arch http://processors.wiki.ti.com/index.php/cortex-a8_neon_architecture logo (w/ raspi) from http://www.itclips.net/wp-content/plugins/rssposter/cache/ef657_bbvrpi-e1374615365680.png logo (larger, w/ wifi) from https://fleshandmachines.files.wordpress.com/2012/07/beagle.png

QUESTIONS?