Xilinx Zynq-7000 EPP An Extensible Processing Platform Family

Similar documents
All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

FPGA-Accelerated Heterogeneous Hyperscale Server Architecture for Next-Generation Compute Clusters

7 Series FPGA Overview

7a. System-on-chip design and prototyping platforms

System Design Issues in Embedded Processing

System Performance Analysis of an All Programmable SoC

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

ARM Webinar series. ARM Based SoC. Abey Thomas

Model-based system-on-chip design on Altera and Xilinx platforms

Reconfigurable System-on-Chip Design

Hybrid Platform Application in Software Debug

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Extending the Power of FPGAs. Salil Raje, Xilinx

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

What is a System on a Chip?

Attention. restricted to Avnet s X-Fest program and Avnet employees. Any use

Qsys and IP Core Integration

Bare-Metal, RTOS, or Linux? Optimize Real-Time Performance with Altera SoCs

High Performance or Cycle Accuracy?

Accelerate Cloud Computing with the Xilinx Zynq SoC

WiSER: Dynamic Spectrum Access Platform and Infrastructure

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

High-Level Synthesis for FPGA Designs

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Building Blocks for PRU Development

ZigBee Technology Overview

Building an Embedded Processor System on a Xilinx Zync FPGA (Profiling): A Tutorial

FPGA Design From Scratch It all started more than 40 years ago

Getting Started with Embedded System Development using MicroBlaze processor & Spartan-3A FPGAs. MicroBlaze

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

SABRE Lite Development Kit

Architectures and Platforms

Easy H.264 video streaming with Freescale's i.mx27 and Linux

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Introduction to AMBA 4 ACE and big.little Processing Technology

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

OpenSoC Fabric: On-Chip Network Generator

Am186ER/Am188ER AMD Continues 16-bit Innovation

Pre-tested System-on-Chip Design. Accelerates PLD Development

LogiCORE IP AXI Performance Monitor v2.00.a

Zynq SATA Storage Extension (Zynq SSE) - NAS. Technical Brief from Missing Link Electronics:

Designing a System-on-Chip (SoC) with an ARM Cortex -M Processor

Kirchhoff Institute for Physics Heidelberg

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Concept Engineering Adds JavaScript-based Web Capabilities to Nlview at DAC 2016

FPGAs in Next Generation Wireless Networks

XA Zynq-7000 All Programmable SoC Overview

Hardware accelerated Virtualization in the ARM Cortex Processors

Data Center and Cloud Computing Market Landscape and Challenges

Simplifying Embedded Hardware and Software Development with Targeted Reference Designs

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Introduction to Digital System Design

AppliedMicro Trusted Management Module

Network connectivity controllers

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Atmel SMART ARM Core-based Embedded Microprocessors

Which ARM Cortex Core Is Right for Your Application: A, R or M?

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

System Considerations

Memory Architecture and Management in a NoC Platform

Open Flow Controller and Switch Datasheet

Use-Case Power Management Optimization: Identifying & Tracking Key Power Indicators ELC- E Edimburgh, Patrick Ti:ano

Going Linux on Massive Multicore

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Architectures, Processors, and Devices

STM32 F-2 series High-performance Cortex-M3 MCUs

NORTHEASTERN UNIVERSITY Graduate School of Engineering. Thesis Title: CRASH: Cognitive Radio Accelerated with Software and Hardware

ARM Cortex STM series

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Virtualized Execution and Management of Hardware Tasks on a Hybrid ARM-FPGA Platform

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

VPX Implementation Serves Shipboard Search and Track Needs

Networking Virtualization Using FPGAs

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Embedded Linux RADAR device

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

How to Use Interrupts on the Zynq SoC

Debug and Trace for Multicore SoCs How to build an efficient and effective debug and trace system for complex, multicore SoCs

High-Performance, Highly Secure Networking for Industrial and IoT Applications

Using a Generic Plug and Play Performance Monitor for SoC Verification

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

Video/Cameras, High Bandwidth Data Handling on imx6 Cortex-A9 Single Board Computer

DDR subsystem: Enhancing System Reliability and Yield

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Full Power Domain SLCR (FPD_SLCR)

Chapter 1 Computer System Overview

Chapter 13. PIC Family Microcontroller

1. Memory technology & Hierarchy

Scaling from Datacenter to Client

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

Low Power AMD Athlon 64 and AMD Opteron Processors

PCI Express and Storage. Ron Emerick, Sun Microsystems

9/14/ :38

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Transcription:

Xilinx Zynq-7000 EPP An Extensible Processing Platform Family Vidya Rajagopalan, Vamsi Boppana, Sandeep Dutta, Brad Taylor, Ralph Wittig August 18, 2011

Agenda Xilinx Series 7 Highlights Zynq-7000 EPP Architecture & Silicon Zynq-7000 Software & Applications Summary Page 2

Xilinx 7 Series Highlights 7 Series silicon devices 28 nm Technology, TSMC HPL process 50% reduction in power over 40 nm devices 3 FPGA Fabrics Artix = Low cost, low power FPGA ( 1W FPGA ) Design Green by Xilinx Kintex = Density & performance FPGA ( Market Sweet spot ) Virtex = Highest density and performance FPGA ( More than Moore ) More than Moore density increase Up to 2M logic cells Using Stacked Silicon Interconnect Technology (SSIT) Improved GT bandwidth GT bandwidth increased to 28 GHz Zynq Embedded Processing Platform (EPP) Page 3

More Than Moore Xilinx Stacked Silicon Interconnect Technology Microbumps Access to power / ground / IOs Access to logic regions Leverages ubiquitous image sensor Through-silicon micro-bump technology Vias (TSV) Only bridge power / ground / IOs to C4 bumps Coarse pitch, low density aids manufacturability Etch process (not laser drilled) Passive Silicon Interposer (65nm Generation) 4 conventional metal layers connect micro bumps & TSVs No transistors means low risk and no TSV induced performance degradation Side-by-Side Die Layout Minimal heat flux issues Minimal design tool flow impact 28nm FPGA Slice 28nm FPGA Slice 28nm FPGA Slice 28nm FPGA Slice Microbumps Silicon Interposer Through Silicon Vias Package Substrate C4 Bumps BGA Balls Page 4

Agenda Xilinx Series 7 Highlights Zynq-7000 EPP Architecture & Silicon Zynq-7000 Software & Applications Summary Page 5

Zynq-7020 Device Processor System (PS) ARM Cortex-A9 MPcore Standard Peripherals 32-bit DDR3 / LPDDR2 controller 54 Multi-Use IOs 73 DDR IOs Programmable Logic (PL) 85 K Logic Cells 106K FFs 140 32-Kb Block RAM 220 DSP Blocks Dual 12-bit ADC Secure configuration engine 4 Clock Management Tiles 200 Select IO (1.2-3.3V) Secure Configuration Processor System Processor System Programmable Logic Select IOs DSP BRAM XADC CMT Page 6

Zynq-7000 Device Family Zynq-7000 EPP Devices Z-7010 Z-7020 Z-7030 Z-7040 Processing System Processor Core Processor Extensions Max Frequency Memory External Memory Support Peripherals Dual ARM Cortex -A9 MPCore NEON & Single / Double Precision Floating Point 800MHz L1 Cache 32KB I / D, L2 Cache 512KB, on-chip Memory 256KB DDR3, DDR2, LPDDR2, 2x QSPI, NAND, NOR 2x USB 2.0 (OTG), 2x Tri-mode Gigabit Ethernet, 2x SD/SDIO, 2x UART, 2x CAN 2.0B, 2x I2C, 2x SPI, 4x 32b GPIO Programmable Logic I/O Approximate ASIC Gates ~430K (30k LC) ~1.3M (85k LC) ~1.9M (125k LC) ~3.5M (235k LC) Extensible Block RAM 240KB 560KB 1,060KB 1,860KB Peak DSP Performance (Symmetric FIR) 58 GMACS 158 GMACS 480 GMACS 912 GMACS PCI Express (Root Complex or Endpoint) - Gen2 x4 Gen2 x8 Agile Mixed Signal (XADC) 2x 12bit 1Msps A/D Converter Processor System IO 130 Multi Standards 3.3V IO 100 200 100 200 Multi Standards High Performance 1.8V IO - - 150 150 Multi Gigabit Transceivers - - 4 12 Page 7

Zynq-7000 Processor System (PS) Dual Core Cortex ARM A9 NEON, 512 KB L2 cache 256 KB On-Chip-Memory (OCM) DDR Interface DDR3 Performance High BW utilization Config & Legacy Memory I/F Quad-SPI, NOR, NAND Standard Peripherals GigE Available to PS IO or to Programmable Logic System Level Peripherals Clock generation, Counter Timers 8 Channel DMA controller Coresight Debugging PS Peripherals can be multiplexed onto 54 external Multi-Use-IOs (MIO) MIO[53:0] GigE (2) USB (2) SPI (2) I2C (2) UART (2) CAN (2) GPIO PLL(3) Coresight Config Config PS Peripherals can also be routed through Security the Programmable XADC Logic Zynq 7000 EPP Processing System Cortex A9 NEON 32KB I$/D$ SCU Cortex A9 NEON 32KB I$/D$ Timers, AWDT, GIC, ACP L2 Cache 512KB OCM 256 KB DMA TTC SWDT AMBA AXI Interconnect Programmable Logic Quad SPI CTRL Parallel CTRL NAND CTRL DDR CTRL SDIO (2) General Purpose ACP High Performance Select IO GTs PCIe LPDDR2 DDR2 DDR3 32-bit NOR NAND Quad-SPI Page 8

Zynq-7000 Programmable Logic (PL) Programmable Logic Resources 30K 235 K Logic Cells Dedicated 36 K-bit BRAMs, DSP, CMT XADC dual channel 12-bit ADC Up to 12 GTs with PCIe hard core Up to 300 Select IOs Programmable Logic AXI Interfaces Multiple 32/64 bit AXI interfaces to PL Accelerator Coherency Port (ACP) with access to caches Programmable Logic System Interfaces Interrupts, DMA control Debug High Performance PL Configuration Security Decryption Engine Under 200 ms configuration time from flash Debugging interfaces GigE (2) USB (2) SPI (2) I2C (2) UART (2) CAN (2) GPIO PLL(3) Coresight Config Config Security XADC Zynq 7000 EPP Processing System Cortex A9 NEON 32KB I$/D$ SCU Cortex A9 NEON 32KB I$/D$ Timers, AWDT, GIC, ACP L2 Cache 512KB OCM 256 KB DMA TTC SWDT AMBA AXI Interconnect Programmable Logic Quad SPI CTRL Parallel CTRL NAND CTRL DDR CTRL SDIO (2) General Purpose ACP High Performance Select IO GTs PCIe DDR3 32-bit 4-8 GB/sec Page 9

Customizing Zynq Tools for the Programmable Logic System Builder Clocking Flexible clock sources (PS or PL) Simple clock interfaces Memory and Peripheral access PL access to all memory: Caches, OCM, DDR 2 dedicated DDR ports ensure bandwidth PL access to all peripherals in PS Interconnect AXI Interconnect IP available from Xilinx Optimized for FPGA implementation Debug and Misc. Bidirectional cross-triggers (Coresight and Chipscope) 16 general purpose interrupts from PL to PS Simple Clock Interfaces Relative Latency 3.0 2.5 2.0 1.5 1.0 0.5 D Q Q D To PL Zynq-7000 DDR Utilization vs Latency 0.0 0% 20% 40% 60% 80% 100% DDR Utilization PL clock From PL Page 10

SW managed Programmable Logic (PL) Linux based, remote controlled, programmable logic SW user experience: SoC with integrated PL Configure PL (full and partial) Start/stop & single step clocks Setup & update HW triggers Monitor HW performance counters Observe & sync to PL hardware events PS ARM Coresight H/W Events Cross Trigger PL Config clocks Page 11

Zynq-7000 Power Saving Features Low power 1.0V HPL 28 nm process silicon technology Programmable Logic can be powered off and on as needed 40-90% reduction in static power depending on device Very fast configuration times when loaded from DRAM Low power ARM Cortex-A9 MP Incorporates clock gating and power-down modes Support for LPDDR2 devices Ultra low power self refresh Peripherals shutdown Design Green by Xilinx Page 12

Engineering Insights Process Selection Criteria Power Frequency Process Considerations: 28 HP: Highest Performance HKMG Process (but must be able to afford power; e.g. GPU) 28 HPL: Low power HKMG process (shifts down HP power / performance range) 28 LP: No HKMG low power process (cheaper than HPL, but less performance) Xilinx Reasons for Selecting HPL: Higher performance than LP (at same power level) Higher performance vs HP at FPGA TDP (or lower power at same performance) Power Frequency Page 13

Engineering Insights Finding The Frequency Sweet Spot (within the HPL Process) 2.00 1.50 1.00 Normalized Power vs. Frequency 0.50 450 550 650 750 Worst Setup -40c Hold Timing Histograms Vt usage Typical High Vt Med Vt Low Vt Page 14 Delay Variation 2 1 0 Normalized Path Delay Max Nom Min

Engineering Insights Configuring Interconnect CPU: CPU 800 MHz FPGA 200 Mhz Switch OCM 400 MHz FPGA: OCM: 1 1 1 1 1 1 1 2 3 -stalled- 3 1 1 1 1 1 1 1 1 2 2 2 CPU: CPU 800 MHz FPGA 200 Mhz Switch Threshold OCM 400 MHz FPGA: OCM: 1 1 1 1 1 release 1 2 1 3 1 1 1 1 2 2 2 2 1 1 1 1 3 15 Copyright Copyright 2011 2011 Xilinx Xilinx

Agenda Xilinx Series 7 Highlights Zynq-7000 EPP Architecture & Silicon Zynq-7000 Software & Applications Summary Page 16

Zynq-7000 Use Cases #1 #2 #3 Embedded Control Fabric Datapath SW Acceleration ARM CPU ARM CPU ARM CPU Memory Peripheral Peripheral Datapath Accelerator Ex: Motor Control Ex: Static Video Stream Ex: Interactive Image Processing Use Case #1 Access peripheral configuration registers Use Case #2 Access datapath configuration registers Access datapath memory (coefficient tables) Use Case #3 Low latency/high bandwidth shared work spaces Move data between SW and HW domains Page 17

Application Programming Using Only C Application C/C++ Device Information SW-Centric Design Environment High-Level Synthesis via AutoESL Binary for CPU Bitstream for PL fabric CPU Memory Data Movement Interconnect FPGA Fabric Video Codec Encryption LTE Modem

AutoESL Generated Accelerators C-Based, High-Level Synthesis Tools at Xilinx Application Example: Back Projection Algorithm (recreate CT scan images from samples) gprof Locate SW hot spot function(s) on ARM AutoESL Synthesize hot spot function(s) to HW/PL 52 Floating Point Operators @ 200Mhz Fits in lowest cost Zynq 7010 device 3X Performance vs SW only Page 19

Engineering Insights Data Movement Gotchas Processor Host Program Engine Matrix Mult Memory 1 Memory 2 Memory 3 Using CPU Programmed IO IO dominates accelerator compute time Page 20

Engineering Insights Data Movement Gotchas Processor Engine Memory 1 Memory 2 Memory 3 Using DMA DMA setup time dominates (white space between green bars) Page 21

Xilinx Evolution Towards Multicore Legacy Logic BRAM DSP Glue MicroBlaze ub ub acc1 acc2 Soft Multi-Core + Accelerators ZYNQ Dual A9 acc1 acc2 GPP + Accelerators Future Zynq Next Gen GPP MC Array PE PE PE PE acc1 acc2 GPP + Multicore + Soft Processing Engine + Accelerators Multicore programming models being ported to Zynq Page 22

Agenda Xilinx Series 7 Highlights Zynq 7000 EPP Architecture & Silicon Zynq 7000 Software & Applications Summary Page 23

Summary Zynq SoC Device Family with Integrated Programmable Logic $15 Price Point* / 28nm Fab Process Microcontroller and Accelerator Use Models Industry Standard Tools (ARM Ecosystem, Android, ISE) Emerging Tools (AutoESL, Multicore) Emulation platforms in use for prototyping Available 1H 2012 Android on Zynq emulation board Source: iveia LLC Page 24 * High volume price for smallest device and package, slowest speed grade