ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler



Similar documents
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Lynx Design System Delivering Higher Productivity and Predictability in IC Design

Application Performance Analysis of the Cortex-A9 MPCore

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

7a. System-on-chip design and prototyping platforms

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

ARM Microprocessor and ARM-Based Microcontrollers

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Designing a System-on-Chip (SoC) with an ARM Cortex -M Processor

White Paper. S2C Inc Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: Fax:

Prototyping ARM Cortex -A Processors using FPGA platforms

The ARM Cortex-A9 Processors

High Performance or Cycle Accuracy?

Implementation Details

Engineering Change Order (ECO) Support in Programmable Logic Design

Rapid System Prototyping with FPGAs

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

World-wide University Program

ZigBee Technology Overview

ARM Webinar series. ARM Based SoC. Abey Thomas

System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems.

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Testing of Digital System-on- Chip (SoC)

Codesign: The World Of Practice

What is a System on a Chip?

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

How To Design A Chip Layout

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

Am186ER/Am188ER AMD Continues 16-bit Innovation

Introduction to Digital System Design

CISC, RISC, and DSP Microprocessors

Fastest Path to Your Design. Quartus Prime Software Key Benefits

Application Note: AN00141 xcore-xa - Application Development

Cortex-A9 MPCore Software Development

ARM Processor Evolution

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Architectures and Platforms

A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software RTOS

Semiconductor design Outsourcing: Global trends and Indian perspective. Vasudevan A Date: Aug 29, 2003

big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency

OpenSoC Fabric: On-Chip Network Generator

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

Model-based system-on-chip design on Altera and Xilinx platforms

Bare-Metal, RTOS, or Linux? Optimize Real-Time Performance with Altera SoCs

Update on big.little scheduling experiments. Morten Rasmussen Technology Researcher

Software Configuration Management for Embedded Systems Developers

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Low Power AMD Athlon 64 and AMD Opteron Processors

Ingar Fredriksen AVR Applications Manager. Tromsø August 12, 2005

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Hardware accelerated Virtualization in the ARM Cortex Processors

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Hunting Asynchronous CDC Violations in the Wild

Attention. restricted to Avnet s X-Fest program and Avnet employees. Any use

Complete ASIC & COT Solutions

NEC Electronics: Integrating Power Awareness in SoC Design with CPF

The new 32-bit MSP432 MCU platform from Texas

Comparing Power Saving Techniques for Multi cores ARM Platforms

IL2225 Physical Design

Rapid Software Development with OpenAccess. Dean Marvin Exec Director, Product Development

Software-Programmable FPGA IoT Platform. Kam Chuen Mak (Lattice Semiconductor) Andrew Canis (LegUp Computing) July 13, 2016

CoreSight SoC enabling efficient design of custom debug and trace subsystems for complex SoCs

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

Improving Grid Processing Efficiency through Compute-Data Confluence

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

The Importance of Software License Server Monitoring

on-chip and Embedded Software Perspectives and Needs

Developing Embedded Applications with ARM Cortex TM -M1 Processors in Actel IGLOO and Fusion FPGAs. White Paper

How to Run the MQX RTOS on Various RAM Memories for i.mx 6SoloX

Extending the Power of FPGAs. Salil Raje, Xilinx

State-of-Art (SoA) System-on-Chip (SoC) Design HPC SoC Workshop

System Design Issues in Embedded Processing

Chapter 1 Computer System Overview

Pre-tested System-on-Chip Design. Accelerates PLD Development

Concept Engineering Adds JavaScript-based Web Capabilities to Nlview at DAC 2016

Java Embedded Applications

Embedded Development Tools

Enhanced Project Management for Embedded C/C++ Programming using Software Components

Digital IC Design Flow

Design Methodology for Engineering Change Orders (ECOs) in a Flat Physical Standard Cells Based Design Environment

Embedded Parallel Computing

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

EEC 119B Spring 2014 Final Project: System-On-Chip Module

High-Level Synthesis for FPGA Designs

Architectures, Processors, and Devices

Building an Embedded Processor System on a Xilinx Zync FPGA (Profiling): A Tutorial

Transcription:

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler DAC 2008 Philip Watson Philip Watson Implementation Environment Program Manager ARM Ltd

Background - Who Are We? Processor Division, Cores Implementation, ARM-India. This team is actively involved in processor development benchmarking The team has been working alongside the development of the microarchitecture of the ARM Cortex -A9 processor since early development and test The outcome of this effort is to showcase Power consumption Performance Area The effort is focused on making the Cortex-A9 processor core a deployable embedded solution 2

Partnership Through the Design Chain The RM ties all this together, piloting the route from RTL to Silicon The CPU is at the heart of the system-on-chip We work with major EDA companies to ensure our IP works seamlessly Processors Reference Methodology Fabric & EDA Tools Physical IP Mutual Customers We partner with silicon foundries to provide diversity it of SoC implementation and manufacturing choice EDA tools provide the environment to exploit this IP SoCs require high performance fabric and quality physical IP 3

Cortex-A9 MPCore Multicore Solutions The relative performance and power range of an ARM processor enabled by its ARM Physical IP MHz Mainstream Platform Performance Platform 15% CPU performance boost! Density Optimized Platform 15% lower power, higher density mw 4

Challenges with Cortex-A9 MPCore Implementation run time with all EDA tools is a key challenge for design closure, particularly with scalable performance processor designs Iteration time increases as the design size increases The iterations influence our ability to turnaround floor plan changes, tailor optimizations, allow the debug of constraints and design feedback this is a key to converging results 6.0 5.0 4.0 3.0 2.0 1.0 0.0 A9 MP 1x with Neon A9 MP 2x with Neon A9 MP 4x with Neon Gate Count Run time 5

Challenges with Cortex-A9 MPCore Implementation of 1 CPU vs 4 CPU Cortex-A9 with flat flow Configuration 1CPU, 1 Neon, 32K D$, 32K I$, 32 interrupts 4CPU, 4 Neon, 32K D$, 32K I$, 32 interrupts Process Technology TSMC CLN65LP TSMC CLN65LP Standard Cell Library 12Track Nominal VT 12Track Nominal VT Memory Library Optimized fast cache instances Optimized fast cache instances The 4 CPU solution gives: A significant increase in run time Potentially some drop in performance (frequency) as compared to a 1 CPU implementation. 6

Hierarchical Implementation with IC Compiler For faster TTR Cortex-A9 cpu0 Placement (X Hrs) CTS (Y Hrs) Routing (Z Hrs) Cortex-A9 cpu1 Placement (X Hrs) CTS (Y Hrs) Cortex-A9 MPCore Routing (Z Hrs) Cortex-A9 top only Cortex-A9 cpu2 Placement (X Hrs) CTS (Y Hrs) Placement (A Hrs) CTS (B Hrs) Routing (Z Hrs) Routing (C Hrs) Cortex-A9 cpu3 Placement (X Hrs) CTS (Y Hrs) Routing (Z Hrs) Total Run Time = X + Y + Z + C Hrs 7

Hierarchical Implementation with IC Compiler Steps involved SDC & ScanDef Floorplanning Create Physical Partition Partition Aware Place Power Network Synthesis Power Network Analysis In-Place Optimization Clock Planning Pin Assignment Budgeting Commit Blocks 8

Cortex-A9 MPCore Multicore Solutions The relative performance and power range of an ARM processor enabled by its Artisan physical IP Cortex-A9 Hierarchical Flow (with IC Compiler) MHz Mainstream Platform Performance Platform 15% CPU performance boost! Density Optimized Platform 15% lower power, higher density mw 9

Hierarchical Implementation with IC Compiler Results Implementation of 1 CPU Cortex-A9 flat vs 4 CPU Cortex-A9 hierarchical flow Configuration 4CPU, 4 Neon, 32K D$, 32K I$, 32 interrupts 4CPU, 4 Neon, 32K D$, 32K I$, 32 interrupts Process Technology TSMC CLN65LP TSMC CLN65LP Standard d Cell Library 12Track Nominal VT 12Track Nominal VT Memory Library Optimized fast cache instances Optimized fast cache instances Implementation flow Flat Hierarchical 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 The 4 CPU implemented with a hierarchical flow gives: 0.5 0.0 A9 MP 1x with Neon A9 MP 2x with Neon A9 MP 4x with Neon Comparable QoR in performance (frequency) 25% additional run time when compared to a 1CPU flat implementation Gate Count Run time hierarchical 10

Next Steps Handling efficiently Multiple Instantiated Module (MIM) for symmetric cores 11

Summary Hierarchical flow delivers much faster iteration time with no loss of QoR Simple and effective strategy to implement a multicore processor Reduction in high memory cluster requirements Lends itself very well for low power partitioning Advanced low power management such as State Retention Power Gating Leakage mitigation by power shutdown if the hardware is not being utilized Easily deployable for the partner base (estimated by end of 2008) In an ARM-Synopsys irm (implementation Reference Methodology) with: Floorplan Tcl Scripts (Complete flow from RTL to GDSII) Physical IP Libraries ARM Documentation - Core Signoff Guide providing an out-of-box solution from ARM 12