High Performance or Cycle Accuracy?

Similar documents
Embedded Development Tools

Hybrid Platform Application in Software Debug

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Hardware accelerated Virtualization in the ARM Cortex Processors

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

Hardware Virtualization for Pre-Silicon Software Development in Automotive Electronics

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

Multi-/Many-core Modeling at Freescale

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

FPGA Prototyping Primer

Codesign: The World Of Practice

Attention. restricted to Avnet s X-Fest program and Avnet employees. Any use

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Architectures and Platforms

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

7a. System-on-chip design and prototyping platforms

MPSoC Virtual Platforms

ARM Webinar series. ARM Based SoC. Abey Thomas

big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency

Virtual Platforms Addressing challenges in telecom product development

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

ARM Architecture for the Computing World: From Devices to Cloud. Roy Chen Director, Client Computing ARM

H MICRO CASE STUDY. Device API + IPC mechanism. Electrical and Functional characterization of HMicro s ECG patch

Application Performance Analysis of the Cortex-A9 MPCore

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

World-wide University Program

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Early Hardware/Software Integration Using SystemC 2.0

Model-based system-on-chip design on Altera and Xilinx platforms

White Paper. S2C Inc Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: Fax:

Electronic system-level development: Finding the right mix of solutions for the right mix of engineers.

Comprehensive Security for Internet-of-Things Devices With ARM TrustZone

System Software Integration: An Expansive View. Overview

Update on big.little scheduling experiments. Morten Rasmussen Technology Researcher

TLM-2.0 in Action: An Example-based Approach to Transaction-level Modeling and the New World of Model Interoperability

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Network connectivity controllers

Introduction to AMBA 4 ACE and big.little Processing Technology

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Data Center and Cloud Computing Market Landscape and Challenges

Xeon+FPGA Platform for the Data Center

CoreSight SoC enabling efficient design of custom debug and trace subsystems for complex SoCs

9/26/2011. What is Virtualization? What are the different types of virtualization.

Sierraware Overview. Simply Secure

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Productivity, Predictability, and Use-Model Versatility: The Three Key Care-Abouts of Choosing Hardware- Assisted Verification

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

Embedded Software Development: Spottbillige Hardware + OSS = Zum Spielen zu Schade!

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

Building Blocks for PRU Development

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Introduction to the NI Real-Time Hypervisor

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012

Accelerating the Data Plane With the TILE-Mx Manycore Processor

New Methodologies in Smart Card Security Design. Y.GRESSUS Methodology and Secure ASIC development manager, Bull CP8

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Next Generation Operating Systems

An Easier Way for Cross-Platform Data Acquisition Application Development

1. Survey on the Embedded Windows Сompact 7 for System-

Customer Experience. Silicon. Support & Professional Eng. Services. Freescale Provided SW & Solutions

Embedded Linux Platform Developer

Android Virtualization from Sierraware. Simply Secure

Android Development: a System Perspective. Javier Orensanz

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

Embedded Systems: map to FPGA, GPU, CPU?

Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014

System Level Virtual Prototyping becomes a reality with OVP donation from Imperas.

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Efficient Software Development Platforms for Multimedia Applications at Different Abstraction Levels

2972 Linux Options and Best Practices for Scaleup Virtualization

IO Visor: Programmable and Flexible Data Plane for Datacenter s I/O

System Design Issues in Embedded Processing

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

Informal methods A personal search for practical alternatives to moral improvement through suffering in systems research

Cortex-A9 MPCore Software Development

Parallels Virtuozzo Containers

Virtualization. Jukka K. Nurminen

The Future of IoT. Zach Shelby VP Marketing, IoT Feb 3 rd, 2015

Developing reliable Multi-Core Embedded-Systems with NI Linux Real-Time

Extending the Power of FPGAs. Salil Raje, Xilinx

Development of Type-2 Hypervisor for MIPS64 Based Systems

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

The ARM Cortex-A9 Processors

Derek Chiou, UT and IAA Workshop

IO Visor Project Overview

How to Run the MQX RTOS on Various RAM Memories for i.mx 6SoloX

Università Degli Studi di Parma. Distributed Systems Group. Android Development. Lecture 1 Android SDK & Development Environment. Marco Picone

Data Centric Systems (DCS)

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Transcription:

CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100

AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing the two worlds together Results Summary

CHALLENGES IN MOBILE SOC DESIGN A typical verification challenge for today... big.little processing heterogeneous multicore Increasing complexity APPS APPS Increasing SW content OS Hypervisor OS Multiple OSs Secure services Diversified development Interrupt Control Cortex -A15 MPCore Cortex-A7 MPCore CPU 1 CPU 2 CPU 1 CPU 1 Mali GPU Base Periph. System Control On-chip Mem community L2 Cache L2 Cache DDR Cntrl Higher performance and lower power Cache Coherent Interconnect I/O & Peripheral Interfaces Security Are we building the thing right? Are we building the right thing?

ONE SIZE DOES NOT FIT ALL DIFFERENT TASKS DIFFERENT TOOLS

SOFTWARE CHALLENGES AND MODELING High level modeling and virtual prototypes exploited to improve development productivity and reduce time to market. Models required early in product development lifecycles Models are utilized for: Architecture & device modeling Early software development Early HW / SW co-validation Device implementation compliance Device validation support Models are used Within ARM Supplied to silicon partners Supplied to Ecosystem partners Accelerating SoC Time to Market... High performance models suitable for SW community Available early for SW development Accurate to HW for consistency Easy to debug enabling improved productivity Easy to deploy to facilitate the developer community

LOOSELY TIMED MODELING (A.K.A. PROGRAMMER S VIEW) ` Architecture Envelope Model (AEM) Core Fast Models Architecture Envelope Model (AEM) Architecture Envelope Model (AEM): Executable version of the ARM Architecture Reference Manual ( ARMARM ) Catch architectural defects early Core LT Model e.g. ARM Cortex -A15, Cortex-A7 processors Early software development platform Device implementation compliance: execute same validation suites as RTL

Code Translation Optimized translation of ARM instructions to X86 equivalents Direct Memory Access Translated code is cached for fast memory access LOOSELY TIMED MODEL HOW PERFORMANCE IS ACHIEVED

Hardware Android Port Software Stack Virtual Platform AN EXAMPLE: BIG.LITTLE SYSTEM BRING-UP TIMELINE 1st VP to ARM SW Dev Lead Partner Delivery Bi-monthly incremental deliveries General Release Lead Partner Delivery APM Demo Task Migration S/W Development on Virtual Platform Linaro Delivery big.little Android Gingerbread Ice Cream Sandwich Delivered Software Stack Running Performance Tuning 2011 2012

LT MODELS PERFORMANCE LINARO ANDROID BOOT 2.2GHz Laptop Cortex-A15x1 75 seconds (90 mips) Cortex-A15x4 + Cortex-A7x4 140 seconds 14 days continuous, >100T instructions Recommendation: High end workstation, 64-bit OS, large physical memory

SOC VERIFICATION LANDSCAPE Complex designs need multiple verification and validation platforms Virtual platforms allow early HW / SW co-development and analysis Requirements Analysis Virtual Platforms RTL (Sim / Accel) Emulation FPGA Evaluation Board System Validation System Design System Verification Architecture Design Integration Verification IP/Module Design IP / Module Verification Implementation

ARM FAST MODELS Fast, accurate Programmer s View (LT) Models aligned with the ARM IP and tools roadmaps Enables development at all levels of the software stack prior to silicon availability Models for newly available ARM processors Cortex-A15, Cortex-A7, Cortex-R5, Cortex-R7, big.little reference platforms Models of ARMv8 cores available to lead partners ARMv7 and ARMv8 architecture models for validation of processor IP Export to SystemC and TLM-2.0 Comprehensive debug, trace and simulation control Debug with DS-5 toolchain

SYSTEM ANALYSIS AND ARCHITECTURE DESIGN MODELLING NEEDS Highly accurate interconnect (& System IP) models > 95% accuracy Use with traffic generators for interconnect-centric exploration Accurate CPU models > 80% accurate: deliver representative traffic to interconnect > 95% accurate: Architecture validation & SW performance optimization Execution speed ~ 100 s KIPS Execute significant workloads

Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing the two worlds together Results Summary

CYCLE ACCURATE MODELS WHAT ARE THEY REALLY? Not defined in SystemC TLM-2.0 Requires protocol knowledge Source: Google.com Can be impossible to fully validate if handcreated

CYCLE ACCURATE MODELS WHY ARE THEY IMPORTANT? Ideally, all models are implementationaccurate 100% confidence in developed system Accurate hardware and software design decisions Multiple reasons why they re not always used RTL not available yet Execution speed Model creation/validation time

CARBONIZED ARM MODELS 100% accurate model of ARM IP compiled from ARM RTL Instrumented for interactive debug, memory, pipeline and cache analysis Mappings included to enable automatic interchange with ARM Fast Models Models for nearly all available ARM processors Cortex-A15, Cortex-A7, big.little, Cortex-A9, Cortex-A8 Cortex-R5, Cortex-R4, Cortex-M4, Cortex-M3, Cortex-M0 ARM11 MPCore, ARM1176, ARM1136, ARM926, ARM946, ARM968, ARM7TDMI-S Early availability programs on new processors Reference platforms and software packages included Comprehensive debug, trace and simulation control Debug with DS-5 toolchain

Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing the two worlds together Results Summary

WHY NOT APPROXIMATELY TIMED? THE DANGER OF THE MIDDLE AT models are great as part of HLS process Not as well-suited for other design tasks Can t achieve the speeds needed for software Doesn t have the accuracy needed for architecture Require large design/validation effort

SYSTEM ANALYSIS AND ARCHITECTURE DESIGN MODELLING NEEDS PV Models alone: insufficient detail for accurate analysis CA Models alone: unable to run large software workloads Option 1: dedicated models Option 2: re-use existing models PV for performance CA for detailed timing information

BRINGING THE WORLDS TOGETHER PT 1 GOOD FENCES MAKE GOOD NEIGHBORS Interrupt Control Base Periph. Cortex-A15 MPCore CPU 1 CPU 2 Cortex-A7 MPCore CPU 1 CPU 1 Mali GPU System Control On-chip Mem L2 Cache L2 Cache DDR Cntrl Cache Coherent Interconnect I/O & Peripheral Interfaces Security Best models for CA Mixing PV and CA models can be very effective Maintain core in PV Use CA for model(s) under investigation CA when needed, dormant when not Any model can be CA but avoid processor/memory datapath if maximum performance is desired Good for firmware, not for architecture

BRINGING WORLDS TOGETHER PT 2 THE BEST OF BOTH WORLDS Utilize existing LT and CA models Use LT models to quickly execute to point of interest Software breakpoint Arbitrary point in time (requires sync step) Swap to CA models to continue execution Debug hardware/software interaction Gather performance analysis data

MAPPING FROM LT TO CA 5 POUNDS OF DATA INTO A 10 POUND BAG LT Virtual Prototype Interrupt Control Cortex-A15 MPCore Cortex-A7 MPCore CPU 1 CPU 2 CPU 1 CPU 1 L2 Cache L2 Cache Cache Coherent Interconnect Base Periph. System Control On-chip Mem DDR Cntrl Security Checkpoints are used to create swap points ALL models need to support checkpointing Additional Initialization Data I/O & Peripheral Interfaces CA Virtual Prototype Interrupt Control Base Periph. Cortex-A15 MPCore Cortex-A7 MPCore System Control Checkpoint Data via CADI including side-effects CPU 1 CPU 2 CPU 1 CPU 1 L2 Cache L2 Cache Cache Coherent Interconnect I/O & Peripheral Interfaces On-chip Mem DDR Cntrl Security

CUSTOM MODEL SWAP Fast Model Fast Models C++ Models SystemC/TLM-2.0 Models ARM ESL APIs Accurate Model Carbonized Models C++ Models SystemC/TLM 2.0 Models Models presenting a compliant ARM ESL API interface may participate in swap Target model needs to support writing to necessary registers via ARM ESL API (including side-effects!) Only state elements are supported, not thread context Need system management of sockets, debug connections, etc

Compare Results IF IT S NOT TESTED, IT S BROKEN LT Virtual Prototype Program Flow CP1 CP2 CA Virtual Prototype Extract checkpoints from LT run Compare end results to validate execution Intermediate comparisons may be inconsistent due to different execution model in actual processor CPn

Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing the two worlds together Results Summary

CUSTOMER EXAMPLE: PARTITIONED SPEED AND ACCURACY Interrupt Control Cortex-A9 MPCore CPU 1 CPU 2 Base Periph. System Control On-chip Mem Mali GPU Hardware Prototype Linux Boot Virtual Prototype F L2 Cache DDR Cntrl Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame LT Model CA Model 5 Min 10 Min 15 Min Majority of system executes in LT domain, GPU in CA GPU only active when being initialized or processing frames Boots Linux in <20 seconds, processes frames in <90 seconds Display 9 frames of graphics in virtual prototype while hardware prototype is still booting Linux

LEVERAGING SPEED AND ACCURACY DRIVER DEVELOPMENT LT Virtual Prototype Boot Loader OS Boot UART Driver LCD Driver Mali Driver CP 1 CP 2 CP 3 CA Virtual Prototype Create checkpoint at each point of interest, each engineer debugs at correct time with 100% accuracy Able to boot OS and run to relevant checkpoint OS in 10-15 seconds leveraging speed of ARM Fast Models Each checkpoint is debugged independently with 100% accuracy

LEVERAGING SPEED AND ACCURACY PERFORMANCE ANALYSIS LT Virtual Prototype Boot Loader OS Boot Browser Startup CP CP CP CP CP CP CP CP CP CP CP CP CP CA Virtual Prototype Break long run into multiple run to execute in parallel and gather accurate data for cycle count, power analysis, etc. Results can be aggregated to deliver accurate results for much longer runs than normally possible with accurate models

DEPLOYMENT RESULTS Ability to swap Fast Model->Carbonized model is in active use Execution time is unchanged No effect on Fast Model runtime speed CA models execute at regular speed after restore Checkpoint creation/restore is <10 seconds Currently supported cores Cortex-A15, Cortex-A7, Cortex-A9

AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing the two worlds together Results Summary

SUMMARY Swapping LT and CA models enables a single virtual prototype to be used by all design groups 50-200 MIPS performance for software development 100% accuracy for firmware, architecture and debug