Using a Generic Plug and Play Performance Monitor for SoC Verification



Similar documents
Using a Generic Plug and Play Performance Monitor for SoC Verification

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

AppliedMicro Trusted Management Module

Going Linux on Massive Multicore

ZigBee Technology Overview

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

System Performance Analysis of an All Programmable SoC

OpenSPARC T1 Processor

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

Building Blocks for PRU Development

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

USB 3.0 Connectivity using the Cypress EZ-USB FX3 Controller

WiLink 8 Solutions. Coexistence Solution Highlights. Oct 2013

Qsys and IP Core Integration

Chapter 13. PIC Family Microcontroller

System on Chip Platform Based on OpenCores for Telecommunication Applications

Develop a Dallas 1-Wire Master Using the Z8F1680 Series of MCUs

A NOVEL RESOURCE EFFICIENT DMMS APPROACH

Universal Flash Storage: Mobilize Your Data

AXI Performance Monitor v5.0

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

Embedded Linux Platform Developer

What is a System on a Chip?

CMA5000 SPECIFICATIONS Gigabit Ethernet Module

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

Qsys System Design Tutorial

C-GEP 100 Monitoring application user manual

Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS

Getting off the ground when creating an RVM test-bench

PCI Express: The Evolution to 8.0 GT/s. Navraj Nandra, Director of Marketing Mixed-Signal and Analog IP, Synopsys

TURBO PROGRAMMER USB, MMC, SIM DEVELOPMENT KIT

Web Load Stress Testing

10/100/1000Mbps Ethernet MAC with Protocol Acceleration MAC-NET Core with Avalon Interface

Route Processor. Route Processor Overview CHAPTER

7a. System-on-chip design and prototyping platforms

Firewall Security: Policies, Testing and Performance Evaluation

Qsan Document - White Paper. Performance Monitor Case Studies

AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Am186ER/Am188ER AMD Continues 16-bit Innovation

FPGA Manager PCIe, USB 3.0 and Ethernet

Debugging A MotoHawk Application using the Application Monitor

Chapter 1 Lesson 3 Hardware Elements in the Embedded Systems Chapter-1L03: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Pre-tested System-on-Chip Design. Accelerates PLD Development

SPI I2C LIN Ethernet. u Today: Wired embedded networks. u Next lecture: CAN bus u Then: wireless embedded network

In-System Programmer USER MANUAL RN-ISP-UM RN-WIFLYCR-UM

Frequently Asked Questions

Architectures, Processors, and Devices

Introduction to MPIO, MCS, Trunking, and LACP

Performance Evaluation of Linux Bridge

Procedure: You can find the problem sheet on Drive D: of the lab PCs. Part 1: Router & Switch

BLUETOOTH SMART CABLE REPLACEMENT

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

SoC IP Interfaces and Infrastructure A Hybrid Approach

PL2775 JBOD / BIG / RAID0 / RAID1 Mode Application

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Network performance in virtual infrastructures

i.mx USB loader A white paper by Tristan Lelong

Hybrid Platform Application in Software Debug

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Efficient Project Management and Verification Sign-off Using Questa Verification Management

VoIP QoS on low speed links

Computer Systems Structure Main Memory Organization

Configuring Your Computer and Network Adapters for Best Performance

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

TANDBERG MANAGEMENT SUITE 10.0

OpenSoC Fabric: On-Chip Network Generator

Service and Resource Discovery in Smart Spaces Composed of Low Capacity Devices

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

SURF HMP getting started guide

CQG/LAN Technical Specifications. January 3, 2011 Version

Adapt Support Managed Service Programs

Secure Cloud Storage and Computing Using Reconfigurable Hardware

A case study of mobile SoC architecture design based on transaction-level modeling

Serial Communications

Using Network Virtualization to Scale Data Centers

Automated Target Testing with TTCN-3: Experiences from WiMAX Call Processing Features

Introduction to Functional Verification. Niels Burkhardt

DDR subsystem: Enhancing System Reliability and Yield

10 Gigabit Ethernet MAC Core for Altera CPLDs. 1 Introduction. Product Brief Version February 2002

IP videoconferencing solution with ProCurve switches and Tandberg terminals

Infor Web UI Sizing and Deployment for a Thin Client Solution

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

CSE2102 Digital Design II - Topics CSE Digital Design II

Early Hardware/Software Integration Using SystemC 2.0

High-Performance, Highly Secure Networking for Industrial and IoT Applications

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Intel N440BX Server System Event Log (SEL) Error Messages

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)

Notes and terms of conditions. Vendor shall note the following terms and conditions/ information before they submit their quote.

FlowMonitor a network monitoring framework for the Network Simulator 3 (NS-3)

13. Publishing Component Information to Embedded Software

Video and Image Processing Design Example

Transcription:

Using a Generic Plug and Play Performance Monitor for SoC Verification Dr. Ambar Sarkar Kaushal Modi Janak Patel Bhavin Patel Ajay Tiwari Accellera Systems Initiative 1

Agenda Introduction Challenges Why Performance Monitor Features Architecture and Configuration Measurements Supported Performance Result Report Steps for Monitor Usage Accellera Systems Initiative 2

Introduction Two parameters of Performance Verification: Latency : Time taken by data to reach from one point to another Bandwidth : Amount of data transferred in a given time The parameters have to be measured and verified against an expected value for all paths of interest; the process known as performance verification Functional Verification alone cannot guarantee meeting of performance criteria With the complexity of SoCs increasing, importance of performance verification grows in verification cycle Accellera Systems Initiative 3

Complex SoC : Diagram DPHY DPHY DPHY DPHY DPHY CSI-RX CSI-RX CSI-RX CSI-RX CSI-RX DPHY CSI-RX Image Image Processor Image Processor Image Processor Image Image Processor Processor Processors 7 7 Dual Core Scalar 7 Dual Core Scalar Floating 7 Dual Core Floating Point Scalar Point Processors 7 Dual Core Scalar Floating Point Processors 7 Dual Floating Dual Core Scalar Scalar Core Point Scalar Processors Floating Point Processors Floating Point Processors Processors Floating Point Processors OCP OCP OCP OCP OCP OCP OCP OCP OCP OCP OCP OCP F/W Slv F/W 7 7 Dual Core Scalar 7 Dual Core Scalar Floating 7 Dual Floating Dual Core Point Scalar Vector Core Point Scalar Processors Floating Point Processors Floating Point Processors Processors Floating Point Processors OCP OCP OCP OCP OCP OCP OCP OCP F/W Slv INTERCONNECT F/W PCIe EP DMA F/W Generic DMA JTAG Interrupt & Error Controller Timers & Watchdog I2C, SPI, UART I O M U X Clock & Reset Generator Boot ROM System RAM Controller Bank0 Bank1 Bank2 Memory Scheduler DDR Memory Controller DRAM PHY Chip Specific IPs Debug & Trace Controller Sensor Accellera Systems Initiative 4

Complex SoC : Perf Requirement Paths Image Processor 0 -> DDR Image Processor 1,2 -> DDR Image Processor 3 to M -> DDR Floating processor 0,1 -> DDR Floating processor 2,3 -> DDR Floating processor N -> System RAM Other Floating processor -> DDR PCIe -> DDR Traffic Type 128B WR 75% Sequential, 25% Random 128B WR 75% Sequential, 25% Random 128B WR 75% Sequential, 25% Random [1] Sequential 128B RD [2] Random 32B RD [3] Random 32B WR [1] Random 32B RD [2] Random 32B WR Sequential 128B RD [1] Random 32B RD [2] Random 32B WR [1] Sequential 128B RD [2] Sequential 128B WR Accellera Systems Initiative 5

Why Performance Monitor Challenges in Performance Verification Multiple interfaces having different protocols Bunch of data paths of measurement Each path can have specific measurement requirements Paths can have interleaved traffic pattern Variable measurement window for different traffic patterns possible Selective traffic measurement on particular interface Possibility of change in Performance requirements during course of the project Possibility of addition of new paths in derivative SoC Accellera Systems Initiative 6

Why Performance Monitor Challenges in Performance Verification Different approaches used Calculation in tests or callbacks Testcases become too complex and lengthy Reporting using post processing Testcase developer has to print unique and appropriate messages Requires development of extra script, overhead in terms of time and effort No standard method approach becomes dependent on individual Solution :: Performance Monitor Accellera Systems Initiative 7

Features Highly Configurable Multiple paths and multiple traffic patterns supported Configurable units for latency (ns, us, ms) and bandwidth (KBps, MBps, GBps, Kbps, Mbps, Gbps) Configuration methods convenient, uncomplicated Configuration compliance check Reporting Measurement report depending upon user defined verbose level Error reporting in case of incorrect result Run time reporting of measurements Trace file reporting for all the measurements Types of Measurements Per transaction & Average latency Bandwidth Alternate bandwidth and latency window Cumulative bandwidth measurement for multiple traffic patterns Uniformity comparison between interfaces Root mean square (RMS) value calculation for latency, bandwidth and error values Miscellaneous Callback support to collect functional coverage API support to get performance statistics on-the-fly Measurement Enable/Disable Support Accellera Systems Initiative 8

Architecture top_env_cfg perf_mon_e nv_cfg Use Model Top Environment Performance Environment perf_mon_ cfg[ ] cfg DUT Adapter Agent Perf Trans perf_mon_n perf_mon_1 perf_mon_0 leaf_mon_sel leaf_mon_sel De- Multiplexer leaf_mon_sel De- Multiplexer De- Multiplexer cfg leaf_mon_n cfg leaf_mon_n leaf_mon_1 cfg cfg cfg leaf_mon _cfg[] cfg leaf_mon _cfg[] cfg leaf_mon_ cfg[ ] leaf_mon_n leaf_mon_1 leaf_mon_0 cfg leaf_mon_1 leaf_mon_0 leaf_mon_0 General Transaction cfg Accellera Systems Initiative 9

Architecture Hierarchy of Monitor The measurement over different interfaces and traffic patterns is taken care of by the hierarchical structure of the performance monitor Each level has its own configuration parameters Top level Environment Instantiated inside testbench Perf Top Env Performance monitor One performance monitor instantiated per interface of measurement Perf Mon Perf Mon Leaf monitor Basic entity that performs calculations One instance per traffic pattern, per interface Leaf Mon Leaf Mon Leaf Mon Accellera Systems Initiative 10

Architecture Performance Transaction Since the monitor can be used with different interfaces, a generic transaction type is needed Performance Transaction. The fields of which are: Field Id req_lat_start_time req_lat_end_time bw_start_time bw_end_time data_bytes Description Decides the leaf monitor that this transaction is intended for Stores the start time for latency calculation Stores the end time for latency calculation Stores start time for bandwidth calculation Stores end time for bandwidth calculation Stores the total number of bytes being transferred in a transaction User will update these fields upon the occurrence of required event This can be done inside any of the testbench components, where the required data will be available Passed to performance monitor through connecting port Accellera Systems Initiative 11

Sr. No. Configuration Configuration class based approach Instance of the configuration class is taken inside testbench CSV file based approach A pre-defined format of CSV file to configure 3 levels of hierarchy SEQUENCE NAME 1 ocp_axi_read_w rite LEVEL NUM OF PERF MON PERF MON NAME NUM OF TRANS TYPE LEAF MON ID MEASUREMENT TYPE L1 2 L2 OCP 2 L3 READ AVG_LATENCY + BANDWIDTH EXPECTED LATENCY EXPECTED PER_TRANS LATENCY LATENCY UNIT EXPECTED BANDWIDTH BANDWIDTH UNIT 48 48 ns 100 MBps L3 WRITE AVG_LATENCY + BANDWIDTH 48 48 ns 100 MBps L2 AXI 2 L3 READ AVG_LATENCY + BANDWIDTH L3 WRITE AVG_LATENCY + BANDWIDTH 48 48 ns 100 MBps 48 48 ns 100 MBps Accellera Systems Initiative 12

Measurements Supported Per Transaction Latency measurement Performed when traffic requirement is of guaranteed service Measurement over a number of transactions Gives the value of bandwidth and latency over a user defined window of transactions setup and hold transactions can be configured Measurement between two events Used when measurement window has to be defined by the total bytes to be transferred, instead of total number of transactions. Measurement over an alternate window To get the latency and bandwidth measurement over a window other than the configured window; helps to check the consistency of results Accellera Systems Initiative 13

Measurements Supported Cumulative Bandwidth measurement Helps if there is a requirement of different traffic patterns meeting a bandwidth requirement collectively Uniformity comparison If the measured value for any interface has to be compared against the other interface, uniformity comparison can be used Accellera Systems Initiative 14

Result Reporting Two types of reporting: Log file reporting Trace file reporting Configurable report verbosity Default Reporting : Summary of results reported in report_phase. This includes total windows, number of matching windows, average latency over all the windows among other details Reporting Level 1 : Run-time reporting of results for each window, in addition to default reporting Reporting Level 2 : Run-time reporting of results for each transaction, in addition to default and level 1 reporting Debug Reporting : Debug messages for all the relevant information, apart from above mentioned details Accellera Systems Initiative 15

Steps for Usage Integration Steps Apply configuration Instantiate in Testbench Generate Perf Trans Run Simulation Define requirement What to measure, where to measure Use this information to configure each hierarchy of monitor using config class instance or CSV based approach Include the package in testbench Create instance of Performance top Environment Connect the Perf Mon ports to testbench Create perf trans inside the testbench Use adapter class if required Pass the transaction to Perf Mon through connecting ports Run the simulations as usual Tabular reporting helps analysis The name of the monitor in report helps to identify erring traffic pattern, if any Accellera Systems Initiative 16

SoC Performance Requirement PCIe EP USB Performance Requirements Bulk In & Bulk Out System RAM Ethernet RGMII PCIe Descriptor Mem Read Descriptor Mem Write TX Eth Packet Mem Write, Avg. Latency Interconnect Ethernet RX Eth Packet Mem Write - 980Mbps, Avg. Latency 3us TX Packet from RAM USB Device CPU TX Packet from PCIe, Avg. Latency RX Packet from RAM RX Packet from PCIe, Avg. Latency DMA CPU PCIe Read/Write RAM Random Read/Write Scenario1: Start Event: Start of Trans 1 RAM Sequential Read/Write PCIe <-> Ethernet End Event: End of Trans 512 DMA TX Descriptor Read/Write Scenario2: DMA enabled traffic Start Event: DMA GO End Event: DMA DONE RX Descriptor Read/Write RX Packet Read/Write Accellera Systems Initiative 17

Summary The monitor is used in three different projects for different clients so far With ad-hoc methods, for a complex SoC with 20+ masters and 80+ slaves, building the set-up for performance verification took as long as a month. Apart from this, another month was spent achieving closure through testcase and debugging A derivative of same SoC was verified using Performance Monitor and complete performance verification closure was achieved in 3 weeks duration, with lesser resources This demonstrates a saving in the man-hours for the closure Accellera Systems Initiative 18

Questions Accellera Systems Initiative 19