Multi-/Many-core Modeling at Freescale David Murrell, Jim Holt, Michele Reese
Perspective: Virtual Platform ROI Schedules: SW and HW available on day 1 becoming an industry expectation FSL P4080 Linux kernel boots on 1 st silicon, within 2 days of receiving DS boards MOT P4080 Software development enabled 1 year before silicon XBOX360 and PS3 OS kernels boot on 1 st silicon within 2 weeks of receiving first eval boards (IBM s Mambo software VP) Growing demand from customers Evidenced by recent trends in industry strategic acquisitions Quality: VP is the proving ground for complex HW/SW interactions Both HW and SW are improved while still in formative stages (it s not just that SW has a running start) functionality, usability and performance Gain insight into pathological effects, and intercept them before solutions become costly Typically 20-30 studies conducted per critical path IP block Verified bit and cycle-level accuracy is essential 2
Virtual Platform Consumers Ecosystem development Debuggers, Tools/Partners Verification & Validation Pre-Si test bringup Reference Architectural Design/ Exploration μ-arch analysis Marketing/ FAEs Virtual Platforms NPI Next Generation Definition Customer SW bringup Demos Performance assurance Bake-offs Tradeoff analysis Primary Use Case Proof/bench development Performance Analysis PRL assessment Functional Performance Hybrid Pre-Si BSP/SDK bringup μcode development Virtual DS Software Enablement 3
Concept thru NPI Execution ALE ALF Planning TO Production Func α Functional β Functional Cycle CornerStone Exploration α Performance β Performance Plat Trace Linux Cornerstone Platform Bare Metal Platform Linux System Platform 4
Technology Improvement Strategy Scale UP Continue to optimize single-thread performance Continue to leverage JIT (DBT) Technologies OK for partial core cluster configurations Scale OUT Migrate to distributed simulation platforms: multi-cores and multi-machine clusters Explore transition to COREMU (or similar) Assess the potential of the MIT Graphite platform Possibilities to facilitate native execution for dedicated sim farms FSL s VortiQa U Inexpensive (low $100 s per compute node) 32-bit e500v2 (with SPE) configuration (P2020U) 64-bit e5500 configuration (P5020U) Combine with Graphite for multi-u simulation cluster Port PIN or DynamoRIO to support Power ISA
What we ve done with graphite so far We ve been using it in the context of the Angstrom project Ported graphite to Redhat Linux Learned overall system architecture & how to add syscalls Lesson learned: Syscalls - when you bring new application code into graphite it will sometimes include unsupported syscalls If you see runtime message from graphite: Unhandled syscall number ### then you can typically identify the offending syscall by going here and looking up the reported number: http://asm.sourceforge.net/syscall.html Currently working on a cycle & power accurate Angstrom tile model using Freescale e200 cores goal is to integrate with graphite for Angstrom related research
Porting to Redhat Linux In common/makefile.common change KERNEL to ETCH Remove -Werror flag from all Makefiles (there are several warnings about classes with virtual functions that do not have virtual destructors) There are a couple of instance of "invalid use of sizeof operator" in pin/handle_syscalls.cc, these are easy to fix In common/misc/moving_average.h call of overloaded 'pow' is ambiguous. * line 120 changed UInt32 curr_window_size to Int curr_window_sizexxx
BACKUP
Customer Expectations: Functional Virtual Platform High speed functional execution (programmer s view of the system): Model the behavior to arrive at the correct result 10 s of MIPS per core at a minimum Evaluation board replacement: Functional fidelity: firmware, OS, drivers, and applications run without modification Linux console Enhanced debug environment: Source-level: CodeWarrior, GDB, etc. Low-level: core/device registers, TLB s, disassembly, memory Rich breakpoint/watchpoint support Event callbacks (triggers) Full system checkpoint/restore Python command shell Linux/Windows-32/64 hosts 9
Customer Expectations: Performance Virtual Platform System-level cycle accurate models Model the behavior and the number of cycles consumed by the operation Cores, caches, memory controllers, interconnects, accelerators, data path, devices Executes un-modified binaries and traces Fast-forward to points of interest via checkpoints and/or gear-shifting Verified cycle-timing fidelity Accurate to within 10% of hardware logic Models micro-architectural structures and policies Comprehensive set of system performance metric data Rich data visualization Source-code traceability (multicore and hypervisor aware) Custom plots System activity heat maps Live display and data replay Flexible instrumentation and control points Event breakpoints Event callbacks (triggers) Application control of the simulator via speciallyencoded instructions 10
Performance Analysis and Design Exploration Flow Perf Model Traces Perf Model Pre-ALF Measurements Experimental Design Alternatives Cornerstone, Bare Metal and Linux Platforms PRL Linux Linux EGFE Analysis Plan Select Workloads Func Model Port, Test, and Debug Emulator Post-ALF Measurements Analyze Data Perf Model Investigate Anomalies Design Improvements Bare Metal and Linux Platforms Eval Boards Post-Si Measurements Summary Analysis Report S&A S&A: SoC Systems, S&A + PcP: Core+MSS 11
Verification Flow RTL Simulator Execute Test Cases Specs Verification Plan Develop Test Cases Func Model Debug Test Cases Test Suites Emulator Execute Test Cases Compare Function Verify Bit Accuracy Timing Calibration Bare Metal Platform Eval Boards Execute Test Cases Model Fixes Bare Metal Platform HW Verification Flow Verif T IP Model Verification Addendum S&A 12
Software Development Flow Linux SW PRL Develop Func Model Debug and Test Function Debug and Test Timing Race Conditions & Optimize Function Debug and Test Timing Race Conditions & Optimize FM Microcode Bare Metal Platform Linux Platform Hypervisor Drivers USDPAA Linux VortiQa IP SW IP: Drivers & Microcode 13
Embedded Reference Flow Petra Petra Stampede Func Model Test and Debug Func Model Reference Petra Platform Bare Metal Platform Eval Board 1 2 3 4 a1 a2 a3 a4 0 Vcc1 Si GND 0 b1 b2 b3 b4 5 6 7 8 T 14