Worst Case Execution Time Prediction

Similar documents
Worst-Case Execution Time Prediction by Static Program Analysis

VLIW Processors. VLIW Processors

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER


Towards Model-Driven Development of Hard Real-Time Systems

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Instruction Set Design

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Timing Analysis of Real-Time Software

A Lab Course on Computer Architecture

Abstract Interpretation-based Static Analysis Tools:

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract

Computer Architecture TDTS10

Introduction to Cloud Computing

Making Dynamic Memory Allocation Static To Support WCET Analyses

Real-Time Software. Basic Scheduling and Response-Time Analysis. René Rydhof Hansen. 21. september 2010

The Worst-Case Execution Time Problem Overview of Methods and Survey of Tools

IA-64 Application Developer s Architecture Guide

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

GameTime: A Toolkit for Timing Analysis of Software

PROBLEMS #20,R0,R1 #$3A,R2,R4

The Microarchitecture of Superscalar Processors

Wiggins/Redstone: An On-line Program Specializer

CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Real-time Java Processor for Monitoring and Test

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Antonio Kung, Trialog. HIJA technical coordinator. Scott Hansen, The Open Group. HIJA coordinator

Multi-GPU Load Balancing for Simulation and Rendering

From Control Loops to Software

HVM TP : A Time Predictable and Portable Java Virtual Machine for Hard Real-Time Embedded Systems JTRES 2014

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Load Balancing & Termination

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

APPROACHES TO SOFTWARE TESTING PROGRAM VERIFICATION AND VALIDATION

Performance Analysis and Optimization Tool

ICS Principles of Operating Systems

A Unified View of Virtual Machines

The Design of the Inferno Virtual Machine. Introduction

Operating Systems. III. Scheduling.

Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Hardware/Software Co-Design of Run-Time Schedulers for Real-Time Systems

x86 ISA Modifications to support Virtual Machines

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

SOC architecture and design

11.1 inspectit inspectit

Pipelining Review and Its Limitations

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Real Time Programming: Concepts

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Architectures and Platforms

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

CPU Scheduling. CPU Scheduling

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Performance Analysis of Distributed Embedded Systems

Putting it all together: Intel Nehalem.

Precision-Timed (PRET) Machines

On Demand Loading of Code in MMUless Embedded System

MAQAO Performance Analysis and Optimization Tool

Computer System Design. System-on-Chip

Know or Go Practical Quest for Reliable Software

System Virtual Machines

Low Power AMD Athlon 64 and AMD Opteron Processors

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

Real Time Scheduling Basic Concepts. Radek Pelánek

Static Scheduling. option #1: dynamic scheduling (by the hardware) option #2: static scheduling (by the compiler) ECE 252 / CPS 220 Lecture Notes

Computer Organization

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Lecture 3 Theoretical Foundations of RTOS

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Workload Models for System-Level Timing Analysis: Expressiveness vs. Analysis Efficiency

TIME PREDICTABLE CPU AND DMA SHARED MEMORY ACCESS

picojava TM : A Hardware Implementation of the Java Virtual Machine

Embedded Systems. 6. Real-Time Operating Systems

Multithreading Lin Gao cs9244 report, 2006

Putting Checkpoints to Work in Thread Level Speculative Execution

EECS 583 Class 11 Instruction Scheduling Software Pipelining Intro

EE361: Digital Computer Organization Course Syllabus

Java Virtual-Machine Support for Portable Worst-Case Execution-Time Analysis

Notes and terms of conditions. Vendor shall note the following terms and conditions/ information before they submit their quote.

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Weighted Total Mark. Weighted Exam Mark

CHAPTER 7: The CPU and Memory

Introducción. Diseño de sistemas digitales.1

Transcription:

Worst Case Execution Time Prediction

2 Hard Real-Time Systems Controllers in planes, cars, plants, are expected to finish their tasks within reliable time bounds.

3 Timing Validation Schedulability analysis has to show that all timing requirements will be met Takes into account: System design (event based, time triggered,...) Outside world (maximal arrival rates, minimal interval between events,...) Scheduling policy (round robin, RMA, time triggered, RTOS,...)... All results from the Scheduling Theory require the Worst Case Execution Time (WCET) of the tasks to be known

4 Certification Certificates Stand alone tool for e.g. TÜVs To proof timings to obtain certificates from TÜVs Certificate: Program terminates in 82 ms on MicroS

Support during SW-Development 5 Loop L31: L31: IC=191 1910 h 382 382 m 56: 56: I-miss 57: 57: I-hit I-hit.. _main: WCET 166.2 ms ms BCET 157.0 ms ms

Modern Hardware 6 Multiple memories, caches, pipelines, branch prediction, Performance depends on execution history. This makes the prediction difficult No information means: assume the worst Switching off caching reduces performance by a factor of 30 (EADS study)

General Problems with State Based Processor Features 7 Problem: Modern hardware <=> predictability of execution time Software monitoring, dual loop benchmark, direct measurement with logic analyzer, hardware simulation are no longer generally applicable. Choosing the fastest available processor, praying, or crossing fingers is not a true alternative.

A Traditional Approach 8 Partition the application into code snippets, Determine their worst-case inputs, Measure their runtime with these inputs, Combine these results to find the worst-case path and its runtime. Error-prone and expensive!

Some Architectural Challenges 9 The empty cache is not necessarily the worst case cache The global round robin counter/plru state bits can be changed by interrupt routines Unified cache => instructions and data interfere => measurements for all possibilities of interference necessary A cache miss is not necessarily the worst case

10 Solution: Static WCET Analysis The WCET analyzer computes save upper bounds of the execution time of the tasks in a program for all inputs Static program analysis based on Abstract Interpretation The analysis design is proven to be correct

11 WCET Analyzer Input: an executable program, starting points, loop iteration counts, call targets of indirect function calls, and a description of bus and memory speeds Computes Worst-Case Execution Time bounds of tasks

Scope 12 The WCET analyzer assumes no interference from the outside. Effects of interrupts, IO, timers, other (co-) processors are not reflected in the predicted runtime and have to be considered separately.

Input 13 An executable (e.g. in ELF or COFF format). User annotations Call targets for all indirect function calls Upper bounds on the iteration counts of all loops (and recursions) A description of the (external) memories and buses (i.e. a list of memory areas with minimal and maximal access times) A task To be provided once for a new board Can be arbitrarily selected by a start address or a function name A task denotes a sequentially executed piece of code (no threads, no parallelism, no waiting for external events) The code of a task is compiled by a C-compiler from a restricted subset of ANSI-C (no dynamic data structures, no setjmp/longjmp)

Call Graph Calls contributing to the WCET are red 14

Control Flow Graph Worst case path is red 15

Cycle-Wise Evolution of Cache/ Pipeline States 16

Cache/ Pipeline State 17

Overall Structure 18 Executable program CFG Builder Loop Trafo CRL File Static Analyses Path Analysis Value Analyzer Cache/Pipeline Analyzer AIP File PER File ILP-Generator LP-Solver Evaluation Loop bounds WCET Visualization

The ColdFire Pipeline 19 Fetch Pipeline of 4 stages Instruction Address Generation (IAG) Instruction Fetch Cycle 1 (IC1) Instruction Fetch Cycle 2 (IC2) Instruction Early Decode (IED) Instruction Buffer (IB) for 8 instructions Execution Pipeline of 2 stages Decoding and register operand fetching (1 cycle) Memory access and execution (1 many cycles)

20 Pipeline Model

PPC755 Pipeline 21 Complex branch prediction Superscalar: up to two instructions per cycle dispatched and retired Out-of-order execution Integer instructions, Floating point instructions and Loads may be reordered Speculative execution After predicted branches, instructions are executed speculatively Speculative loads may occur

Pipeline of the PPC755 22

Pipeline Model 23

24 Analysis of Loops Loops are analyzed like procedures This allows for Virtual inlining Virtual unrolling Better address resolution Burst accesses Selectable precision Optional user constraints

Limitations 25 Well behaved code (ABI) No exceptions Only aligned accesses No VM settings Some instructions not to be used

26 Advantages The WCET analyzer allows you to: inspect the timing behavior of (timing critical parts of) your code The analysis results are determined without the need to change the code hold for all executions (for the intrinsic cache and pipeline behavior)

Advantages 27 The results are precise The computation is fast The WCET analyzer is easy to use The WCET analyzer works on optimized code The WCET analyzer saves development time by avoiding the tedious and time consuming (instrument and) execute (or emulate) and measure cycle for a set of inputs over and over again

Planned Extensions 28 Support for code generators E.g. recognition of generated patterns to avoid the need of user annotations on generated code Automatic detection of the number of loop iterations in the executable Further targets

29 Support for a New Processor Model Front end for executables Modeling of the pipeline according to the documentation (Modeling of the cache) Verification of the pipeline model on the real hardware or reliable emulator

References LCTRTS 97. Ferdinand, Martin, and Wilhelm. Applying Compiler Techniques to Cache Behavior Prediction. RTSS 98. Theiling and Ferdinand. Combining Abstract Interpretation and ILP for Microarchitecture Modeling and Program Path Analysis. STTT Journal. Martin. PAG -- An Efficient Program Analyzer Generator LCTES 99. Schneider and Ferdinand. Pipeline Behavior Prediction for Superscalar Processors by Abstract Interpretation. RTS Journal. Ferdinand and Wilhelm. Efficient and Precise Cache Behavior Prediction for Real-Time Systems. RTS Journal. Kästner and Thesing. Cache-Aware Pre-Runtime Scheduling. RTS Journal. Theiling and Ferdinand. Fast and Precise WCET Prediction by Separated Cache and Path Analysis. RTSS 00. Schneider. Cache and Pipeline Sensitive Fixed Priority Scheduling for Preemptive Real-Time Systems. RTCSA 00. Theiling. Extracting Safe and Precise Control Flow from Binaries LCTES 01. Theiling. Generating Decision Trees for Decoding Binaries EMSOFT 01. Ferdinand et al., Reliable and Precise WCET Determination for a Real-Life Procesor MSP 02. Heckmann. The Influence of Replacement Strategy on Static Prediction of Cache Contents ECRTS 02. Thesing, Langenbach, Heckmann. Pipeline Behavior Prediction Analysis for Real-Time Systems by Pipeline Modeling EMSOFT 02. Theiling. ILP-based Interprocedural Path Analysis SAS 02. Langenbach, Thesing, Heckmann. Pipeline Modeling for Timing Analysis WCET 02. Langenbach, Ferdinand, Wilhelm. Worst Case Execution Time Prediction 30

email: info@absint.com http://www.absint.com