HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices



Similar documents
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

LaTTe: An Open-Source Java VM Just-in Compiler

02 B The Java Virtual Machine

CSC 8505 Handout : JVM & Jasmin

Armed E-Bunny: A Selective Dynamic Compiler for Embedded Java Virtual Machine Targeting ARM Processors

Virtual Machine Learning: Thinking Like a Computer Architect

picojava TM : A Hardware Implementation of the Java Virtual Machine

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

Chapter 7D The Java Virtual Machine

Inside the Java Virtual Machine

Wiggins/Redstone: An On-line Program Specializer

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire

Under the Hood: The Java Virtual Machine. Lecture 24 CS 2110 Fall 2011

Programming the Android Platform. Logistics

Habanero Extreme Scale Software Research Project

Introduction to Programming

Jonathan Worthington Scarborough Linux User Group

The Java Virtual Machine (JVM) Pat Morin COMP 3002

Trace-based Just-in-Time Type Specialization for Dynamic Languages

University of Twente. A simulation of the Java Virtual Machine using graph grammars

General Introduction

Virtual Machine Showdown: Stack Versus Registers

HVM TP : A Time Predictable and Portable Java Virtual Machine for Hard Real-Time Embedded Systems JTRES 2014

A JIT Compiler for Android s Dalvik VM. Ben Cheng, Bill Buzbee May 2010

Precise and Efficient Garbage Collection in VMKit with MMTk

Instruction Folding in a Hardware-Translation Based Java Virtual Machine

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Java Programming. Binnur Kurt Istanbul Technical University Computer Engineering Department. Java Programming. Version 0.0.

Static Analysis of Virtualization- Obfuscated Binaries

Write Barrier Removal by Static Analysis

Lecture 32: The Java Virtual Machine. The Java Virtual Machine

Types of Separated Java Bytecode

What is Software Watermarking? Software Watermarking Through Register Allocation: Implementation, Analysis, and Attacks

Java Virtual-Machine Support for Portable Worst-Case Execution-Time Analysis

Java Program Coding Standards Programming for Information Technology

Introduction to Java

Instruction Set Architecture (ISA) Design. Classification Categories

Prototyping Faithful Execution in a Java Virtual Machine

Linear Scan Register Allocation on SSA Form

Eclipse Visualization and Performance Monitoring

Tachyon: a Meta-circular Optimizing JavaScript Virtual Machine

Checking Access to Protected Members in the Java Virtual Machine

Built-in Concurrency Primitives in Java Programming Language. by Yourii Martiak and Mahir Atmis

Introduction to programming

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

Effective Java Programming. efficient software development

Can You Trust Your JVM Diagnostic Tools?

Reducing Transfer Delay Using Java Class File Splitting and Prefetching

VLIW Processors. VLIW Processors

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

Optimizing compilers. CS Modern Compilers: Theory and Practise. Optimization. Compiler structure. Overview of different optimizations

Programming Language Pragmatics

How To Trace

Compilers and Tools for Software Stack Optimisation

Cloud Computing. Up until now

Using inter-procedural side-effect information in JIT optimizations

Using jvmstat and visualgc to Solve Memory Management Problems

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

Performance Tools for Parallel Java Environments

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

Replication on Virtual Machines

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

CSCI E 98: Managed Environments for the Execution of Programs

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

Briki: a Flexible Java Compiler

How accurately do Java profilers predict runtime performance bottlenecks?

1 The Java Virtual Machine

DISK: A Distributed Java Virtual Machine For DSP Architectures

C# and Other Languages

Instruction Set Architecture (ISA)

On Demand Loading of Code in MMUless Embedded System

A Scalable Technique for Characterizing the Usage of Temporaries in Framework-intensive Java Applications

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment

An Exception Monitoring System for Java

Transcription:

HOTPATH VM An Effective JIT Compiler for Resource-constrained Devices based on the paper by Andreas Gal, Christian W. Probst and Michael Franz, VEE 06

INTRODUCTION

INTRODUCTION Most important factor: speed

INTRODUCTION Most important factor: speed Overlooked: power consumption, memory footprint

INTRODUCTION Most important factor: speed Overlooked: power consumption, memory footprint Embedded JIT compilers (simpler algorithms)

JAMVM

JAMVM GNU classpath-based

JAMVM GNU classpath-based Own intermediate representation

JAMVM GNU classpath-based Own intermediate representation HotpathVM as add-on

JAMVM GNU classpath-based Own intermediate representation HotpathVM as add-on Integrated w/ 20 lines of code

HOTPATHVM

HOTPATHVM 150 kb memory footprint

HOTPATHVM 150 kb memory footprint Searching for hot code by tracing

HOTPATHVM 150 kb memory footprint Searching for hot code by tracing Cold code for interpreter

HOTPATHVM 150 kb memory footprint Searching for hot code by tracing Cold code for interpreter Optimizing and compiling traces

TRACE SELECTION

TRACE SELECTION Problem: bytecode is not purely sequential

TRACE SELECTION Problem: bytecode is not purely sequential Solution: Software Trace Scheduling

TRACE SELECTION Problem: bytecode is not purely sequential Solution: Software Trace Scheduling Limited to loops

IDENTIFYING LOOP HEADERS

IDENTIFYING LOOP HEADERS Backward jump record destination

IDENTIFYING LOOP HEADERS Backward jump record destination Assumption: bytecode is forward

IDENTIFYING LOOP HEADERS Backward jump record destination Assumption: bytecode is forward Counter for backward jumps

IDENTIFYING LOOP HEADERS Backward jump record destination Assumption: bytecode is forward Counter for backward jumps Saved in intermediate representation

IDENTIFYING LOOP HEADERS Backward jump record destination Assumption: bytecode is forward Counter for backward jumps Saved in intermediate representation Record after threshold is exceeded

TRACE EXAMPLE

TRACE EXAMPLE public static void main(string args[]) { int i, k = 0; for (i = 0; i < 1000, ++i) ++k; System.out.println(k); }

TRACE EXAMPLE public static void main(string args[]) { int i, k = 0; for (i = 0; i < 1000, ++i) ++k; System.out.println(k); } A: B: iconst_0 istore_2 iconst_0 istore_1 iload_1 sipush 1000 if_icmpge B iinc 2,1 iinc 1,1 goto A getstatic System.out iload_2 invokevirtual println(int) return

TRACE EXAMPLE public static void main(string args[]) { int i, k = 0; for (i = 0; i < 1000, ++i) ++k; hotspot System.out.println(k); } A: B: iconst_0 istore_2 iconst_0 istore_1 iload_1 sipush 1000 if_icmpge B iinc 2,1 iinc 1,1 goto A getstatic System.out iload_2 invokevirtual println(int) return

TRACE EXAMPLE public static void main(string args[]) { int i, k = 0; for (i = 0; i < 1000, ++i) ++k; hotspot System.out.println(k); } A: B: iconst_0 istore_2 iconst_0 istore_1 iload_1 sipush 1000 if_icmpge B iinc 2,1 hotpath iinc 1,1 goto A getstatic System.out iload_2 invokevirtual println(int) return

RECORDING TRACES

RECORDING TRACES Meta block after each instruction

RECORDING TRACES Meta block after each instruction Second stop-loss threshold

RECORDING TRACES Meta block after each instruction Second stop-loss threshold No overhead without tracer

RECORDING TRACES Meta block after each instruction Second stop-loss threshold No overhead without tracer Guards for branches

GUARDS

GUARDS Ensures following the hotpath

GUARDS Ensures following the hotpath Returns control to VM on failure

GUARDS Ensures following the hotpath Returns control to VM on failure Reset values from intermediate representation

BRANCHES

BRANCHES Multiple hot paths in loop

BRANCHES Multiple hot paths in loop Lazy recording

BRANCHES Multiple hot paths in loop Lazy recording Secondary traces (like Dynamo)

BRANCHES Multiple hot paths in loop Lazy recording Secondary traces (like Dynamo) Updates guard instruction

BRANCHES Multiple hot paths in loop Lazy recording Secondary traces (like Dynamo) Updates guard instruction

STOP CONDITIONS

STOP CONDITIONS Successful recording

STOP CONDITIONS Successful recording Aborted recording on rare events

STOP CONDITIONS Successful recording Aborted recording on rare events Exceptions

STOP CONDITIONS Successful recording Aborted recording on rare events Exceptions Native method invocation

COMPILING TRACES

COMPILING TRACES Compiling complete traces only

COMPILING TRACES Compiling complete traces only Assumption: trace is executed repeatedly

COMPILING TRACES Compiling complete traces only Assumption: trace is executed repeatedly

COMPILING TRACES Compiling complete traces only Assumption: trace is executed repeatedly Stack Deconstruction

COMPILING TRACES Compiling complete traces only Assumption: trace is executed repeatedly Stack Deconstruction Code Analysis

COMPILING TRACES Compiling complete traces only Assumption: trace is executed repeatedly Stack Deconstruction Code Analysis Code Generation

STACK DECONSTRUCTION

STACK DECONSTRUCTION Generates SSA-based IR

STACK DECONSTRUCTION Generates SSA-based IR φ pseudo instruction

STACK DECONSTRUCTION x = 5 Generates SSA-based IR if x > 5 φ pseudo instruction x = 5 x = 7 x = 7

STACK DECONSTRUCTION x1 x = 5 Generates SSA-based IR if x1 x > 5 φ pseudo instruction x2 x = 5 x3 x = 7 x4 = x φ(x2,x3) = 7

STACK DECONSTRUCTION x1 x = 5 Generates SSA-based IR if x1 x > 5 φ pseudo instruction Flags loop invariant code x2 x = 5 x3 x = 7 x4 = x φ(x2,x3) = 7

STACK DECONSTRUCTION x1 x = 5 Generates SSA-based IR if x1 x > 5 φ pseudo instruction Flags loop invariant code x2 x = 5 x3 x = 7 x4 = x φ(x2,x3) = 7

CODE ANALYSIS

CODE ANALYSIS Constant Propagation

CODE ANALYSIS Constant Propagation Loop Peeling

CODE ANALYSIS Constant Propagation Loop Peeling Common subexpression elimination

CODE ANALYSIS Constant Propagation Loop Peeling Common subexpression elimination Invariant code motion

CODE GENERATION

CODE GENERATION Generating code backwards

CODE GENERATION Generating code backwards Enables to emit special machine code

CODE GENERATION Generating code backwards Enables to emit special machine code Constant Folding

SIDE EXITS

SIDE EXITS Leaving the hot path

SIDE EXITS Leaving the hot path Generator generates compensation code

SIDE EXITS Leaving the hot path Generator generates compensation code Special case: inline methods

SIDE EXITS Leaving the hot path Generator generates compensation code Special case: inline methods Writing back current state

WALKTHROUGH

WALKTHROUGH Java Code

WALKTHROUGH compilation Java Code

WALKTHROUGH compilation Java Code Bytecode

WALKTHROUGH Bytecode

WALKTHROUGH stack deconstruction Bytecode

WALKTHROUGH stack deconstruction Bytecode SSA

WALKTHROUGH stack deconstruction Bytecode SSA Flag invariant code

WALKTHROUGH SSA

WALKTHROUGH analysis optimization SSA

WALKTHROUGH analysis optimization SSA SSA

WALKTHROUGH analysis optimization SSA SSA Constant Propagation Loop Peeling Common subexpression elimination Invariant code motion

WALKTHROUGH SSA

WALKTHROUGH generation SSA

WALKTHROUGH generation SSA Machine code

WALKTHROUGH generation SSA Machine code Constant Folding

BENCHMARKS

BENCHMARKS 250 200 execution time in s 150 100 50 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS 250 200 execution time in s 150 100 50 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS 250 200 execution time in s 150 100 50 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS 250 200 execution time in s 150 100 50 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS 250 200 execution time in s 150 100 50 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS

BENCHMARKS speedup factor 20 18 16 14 12 10 8 6 4 2 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS speedup factor 20 18 16 14 12 10 8 6 4 2 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS speedup factor 20 18 16 14 12 10 8 6 4 2 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS speedup factor 20 18 16 14 12 10 8 6 4 2 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

BENCHMARKS speedup factor 20 18 16 14 12 10 8 6 4 2 0 LU SOR FFT SCR Linpack NumSort MonteCarlo Java VM JamVM HotpathVM Hotspot VM

OUTLOOK

OUTLOOK Trace Merging

OUTLOOK Trace Merging Support additional architectures (ARM)

OUTLOOK Trace Merging Support additional architectures (ARM) Use outside embedded environment

HOTPATH VM An Effective JIT Compiler for Resource-constrained Devices