A JIT Compiler for Android s Dalvik VM. Ben Cheng, Bill Buzbee May 2010



Similar documents
General Introduction

Example of Standard API

picojava TM : A Hardware Implementation of the Java Virtual Machine

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Chapter 3 Operating-System Structures

Replication on Virtual Machines

Instrumentation Software Profiling

How To Use Java On An Ipa (Jspa) With A Microsoft Powerbook (Jempa) With An Ipad And A Microos 2.5 (Microos)

Chapter 2 System Structures

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

NetBeans Profiler is an

Overview. The Android operating system is like a cake consisting of various layers.

Monitoring, Tracing, Debugging (Under Construction)

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Effective Java Programming. efficient software development

Running a Program on an AVD

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

Virtual Machine Learning: Thinking Like a Computer Architect

MOBILE APPS. QA Testing for mobile applications

Introducing the IBM Software Development Kit for PowerLinux

Virtualization. Dr. Yingwu Zhu

Chapter 3: Operating-System Structures. Common System Components

11.1 inspectit inspectit

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Armed E-Bunny: A Selective Dynamic Compiler for Embedded Java Virtual Machine Targeting ARM Processors

Android Geek Night. Application framework

System Structures. Services Interface Structure

Pentesting Android Apps. Sneha Rajguru

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices

A Practical Method to Diagnose Memory Leaks in Java Application Alan Yu

Debugging A MotoHawk Application using the Application Monitor

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

Monitoring and Managing a JVM

Performance Analysis of Android Platform

Eight Ways to Increase GPIB System Performance

Mobile Application Development Android

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

ZooKeeper. Table of contents

CSCI E 98: Managed Environments for the Execution of Programs

Guided Performance Analysis with the NVIDIA Visual Profiler

Mobile Performance Testing Approaches and Challenges

Android Programming and Security

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Introduction to Android

Compilers and Tools for Software Stack Optimisation

Open-Xchange Outlook OXtender

ELEC 377. Operating Systems. Week 1 Class 3

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Performance Measurement of Dynamically Compiled Java Executions

IBM SDK, Java Technology Edition Version 1. IBM JVM messages IBM

RoverPal - A Mobile Payment Application

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Mobile Devices - An Introduction to the Android Operating Environment. Design, Architecture, and Performance Implications

Holly Cummins IBM Hursley Labs. Java performance not so scary after all

Tool - 1: Health Center

Oracle Corporation Proprietary and Confidential

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Troubleshooting PHP Issues with Zend Server Code Tracing

THE BUSY DEVELOPER'S GUIDE TO JVM TROUBLESHOOTING

CS378 -Mobile Computing. Android Overview and Android Development Environment

Full and Para Virtualization

GraySort on Apache Spark by Databricks

Cloud Computing. Up until now

Eclipse Visualization and Performance Monitoring

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

As shown, the emulator instance connected to adb on port 5555 is the same as the instance whose console listens on port 5554.

Performance Monitoring of the Software Frameworks for LHC Experiments

Operating System Structures

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

XenMobile Logs Collection Guide

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Lab 4 In class Hands-on Android Debugging Tutorial

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

ROM Monitor. Entering the ROM Monitor APPENDIX

Dongwoo Kim : Hyeon-jeong Lee s Husband

The V8 JavaScript Engine

Module Title: Software Development A: Mobile Application Development

Mobile Performance Management Tools Prasanna Gawade, Infosys April 2014

VM Application Debugging via JTAG: Android TRACE32 JTAG Debug Bridge ADB Architecture Stop-Mode implications for ADB JTAG Transport Outlook

Evading Android Emulator

Practical Performance Understanding the Performance of Your Application

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

An Easier Way for Cross-Platform Data Acquisition Application Development

Performance Tuning and Optimizing SQL Databases 2016

Java Garbage Collection Basics

DevOps Best Practices for Mobile Apps. Sanjeev Sharma IBM Software Group

ANDROID BASED MOBILE APPLICATION DEVELOPMENT and its SECURITY

vmprof Documentation Release 0.1 Maciej Fijalkowski, Antonio Cuni, Sebastian Pawlus

Capacity Estimation for Linux Workloads

Availability Digest. Raima s High-Availability Embedded Database December 2011

MYPY: A PYTHON VARIANT WITH SEAMLESS DYNAMIC AND STATIC TYPING. Jukka Lehtosalo University of Cambridge Computer Laboratory

Instruction Set Architecture (ISA)

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

Transaction Performance Maximizer InterMax

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Transcription:

A JIT Compiler for Android s Dalvik VM Ben Cheng, Bill Buzbee May 2010

Overview View live session notes and ask questions on Google Wave: http://bit.ly/bizjnf Dalvik Environment Trace vs. Method Granularity JITs Dalvik JIT 1.0 Future directions for the JIT Performance Case Studies Profiling JIT d code Built-in Self-Verification Mode 3

Dalvik Execution Environment Virtual Machine for Android Apps See 2008 Google IO talk http://www.youtube.com/watch?v=ptjedozexpm Very compact representation Emphasis on code/data sharing to reduce memory usage Process container sandboxes for security 4

Dalvik Interpreter Dalvik programs consist of byte code, processed by a hostspecific interpreter Highly-tuned, very fast interpreter (2x similar) Typically less than 1/3rd of time spent in the interpreter OS and performance-critical library code natively compiled Good enough for most applications Performance a problem for compute-intensive applications Partial solution was the release of the Android Native Development Kit, which allows Dalvik applications to call out to statically-compiled methods Other part of the solution is a Just-In-Time Compiler Translates byte code to optimized native code at run time 5

A JIT for Dalvik - but what flavor of JIT? Surprisingly wide variety of JIT styles When to compile install time, launch time, method invoke time, instruction fetch time What to compile whole program, shared library, page, method, trace, single instruction Each combination has strengths & weaknesses - key for us was to meet the needs of a mobile, battery-powered Android device Minimal additional memory usage Coexist with Dalvik s container-based security model Quick delivery of performance boost Smooth transition between interpretation & compiled code 6

Method vs. Trace Granularity Method-granularity JIT Most common model for server JITs Interprets with profiling to detect hot methods Compile & optimize method-sized chunks Strengths Larger optimization window Machine state sync with interpreter only at method call boundaries Weaknesses Cold code within hot methods gets compiled Much higher memory usage during compilation & optimization Longer delay between the point at which a method goes hot and the point that a compiled and optimized method delivers benefits 7

Method vs. Trace Granularity Trace-granularity JIT Most common model for low-level code migration systems Interprets with profiling to identify hot execution paths Compiled fragments chained together in translation cache Strengths Only hottest of hot code is compiled, minimizing memory usage Tight integration with interpreter allows focus on common cases Very rapid return of performance boost once hotness detected Weaknesses Smaller optimization window limits peak gain More frequent state synchronization with interpreter Difficult to share translation cache across processes 8

Hot vs. Cold Code: system_server example Full Program 4,695,780 bytes Hot Methods 396,230 bytes 8% of Program Method JIT: Best optimization window Trace JIT: Best speed/space tradeoff Hot Traces 103,966 bytes 26% of Hot Methods 2% of Program 9

The Decision: Start with a Trace JIT Minimizing memory usage critical for mobile devices Important to deliver performance boost quickly User might give up on new app if we wait too long to JIT Leave open the possibility of supplementing with methodbased JIT The two styles can co-exist A mobile device looks more like a server when it s plugged in Best of both worlds Trace JIT when running on battery Method JIT in background while charging 10

Start Update profile count for this location Interpret until next potential trace head No Threshold? Yes Dalvik Trace JIT Flow Translation Cache Translation Translation Exit 0 Interpret/build trace request No Xlation exists? Yes Exit 0 Exit 1 Exit 1 Submit compilation request Compiler Thread Install new translation Translation Exit 0 Exit 1 11

Dalvik JIT v1.0 Overview Tight integration with interpreter Useful to think of the JIT as an extension of the interpreter Interpreter profiles and triggers trace selection mode when a potential trace head goes hot Trace request is built during interpretation Allows access to actual run-time values Ensures that trace only includes byte codes that have successfully executed at least once (useful for some optimizations) Trace requests handed off to compiler thread, which compiles and optimizes into native code Compiled traces chained together in translation cache 12

Dalvik JIT v1.0 Features Per-process translation caches (sharing only within security sandboxes) Simple traces - generally 1 to 2 basic blocks long Local optimizations Register promotion Load/store elimination Redundant null-check elimination Heuristic scheduling Loop optimizations Simple loop detection Invariant code motion Induction variable optimization 13

CPU-Intensive Benchmark Results Speedup relative to Dalvik Interpreter on Nexus One 6 4 2 0 Linpack BenchmarkPi CMark SciMark3 Checkers 300 200 100 JIT Total Memory Usage (in kbytes) Linpack, BenchmarkPI, CaffeineMark & Checkers from the Android Market Scimark 3 run from command-line shell Measurements taken on Nexus One running pre-release Froyo build in airplane mode 0 Linpack BenchmarkPi CMark SciMark3 Checkers 14

Future Directions Method in-lining Trace extension Persistent profile information Off-line trace coalescing Off-line method translation Tuning, tuning and more tuning 15

Solving Performance and Correctness Issues How much boost will an app get from the JIT? JIT can only remove cycles from the interpreter OProfile can provide the insight to breakdown the workload How resource-friendly/optimizing is the JIT? Again, OProfile can provide some high-level information Use a special Dalvik build to analyze code quality How to debug the JIT? Code generation vs optimization bugs Self-verification against the interpreter 16 Google Confidential

Case Study: RoboDefense Lots of actions 17 Google Confidential

Case Study: RoboDefense Performance gain from Dalvik capped at 4.34% Samples % Module 15965 73.98 libskia.so 2662 12.33 no-vmlinux 1038 4.81 libcutils.so 937 4.34 libdvm.so 308 1.42 libc.so 297 1.37 libglesv2_adreno200.so 18 Google Confidential

Case Study: Checkers JIT <3 Brain and Puzzle 965022 5231208 5.4x Speedup 19 Google Confidential

Case Study: Checkers Use OProfile to explain the speedup Samples % Module 975 93.57 dalvik-jit-code-cache 97% 96.45% 30 2.88 libdvm.so 3% 28 2.69 no-vmlinux 4 0.38 libc.so 3 0.09 libglesv2_adreno200.so 20 Google Confidential

Solving Performance and Correctness Issues Part 2/3 How much boost will an app get from the JIT? How resource-friendly/optimizing is the JIT? How to debug the JIT? 21 Google Confidential

Peek into the Code Cache Land kill -12 <pid> Example from system_server (20 minutes after boot) 9898 compilations using 796264 bytes 80 bytes / compilation Code size stats: 103966/396230 (trace/method Dalvik) 796264 / 103966 = 7.7x code bloat from Dalvik to native Total compilation time: 6024 ms Average unit compilation time: 609 µs 22 Google Confidential

JIT Profiling Set dalvik.vm.jit.profile = true in /data/local.prop count % offset (# insn), line method signature 15368 1.15 0x0(+2), 283 13259 1.00 0x18(+2), 858 13259 1.00 0x22(+2), 857 11842 0.89 0x5(+2), 183 11827 0.89 0x0(+2), 183 11605 0.87 0x30(+3), 892 Ljava/util/HashMap;size;()I Lcom/android/internal/os/ BatteryStatsImpl;readKernelWakelockStats;()Ljava/util/Map; Lcom/android/internal/os/ BatteryStatsImpl;readKernelWakelockStats;()Ljava/util/Map; Ljava/util/HashSet;size;()I Ljava/util/HashSet;size;()I Lcom/android/internal/os/BatteryStatsImpl;parseProcWakelocks; ([BI)Ljava/util/Map; 23 Google Confidential

Solving Performance and Correctness Issues Part 3/3 How much boost will an app get from the JIT? How resource-friendly/optimizing is the JIT? How to debug the JIT? 24 Google Confidential

Guess What s Wrong Here A codegen bug is deliberately injected to the JIT E/AndroidRuntime( 84): *** FATAL EXCEPTION IN SYSTEM PROCESS: android.server.serverthread E/AndroidRuntime( 84): java.lang.runtimeexception: Binary XML file line #28: You must supply a layout_width attribute. E/AndroidRuntime( 84): at android.content.res.typedarray.getlayoutdimension(typedarray.java: 491) E/AndroidRuntime( 187): *** FATAL EXCEPTION IN SYSTEM PROCESS: E/AndroidRuntime( 84): at android.view.viewgroup $LayoutParams.setBaseAttributes(ViewGroup.java:3592) WindowManager E/AndroidRuntime( 187): java.lang.arrayindexoutofboundsexception E/AndroidRuntime( 187): 661) E/AndroidRuntime( 187): : at java.util.gregoriancalendar.computefields(gregoriancalendar.java: E/AndroidRuntime( 435): *** FATAL at java.util.calendar.complete(calendar.java:807) EXCEPTION IN SYSTEM PROCESS: android.server.serverthread E/AndroidRuntime( 435): java.lang.stackoverflowerror E/AndroidRuntime( 435): at java.util.hashtable.get(hashtable.java:267) E/AndroidRuntime( 435): at java.util.propertyresourcebundle.handlegetobject(propertyresourcebundle.java:120) : 25 Google Confidential

Debugging and Verification Tools Byte code binary search Call graph filtering Self-verification w/ the interpreter Code generation Optimization 26 Google Confidential

Bugs == Incorrect Machine States Heap, stack, and control-flow Shadow Heap Addr Data == Heap Shadow Stack == Stack PC == PC 27 Google Confidential

Step-by-Step Debugging under Self-Verification Divergence detected ~~~ DbgIntp(8): REGISTERS DIVERGENCE! ********** SHADOW STATE DUMP ********** CurrentPC: 0x42062d24, Offset: 0x0012 Class: Ljava/lang/Character; Method: touppercase Dalvik PC: 0x42062d1c endpc: 0x42062d24 Interp FP: 0x41866a3c endfp: 0x41866a3c Shadow FP: 0x22c330 endfp: 0x22c330 Frame1 Bytes: 8 Frame2 Local: 0 Bytes: 0 Trace length: 2 State: 0 28 Google Confidential

Step-by-Step Debugging under Self-Verification Divergence details ********** SHADOW TRACE DUMP ********** 0x42062d1c: (0x000e) const/16 0x42062d20: (0x0010) if-ge *** Interp Registers: (v0) 0x b5 X (v1) 0x 55 *** Shadow Registers: (v0) 0x b6 X (v1) 0x 55 29 Google Confidential

Step-by-Step Debugging under Self-Verification Replay the compilation with verbose dump Compiler: Building trace for touppercase, offset 0xe 0x42062d1c: 0x0013 const/16 v0, #181 0x42062d20: 0x0035 if-ge v1, v0, #4 TRACEINFO (141): 0x42062d00 Ljava/lang/Character;toUpperCase -------- dalvik offset: 0x000e @ const/16 v0, #181 0x2 (0002): ldr r1, [r5, #4] 0x4 (0004): mov r0, #182 -------- dalvik offset: 0x0010 @ if-ge v1, v0, #4 0x6 (0006): cmp r1, r0 0x8 (0008): str r0, [r5, #0] 0xa (000a): bge 0x00000014 30 Google Confidential

Summary A resource friendly JIT for Dalvik Small memory footprint Significant speedup improvement delivered 2x ~ 5x performance gain for computation intensive workloads More optimizations waiting in the pipeline Enable more computation intensive apps Verification bot Dynamic code review by the interpreter 31 Google Confidential

Q&A http://bit.ly/bizjnf 32