Android Renderscript. Stephen Hines, Shih-wei Liao, Jason Sams, Alex Sakhartchouk srhines@google.com November 18, 2011



Similar documents
Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Glossary of Object Oriented Terms

Comparison and Analysis of the Three Programming Models in Google Android

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Lecture 1 Introduction to Android

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Android Development: a System Perspective. Javier Orensanz

Towards OpenMP Support in LLVM

LLVM for OpenGL and other stuff. Chris Lattner Apple Computer

Specialized Android APP Development Program with Java (SAADPJ) Duration 2 months

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

Freescale Semiconductor, I

To Java SE 8, and Beyond (Plan B)

Jonathan Worthington Scarborough Linux User Group

The programming language C. sws1 1

High Performance or Cycle Accuracy?

The C Programming Language course syllabus associate level

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Using EDA Databases: Milkyway & OpenAccess

picojava TM : A Hardware Implementation of the Java Virtual Machine

AP Computer Science Java Subset

CS506 Web Design and Development Solved Online Quiz No. 01

General Introduction

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Android Architecture. Alexandra Harrison & Jake Saxton

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

WebView addjavascriptinterface Remote Code Execution 23/09/2013

Getting Started with the Internet Communications Engine

Mobile Application Development 2014

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Android Basics. Xin Yang

Crash Course in Java

Publishing to TIZEN Using the Automated Conversion/Repackaging of Existing Android Apps. Hyeokgon Ryu, Infraware Technology, Ltd.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

Android Programming and Security

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

Enhanced Project Management for Embedded C/C++ Programming using Software Components

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Optimizing AAA Games for Mobile Platforms

Mobile Application Development Android

Overview. The Android operating system is like a cake consisting of various layers.

The "Eclipse Classic" version is recommended. Otherwise, a Java or RCP version of Eclipse is recommended.

Cloud Computing. Up until now

GTask Developing asynchronous applications for multi-core efficiency

Introduction to NaviGenie SDK Client API for Android

Example of Standard API

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Habanero Extreme Scale Software Research Project

RISC-V Software Ecosystem. Andrew Waterman UC Berkeley

OpenCL Static C++ Kernel Language Extension

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

: provid.ir

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Affdex SDK for Windows!

1/20/2016 INTRODUCTION

SYSTEM ecos Embedded Configurable Operating System

CS378 -Mobile Computing. Android Overview and Android Development Environment

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

New Technology Introduction: Android Studio with PushBot

How To Write A Program In Anieme Frontend (For A Non-Programmable Language)

OpenACC 2.0 and the PGI Accelerator Compilers

Restraining Execution Environments

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

An Overview of Java. overview-1

Java (12 Weeks) Introduction to Java Programming Language

How To Write Portable Programs In C

Building Applications Using Micro Focus COBOL

Virtuozzo Virtualization SDK

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

C++ Programming Language

Embedded/Real-Time Software Development with PathMATE and IBM Rational Systems Developer

MA-WA1920: Enterprise iphone and ipad Programming

Application Development,.NET

Writing Applications for the GPU Using the RapidMind Development Platform

Java Interview Questions and Answers

Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Fundamentals of Java Programming

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

IBM BPM V8.5 Standard Consistent Document Managment

ODROID Multithreading in Android

OpenACC Programming and Best Practices Guide

Evaluation of CUDA Fortran for the CFD code Strukti

ECWM511 MOBILE APPLICATION DEVELOPMENT Lecture 1: Introduction to Android

Hacking your Droid ADITYA GUPTA

Android Developer Fundamental 1

Creating and Using Databases for Android Applications

Dynamic Web Programming BUILDING WEB APPLICATIONS USING ASP.NET, AJAX AND JAVASCRIPT

Transcription:

Android Renderscript Stephen Hines, Shih-wei Liao, Jason Sams, Alex Sakhartchouk srhines@google.com November 18, 2011

Outline Goals/Design of Renderscript Components Offline Compiler Online JIT Compiler Renderscript Runtime Java Reflection + rsforeach HelloCompute Example LLVM Challenges Source Differences Adventures in AST Annotation (Reference Counting) Conclusions & Future Work

Renderscript Goals Develop a solution to providing a solid 60fps for our application frameworks Has to work well with Dalvik Has to work with existing hardware Has to scale to future hardware without requiring developers to rewrite their applications Developed applications must be portable across a variety of hardware. ARM v5, v7, NEON, x86, SSE, GPUs, DSPs Good performance across all devices instead of peak performance for one device at the expense of others Build a forward looking 3D graphics and compute API

Renderscript Design Principles Portability Applications need to be able to run on a wide range of hardware without changes Applications should be able take advantage of new instructions and processors without requiring a recompile or code changes Performance Should be able to utilize advanced vector operations Should be able to utilize non-cpu processors Usability Within the above constraints what can we do to help developers?

Portability To achieve portability, some code must be compiled on the device. This may happen either at install time, runtime, or possibly both. Leverage LLVM bitcode as our on-device portable format Generated by a Clang-based tool from C99 scripts running as part of the SDK build process LLVM bitcode is not 100% portable Endian (we use little endian) Alignment (use size of type - i.e. 8-byte aligned double) sizeof(long) == 8 to match Java

Performance Started with C99 due to its friendliness with compiling to a variety of HW targets and achieving good performance SMP primitives (foreach) built into the API Memory model designed to allow multiple memory spaces Explicit sync points between memory spaces Only script space is directly visible to the user Runtime is asynchronous with the application's Dalvik code All communication goes through FIFOs Runtime is designed so the developer is not aware of which processor a given script is running on

Usability Our goal is to make this as easy to use for developers as possible: Portability and performance constraints Complicated HW-specific features such as local memory and thread launch types are deliberately not exposed We don't allow developers to control which processors their apps run on This ties apps to specific HW Creates forward portability problems Feature set is CPU like, not "mobile" GPU like. Recursion, function pointers, full IEEE 754 fp32, fp64,...

Renderscript Components Offline compiler (llvm-rs-cc) Convert script files into portable bitcode and reflected Java layer Online JIT compiler (libbcc) Translate portable bitcode to appropriate machine code (CPU/GPU/DSP/...) Runtime library support (librs) Manage scripts from Dalvik layer Also provide basic support libraries (graphics drawing functions, etc.)

Offline Compiler: llvm-rs-cc Leverage Clang abstract syntax tree (AST) to reflect information and functionality back to Java layer Embeds metadata within bitcode (type,...) Performs aggressive machine-independent optimizations on host before emitting portable bitcode All bitcode supplied as a resource within.apk container

Offline Compiler Flow

Online JIT Compiler: libbcc Based on LLVM Currently supports ARM and x86 Future targets include GPU/DSP Performs target-specific optimizations and code generation Provides hooks for runtime to access embedded metadata Links dynamically against runtime library functions (graphics, math) Caches JITted scripts to improve startup time Uses MC CodeGen to generate.o librsloader is used to load the.o file libbcinfo provides bitcode translator + metadata extraction

Online JIT Compiler Flow

Renderscript Runtime: librs Manages Scripts and other Renderscript objects Provides implementation of runtime libraries (math, time, drawing, ref-counting,...) Allows allocation, binding of objects to Script globals Message-passing between control and runtime

Java Reflection Non-static global variables and functions are automatically reflected for use with Java app (control-side) Functions like set_globalname() Set a runtime global to a new value Not reflected for const variables Corresponding get_globalname() Returns the most recently set control-side value Starts out as initialized script-side value (zero-init is on by default because of C99)

Java Reflection (continued) Functions like bind_globalptr() Allows control side to bind Allocations to runtime global pointers Implements call-by-reference (as opposed to set()'s callby-value) Pointer types also reflect get() functions appropriately (that return the most recently bound Allocation) Functions like invoke_funcname() Allows control side to execute a particular runtime function (complete with parameter passing)

rsforeach Explicit parallelism (based on rs_allocation dimensions) Invokes root() function of script on each cell in allocation void root(const void *ain, void *aout, const void *usrdata, uint32_t x, uint32_t y) ICS - Reflect a Java helper to do parallel launch void foreach_root(allocation ain, Allocation aout) Need at least 1 of input/output allocation (determines launch dimensions) ICS - Improved type and dimensionality verification in reflected Java helper

rsforeach (continued) Also available in rs_core.rsh to the C99-based scripts: extern void attribute ((overloadable)) rsforeach(rs_script script, rs_allocation input, rs_allocation output, const void * usrdata, size_t usrdatalen); extern void attribute ((overloadable)) rsforeach(rs_script script, rs_allocation input, rs_allocation output, const void * usrdata, size_t usrdatalen, const rs_script_call_t *);

HelloCompute Example Available in Android SDK samples Converts a bitmap image to grayscale Exploits parallelism by using rsforeach on every pixel of an allocation (simple dot product of RGB values) mono.rs -> mono.bc + reflected to ScriptC_mono.java

Sample Script (mono.rs) #pragma version(1) #pragma rs java_package_name(com.example.android.rs.hellocompute) const static float3 gmonomult = {0.299f, 0.587f, 0.114f; void root(const uchar4 *v_in, uchar4 *v_out, uint32_t x, uint32_t y) { float4 f4 = rsunpackcolor8888(*v_in); float3 mono = dot(f4.rgb, gmonomult); *v_out = rspackcolorto8888(mono);

ScriptC_mono.java (generated pt. 1) package com.example.android.rs.hellocompute; import android.renderscript.*; import android.content.res.resources; public class ScriptC_mono extends ScriptC { // Constructor public ScriptC_mono(RenderScript rs, Resources resources, int id) { super(rs, resources, id); U8_4 = Element.U8_4(rs); private Element U8_4; private final static int mexportforeachidx_root = 0; public void foreach_root(allocation ain, Allocation aout) { // check ain if (!ain.gettype().getelement().iscompatible( U8_4)) { throw new RSRuntimeException("Type mismatch with U8_4!");...

ScriptC_mono.java (generated pt. 2)... // check aout if (!aout.gettype().getelement().iscompatible( U8_4)) { throw new RSRuntimeException("Type mismatch with U8_4!"); // Verify dimensions Type tin = ain.gettype(); Type tout = aout.gettype(); if ((tin.getcount()!= tout.getcount()) (tin.getx()!= tout.getx()) (tin.gety()!= tout.gety()) (tin.getz()!= tout.getz()) (tin.hasfaces()!= tout.hasfaces()) (tin.hasmipmaps()!= tout.hasmipmaps())) { throw new RSRuntimeException("Dimension mismatch between input and output parameters!"); foreach(mexportforeachidx_root, ain, aout, null);

HelloCompute.java (partial source) public class HelloCompute extends Activity {... private void createscript() { mrs = RenderScript.create(this); minallocation = Allocation.createFromBitmap(mRS, mbitmapin, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT); moutallocation = Allocation.createTyped(mRS, minallocation.gettype()); mscript = new ScriptC_mono(mRS, getresources(), R.raw.mono); mscript.foreach_root(minallocation, moutallocation); moutallocation.copyto(mbitmapout);...

LLVM Challenges Bitcode versioning Honeycomb Only supported legacy JIT path LLVM 2.7-2.9 bitcode depending on MR version Ice Cream Sandwich MC CodeGen LLVM 3.0 bitcode - this one will last a while, right? ;) Solution - provide a translator to go between versions Partners only need to support latest bitcode format llvm-rs-cc generates older bitcode for older target API Metadata extraction Don't force partners rewrite this for every target backend + RS driver implementation

Source Differences android/external/clang 32 line diff from Clang upstream! Support for RGBA vector selection (28 lines) Support flag to force "long" to 64-bit (4 lines) Ready to upstream now android/external/llvm 368 line diff from LLVM upstream! Mostly patches for legacy JIT path (no longer used - essentially dead code) Stripped down a bit to fit on current tablet/smartphone No debugging support currently (since we don't emit DWARF from llvm-rs-cc today)

Adventures in AST Annotation Renderscript runtime manages a bunch of types Allocations in the sample script (plus other things too) How do we know when they can be cleaned up? Java-Side??? Script-Side??? Reference Counting rssetobject(), rsclearobject() Developers do not want to micro-manage opaque blobs Solution is to dynamically annotate script code to use these functions in the appropriate spots

Annotating the AST Update this in-place before we emit bitcode Need to do a few types of conversions on variables with an RS object type (rs_* types, not including rs_matrix*) Assignments -> rssetobject(&lhs, rhs) Insert destructor calls as rsclearobject(&local) for locals Global variables get cleaned up by runtime after script object is destroyed

Reference Counting Example rs_font globalin[10], globalout; void foo(int j) { rs_font localuninit; localuninit = globalin[0]; for (int i = 0; i < j; i++) { rs_font fornest = globalin[i]; switch (i) { case 3: return; case 7: continue; default: break; localuninit = fornest; globalout = localuninit; return;

RS Object Local Variables rs_font globalin[10], globalout; void foo(int j) { rs_font localuninit; localuninit = globalin[0]; for (int i = 0; i < j; i++) { rs_font fornest = globalin[i]; switch (i) { case 3: return; case 7: continue; default: break; localuninit = fornest; globalout = localuninit; return;

Assignment -> rssetobject() rs_font globalin[10], globalout; void foo(int j) { rs_font localuninit; rssetobject(&localuninit, globalin[0]); // Simple translation to call-expr for (int i = 0; i < j; i++) { rs_font fornest; rssetobject(&fornest, globalin[i]); switch (i) { case 3: // Initializers must be split before conversion return; case 7: continue; default: break; rssetobject(&localuninit, fornest); rssetobject(&globalout, localuninit); return;

Insert Destructor Calls rs_font globalin[10], globalout; void foo(int j) { rs_font localuninit; rssetobject(&localuninit, globalin[0]); for (int i = 0; i < j; i++) { rs_font fornest; rssetobject(&fornest, globalin[i]); switch (i) { case 3: rsclearobject(&localuninit); // Return statements always require that you rsclearobject(&fornest); // destroy any in-scope local objects (inside-out). return; case 7: rsclearobject(&fornest); // continue scopes to for-loop, so destroy fornest continue; default: break; // break scopes to switch-stmt, so do nothing rssetobject(&localuninit, fornest); rsclearobject(&fornest); // End of for-loop scope, so destroy fornest rssetobject(&globalout, localuninit); rsclearobject(&localuninit); // End outer scope (before return) return;

Conclusions Renderscript Portable, high-performance, and developer-friendly 3D graphics + compute acceleration path Hide complexity through compiler + runtime C99-based + foreach Ample use of reflection Library functions Opaque managed types + reference counting Future Work Debugging and profiling support Improved use of vector intrinsics See http://developer.android.com/ for the latest info