LLVM for OpenGL and other stuff. Chris Lattner Apple Computer clattner@apple.com



Similar documents
EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

Tachyon: a Meta-circular Optimizing JavaScript Virtual Machine

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Android Renderscript. Stephen Hines, Shih-wei Liao, Jason Sams, Alex Sakhartchouk November 18, 2011

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Wiggins/Redstone: An On-line Program Specializer

Vertex and fragment programs

Chapter 3.2 C++, Java, and Scripting Languages. The major programming languages used in game development.

Introduction to GPGPU. Tiziano Diamanti

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

How To Trace

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

GPU Profiling with AMD CodeXL

Parrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

General Introduction

CA Compiler Construction

Developer Tools. Tim Purcell NVIDIA

The Design of the Inferno Virtual Machine. Introduction

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Habanero Extreme Scale Software Research Project

Obfuscation: know your enemy

VLIW Processors. VLIW Processors

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

OpenGL Shading Language Course. Chapter 5 Appendix. By Jacobo Rodriguez Villar jacobo.rodriguez@typhoonlabs.com

Compilers and Tools for Software Stack Optimisation

Jonathan Worthington Scarborough Linux User Group

picojava TM : A Hardware Implementation of the Java Virtual Machine

Virtual Machine Learning: Thinking Like a Computer Architect

02 B The Java Virtual Machine

Lesson 06: Basics of Software Development (W02D2

The P3 Compiler: compiling Python to C++ to remove overhead

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

C# and Other Languages

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

The C Programming Language course syllabus associate level

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

QCD as a Video Game?

CS418 GLSL Shader Programming Tutorial (I) Shader Introduction. Presented by : Wei-Wen Feng 3/12/2008

Instruction Set Architecture (ISA)

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

1. Overview of the Java Language

System Structures. Services Interface Structure

Creating OpenGL applications that use GLUT

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Effective Java Programming. efficient software development

<Insert Picture Here> Java, the language for the future

Programming and Software Development CTAG Alignments

Chapter 13: Program Development and Programming Languages

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

Replication on Virtual Machines

OAMulator. Online One Address Machine emulator and OAMPL compiler.

Performance Optimization and Debug Tools for mobile games with PlayCanvas

High level code and machine code

Finger Paint: Cross-platform Augmented Reality

PC Requirements and Technical Help. Q1. How do I clear the browser s cache?

CMSC 427 Computer Graphics Programming Assignment 1: Introduction to OpenGL and C++ Due Date: 11:59 PM, Sept. 15, 2015

Visual Basic Programming. An Introduction

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Virtual Machines. Virtual Machines

Computer Graphics on Mobile Devices VL SS ECTS

The programming language C. sws1 1

Web Based 3D Visualization for COMSOL Multiphysics

Validating Java for Safety-Critical Applications

ELEC 377. Operating Systems. Week 1 Class 3

Computer Science 217

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

Towards OpenMP Support in LLVM

Computer Programming. Course Details An Introduction to Computational Tools. Prof. Mauro Gaspari:

Language Based Virtual Machines... or why speed matters. by Lars Bak, Google Inc

OKLAHOMA SUBJECT AREA TESTS (OSAT )

VHDL GUIDELINES FOR SYNTHESIS

CS420: Operating Systems OS Services & System Calls

An Easier Way for Cross-Platform Data Acquisition Application Development

1/20/2016 INTRODUCTION

Fast JavaScript in V8. The V8 JavaScript Engine. ...well almost all boats. Never use with. Never use with part 2.

Sources: On the Web: Slides will be available on:

Parallel and Distributed Computing Programming Assignment 1

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

Smart Cards a(s) Safety Critical Systems

1001ICT Introduction To Programming Lecture Notes

HP LoadRunner. Software Version: Ajax TruClient Tips & Tricks

Intel Media SDK Library Distribution and Dispatching Process

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

Developing Embedded Software in Java Part 1: Technology and Architecture

High-Level Synthesis for FPGA Designs

Transcription:

LLVM for OpenGL and other stuff Chris Lattner Apple Computer clattner@apple.com

OpenGL JIT OpenGL Vertex/Pixel Shaders

OpenGL Pixel/Vertex Shaders Small program, provided at run-time, to be run on each vertex/pixel: Written in one of a few high-level graphics languages (e.g. GLSL) Executed millions of times, extremely performance sensitive Ideally, these are executed on the graphics card: What if hardware doesn t support some feature? (e.g. laptop gfx) Interpret or JIT on main CPU void main() { vec3 ecposition = vec3(gl_modelviewmatrix * gl_vertex); vec3 tnorm = normalize(gl_normalmatrix * gl_normal); vec3 lightvec = normalize(lightposition - ecposition); vec3 reflectvec = reflect(-lightvec, tnorm); vec3 viewvec = normalize(-ecposition); float diffuse = max(dot(lightvec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectvec, viewvec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_vertex.xy; gl_position = ftransform(); } GLSL Vertex Shader

MacOS OpenGL Before LLVM Custom JIT for X86-32 and PPC-32: Very simple codegen: Glued chunks of Altivec or SSE code Little optimization across operations (e.g. scheduling) Very fragile, hard to understand and change (hex opcodes) OpenGL Interpreter: JIT didn t support all OpenGL features: fallback to interpreter Interpreter was very slow, 100x or worse than JIT GLSL Text OpenGL AST

OpenGL JIT built with LLVM Components GLSL Text OpenGL AST LLVM IR LLVM IR At runtime, build LLVM IR for program, optimize, JIT: Result supports any target LLVM supports (+ PPC64, X86-64 in MacOS 10.5) Generated code is as good as an optimizing static compiler Other LLVM improvements to optimizer/codegen improves OpenGL Key question: How does the OpenGL to LLVM stage work?

Structure of an Interpreter Simple opcode-based dispatch loop: while (...) {... switch (cur_opcode) { case dotproduct: result = opengl_dot(lhs, rhs); break; case texturelookup: result = opengl_texlookup(lhs, rhs); break; case... One function per operation, written in C: double opengl_dot(vec3 LHS, vec3 RHS) { #ifdef ALTIVEC... altivec intrinsics... #elif SSE... sse intrinsics... #else Easy to understand and debug,... generic c code... easy to write each operation (each #endif operation is just C code) } In a high-level language like GLSL, each op can be hundreds of LOC

OpenGL to LLVM Implementation C Code Opcode Functions llvm-gcc Bytecode OpenGL Build Time Bytecode GLSL OpenGL AST LLVM IR LLVM IR At OpenGL build time, compile each opcode to LLVM bytecode: Same code used by the interpreter: easy to understand/change/optimize

OpenGL to LLVM: At runtime 1.Translate OpenGL AST into LLVM call instructions: one per operation 2.Use the LLVM inliner to inline opcodes from precompiled bytecode 3.Optimize/codegen as before GLSL OpenGL AST... vec3 viewvec = normalize(-ecposition); float diffuse = max(dot(lightvec, tnorm), 0.0);... OpenGL to LLVM... %tmp1 = call opengl_negate(ecposition) %viewvec = call opengl_normalize(%tmp1); %tmp2 = call opengl_dot(%lightvec, %tnorm) %diffuse = call opengl_max(%tmp2, 0.0);... Optimize, Codegen LLVM IR LLVM Inliner LLVM IR... %tmp1 = sub <4 x float> <0,0,0,0>, %ecposition %tmp3 = shuffle <4 x float> %tmp1,...; %tmp4 = mul <4 x float> %tmp3, %tmp3...

Benefits of this approach Key features of this approach: Each opcode is written/debugged for a simple interpreter, in standard C Retains all advantages of an interpreter: debugability, understandability, etc Easy to make algorithmic changes to opcodes OpenGL runtime is independent of opcode implementation Primary contributions to Mac OS: Support for PPC64/X86-64 Much better performance: optimizations, regalloc, scheduling, etc No fallback to interpreter needed! OpenGL group doesn t maintain their own JIT!

Another Example: Colorspace Conversion Code to convert from one coordinate system to another:! e.g. BGRA 444R -> RGBA 8888! Hundreds of combinations, importance depends on input Run Time Specialize Compiler optimizes shifts and masking

LLVM + Dynamic Languages

LLVM and Dynamic Languages Dynamic languages are very different than C: Extremely polymorphic, reflective, dynamically extensible Standard compiler optzns don t help much if + is a dynamic method call Observation: in many important cases, dynamism is eliminable Solution: Use dataflow and static analysis to infer types: i starts as an integer var i; for (i = 0; i < 10; ++i)... A[i]... ++ on integer returns integer i isn t modified anywhere else We proved i is always an integer: change its type to integer instead of object Operations on i are now not dynamic Faster, can be optimized by LLVM (e.g. loop unrolling)

Advantages and Limitations of Static Analysis Works on unmodified programs in scripting languages: No need for user annotations, no need for sub-languages Many approaches for doing the analysis (with cost/benefit tradeoffs) Most of the analyses could work with many scripting languages: Parameterize the model with info about the language operations Limitation: cannot find all types in general! That s ok though, the more we can prove, the faster it goes

Scripting Language Performance Ahead-of-Time Compilation provides: Reduced memory footprint (no ASTs in memory) Eliminate (if no eval ) or reduce use of interpreter at runtime (save code size) Much better performance if type inference is successful JIT compilation provides: Full support for optimizing eval d code (e.g. json objects in javascript) Runtime type profiling for speculative optimizations LLVM provides: Both of the above, with one language -> llvm translator Install-time codegen Continuously improving set of optimizations and targets Ability to inline & optimize code from different languages inline your runtime library into the client code?