Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters



Similar documents
Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

02 B The Java Virtual Machine

1 The Java Virtual Machine

picojava TM : A Hardware Implementation of the Java Virtual Machine

Chapter 7D The Java Virtual Machine

Jonathan Worthington Scarborough Linux User Group

Habanero Extreme Scale Software Research Project

The Java Virtual Machine (JVM) Pat Morin COMP 3002

General Introduction

Real-time Java Processor for Monitoring and Test

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

Cloud Computing. Up until now

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

1. Overview of the Java Language

1/20/2016 INTRODUCTION

Armed E-Bunny: A Selective Dynamic Compiler for Embedded Java Virtual Machine Targeting ARM Processors

The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Java Programming. Binnur Kurt Istanbul Technical University Computer Engineering Department. Java Programming. Version 0.0.

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

The Design of the Inferno Virtual Machine. Introduction

University of Twente. A simulation of the Java Virtual Machine using graph grammars

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices

Tail call elimination. Michel Schinz

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

CSCI E 98: Managed Environments for the Execution of Programs

Virtual Machine Showdown: Stack Versus Registers

C Compiler Targeting the Java Virtual Machine

Instruction Set Architecture (ISA)

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

CPU Organization and Assembly Language

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire

A GENERIC PURPOSE, CROSS-PLATFORM, HIGH EXTENSIBLE VIRTUAL MACHINE. John Vlachoyiannis

CHAPTER 7: The CPU and Memory

2) Write in detail the issues in the design of code generator.

Trace-Based and Sample-Based Profiling in Rational Application Developer

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

Programming Languages

Central Processing Unit (CPU)

Hardware/Software Co-Design of a Java Virtual Machine

Compilers I - Chapter 4: Generating Better Code

Parrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1

Instruction Set Architecture

Online Recruitment System 1. INTRODUCTION

How To Trace

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Replication on Virtual Machines

System Structures. Services Interface Structure

Instruction Set Design

Can You Trust Your JVM Diagnostic Tools?

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Chapter 3 Operating-System Structures

Oracle JRockit Mission Control Overview

Practical Performance Understanding the Performance of Your Application

x86 ISA Modifications to support Virtual Machines

Multi-core Programming System Overview

Instruction Set Architecture (ISA) Design. Classification Categories

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Evolution of the Major Programming Languages

Inside the Java Virtual Machine

Eclipse Visualization and Performance Monitoring

Computer Organization and Architecture

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Lecture 32: The Java Virtual Machine. The Java Virtual Machine

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

The programming language C. sws1 1

Optimizations for a Java Interpreter Using Instruction Set Enhancement

Chapter 5 Names, Bindings, Type Checking, and Scopes

LaTTe: An Open-Source Java VM Just-in Compiler

CSC 2405: Computer Systems II

Programming Language Concepts for Software Developers

LSN 2 Computer Processors

Computer Architecture TDTS10

The Hotspot Java Virtual Machine: Memory and Architecture

Automating Mimicry Attacks Using Static Binary Analysis

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Low Level Virtual Machine for Glasgow Haskell Compiler

Performance Improvement In Java Application

Validating Java for Safety-Critical Applications


7.1 Our Current Model

PROBLEMS #20,R0,R1 #$3A,R2,R4

Garbage Collection in the Java HotSpot Virtual Machine

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Chapter 3: Operating-System Structures. Common System Components

HVM TP : A Time Predictable and Portable Java Virtual Machine for Hard Real-Time Embedded Systems JTRES 2014

11.1 inspectit inspectit

Programming Language Inter-conversion

Windows Server Performance Monitoring

MICROPROCESSOR AND MICROCOMPUTER BASICS

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Transcription:

Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure. Common program representations include: raw text (source code), trees (AST of the program), linear sequences of instructions. Interpreters enable the execution of a program without requiring its compilation to native code. They simplify the implementation of programming languages and on modern hardware are efficient enough for most tasks. 3 4 Text-based interpreters Tree-based interpreters Text-based interpreters directly interpret the textual source of the program. They are very seldom used, except for trivial languages where every expression is evaluated at most once i.e. languages without loops or functions. Plausible example: a calculator program, which evaluates arithmetic expressions while parsing them. Tree-based interpreters walk over the abstract syntax tree of the program to interpret it. Their advantage compared to string-based interpreters is that parsing and name/type analysis, if applicable is done only once. Plausible example: a graphing program, which has to repeatedly evaluate a function supplied by the user to plot it. 5 6

Virtual machines Virtual machines Virtual machines behave in a similar fashion as real machines (i.e. CPUs), but are implemented in software. They accept as input a program composed of a sequence of instructions. Virtual machines often provide more than the interpretation of programs: they manage memory, threads, and sometimes I/O. 8 Virtual machines history Why virtual machines? Perhaps surprisingly, virtual machines are a very old concept, dating back to ~1950. They have been and still are used in the implementation of many important languages, like SmallTalk, Lisp, Forth, Pascal, and more recently Java and C#. Since the compiler has to generate code for some machine, why prefer a virtual over a real one? for simplicity: a VM is usually more high-level than a real machine, which simplifies the task of the compiler, for portability: compiled VM code can be run on many actual machines, to ease debugging and execution monitoring. 9 10 Virtual machines drawback Kinds of virtual machines The only drawback of virtual machines compared to real machines is that the former are slower than the latter. This is due to the overhead associated with interpretation: fetching and decoding instructions, executing them, etc. Moreover, the high number of indirect jumps in interpreters causes pipeline stalls in modern processors. There are two kinds of virtual machines: 1. stack-based VMs, which use a stack to store intermediate results, variables, etc. 2. register-based VMs, which use a limited set of registers for that purpose, like a real CPU. There is some controversy as to which kind is better, but most VMs today are stack-based. For a compiler writer, it is usually easier to target a stackbased VM than a register-based VM, as the complex task of register allocation can be avoided. 11 12

Virtual machines input VM implementation Virtual machines take as input a program expressed as a sequence of instructions. Each instruction is identified by its opcode (operation code), a simple number. Often, opcodes occupy one byte, hence the name byte code. Some instructions have additional arguments, which appear after the opcode in the instruction stream. overhead Virtual machines are implemented in much the same way as a real processor: the next instruction to execute is fetched from memory and decoded, the operands are fetched, the result computed, and the state updated, the process is repeated. 13 14 VM implementation Implementing a VM in C Many VMs today are written in C or C++, because these languages are at the right abstraction level for the task, fast and relatively portable. As we will see later, the Gnu C compiler (gcc) has an extension that makes it possible to use labels as normal values. This extension can be used to write very efficient VMs, and for that reason, several of them are written for gcc. typedef enum { add, /*... */ instruction_t; void interpret() { static instruction_t program[] = { add /*... */ ; instruction_t* pc = program; int* sp =...; /* stack pointer */ for (;;) { switch (*pc++) { case add: sp[1] += sp[0]; sp++; break; /*... other instructions */ 15 16 Optimising VMs The basic, switch-based implementation of a virtual machine just presented can be made faster using several techniques: threaded code, top of stack caching, super-instructions, JIT compilation. Threaded code 17

Threaded code Threaded code vs. switch Program: add sub mul In a switch-based interpreter, each instruction requires two jumps: 1. one indirect jump to the branch handling the current instruction, 2. one direct jump from there to the main loop. It would be better to avoid the second one, by jumping directly to the code handling the next instruction. This is called threaded code. switch-based main loop add sub mul Threaded main add sub mul 19 20 Implementing threaded code Threaded code in C To implement threaded code, there are two main techniques: with indirect threading, instructions index an array containing pointers to the code handling them, with direct threading, instructions are pointers to the code handling them. Direct threading is the most efficient of the two, and the most often used in practice. For these reasons, we will not look at indirect threading. To implement threaded code, it must be possible to manipulate code pointers. How can this be achieved in C? In ANSI C, the only way to do this is to use function pointers. gcc allows the manipulation of labels as values, which is much more efficient! 21 22 Direct threading in ANSI C Implementing direct threading in ANSI C is easy, but unfortunately very inefficient! The idea is to define one function per VM instruction. The program can then simply be represented as an array of function pointers. Some code is inserted at the end of every function, to call the function handling the next VM instruction. Direct threading in ANSI C typedef void (*instruction_t)(); static instruction_t* pc; static int* sp =...; static void add() { sp[1] += sp[0]; ++sp; (*++pc)(); /* handle next instruction */ /*... other instructions */ static instruction_t program[] = { add, /*... */ ; void interpret() { sp =...; pc = program; (*pc)(); /* handle first instruction */ 23 24

Direct threading in ANSI C This implementation of direct threading in ANSI C has a major problem: it leads to stack overflow very quickly, unless the compiler implements an optimisation called tail call elimination (TCE). Briefly, the idea of tail call elimination is to replace a function call that appears as the last statement of a function by a simple jump to the called function. In our interpreter, the function call appearing at the end of add and all other functions implementing VM instructions can be optimised that way. Unfortunately, few C compilers implement tail call elimination in all cases. However, gcc 4.01 is able to avoid stack overflows for the interpreter just presented. Trampolines It is possible to avoid stack overflows in a direct threaded interpreter written in ANSI C, even if the compiler does not perform tail call elimination. The idea is that functions implementing VM instructions simply return to the main function, which takes care of calling the function handling the next VM instruction. While this technique known as a trampoline avoids stack overflows, it leads to interpreters that are extremely slow. Its interest is mostly academic. 25 26 Direct threading in ANSI C typedef void (*instruction_t)(); static int* sp =...; static instruction_t* pc; static void add() { sp[1] += sp[0]; ++sp; ++pc; /*... other instructions */ static instruction_t program[] = { add, /*... */ ; void interpret() { sp =...; pc = program; for (;;) (*pc)(); trampoline Direct threading with gcc The Gnu C compiler (gcc) offers an extension that is very useful to implement direct threading: labels can be treated as values, and a goto can jump to a computed label. With this extension, the program can be represented as an array of labels, and jumping to the next instruction is achieved by a goto to the label currently referred to by the program counter. 27 28 Direct threading with gcc label as value void interpret() { void* program[] = { &&l_add, /*... */ ; int* sp =...; void** pc = program; goto **pc; /* jump to first instruction */ l_add: computed sp[1] += sp[0]; ++sp; goto goto **(++pc); /* jump to next instruction */ /*... other instructions */ Threading benchmark To see how the different techniques perform, several versions of a small interpreter were written and measured while interpreting 100 000 000 iterations of a simple loop. All interpreters were compiled with gcc 4.0.1 with maximum optimisations, and run on a PowerPC G4. The normalised times are presented below, and show that only direct threading using gcc s labels-as-values performs better than a switch-based interpreter. switch ANSI C, without TCE ANSI C, with TCE 1.00 1.45 1.80 gcc s labels-as-values 0.61 0 0.5 1.0 1.5 2.0 29 30

Top-of-stack caching Top-of-stack caching In a stack-based VM, the stack is typically represented as an array in memory. Since almost all instructions access the stack, it can be interesting to store some of its topmost elements in registers. However, keeping a fixed number of stack elements in registers is usually a bad idea, as the following example illustrates: pop push u Stack array y Top-of-stack register t x u x moves around unnecessarily 32 Top-of-stack caching Since caching a fixed number of stack elements in registers seems like a bad idea, it is natural to try to cache a variable number of them. For example, here is what happens when caching at most one stack element in a register: pop push u Stack array Top-of-stack register t u no more unnecessary movement! Top-of-stack caching Caching a variable number of stack elements in registers complicates the implementation of instructions. There must be one implementation of each VM instruction per cache state defined as the number of stack elements currently cached in registers. For example, when caching at most one stack element, the add instruction needs the following two implementations: State 0: no elements in reg. add_0: tos = sp[0]+sp[1]; sp += 2; // go to state 1 State 1: top-of-stack in reg. add_1: tos += sp[0]; sp += 1; // stay in state 1 33 34 Benchmark To see how top-of-stack caching performs, two versions of a small interpreter were written and measured while interpreting a program summing the first 100 000 000 integers. Both interpreters were compiled with gcc 4.0.1 with maximum optimisations, and run on a PowerPC G4. The normalised times are presented below, and show that top-of-stack caching brought a 13% improvement to the interpreter. Super-instructions no caching caching of topmost element 0.87 1.00 0 0.5 1.0 35

Static super-instructions Dynamic super-instructions Since instruction dispatch is expensive in a VM, one way to reduce its cost is simply to dispatch less! This can be done by grouping several instructions that often appear in sequence into a super-instruction. For example, if the mul instruction is often followed by the add instruction, the two can be combined in a single madd (multiply and add) super-instruction. Profiling is typically used to determine which sequences should be transformed into super-instructions, and the instruction set of the VM is then modified accordingly. It is also possible to generate super-instructions at run time, to adapt them to the program being run. This is the idea behind dynamic super-instructions. This technique can be pushed to its limits, by generating one super-instruction for every basic block of the program! This effectively transform all basic blocks into single (super-)instructions. 37 38 Just-in-time compilation Just-in-time compilation Virtual machines can be sped up through the use of just-intime (JIT) or dynamic compilation. The basic idea is relatively simple: instead of interpreting a piece of code, first compile it to native code at run time and then execute the compiled code. In practice, care must be taken to ensure that the cost of compilation followed by execution of compiled code is not greater than the cost of interpretation! 40 JIT: how to compile? JIT: what to compile? JIT compilers have one constraint that off-line compilers do not have: they must be fast fast enough to make sure the time lost compiling the code is regained during its execution. For that reason, JIT compilers usually do not use costly optimisation techniques, at least not for the whole program. Some code is executed only once over the whole run of a program. It is usually faster to interpret that code than go through JIT compilation. Therefore, it is better to start by interpreting all code, and monitor execution to see which parts of the code are executed often the so-called hot spots. Once the hot spots are identified, they can compiled to native code, while the rest of the code continues to be interpreted. 41 42

Virtual machine generators Automatic virtual machine generation Several tools have been written to automate the creation of virtual machines based on a high-level description. vmgen is such a tool, which we will briefly examine. 44 vmgen Based on a single description of the VM, vmgen can produce: an efficient interpreter, with optional tracing, a disassembler, a profiler. The generated interpreters include all the optimisations we have seen threaded code, super-instructions, top-of-stack caching and more. Example of instruction description: name sub ( i1 i2 -- i ) i = i1-i2; stack effect body pure C code Real-world example: the Java Virtual Machine 45 The Java Virtual Machine With Microsoft s Common Language Runtime (CLR), the Java Virtual Machine (JVM) is certainly the best known and most used virtual machine. Its main characteristics are: it is stack based, it includes most of the high-level concepts of Java 1.0: classes, interfaces, methods, exceptions, monitors, etc. it was designed to enable verification of the code before execution. Notice that the JVM has remained the same since Java 1.0. All recent improvements to the Java language were implemented by changing the compiler. The JVM model The JVM is composed of: a stack, storing intermediate values, a set of local variables private to the method being executed, which include the method s arguments, a heap, from which objects are allocated deallocation is performed automatically by the garbage collector. It accepts class files as input, each of which contains the definition of a single class or interface. These class files are loaded on-demand as execution proceeds, starting with the class file containing the main method of the program. 47 48

The language of the JVM The JVM has 201 instructions to perform various tasks like loading values on the stack, computing arithmetic expressions, jumping to different locations, etc. One interesting feature of the JVM is that all instructions are typed. This feature is used to support verification. Example instructions: iadd add the two integers on top of stack, and push back result, invokevirtual invoke a method, using the values on top of stack as arguments, and push back result, etc. 49 The factorial on the JVM static int fact(int x) { return x == 0? 1 : x * fact(x - 1); byte code 0: iload_0 1: ifne 8 4: iconst_1 5: goto 16 8: iload_0 9: iload_0 10: iconst_1 11: isub 12: invokestatic fact 15: imul 16: ireturn 50 stack contents [] [int,int] [int,int,int] [int,int] [int,int] [] Byte code verification Sun s HotSpot JVM A novel feature of the JVM is that it verifies programs before executing them, to make sure that they satisfy some safety requirements. To enable this, all instructions are typed and several restrictions are put on programs, for example: it must be possible to compute statically the type of all data on the stack at any point in a method, jumps must target statically known locations indirect jumps are forbidden. HotSpot is Sun s implementation of the JVM. It is a quite sophisticated VM, featuring: an interpreter including all optimisations we have seen, the automatic detection of hot spots in the code, which are then JIT compiled, two separate JIT compilers: 1. a client compiler, fast but non-optimising, 2. a server compiler, slower but optimising (based on SSA). 51 52 Summary Interpreters enable the execution of a program without having to compile it to native code, thereby simplifying the implementation of programming languages. Virtual machines are the most common kind of interpreters, and are a good compromise between ease of implementation and speed. Several techniques exist to make VMs fast: threaded code, top-of-stack caching, super-instructions, JIT compilation, etc. 53