Write your own JVM compiler

Similar documents
language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

02 B The Java Virtual Machine

Chapter 7D The Java Virtual Machine

Programming Languages

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

CSC 8505 Handout : JVM & Jasmin

Inside the Java Virtual Machine

C Compiler Targeting the Java Virtual Machine

C# and Other Languages

Decision Making under Uncertainty

Introduction to Java

Language Arts Core, First Grade, Standard 8 Writing-Students write daily to communicate effectively for a variety of purposes and audiences.

Semantic Analysis: Types and Type Checking

Java Virtual Machine Locks

Sources: On the Web: Slides will be available on:

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

CHAPTER 18 Programming Your App to Make Decisions: Conditional Blocks

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

PROG0101 Fundamentals of Programming PROG0101 FUNDAMENTALS OF PROGRAMMING. Chapter 3 Algorithms

Computational Mathematics with Python

Semester Review. CSC 301, Fall 2015

Rakudo Perl 6 on the JVM. Jonathan Worthington

The Java Virtual Machine (JVM) Pat Morin COMP 3002

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

03 - Lexical Analysis

Factorizations: Searching for Factor Strings

Computational Mathematics with Python

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Computational Mathematics with Python

VIDEO TRANSCRIPT: Content Marketing Analyzing Your Efforts 1. Content Marketing - Analyzing Your Efforts:

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

CSCI 3136 Principles of Programming Languages

Compiler Construction

Common Data Structures

Let s put together a Manual Processor

Chapter 4: Computer Codes

Ep #19: Thought Management

BBC Learning English Talk about English Business Language To Go Part 12 - Business socialising

Easy Casino Profits. Congratulations!!

Compiler Construction

Summarizing and Paraphrasing

Enjoying EPUB ebooks on Your Nook

Apache Thrift and Ruby

Habanero Extreme Scale Software Research Project

How to. Create Personas For Your B2B Content Marketing Strategy

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

A Step By Step Guide On How To Attract Your Dream Life Now

Python, C++ and SWIG

Theories of Personality Psyc , Fall 2014

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

No problem, Alicia. Everything smells wonderful; I m sure your in-laws visit will go just fine.

Focus on Essay Writing

The Marketer s Guide To Building Multi-Channel Campaigns

CVE Adobe Flash Player Integer Overflow Vulnerability Analysis

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Programming in Access VBA

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

Compiler I: Syntax Analysis Human Thought

The 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation. *solve: to find a solution, explanation, or answer for

Jonathan Worthington Scarborough Linux User Group

Limitations of Data Encapsulation and Abstract Data Types

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1

Member Functions of the istream Class

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

TeachingEnglish Lesson plans. Kim s blog

Chapter 2. Making Shapes

Gradus ad Parnassum: Writing the Master s Thesis Michigan State University, College of Music September Carol A. Hess

Instrumenting Java bytecode

SALES TEMPLATES. for prospecting, scheduling meetings, following up, networking, and asking for referrals.

BPMN Business Process Modeling Notation

Bright Idea Marketing Phone: SEO Guide White Paper

So you want to create an a Friend action

Rake Task Management Essentials

McGraw-Hill The McGraw-Hill Companies, Inc.,

Course: Programming II - Abstract Data Types. The ADT Stack. A stack. The ADT Stack and Recursion Slide Number 1

After you complete the survey, compare what you saw on the survey to the actual questions listed below:

Lecture 9. Semantic Analysis Scoping and Symbol Table

Briki: a Flexible Java Compiler

HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices

Managing Variability in Software Architectures 1 Felix Bachmann*

Coaching Packages VIP Days PR Bursts & More

Intermediate Code. Intermediate Code Generation

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Session 7 Fractions and Decimals

View Your Photos. What you ll need: A folder of digital photos Jasc Paint Shop Photo Album 5

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

csci 210: Data Structures Recursion

16. Recursion. COMP 110 Prasun Dewan 1. Developing a Recursive Solution

BBC Learning English Talk about English Business Language To Go Part 2 - Induction

Enterprise Recipes with Ruby and Rails

Flowcharting, pseudocoding, and process design

2 Mathematics Curriculum

Introduction to Automata Theory. Reading: Chapter 1

How to Outsource Without Being a Ninnyhammer

Grade 6 Math Circles. Binary and Beyond

Transcription:

Write your own JVM compiler OSCON Java 2011 Ian Dees @undees Hi, I m Ian. I m here to show that there s never been a better time than now to write your own compiler.

the three paths Abstraction compilers Time Why would someone want to do this? That depends on the direction you re coming from.

Abstraction xor eax, eax compilers e - Time If, like me, you came from a hardware background, compilers are the next logical step up the ladder of abstraction from programming.

Abstraction logic λx.x grammars xor eax, eax compilers e - Time If you come from a computer science background, compilers and parsers are practical tools to get things done (such as reading crufty ad hoc data formats at work).

Abstraction logic λx.x grammars xor eax, eax compilers e - Time If you re following the self-made path, compilers are one of many computer science topics you may come to naturally on your travels.

where we re headed By the end of this talk, we ll have seen the basics of how to use JRuby to create a compiler for a fictional programming language.

Thnad a fictional programming language Since all the good letters for programming languages are taken (C, D, K, R, etc.), let s use a fictional letter. Thnad, a letter that comes to us from Dr. Seuss, will do nicely.

function factorial(n) { if(eq(n, 1)) { 1 } else { times(n, factorial(minus(n, 1))) } } print(factorial(4)) Here s a simple Thnad program. As you can see, it has only integers, functions, and conditionals. No variables, type definitions, or anything else. But this bare minimum will keep us plenty busy for the next half hour.

hand-waving before we see the code Before we jump into the code, let s look at the general view of where we re headed.

We need to take a piece of program text like minus(n, 1) and break it down into something like a sentence s parts of speech: this is a function call, this is a parameter, and so on. The code in the middle is how we might represent this breakdown in Ruby, but really we should be thinking about it graphically, like the tree at the bottom. minus(n, 1) { :funcall => { :name => 'minus' }, { :args => [ { :arg => { :name => 'n' } }, { :arg => { :number => '1' } } } ] } :funcall :name :args minus :arg :arg :name :number n 1

The representation on the previous slide used plain arrays, strings, and so on. It will be helpful to transform these into custom Ruby objects that know how to emit JVM bytecode. Here s the kind of thing we ll be aiming for. Notice that the tree looks really similar to the one we just saw the only differences are the names in the nodes. minus(n, 1) Funcall.new 'minus', Usage.new('n'), Number.new(1) Funcall @name @args minus Usage Number @name @value n 1

1. parse 2. transform 3. emit Each stage of our compiler will transform one of our program representations original program text, generic Ruby objects, custom Ruby objects, JVM bytecode into the next.

1. parse First, let s look at parsing the original program text into generic Ruby objects.

describe Thnad::Parser do before do @parser = Thnad::Parser.new end it 'reads a number' do expected = { :number => '42' } @parser.number.parse('42').must_equal expected end end Here s a unit test written with minitest, a unit testing framework that ships with Ruby 1.9 (and therefore JRuby). We want the text 42 to be parsed into the Ruby hash shown here.

tool #1: Parslet http://kschiess.github.com/parslet How are we going to parse our programming language into that internal representation? By using Parslet, a parsing tool written in Ruby. For the curious, Parslet is a PEG (Parsing Expression Grammar) parser.

require 'parslet' module Thnad class Parser < Parslet::Parser # gotta see it all rule(:space) { match('\s').repeat(1) } rule(:space?) { space.maybe } # numeric values rule(:number) { match('[0-9]').repeat(1).as(:number) >> space? } end end Here s a set of Parslet rules that will get our unit test to pass. Notice that the rules read a little bit like a mathematical notation: a number is a sequence of one or more digits, possibly followed by trailing whitespace.

2. transform The next step is to transform the Ruby arrays and hashes into more sophisticated objects that will (eventually) be able to generate bytecode.

input = { :number => '42' } expected = Thnad::Number.new(42) @transform.apply(input).must_equal expected Here s a unit test for that behavior. We start with the result of the previous step (a Ruby hash) and expect a custom Number object.

tool #2: Parslet (again) How are we going to transform the data? We ll use Parslet again.

module Thnad class Transform < Parslet::Transform rule(:number => simple(:value)) { Number.new(value.to_i) } end end This rule takes any simple Ruby hash with a :number key and transforms it into a Number object.

require 'parslet' module Thnad class Number < Struct.new(:value) # more to come... end end Of course, we ll need to define the Number class. Once we ve done so, the unit tests will pass.

Finally, we need to emit JVM bytecode. 3. emit

bytecode ldc 42 Here s the bytecode we want to generate when our compiler encounters a Number object with 42 as the value. It just pushes the value right onto the stack.

describe 'Emit' do before do @builder = mock @context = Hash.new end it 'emits a number' do input = Thnad::Number.new 42 @builder.expects(:ldc).with(42) input.eval @context, @builder end end And here s the unit test that specifies this behavior. We re using mock objects to say in effect, Imagine a Ruby object that can generate bytecode; our compiler should call it like this.

require 'parslet' module Thnad class Number < Struct.new(:value) def eval(context, builder) builder.ldc value end end end Our Number class will now need an eval method that takes a context (useful for carrying around information such as parameter names) and a bytecode builder (which we ll get to on the next slide).

tool #3: BiteScript https://github.com/headius/bitescript A moment ago, we saw that we ll need a Ruby object that knows how to emit JVM bytecode. The BiteScript library for JRuby will give us just such an object.

iload 0 ldc 1 invokestatic example, 'minus', int, int, int ireturn This, for example, is a chunk of BiteScript that writes a Java.class file containing a function call in this case, the equivalent of the minus(n, 1) Thnad code we saw earlier.

$ javap -c example Compiled from "example.bs" public class example extends java.lang.object{ public static int dosomething(int); Code: 0: iload_0 1: iconst_1 2: invokestatic #11; //Method minus:(ii)i 5: ireturn If you run that through the BiteScript compiler and then use the standard Java tools to view the resulting bytecode, you can see it s essentially the same program.

enough hand-waving let s see the code! Armed with this background information, we can now jump into the code.

a copy of our home game https://github.com/undees/thnad At this point in the talk, I fired up TextMate and navigated through the project, choosing the direction based on whim and audience questions. You can follow along at home by looking at the various commits made to this project on GitHub.

more on BiteScript http://www.slideshare.net/charlesnutter/ redev-2010-jvm-bytecode-for-dummies There are two resources I found helpful during this project. The first was Charles Nutter s primer on Java bytecode, which was the only JVM reference material necessary to write this compiler.

see also http://createyourproglang.com Marc-André Cournoyer The second helpful resource was Marc-André Cournoyer s e-book How to Create Your Own Freaking Awesome Programming Language. It s geared more toward interpreters than compilers, but was a great way to check my progress after I d completed the major stages of this project.

hopes My hope for us as a group today is that, after our time together, you feel it might be worth giving one of these tools a shot either for a fun side project, or to solve some protocol parsing need you may encounter one day in your work.