Java Under The Hood and Future

Similar documents
language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

CS193j, Stanford Handout #4. Java 2

02 B The Java Virtual Machine

Jonathan Worthington Scarborough Linux User Group

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

1. Overview of the Java Language

General Introduction

picojava TM : A Hardware Implementation of the Java Virtual Machine

C# and Other Languages

How Java Software Solutions Outperform Hardware Accelerators

CSCI E 98: Managed Environments for the Execution of Programs

A Comparison of Programming Languages for Graphical User Interface Programming

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

A deeper look at Inline functions

Example of Standard API

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Eclipse installation, configuration and operation

Chapter 3.2 C++, Java, and Scripting Languages. The major programming languages used in game development.

Stack Allocation. Run-Time Data Structures. Static Structures

Chapter 3: Operating-System Structures. Common System Components

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

1/20/2016 INTRODUCTION

Effective Java Programming. efficient software development

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Java Interview Questions and Answers

Crash Course in Java

Use of profilers for studying Java dynamic optimizations

Python, C++ and SWIG

Java Programming. Binnur Kurt Istanbul Technical University Computer Engineering Department. Java Programming. Version 0.0.

Large-Scale Web Applications

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

Cache Database: Introduction to a New Generation Database

Cloud Computing. Up until now

farmerswife Contents Hourline Display Lists 1.1 Server Application 1.2 Client Application farmerswife.com

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Lesson 06: Basics of Software Development (W02D2

Habanero Extreme Scale Software Research Project

Chapter 7D The Java Virtual Machine

Designing with Exceptions. CSE219, Computer Science III Stony Brook University

CSC 551: Web Programming. Spring 2004

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

CS193j, Stanford Handout #10 OOP 3

Choosing a Computer for Running SLX, P3D, and P5

Operating System Components

Lecture 1 Introduction to Android

Software Engineering Techniques

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Multi-core Programming System Overview

Section 1.4. Java s Magic: Bytecode, Java Virtual Machine, JIT,

Chapter 3 Operating-System Structures

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

University of Twente. A simulation of the Java Virtual Machine using graph grammars

14/05/2013 Ed Merks EDL V1.0 1

Introduction to Virtual Machines

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler

Can You Trust Your JVM Diagnostic Tools?

Programming Languages

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 04 Java

Hardware/Software Co-Design of a Java Virtual Machine

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

9/11/15. What is Programming? CSCI 209: Software Development. Discussion: What Is Good Software? Characteristics of Good Software?

Inside the Java Virtual Machine

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Eclipse Visualization and Performance Monitoring

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Java (Micro) Performance Measuring

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

recursion, O(n), linked lists 6/14

CSE 130 Programming Language Principles & Paradigms

Parrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1

Keil C51 Cross Compiler

Java Garbage Collection Basics

DevOps with Containers. for Microservices

Mobile Application Development Android

The Design of the Inferno Virtual Machine. Introduction

MAGENTO HOSTING Progressive Server Performance Improvements

CS420: Operating Systems OS Services & System Calls

4D as a Web Application Platform

Validating Java for Safety-Critical Applications

Installing Java. Table of contents

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Performance Measurement of Dynamically Compiled Java Executions

Software: Systems and Application Software

Chapter 13: Program Development and Programming Languages

System Architecture V3.2. Last Update: August 2015

Spark: Cluster Computing with Working Sets

What is a programming language?

Test Automation Architectures: Planning for Test Automation

CS3600 SYSTEMS AND NETWORKS

Lab Experience 17. Programming Language Translation

Evolution of the Major Programming Languages

Building Applications Using Micro Focus COBOL

GTask Developing asynchronous applications for multi-core efficiency

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Transcription:

CS108, Stanford Handout #38 Winter, 200607 Nick Parlante Java Under The Hood and Future Java 7 Ideas Some sort of closure system syntactic show way to make little bits of code to pass around SwingUtilities.invokeLater( Runnable() {..code.. ); Maybe get rid of "erasure" generic system generic types work more fully (at the cost of compatibility with old JVMs, while old source code will compile fine). Java Compiler Structure Compile classes in.java files produce bytecode in.class files Bytecode A compiled class is stored in a.class files or a.jar file Represent a computation in a portable way as PDF is to an image. Java Virtual Machine The Java Virtual Machine (JVM) is an abstract "stack machine" the bytecode is written to run on the JVM, manipulating values by pushing and popping them on a virtual stack (examples below). The JVM is also the name of the (written in C/C++) program that loads and runs the bytecode. The JVM interprets the bytecode to "run" the program. The JVM runs the code with the various robustness/safety checks in place robustness vs. performance tradeoff. Java aims to have portability to different hardware and safety. Portability and Competition Some sort of JVM/bytecode structure seems like a good architecture moving forward to achieve portability Microsoft is using it with.net, but.net is targeted only at Windows while Java targets all OS's. Java is an especially safe domain for code, now that it is open source. Portability can mean the ability to use a different CPU in the future. Think about x86 architecture carrying lots of baggage to support code compiled for an old chip. Intel thinks that having software portable across OS's is great, so long as it only runs on Intel chips Microsoft thinks that portable software is great if that means it runs on different chips, just so long as it only works with Windows. Customers sincerely love portability the opportunity to switch among software and hardware vendors creates competition > avoiding lockin, yielding better prices and features. There is a linkage between portability and competition. That's why consumers benefit and vendors are, at times, wary. There is a theory that the Internet has been such a huge success because standards, like HTTP and HTML, have led to rapid competition, yielding great price/performance. Verifier The JVM also has a "verifier" that checks that the bytecode is wellformed (e.g. an int is never directly used as a pointer, a variable is never used without being initialized). This is a step in making Java robust/virusproof. A bad guy can try to feed a "bad" program to the JVM, but the verifier will catch in places where the program tries to get around the runtime type, array, etc. checks.

2 Usually, you don't see verifier errors, since the compiler will only generate "correct" bytecode, but the verifier is still needed, in case a bad guy hand crafts some bad bytecode. In Java 6, a new verifier architecture is being put in which does more of the verification at compile time making the load/run process a little quicker. ByteCode Example You can use the "javap" command to print out the bytecode for a class. Normally it prints a summary. The c switch causes it to print the actual bytecode. The "javap" command looks for.class files in the current directory, just like the "java" command. Bytecode Strategy The bytecode is just a description of the computation moving values around, computing things. JIT/HotSpot can translate the bytecode into real register machine code for the particular architecture. Point: the bytecode describes the computation, but it does not actually need to run exactly that way. The bytecode can be translated into anything, so long as it produces exactly the same output. The Bytecode format is deliberately structured to enable the verifier to check the necessary aspects of the code at loadtime. Bytecode Primer The byte code executes against a stack machine adding 1 + 2 like this iload 1; // push a 1 onto the stack iload 2; // push a 2 onto the stack add; // add the two numbers on the stack // leaving the answer on the stack "load" means push a value onto the stack aload_0 = push address of slot 0 slot 0 is the "this" pointer, later slots are the parameters iload_1 = push an int from slot 1 (a parameter) getfield using the pointer on the stack, load an ivar putfield as above, but store to the ivar Why a Stack? Why does bytecode use a stack to represent the computation? It this way, the bytecode description does not depend on a particular number of registers It describes the computation in a clean way the stack bytecode describes the computation, but it does not need to be implemented that way. The JIT/Hotspot layer (below) can translate from the stack storage paradigm to use the actual number of registers on a particular machine. At runtime, the JVM does not need to actually use exactly this stack to run the code the bytecode stack is just a description. Student Bytecode The Student class has a "units" instance variable, and getunits(), setunits(), getstress() methods nick% javap c Student Compiled from Student.java public class Student extends java.lang.object { protected int units; public static final int MAX_UNITS; public static final int DEFAULT_UNITS; public Student(int); public Student(); public int getunits(); public void setunits(int); public int getstress(); public boolean dropclass(int); public static void main(java.lang.string[]);

3 <snip> Method int getunits() 0 aload_0 1 getfield #20 <Field int units> 4 ireturn Method void setunits(int) 0 iload_1 1 iflt 10 // if negative, go to step 10 4 iload_1 // if <= 20, goto 20 5 bipush 20 7 if_icmple 11 10 return 11 aload_0 12 iload_1 13 putfield #20 <Field int units> 16 return Method int getstress() 0 aload_0 1 getfield #20 <Field int units> 4 bipush 10 6 imul 7 ireturn JITs and Hotspot Just In Time compiler modern JVMs compile the bytecode to "native" code for the actual CPU at hand at runtime (with the robustness checks still in). This is one reason why java programs have slow startup times and use more memory than you might guess. This happens to your code under the hood, without your code knowing. The native version just gets "swapped in" as your code runs. Did you ever notice your code seeming so speed up dramatically after a few seconds? The "Hotspot" project (now open source) within the Sun JVM tries to do a sophisticated job of which parts of the program to compile and when don't just do the whole thing at once. In some cases, hotspot can do a better job of optimization than a C++ compiler, since Hotpsot is playing with the code at runtime and so has more information. IMHO, the most interesting optimizations now happen at runtime with a Hotspot type system vs. the old view that optimization is done by the compiler. What I love about Hotspot is that every year it gets 15% smarter. Code I wrote years ago automatically runs a little faster each time. Bytecode Future Maybe cache the compiled version, to speed class loading Think of bytecode as a distribution format, while at runtime something more native is happening. The native code is created from the bytecode when the app is installed, or perhaps created when the program is first run and then cached. This creates great potential freedom for the code to work on unheard of hardware and operating systems in the future, making bytecode a very safe format if you just want the computation to work in the future. Security implication of keeping the compiled native version around needs to be stored in a secure way, since it gets past the verifier. This is true of the JVM binary too. Java Performance Trick: Locals Faster Than ivars Local (stack) variables are faster than member/instance variables of any object (the receiver or some other object). Locals are also easier for the optimizer to work with for a variety of optimizations.

4 Access to an ivar in an object, eg foo.width or this.width, is typically slower than access to a local stack variable. Inside loops, pull needed values into local variables (int i, temp;). Suppose we are in a for loop: for (int i=0; i<piece.getwidth(); i++) { 1. Slow message send...i < piece.getwidth() 2. Medium instance variable with a good JIT, this case and (1) above are essentially the same....i < piece.width or...i < this.width (suppose the code is executing against the receiver) 3. Fast pull the state into a local (stack) variable, and then use it. This makes it easier for the JIT to pull the value into a native register. If the value is in an ivar, the runtime may need to retrieve it from memory every time it is used. It's hard for the runtime to deduce that.width is not being changed, so it reloads it from memory every iteration. In contrast, it's easy for it to deduce that localwidth is not being changed in the loop, so it can just put it in a register and use that value the whole time. int localwidth = piece.getwidth();... i < localwidth... or // Use "final" to make it even more clear for the JIT // (Supposedly modern JVMs are smart enough to figure // this out without needing the "final") final int localwidth = piece.getwidth(); The code does not actually need to read and write values to some "localwidth" memory location the JIT can rewrite the code to use a register to hold that value, and it's fine so long it updates all the code to use the register. That's easy with local variables, since the code that touches the variable is well constrained. Theme: the JIT can do aggressive rewrites of the code. Theme: good to avoid creating memory traffic Hotspot: Inlining Methods/Classes JVM optimizers, and Hotspot in particular, make aggressive use of inlining pasting called code into the caller code. Inlining enables many other optimizations. The "final" keyword for a class means it will not be subclassed, and a "final" method will not be overridden. These assumptions can help HotSpot figure out messagemethod mappings before they happen. As Hotspot gets smarter, it can figure things out even without "final". Not Inlined A() { > B() { > C() {

5 Inlined A() { Inline Advantages Data Flow Values in A() are passed to parameters in B(), passed to C(), where they are used. Now, the flow of that value through the whole A/B/C sequence can be analyzed the value can just live in one variable/register for the whole computation This saves on memory traffic, which is just what we need Propagation of analysis Suppose A() is running a for(i=0; i<array.len; i++) loop, and calls B() and C(), passing in the i value. Down in the C() code, a statement like array[i] would need to be checked for i<0 and i>=len normally. But now Hotspot can see the value of i from start to finish, show that it's always in range, and so remove the cost of the arraybounds check. Similar optimizations work for example: checking if pointers are null, checking instanceof on a pointer. Java Bytecode Ecosystem Java Ecosystem JVM/bytecode is a high quality, stable runtime environment especially now that Java is open source Other languages can be compiled to bytecode run in the JVM e.g. Jython project compiles Python to bytecode, JRuby runs ruby in the JVM BeanShell Example BeanShell is a very simple scripting language http://www.beanshell.org/ BeanShell code is compiled on the fly to bytecode which interacts with full Java class and data structures Beanshell code understands Java objects, methods etc., integrating nicely with all that infrastructure Can have a Beanshell text area, the end user types some code in and hits run... it compiles to bytecode and runs on the fly within the JVM. (e.g. the text area at the lower left below)

6 In this example the "dots" ArrayList has been exported to the Beanshell context at the lower left. The user can write Beanshell code there, hit the Run button, and the code is compiled and run against the JVM and all its objects. The Future of Programming Languages 1. Java Structured, "high road" "Programmer efficient" GC for memory Great libs of offtheshelf code Robust type system, antivirus, antibug safety, checking of things at runtime (makes it a bit slower) Typed has a real type system to structure the code (makes it a bit verbose) Most attractive for large and team projects Attractive for APIs to be used by others (i.e. Eclipse auto complete) Speed is very good in some ways, but weak in some ways and startup time is not great. HotSpot paradigm of speedup via JVM optimization looks very good going forward. Memory use is high Now that Java is open source, Java is an extremely safe, open infrastructure technology if you just want things just to work. An excellent default choice unless there is a mismatch between the problem and java Open source projects tend to attract good community iteration and improvement (e.g. Hotspot research) we'll see how that plays out for Java. C# is like Java but locked in to Microsoft. Use it like Visual Basic to implement things that only need to work on Microsoft OS. Java Niche / Future Large, complex project play to Java's strengths in safety, structured type system Use very often for the serverside of Web applications Benefits from programmer efficiency (as Moore's law runs along, programmer efficiency looks better and better. Hardware gets cheaper, programmer time to create correct software does not!)

7 Java Weaknesses Typing can be annoying Runs a little slower, users more memory Portable, but hard to access OSspecific features GUI can look ok, but hard to make it look really great 2. C/C++ "old road" / "fast road" Can run fast, use the minimum memory Not portable in the way that Java is, but can access system specific features The standard lib is tiny, but on the other hand, there are totally free, open source versions IMHO C/C++ is best for things that are not that complicated, and possibly where performance really matters. e.g. disk driver, JPEG decompression having a nice, lowlevel C version to build on is great. I wonder if the optimal system of the future will have some lowlevel parts in C, but with the upper, "system" part with a lot of custom coding in Java. From the Java side, you may not be aware what "native" C parts there are under the hood. As a system, gets large, complex, and has a lot of people... it's tempting to build it in a safe system such as Java Legacy many things nowadays are in C/C++ because they were started 10 years ago and still work today (e.g. Acrobat, MS Word), or things where performance really matters. If Microsoft could snap its fingers and have MS Word in Python or C#, they would do it in a second. It's in C++ now because it was that way 10 years ago. 3. Python / Ruby / Javascript Scripting, "low road" Short little programs many projects have some part where this is a good fit (e.g. unit tests) Lacks type system in the code just declare variables/parameters and go very neat in its way Makes the code short/dense, but hard to scale up to large projects A language without declared types is great for 2 page programs. IMHO, typing is very useful for a 100,000 line project worked on by many people. Contrast to Java large/team project Neat projects: "bean shell" and "Groovy" Integrate scripting languages closely with Java run in JVM, access same classes objects scripting languages integrated to work closely within Java use it for parts unit tests, allowing enduser scripting.