Performance Potential of Optimization Phase Selection During Dynamic JIT Compilation

Similar documents
picojava TM : A Hardware Implementation of the Java Virtual Machine

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

On Performance of Delegation in Java

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Performance Analysis of Web based Applications on Single and Multi Core Servers

Multi-Threading Performance on Commodity Multi-Core Processors

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Delivering Quality in Software Performance and Scalability Testing

System Requirements Table of contents

Performance Measurement of Dynamically Compiled Java Executions

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

HP reference configuration for entry-level SAS Grid Manager solutions

Introduction to Java

Java Virtual Machine: the key for accurated memory prefetching

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Overlapping Data Transfer With Application Execution on Clusters

Benchmarking Cassandra on Violin

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Shared Parallel File System

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Business white paper. HP Process Automation. Version 7.0. Server performance

Impact of Java Application Server Evolution on Computer System Performance

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

How accurately do Java profilers predict runtime performance bottlenecks?

MS SQL Performance (Tuning) Best Practices:

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

Marvell DragonFly Virtual Storage Accelerator Performance Benchmarks

11.1 inspectit inspectit

Benchmarking Hadoop & HBase on Violin

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Sources: On the Web: Slides will be available on:

enterprise professional expertise distilled

Recommended hardware system configurations for ANSYS users

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

SIDN Server Measurements

Tableau Server 7.0 scalability

Habanero Extreme Scale Software Research Project

Java Interview Questions and Answers

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Amazon EC2 XenApp Scalability Analysis

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

AgencyPortal v5.1 Performance Test Summary Table of Contents

MAQAO Performance Analysis and Optimization Tool

C++ Programming Language

Microsoft Exchange Server 2003 Deployment Considerations

AlphaTrust PRONTO - Hardware Requirements

Energy-aware Memory Management through Database Buffer Control

Server Recommendations August 28, 2014 Version 1.22

Overhead and Performance Impact when Using Full Drive Encryption with HP ProtectTools and SSD

Online Performance Auditing: Using Hot Optimizations Without Getting Burned

Spark: Cluster Computing with Working Sets

Optimising Cloud Computing with SBSE

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

RevoScaleR Speed and Scalability

MAGENTO HOSTING Progressive Server Performance Improvements

Oracle Corporation Proprietary and Confidential

Wiggins/Redstone: An On-line Program Specializer

Exploring Multi-Threaded Java Application Performance on Multicore Hardware

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Using Power to Improve C Programming Education

Validating Java for Safety-Critical Applications

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Generating Run-Time Progress Reports for a Points-to Analysis in Eclipse

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

An Oracle Benchmarking Study February Oracle Insurance Insbridge Enterprise Rating: Performance Assessment

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche.

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Evaluating HDFS I/O Performance on Virtualized Systems

Ready Time Observations

Improving Grid Processing Efficiency through Compute-Data Confluence

OpenMP Programming on ScaleMP

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Optimization of Cluster Web Server Scheduling from Site Access Statistics

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Application Performance Monitoring: Trade-Off between Overhead Reduction and Maintainability

Binary search tree with SIMD bandwidth optimization using SSE

Performance Monitoring of the Software Frameworks for LHC Experiments

Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers

for Invoice Processing Installation Guide

Empirical Examination of a Collaborative Web Application

CHAPTER 7: The CPU and Memory

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

FLOW-3D Performance Benchmark and Profiling. September 2012

C++FA 3.1 OPTIMIZING C++

Intelligent Heuristic Construction with Active Learning

Distributed communication-aware load balancing with TreeMatch in Charm++

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Microsoft Dynamics NAV 2013 R2 Sizing Guidelines for On-Premises Single Tenant Deployments

Parallel Computing with MATLAB

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Performance Testing of a Cloud Service

Automatic Object Colocation Based on Read Barriers

Transcription:

1/20 Performance Potential of Optimization Phase Selection During Dynamic JIT Compilation Michael R. Jantz Prasad A. Kulkarni Electrical Engineering and Computer Science, University of Kansas {mjantz,kulkarni}@ittc.ku.edu March 17, 2013

2/20 Introduction Phase Selection in Dynamic Compilers Phase Selection customizing set of optimizations applied for each method / program to generate the best quality code Static solutions do not necessarily apply to JITs Heuristics improve startup performance. Is it possible for phase customization to improve peak throughput (aka steady-state performance)?

3/20 Introduction Our Goals Understand optimization behavior relevant to phase selection Quantify the performance potential of customizing phase selections in online JIT compilers Determine if current state-of-the-art heuristics achieve ideal performance

4/20 Experimental Framework Our Experimental Framework Uses the server compiler in Sun/Oracle s HotSpot TM JVM Applies a fixed optimization set to each compiled method Imposes a strict optimization ordering Modified to optionally enable / disable 28 optimizations Two benchmark suites with two inputs: SPECjvm98 (input sizes 10 and 100) DaCapo (small and default) Phase selection applied to one focus method at a time Profiling run to determine hottest methods Select methods that comprise at least 10% of total program runtime (53 focus methods across all of the benchmarks)

5/20 Experimental Framework Performance Measurement Performance Measurement All experiments measure steady-state performance Hot methods pre-compiled No compilation during steady-state iterations Run on a cluster of server machines CPU: four 2.8GHz Intel Xeon processors Memory: 6GB DDR2 SDRAM, 4MB L2 Cache OS: Red Hat Enterprise Linux 5.1

6/20 Analyzing Behavior of Compiler Optimizations for Phase Selection Analyzing Behavior of Compiler Optimizations In some situations, applying an optimization may have detrimental effects Optimization phases interact with each other Experiments to explore the effect of optimization phases on code quality T(OPT < defopt x >) T(OPT < defopt >) T(OPT < defopt >) (1)

7/20 Analyzing Behavior of Compiler Optimizations for Phase Selection Impact of each Optimization over Focus Methods Impact of each Optimization over Focus Methods Accumulated impact 0.8 0.6 0.4 0.2 0-0.2 16.06 1.92 40 30 20 10 0-10 Number of impacted methods -0.4-20 - acc. impact + acc. impact HotSpot optimizations - # impacted + # impacted Figure 1: Left Y-axis: Accumulated positive and negative impact of each HotSpot optimization over our focus methods (non-scaled). Right Y-axis: Number of focus methods that are positively or negatively impacted by each HotSpot optimization. Optimizations not always beneficial to program performance Most optimizations have occasional negative effects

8/20 Analyzing Behavior of Compiler Optimizations for Phase Selection Impact of each Optimization over Focus Methods Impact of each Optimization over Focus Methods Figure 1: Left Y-axis: Accumulated positive and negative impact of each HotSpot optimization over our focus methods (non-scaled). Right Y-axis: Number of focus methods that are positively or negatively impacted by each HotSpot optimization. Some optimizations show degrading impact more often

9/20 Analyzing Behavior of Compiler Optimizations for Phase Selection Impact of each Optimization over Focus Methods Impact of each Optimization over Focus Methods Figure 1: Left Y-axis: Accumulated positive and negative impact of each HotSpot optimization over our focus methods (non-scaled). Right Y-axis: Number of focus methods that are positively or negatively impacted by each HotSpot optimization. Most optimizations only have marginal impact on performance Most beneficial is method inlining, followed by register allocation

10/20 Analyzing Behavior of Compiler Optimizations for Phase Selection Impact of Optimizations for each Focus Method Impact of Optimizations for each Focus Method Accumulated impact 1.6 1.2 0.8 0.4 0-0.4-0.8 2.48 3.33 3.38 20 15 10 5 0-5 -10 Number of optimizations with significant impact Methods - acc. Imp. + acc. Imp. - # opts + # opts Figure 2: Left Y-axis: Accumulated positive and negative impact of the 25 HotSpot optimizations for each focus method (non-scaled). Right Y-axis: Number of optimizations that positively or negatively impact each focus method. The rightmost bar displays the average. Not many optimizations degrade performance (2.2, on average) Only a few optimizations benefit performance (4.4, on average)

11/20 Limits of Optimization Phase Selection Limits of Optimization Phase Selection Iterative search infeasible due to number of optimizations Employ long-running genetic algorithms (GA s) to find near-optimal phase selections Evaluate group of phase selections in each GA generation Random mutation and crossover to next generation s set of phase selections 20 phase selections per generation over 100 generations

12/20 Limits of Optimization Phase Selection GA Results for Focus Methods 1.2 Best method-specific GA time / time with default compiler 1 0.8 0.6 0.4 0.2 0 Methods Figure 3: Performance of method-specific optimization selection after 100 GA generations. All results are scaled by the fraction of total program time spent in the focus method to show the runtime improvement for each individual method. The rightmost bar displays the average. Customizing optimization phase selections achieves significant gains Maximum improvement of 44%, average improvements of 6.2%

13/20 Effectiveness of Feature-Vector Based Heuristics Effectiveness of Feature-Vector Based Heuristics Iterative searches not practical for JITs Previous works proposed using feature-vector based heuristics during online compilation [1, 2] GA results allow the first evaluation of these heuristics (compared to ideal phase selections)

14/20 Effectiveness of Feature-Vector Based Heuristics Overview of Approach Overview of Approach Training stage Evaluate program performance for different phase selections Construct a feature set to characterize methods Correlate good phase selections with method features Deployment stage Install learned statistical model into compiler Extract each method s feature set at runtime Predict a customized phase selection for each method

15/20 Effectiveness of Feature-Vector Based Heuristics Our Experimental Setup Our Experimental Setup Scalar Features Distribution Features Counters Types ALU Operations Bytecodes byte char add sub Arguments int double mul div Temporaries short long rem neg Nodes float object shift or address and xor inc compare Attributes Casting Memory Operations Constructor to byte load load const Final to char store new Protected to short new array / multiarray Public to int Static to long Control Flow Synchronized to float branch call Exceptions to double jsr switch Loops to address to object Miscellaneous cast check instance of throw array ops field ops synchronization Table 1: List of method features used in our experiments Select method features relevant to HotSpot Use logistic regression to train our predictive model

16/20 Effectiveness of Feature-Vector Based Heuristics Feature-Vector Based Heuristic Algorithm Results Feature-Vector Based Heuristic Algorithm Results Feature vector model time / best method-specific GA time 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 4.95 5.34 Methods Figure 4: Effectiveness of method-specific feature-vector based heuristic. Training data for each method consists of all the other focus methods. On average, 22% worse than ideal, 14% worse than default compiler Feature-vector based heuristics cannot achieve improvements found by GA, but this result is similar to findings in previous research.

17/20 Effectiveness of Feature-Vector Based Heuristics Feature-Vector Based Heuristic Algorithm Results Feature-Vector Based Heuristic with Additional Analysis Feature vector model time / best method-specific GA time 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1.56 1.66 Methods Figure 5: Experiments that use the observations from analysis of optimization behavior to improve the performance of feature-vector based heuristic algorithms for online phase selection Only predict ON/OFF setting for optimizations that show negative performance impact for at least 10% of methods Does not achieve ideal code quality, but only 8.4% worse than ideal

18/20 Conclusions Conclusions Framework for optimization selection research in JITs Observations Most optimizations have modest performance impact Few optimizations are active for most methods Most optimizations do not negatively affect performance Modest potential for optimization phase selection GA-based search yields 6.2% average improvement Feature-vector based heuristics do not attain ideal performance Suggested directions may improve phase selection heuristics

19/20 Conclusions Questions Thank you for your time. Questions?

20/20 References References [1] John Cavazos and Michael F. P. O Boyle. Method-specific dynamic compilation using logistic regression. In Proceedings of the conference on Object-oriented programming systems, languages, and applications, pages 229 240, 2006. [2] R.N. Sanchez, J.N. Amaral, D. Szafron, M. Pirvu, and M. Stoodley. Using machines to learn method-specific compilation strategies. In Code Generation and Optimization, pages 257 266, April 2011.