An Empirical Analysis of the Java Collections Framework Versus the C++ Standard Template Library



Similar documents
Java Coding Practices for Improved Application Performance

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Java Virtual Machine: the key for accurated memory prefetching

Validating Java for Safety-Critical Applications

Garbage Collection in the Java HotSpot Virtual Machine

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Evolution of the Major Programming Languages

Lecture 2: Data Structures Steven Skiena. skiena

Glossary of Object Oriented Terms

1. Overview of the Java Language

Performance Measurement of Dynamically Compiled Java Executions

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

CompuScholar, Inc. Alignment to Utah's Computer Programming II Standards

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

picojava TM : A Hardware Implementation of the Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine

Fundamentals of Java Programming

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

JAVA COLLECTIONS FRAMEWORK

Data Structures in the Java API

General Introduction

Overlapping Data Transfer With Application Execution on Clusters

11.1 inspectit inspectit

CSCI E 98: Managed Environments for the Execution of Programs

CHAPTER 4 ESSENTIAL DATA STRUCTRURES

INTRODUCTION TO JAVA PROGRAMMING LANGUAGE

The C Programming Language course syllabus associate level

Programming Languages & Tools

Basic Programming and PC Skills: Basic Programming and PC Skills:

On Benchmarking Popular File Systems

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Virtual Machine Learning: Thinking Like a Computer Architect

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

Expansion of a Framework for a Data- Intensive Wide-Area Application to the Java Language

Online Recruitment System 1. INTRODUCTION

Developing MapReduce Programs

Java in Education. Choosing appropriate tool for creating multimedia is the first step in multimedia design

C Compiler Targeting the Java Virtual Machine

Control 2004, University of Bath, UK, September 2004

Computer Programming I

Developing Embedded Software in Java Part 1: Technology and Architecture

Object Oriented Database Management System for Decision Support System.

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Resource Utilization of Middleware Components in Embedded Systems

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Can You Trust Your JVM Diagnostic Tools?

Replication on Virtual Machines

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Practical Performance Understanding the Performance of Your Application

02 B The Java Virtual Machine

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

El Dorado Union High School District Educational Services

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

RevoScaleR Speed and Scalability

Monitoring Java enviroment / applications

THE NAS KERNEL BENCHMARK PROGRAM

Lecture 1 Introduction to Android

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Jonathan Worthington Scarborough Linux User Group

1 The Java Virtual Machine

1/20/2016 INTRODUCTION

Eclipse Visualization and Performance Monitoring

Structural Design Patterns Used in Data Structures Implementation

Efficiency Considerations of PERL and Python in Distributed Processing

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Language Based Virtual Machines... or why speed matters. by Lars Bak, Google Inc

Chapter 3 Operating-System Structures

From Faust to Web Audio: Compiling Faust to JavaScript using Emscripten

Java Garbage Collection Basics

Computing Concepts with Java Essentials

Curriculum Map. Discipline: Computer Science Course: C++

Binary search tree with SIMD bandwidth optimization using SSE

Java Software Structures

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Habanero Extreme Scale Software Research Project

Java Programming Fundamentals

Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 04 Java

AP Computer Science A - Syllabus Overview of AP Computer Science A Computer Facilities

DATA STRUCTURES USING C

PC Based Escape Analysis in the Java Virtual Machine

CHAPTER 1 INTRODUCTION

What is a programming language?

Cloud Computing. Up until now

Instruction Set Architecture (ISA)

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

A generic framework for game development

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

System Structures. Services Interface Structure

Java (12 Weeks) Introduction to Java Programming Language

The Classical Architecture. Storage 1 / 36

Java 6 'th. Concepts INTERNATIONAL STUDENT VERSION. edition

Write Barrier Removal by Static Analysis

New Hash Function Construction for Textual and Geometric Data Retrieval

Transcription:

An Empirical Analysis of the Java Collections Framework Versus the C++ Standard Template Library Sean Wentworth: sean.wentworth@minolta-qms.com David Langan: langan @cis.usouthal.edu Thomas Hain: hain@cis.usouthal.edu Abstract School of Computer and Information Sciences University of South Alabama Mobile, AL 36688 The choice of a programming language involves a tradeoff between run-time performance and factors that are more important during implementation and maintenance. In the case of two specific object-oriented languages, Java and C++, performance has favored C++. On the other hand, design-time factors such as platformindependence, flexibility, safety, and security are characteristics of Java that have made it increasingly popular. This paper presents an empirical analysis of the operation and application-level performance of two libraries supporting these languages, the Java Collections Framework and the C++ Standard Template Library. The performance of the libraries is also taken to be indicative of the performance of their respective languages, and a goal of this study is to aid in the evaluation of the above-mentioned tradeoff. The results found are consistent with other Java and C++ performance comparisons found in the literature. However, compiling Java to native machine code narrows the performance gap between C++ and Java, signaling that this may be Java's best hope for performance gains. Keywords: Performance benchmarks, STL performance, CF performance, Java versus C++. 1 Introduction The developer s choice of a programming language, after other constraints have narrowed the field, involves a tradeoff between run-time performance and design-time factors that impact implementation and maintenance. In the case of two specific objectoriented languages, Java and C++, performance has favored C++. On the other hand, design-time factors such as platform-independence, flexibility, safety, and security are characteristics of Java that have made it increasingly popular over the last seven years. Sun Microsystems write once, run anywhere marketing line for Java quickly caught developers attention. Java s early evolution coincided with the rapid growth of 1

the Internet, which was the first proving ground for Java s claims of platform independence. Although platform independence may have been the initial hook that lured programmers to Java, it does not account for the sustained interest. Java s widespread acceptance comes from its flexibility, simplicity, and safety relative to other object-oriented programming languages [10]. As Tyma states, [Java] takes the best ideas developed over the past 10 years and incorporates them into one powerful and highly supported language [24]. Java s rich set of packages has enticed many programmers to select the language for its handling of run-time exceptions, its support for writing network code and its support for creating multithreaded applications. Not only does Java simplify the design process, but it also ensures a high degree of safety that is not found in languages like C++. Optional thread synchronization, for example, is guaranteed and managed by the language and array bounds checking is automatically done at run-time. While borrowing much from C++, Java deliberately omits many of the error-prone features that arise from explicit memory management in C++ [9]. Java's memory management model provides both programmer convenience and run-time safety. The Java Run-time Environment (JRE) includes an asynchronous garbage collector that is responsible for deallocating heap memory thus relieving the programmer of concerns about memory leaks [12] and dangling references. It has been shown that Java is a highly productive programming environment, generating less than half the bugs per line of code than programming in C++ [16]. However, the features that make this possible in Java come at the cost of performance. For example, the run-time performance cost of worry-free garbage collection can be substantial, especially for applications with a high rate of object creation [8]. Initially, performance degradation was not a big concern, since Java was found suitable for Internet computing tasks such as writing applets, where bandwidth and platform independence are more important than program performance. More recently, there has been increased use of Java for performance-sensitive applications, such as server applications and scientific computing, which have typically been written in languages like C++ [11]. The October 2001 issue of the Communications of the ACM 2

documents several ongoing projects aimed at the use of Java in the domain of highperformance computing. This paper presents an empirical analysis of the operation and application-level performance of two libraries supporting these languages, the Java Collections Framework and the Standard Template Library. The performance of the libraries is also taken to be indicative of the performance of their respective languages, and a goal of this study is to aid in the evaluation of the tradeoff between run-time performance and design-time factors. Section 2 will outline the technological history of Java performance enhancements, and outline the characteristics of the libraries used in this study. Section 3 reviews other performance comparison studies that have been conducted. Section 4 presents the approach for performance measurement and the results of these tests. Conclusions are drawn in Section 5. 2 Background Java is a young language and its rapid evolution is motivated by its growing popularity. Its previous performance disadvantage has been documented by numerous studies comparing the performance of Java against C/C++, however these suffer from aiming at a moving target. Section 2.1 outlines this moving target. Section 2.2 characterizes the libraries that are used in this study. 2.1 The Technological Evolution of Java Java programs are compiled into class files composed of Java bytecode. The Java virtual machine (JVM) is a process running on a particular physical machine that interprets and executes the bytecode instructions. Initially JVMs were purely interpreted. Later, the interpreted model was improved with just-in-time (JIT) compilers that compiled Java bytecode to native machine code at method load-time. Depending on the application, a JIT-enabled JVM can boost execution performance from a factor of 2 to 20 [10][12][26]. Building on the JIT concept, Sun improved Java s performance further in its HotSpot Performance Engine, which incorporated several performance enhancing technologies [23]. In both JIT and HotSpot, optimizations are performed during program execution. To avoid this run-time overhead, native-code compilers such as Excelsior's 3

JET [5] were developed for Java. The Marmot optimizing compiler for Java has yielded "application performance approaching that of C++" [6]. As the performance gap between Java and natively compiled languages has narrowed, many companies are promoting Java as a general-purpose high level computing language [1]. There is an emerging trend to use Java in place of C++ to implement large-scale applications [3]. In 1997, it was estimated that as many as 60% of C++ developers were learning Java [7]. In addition, many computer science curriculums have replaced C++ with Java as the language used to teach introductory programming concepts [24]. This growing language shift has created the need for a better understanding of Java's performance relative to more traditional object-oriented languages, particularly C++. In the literature, Java is most often compared to C++ [12]. This is probably due to their syntactical and semantic similarities, and because C++ has become the de facto standard among general-purpose, object-oriented languages. 2.2 The Libraries What follows is a brief overview of the libraries that were compared: the Java Collections Framework (CF) and the C++ Standard template Library (STL). 2.2.1 The Standard Template Library (STL) The STL became part of the ANSI/ISO standard definition of the C++ language in 1994. It is organized around three fundamental components: containers, algorithms and iterators. Supporting these fundamental components are three additional components: allocators, adaptors and function objects. As the name implies, containers are objects that hold other objects. The seven containers provided in the STL are grouped into sequence containers (vector, list, deque) and associative containers (set, multiset, map, multimap). Interestingly, containers are not accessed via references (or pointers) as might be expected in C++, but rather using copy semantics, thereby providing both efficiency and type safety [15]. In addition, containers free the programmer from allocation and deallocation concerns by generally controlling these aspects using default allocators [21]. 4

Algorithms perform the manipulation of elements in containers. However, algorithms are decoupled from container implementations in a completely orthogonal way by the use of iterators, thus providing genericity. Algorithm templates are loosely grouped into seven functional categories: non-mutating sequence algorithms, mutating sequence algorithms, sorting and searching algorithms, set operations, heap operations, numeric algorithms and miscellaneous/other functions [15]. 2.2.2 The Java Collections Framework (CF) Sun s initial releases of the SDK included very limited support for data collections and algorithms (only the vector, stack, hashtable and bitset classes). It was not until the release of SDK 1.2 in 1998, that Sun introduced an extended generic container library called the Collections Framework (CF). The primary conceptual difference between the CF and the STL is that the former is focused on containers rather than on the combination of containers and algorithms [25]. The STL defines both containers and algorithms as relatively equal entities that can be mixed and matched in useful ways. The CF, on the other hand, defines algorithms as methods of either the collection interface or class. Because of this singular focus, the CF does not fully support generic programming as defined by Musser and Stepanov [14] since the algorithms are tightly coupled with the containers on which they operate. The CF defines only a small number of algorithms that perform basic operations such as sorting, searching, filling, copying and removing, while the STL offers these and many additional ones such as heap operations, numeric algorithms, find operations, and transformations. The CF containers are defined in a hierarchy of Java interfaces. Interestingly, the interfaces do not derive from a single "super-interface," since the CF does not treat maps as true collections of objects as does the STL. The CF designers decided to differentiate between a Collection, as a group of related objects, and a Map, as a mapping of key/value pairs. However, the map interfaces have "collection view" operations, or wrappers, that allow them to be manipulated as collections. That is, a Map can return a Set of its keys, a Collection of its values or a Set of its pairs [4]. The CF also supplies six concrete classes based upon these interfaces (shown in parentheses with its associated interface in Figure 5

1). In addition, an abstract class definition of each of the interfaces is provided, giving programmers a partial implementation from which custom implementations can be built. Collection Map (HashMap) List (ArrayList & LinkedList) Set (HashSet) SortedSet (TreeSet) SortedMap (TreeMap) Figure 1 Collections Framework interface hierarchy with concrete implementations in parentheses 3 Previous Performance Comparisons Previous performance comparisons between Java and C or C++ fall into three general categories. The languages are compared using full-scale applications, smaller application kernels, and low-level microbenchmarks. Although they provide mostly anecdotal data, the full-scale application comparisons offer valuable insight into the language differences on real world applications. Application kernels test a language's ability to perform certain general types of operations, such as the calculation of linear equations. Microbenchmarks are used to analyze specific low-level language operations, such as method calling, variable assignment, or integer addition. Interestingly, the majority of these studies show similar results: Java code running on a JIT-enabled JVM requires about 3 times the run-time of corresponding C or C++ code. 3.1 Full-Scale Applications Jain et al. designed an image processing application called MedJava and compared it to xv, an equivalent image processing application written in C [10]. Eight image filtering operations, each applied to images of various sizes, were timed with each application. Although MedJava met or exceeded the performance of xv in two of the filters, for the remaining six filters, xv executed from 1.5 to 5 times faster than MedJava. This rather large range can be attributed to the designs of the eight image filtering operations. For example, the authors point out that the Java version of the sharpen filter 6

performs better than the xv version because this filter is dominated by floating point to integer conversions, which are less efficient in C/C++. A second comparison was performed using transport interface applications over a high speed ATM network to transfer image files. This comparison showed C++ outperforming Java by only 15-20% over a range of sending-buffer sizes. Another large-scale comparison between Java and C++ analyzed a parallel multipass image rendering system in both languages [26]. Based on the same algorithm, the two versions were tested on three images using machines with a varying number of CPUs. The C++ version executed 3 to 5 times faster than the Java version. The I/O capabilities of the two versions were also examined by measuring the elapsed time for each to read the modeling data into memory. The C++ version read the data about 5 times faster than the Java version for the three images. Prechelt compared Java to C and C++ by having 38 different programmers write 40 implementations (24 written in Java, 11 in C++, and 5 in C) of a program that converts a telephone number into a word string based upon a dictionary file [17]. Similar to the previous two studies, his results showed that the median execution time of the C and C++ codes were just over 3 times faster than the Java implementations. 3.2 Application Kernels Bernardin compared Java to both C and C++ using application kernels that multiply univariate polynomials of varying degrees [1]. In the Java to C comparison, the polynomials were represented as arrays of 32-bit integers. The C version was compiled two ways: with standard compilation options, and with powerful, machine-specific optimizations. The heavily optimized C code executed 2.3 times faster than the Java code. However, against the standard C compilation, the Java version performed better, averaging 21% faster. The Java to C++ comparison represented the polynomials as arrays of pointers to generic coefficient objects. Using this design, the C++ version executed an average of 2.6 times faster than the Java version. The Linpack benchmark a standard application kernel benchmark was performed in both Java and C using a problem size of 1000 1000 [3]. This benchmark is a matrix multiplication of linear equations used to measure floating-point numeric 7

performance. The results, measured in floating point operations per second, showed C running about 2.3 times faster than Java. 3.3 Microbenchmarks In Roulo's analysis, several microbenchmarks were performed comparing Java to both C and C++ [18]. He showed that across methods accepting from 1 to 11 parameters, Java s method calls are approximately one clock cycle longer than the same C++ function calls. He also showed that explicit object creation in Java is roughly equivalent to using the malloc operation in C or creating objects with the new operator in C++. This is due to the fact that in all three of these instances, the memory allocation occurs on the heap. However, such user-defined objects are not the only objects that Java and C++ must create and manage. Many common operations require the creation of implicit, temporary objects. Java uses the heap for all object creation, whereas C++ creates temporary objects on the system stack, which is almost always in the on-board Level 1 cache [18]. Therefore, temporary object creation in C++ occurs about 10 to 12 times faster than it does in Java. Furthermore, the use of JIT seems to have little impact on the Java performance penalty for object creation [18]. Mangione also used microbenchmarks to compare Java and C++ [12]. Several fundamental computational tasks, such as method calls, integer division and float division, were performed in loops that executed 10 million times. In six of the loops, the Java versions executed as fast as the C++ versions. At first glance, these results seem hard to believe, given the other timings presented so far. There is good reason for skepticism here. Such simple looping microbenchmarks can yield widely varying results between runs due to the behavior of cache memory and the size of internal write buffers [2]. Even when such factors are taken into account, conclusions drawn solely from microbenchmark results should be carefully qualified. 4 Performance Comparisons This paper presents a comparison of the support offered to Java and C++ for data collections. More specifically, it compares the performance of the CF and the STL. The performance tests are done at two levels: a microbenchmark compares equivalent 8

algorithm/method operations on the various containers, and a combination benchmark compares overall performance given various execution profiles. This two-tiered approach to performance testing is modeled after the methodology used in several other empirical performance comparisons [10] [19] [20]. 4.1 Microbenchmarks 4.1.1 Comparable Library Features As described in Section 2.2, the approaches used by the STL and the CF differ in several significant ways. In order to compare operations on containers of each library, it is necessary to identify the overlap in their functionality, since this will provide the basis for the microbenchmark tests conducted. For each library, the functionality of the implemented container classes was compared. This analysis yielded six functionally comparable container classes (highlighted in Table 1). Since the CF does not contain a queue container, and because queues are nonetheless commonly used data structures, a queue class was created by extending the LinkedList class and adding enqueue and dequeue operations. For the set and map container types, the CF provides two possible container classes an "ordered" and a "hashed" version. Currently, the STL only provides an ordered set implementation. Because its functionality more closely matches that of the STL, the ordered set of the CF (TreeSet) was chosen for comparison against the STL's set container. Non-standard hashed set templates are available for C++ in libraries such as STLport [22]. It is expected that the next version of the STL will contain a hashed set implementation [13]. 9

Tests Performed Table 1: Microbenchmark containers and timed operations. Dynamic Array List Ordered Set Ordered Map Stack Queue Create & Fill Create & Fill Create & Fill Create & Fill Equal Create & Fill Copy Sequential Search Equal, min/max Equal, min/max Push & Pop Sequential Search Sort Increment values of each element Sort Binary Search Subset relationship Binary Search Shuffle, reverse, equal, min/max Increment values of each element Reverse, equal, min/max Increment values of each element Set Union Set Intersection & Set Difference Increment values of each element Equal Enqueue & Dequeue The initial analysis of algorithms/methods available for these six containers yielded a set of comparable algorithms/methods across both libraries that represented approximately 47% of the total. However, not all of the comparable methods were included in the final benchmark design. Some were deemed computationally insignificant. For example, the method to obtain a container's size simply returns the value of the container's privately stored size counter. Some methods were not included because they were not functionally equivalent in both libraries. For example, the CF ArrayList constructor, ArrayList(int n), creates an empty ArrayList container and pre-allocates room for n elements. Although space has been allocated, the container is empty after the constructor is called (the size() method returns 0). The syntactically comparable constructor in the STL, however, fills the container with n constructed elements in addition to allocating space (so size() returns n). Although the constructors appear to be functionally equivalent, they are not. Adding an element to the front of the CF s ArrayList after this constructor is called is much faster than in the STL version, because the STL must shift all n elements currently in the vector before adding the new element. After removing such methods from the initial list, the set of usable methods represented about 35% of the total available methods. If both a member method and a 10

generic algorithm were available for a particular operation, the member method was chosen for the benchmark since they are generally more efficient than generic algorithms by having access to a container's internal structure. The tested containers were filled with objects consisting of an integer and two floating point numbers. These data elements were initialized with random values and were subsequently used in operations done to each item in a container. Each test begins by creating and filling either four or five containers with 200,000 elements each and recording the execution time for this process. Additional tests that vary by container are performed and the execution time for each test is recorded. Table 1 identifies the containers and timed tests that make up the microbenchmark portion of the suite. There are two advantages to designing the microbenchmarks in the suite around common library operations. First, using such a high level of granularity helps to ensure that the suite is robust. The difficulty with many microbenchmarks comes from the fact that they typically measure performance on the order of source-code-level instructions (e.g., arithmetic operations or assignment). Therefore, results can be highly susceptible to variation due to architectural implementations such as cache performance and read buffers [2], as well as to compiler optimizations. Any of these factors may result in microbenchmark performance that does not correlate with performance in actual applications. Building microbenchmarks from high-level library operations avoids such threats to the validity of the results. Second, the microbenchmark results serve a dual purpose. In addition to being merely interpretive tools for larger scale testing, the microbenchmark results provide valuable data in their own right. Being aware of the performance ratios for specific library operations is helpful to potential users of the libraries, especially if one is considering porting code from one language to the other. Measures were taken to minimize variability among the Java microbenchmarks. The System.gc() method was called before each timed test to attempt to force garbage collection to take place outside of the timings. This minimized the possibility that garbage collection would occur in the middle of a test yielding misleading results. Since garbage collection is an inherent part of Java, one might argue that attempting to exclude it from the microbenchmark timings invalidates the comparison between the CF and the STL. This is not the case. The attempt to control garbage collection is only done in the 11

microbenchmark part of the suite. No such adjustments were made in the combination benchmark. Also, microbenchmarks are intended to measure the performance of specific, limited operations. Garbage collection occurs "as needed" at unpredictable times during the execution of a Java program. If a garbage collection cycle occurred during a microbenchmark test, it would likely be reclaiming space from some previous test in the microbenchmark. This would result in the garbage collection overhead being attributed to the wrong test. Thus, attempting to keep garbage collection out of the timings helps to insure the accuracy of the microbenchmark results. 4.1.2 Microbenchmark Results Figure 2 shows the microbenchmark performance ratios for the CF with HotSpot enabled and the compiled CF versus the STL (normalized to 1). The majority (22) of the CF with HotSpot to STL ratios fall in the range from 1.0 to 7.7. Three of the ratios, list binary search, set union and set intersection & difference, are extremely outside of this range due to the results from these STL microbenchmark operations (see Table 2 for a complete listing of microbenchmark ratios). The STL set union test and set intersection & difference test ran exceedingly long (nearly 40 times longer than the Java versions). The reason for this lies in the differences between how these operations are performed in the CF and the STL. The CF performs these operations and merges the results into one of the original sets. For example, performing set union on set1 and set2 results in set2 being unchanged and set1 now being the union of set1 and set2. The STL, on the other hand, performs these operations by creating a third container to receive the result of the operation. Since several of these operations are performed in the microbenchmark, many new sets are created. The memory requirements for these containers quickly exceeds the 128 Mb of physical memory on the test machine, which results in a good deal of memory swapping to the disk (noticed during the execution of these tests). Interestingly, compiling this "problematic" STL code with Borland's compiler yielded results that were 25 times faster than Microsoft's for set union and 15 times faster for set intersection & difference. 12

Relative Ratios (STL normalized to 1) 8.00 7.00 Ratio of CF with HotSpot vs. STL Ratio of compiled CF vs. STL 6.00 5.00 4.00 3.00 2.00 1.00 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1 - ARRAY: Create & Fill 2 - ARRAY: Copy 3 - ARRAY: Sequential Search 4 - ARRAY: Sort 5 - ARRAY: Binary Search 6 - ARRAY: Shuffle, Reverse, Equal & Min/Max 7 - ARRAY: Increment Each Value 8 - LIST: Create & Fill 9 - LIST: Sequential Search 10 - LIST: Sort 11 - LIST: Equal, Reverse & Min/Max 12 - LIST: Increment Each Value 13 - SET: Create & Fill 14 - SET: Equal & Min/Max 15 - SET: Increment Each Value 16 - SET: Subset Relationship 17 - SET: Union 18 - SET: Intersection & Difference 19 - MAP: Create & Fill 20 - MAP: Equal & Min/Max 21 - MAP: Increment Each Value 22 - STACK: Equal 23 - STACK: Push & Pop 24 - QUEUE: Create & Fill 25 - QUEUE: Equal 26 - QUEUE: Enqueue & Dequeue Figure 2 Graph of microbenchmark results for the CF with HotSpot enabled, the CF compiled and the STL. 13

Table 2: Microbenchmark performance ratios. Containers and Operations Java to Java Ratios Java to C++ Ratios Interpreted vs Hotspot Hotspot vs compiled Hotspot vs STL Compiled vs STL D Y N A M I C A R R A Y 1 Create, fill 3.25 1.80 1.71 0.95 2 Copy 4.97 1.20 2.04 1.69 3 Sequential Search 6.12 0.96 1.09 1.14 4 Sort 4.04 1.38 2.52 1.83 5 Binary Search 5.03 1.04 5.29 5.10 6 Shuffle, reverse, equal, min/max 4.45 1.13 3.78 3.35 7 Reset all values 6.13 1.34 1.40 1.05 L I S T 8 Create, fill, copy 1.48 2.93 2.41 0.82 9 Sequential Search 4.20 1.04 1.03 0.98 10 Sort 4.25 1.42 2.07 1.46 11 Shuffle, reverse, equal, min/max 4.34 1.43 1.85 1.29 12 Reset all values 4.75 1.34 1.94 1.44 S E T 13 Create, fill 1.68 2.57 3.09 1.20 14 Equal, min/max 6.03 1.43 7.68 5.36 15 Reset all values 4.49 1.53 2.01 1.32 16 Subset Relationship 6.82 1.45 18.47 12.77 17 Set Union 6.55 1.25 0.03 0.03 18 Set Intersection & Difference 6.51 1.60 0.03 0.02 M A P 19 Create, fill 1.63 1.94 2.81 1.45 20 Equal, min/max 5.13 1.02 4.14 4.06 21 Find, reset, remove 6.22 0.89 5.49 6.14 S T A C K 22 Equal 5.25 0.77 2.98 3.84 23 Push & pop 2.86 1.79 0.15 0.08 Q U E U E 24 Create, fill 1.27 7.80 4.04 0.52 25 Equal 4.58 1.56 2.81 1.80 26 Enqueue & Dequeue 1.70 3.40 1.04 0.31 R A T I O A V E R A G E S 4.21 1.70 3.15 2.31 These averages are excluding tests 17 & 18. 14

In order to get a composite understanding of the microbenchmark results, the performance ratios were given equal weight and averaged together. It would have been better to weigh the operations based upon their relative frequency of use, but no data was found that supported an analysis of this type. Before calculating the average ratio, it is important to exclude the three tests described above as outliers. Including them results in an average ratio that is representative of neither the outlier ratios nor the more consistently grouped ratios. The average of the remaining 24 CF with HotSpot to STL microbenchmark ratios is 3.15. This ratio is consistent with the performance comparisons between Java and C++ found in the literature. Compiling the CF to native machine code improved the performance of the CF microbenchmarks by an average factor of 1.70. Thus the compiled CF performed better against the STL than the CF with HotSpot. The average of the 24 compiled CF to STL microbenchmark ratios is 2.31. This is slightly better than what was seen in the literature. It is interesting to note that the CF with HotSpot versus the purely interpreted CF improved the performance of the microbenchmark results by an average factor of 4.21. It's no surprise that some form of JIT technology like HotSpot has become the norm for Java execution. 4.2 Combination Benchmark The combination benchmark is the second part of the suite. A tool was designed to facilitate the creation of these tests. The tool provides the ability to mix and match containers and algorithms tested in the microbenchmark part of the suite to create a single combination test profile. The resulting profile can then be tested with each library. The combination benchmark provides a more realistic simulation of containers specific algorithmic functionality than the microbenchmarks. This is because these tests are performed on a larger scale analogous to application kernels. Normal program overhead, such as Java's garbage collector, is present. For testing purposes, four combination benchmark profiles were selected that represented a range of possible profiles. At one extreme, a profile was created that 15

performs every available operation for each of the six containers an equal number of times. At the other extreme, a profile was created that performed none of the container operations. It simply read the data files and filled a dynamic array container with elements and then moved these elements from one container to the next until each of the six container types had been filled and emptied. The remaining two profiles represent two points within the spectrum bounded by these first two profiles. They were defined by dividing the available operations into categories borrowed from the STL specification of mutating and non-mutating [19]. Mutating operations are operations that modify the elements of the container in some way, either by changing the order of the elements or by changing the values of the elements. Conversely, non-mutating operations do not modify the elements of the container. The profile of mutating operations includes all six containers. The profile of non-mutating operations includes only four containers, because there are not any non-mutating operations available for the Stack and Queue containers. The binary search operation was left out of the non-mutating profile, because a mutating operation (sort) is required before a binary search can be performed. Across the four profiles, container sizes were held constant and each selected operation was performed an equal number times. Also, the order in which the containers were selected for a profile remained constant for each of the four profiles. 4.2.1 Combination Benchmark Results Figure 4 shows the combination benchmark performance ratios for the CF with HotSpot enabled and the compiled CF versus the STL (normalized to 1). The results generally reflect the overall results of the microbenchmarks. The average of four HotSpot combination ratios is 3.15. As was the case with the microbenchmarks, this ratio is consistent with the performance comparisons between Java and C++ found in the literature. The compiled CF combination benchmark was faster than the CF with HotSpot by an average factor of 1.54. The average of the compiled CF to STL combination The combination benchmark was designed to operate on one container at a time, so binary operations found in the microbenchmarks, such as equality, are not present in the combination benchmark. 16

Relative Run-times (STL normalized to 1) benchmark ratios is 2.04. Again, compiling Java narrows the performance gap between Java and C++. 4.00 3.95 3.67 Ratios of CF with HotSpot vs. STL Ratios of compiled CF vs. STL 3.00 2.44 2.24 2.55 2.45 2.00 1.85 1.62 1.00 0.00 All Operations Mutating Operations Non-mutating Operations No Operations Figure 4: Graph of the combination benchmark performance ratios for the CF with HotSpot enabled and the compiled CF (the STL is normalized to 1). 4.2.2 Caveats A few caveats need to be addressed at this point regarding the interpretation of the results reported here. An effort was made to make all tests as equal as possible, but there are some inherent differences in the languages that affect the flexibility and efficient use of the containers described here. For example, the decision to use objects as the type of data stored in the benchmark containers was made to ensure an "apples to apples" comparison of performance. The STL container classes can be used to hold either objects or primitive data types. The CF classes are designed only to hold objects, thus if primitive data type values need to be stored in a container the programmer would be required to perform type conversions and to use wrapper classes, incurring additional 17

overhead. Thus the STL template approach offers some flexibility not available in the Java approach. The C++ code used in the benchmarking was written to be as efficient as possible, not to just mimic Java s behavior. For example, in the C++ code many of the objects created were on the data stack as opposed to being created dynamically on the heap as is done in Java. While it would have been possible to force the C++ solution to mimic the Java behavior, doing so would have artificially degraded the C++ performance and would not have represented best practice in using C++. Programmers generally do not need to know any of the internal details of the STL or CF implementations to use those libraries. However, to use those libraries at maximum efficiency such knowledge may be needed in some cases. The designers of these libraries made tradeoff decisions in the process of selecting their representation techniques. Often their choices may improve the efficiency of one operation at the expense of another. Thus in both languages an awareness of best practice rules is needed to achieve optimal results. 5 Summary and Conclusions Previously reported performance comparisons showed that C++ programs tend to be faster than equivalent JIT-enabled Java programs by a factor of about three. The results presented here showed that the performance of the Collection Framework compared to the STL, based on using HotSpot, had a similar ratio. The degree of variation seen between tests in the microbenchmarks are likely to be due to differences in algorithmic implementations of the library operations and data representations. Although the results of running Java with HotSpot matched the ratios found in other comparisons, the tests revealed that compiled Java code yielded ratios relative to C++ averaging between 2.0 and 2.5. These results are promising for those who do not require code portability, but who do desire the other advantages offered by Java. Furthermore, compared to C++ compilers, the development of static Java compilers is relatively immature. Currently, only a few static compilers are available for Java. As the concept of statically compiling Java applications becomes more popular, there are likely to be significant improvements in static Java compilation technology. 18

This study extends the body of knowledge regarding the performance of JIT enabled Java compared to C++ by providing a methodical comparison of the performance of their respective collection classes. It also provides a data point regarding the performance of compiled Java code to C++ code. References [1] Bernardin, L., Char, B. & Kaltofen, E. (1999). Symbolic Computation in Java: An Appraisement. Proceedings of the 1999 International Symposium on Symbolic and Algebraic Computation, pp. 237-244. [2] Bershad, B. K., Draves, R. P., & Forin, A. (1992). Using Microbenchmarks to Evaluate System Performance. Proceedings of the Third Workshop on Workstation Operating Systems, IEEE Computer Society Press, Los Alamitos, CA, pp. 148-153. [3] Bull, J. M., Smith, L. A., Westhead, M. D., Henty, D. S. & Davey, R. A. (1999). A Methodology for Benchmarking Java Grande Applications. Proceedings of the ACM, 1999 Conference on Java Grande, June, pp. 81-88. [4] Eckel, B. (2000). Thinking in Java, 2nd ed., rev. 9. Upper Saddle River, NJ: Prentice-Hall, Inc. [5] Excelsior, LLC (2001). http://www.excelsior-usa.com/home.html. [6] Fitzgerald, R., Knoblock, T., Ruf, E., Steensgaard, B. & Tarditi, D. (1999). Marmot: An Optimizing Compiler for Java. Microsoft Research, Technical Report MSR-TR-99-33, June 16. [7] Gaudin, S. (1997). Microsoft tries to reel in Java jumpers. Computerworld, July 7, vol. 31, no. 27, p. 20. [8] Heydon, A. & Najork, M. (1999). Performance Limitations of the Java Core Libraries. Proceedings of the ACM, 1999 Conference on Java Grande, June, pp. 35-41. [9] Jain, P., Schmidt, D. C. (1997). Experiences Converting a C++ Communication Software Framework to Java. C++ Report, January, vol. 9, no. 1, pp. 51-66. [10] Jain, P., Widoff, S. & Schmidt, D. C. (1998). The Design and Performance of MedJava. Proceedings of the 4th USENIX Conference on Object Oriented Technologies and Systems (COOTS), April, pp. 1-16. [11] Klemm, R. (1999). Practical Guidelines for Boosting Java Server Performance. Proceedings of the ACM, 1999 Conference on Java Grande, June, pp. 25-34. [12] Mangione, C. (1998). Performance Tests Show Java as Fast as C++. JavaWorld, February, vol. 3, no. 2. Available at http://www.javaworld.com/javaworld/jw-02-1998/jw-02-jperf_p.html. 19

[13] Meyers, S. (2001). Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library. Boston, MA: Addison-Wesley. [14] Musser, D. R. & Stepanov A. (1989). Generic Programming. Invited paper, in P. Gianni, Ed., ISSAC '88 Symbolic and Algebraic Computation Proceedings, Lecture Notes in Computer Science, Springer-Verlag, vol. 358, pp. 13-25. [15] Nelson, M. (1995). C++ Programmer's Guide to the Standard Template Library. Foster City, CA: IDG Books Worldwide, Inc. [16] Phipps, G. (1999). Comparing Observed Bug and Productivity Rates for Java and C++ (abstract only). Software Practice and Experience, vol. 29, no. 4, pp. 345-348. [17] Prechelt, L. (1999). Comparing Java vs. C/C++ Efficiency Differences to Interpersonal Differences. Communications of the ACM, October, vol. 42, no. 10. [18] Roulo, M. (1998). Accelerate your Java apps! JavaWorld, September, vol. 3, no. 9. Available at http://www.javaworld.com/javaworld/jw-09-1998/jw-09-speed.html. [19] Saavedra, R. H. & Smith, A. J. (1996). Analysis of Benchmark Characteristics and Benchmark Performance Prediction. ACM Transactions on Computer Systems, November, vol. 14, no. 4, pp. 344-384. [20] Seltzer, M., Krinsky, D., Smith, K. & Zhang, X. (1999). The Case for Application- Specific Benchmarking. Proceedings of the Seventh Workshop on Hot Topics in Operating Systems, IEEE Computer Society Press, Los Alamitos, CA, pp. 102-107. [21] Stepanov, A. & Lee, M. (1994). The Standard Template Library. Technical Report HPL-94-34, Hewlett-Packard Laboratories. [22] STLport. (2002). http://www.stlport.org [23] Sun Microsystems. (1999). The Java HotSpot Performance Engine Architecture. Whitepaper from Sun Microsystems, Inc. Available at http://java.sun.com/hotspot/whitepaper.html [24] Tyma, P. (1998). Why are we using Java again? Communications of the ACM, June, vol. 41, no. 6, pp. 38-42. [25] Vanhelsuwé, L. (1999). The battle of the container frameworks: which should you use? JavaWorld, January, vol. 4, no. 1. Available at http://www.javaworld.com/javaworld/jw-01-1999/jw-01-jglvscoll.html. [26] Yamauchi, H., Maeda, A. & Kobayashi, H. (2000). Developing a Practical Parallel Multi-pass Renderer in Java and C++. Proceedings of the ACM, 2000 Conference on Java Grande, June, pp. 126-133. 20