IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

Similar documents
Novel Client Booking System in KLCC Twin Tower Bridge

ibalance-abf: a Smartphone-Based Audio-Biofeedback Balance System

Mobility management and vertical handover decision making in heterogeneous wireless networks

A graph based framework for the definition of tools dealing with sparse and irregular distributed data-structures

Faut-il des cyberarchivistes, et quel doit être leur profil professionnel?

A usage coverage based approach for assessing product family design

An Automatic Reversible Transformation from Composite to Visitor in Java

Territorial Intelligence and Innovation for the Socio-Ecological Transition

A model driven approach for bridging ILOG Rule Language and RIF

QASM: a Q&A Social Media System Based on Social Semantics

Cobi: Communitysourcing Large-Scale Conference Scheduling

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data

Sources: On the Web: Slides will be available on:

The truck scheduling problem at cross-docking terminals

Performance Evaluation of Encryption Algorithms Key Length Size on Web Browsers

Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically

VR4D: An Immersive and Collaborative Experience to Improve the Interior Design Process

The C Programming Language course syllabus associate level

Online vehicle routing and scheduling with continuous vehicle tracking

Advantages and disadvantages of e-learning at the technical university

Aligning subjective tests using a low cost common set

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Expanding Renewable Energy by Implementing Demand Response

Additional mechanisms for rewriting on-the-fly SPARQL queries proxy

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Towards Unified Tag Data Translation for the Internet of Things

Managing Risks at Runtime in VoIP Networks and Services

Study on Cloud Service Mode of Agricultural Information Institutions

Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs

Use of tabletop exercise in industrial training disaster.

Minkowski Sum of Polytopes Defined by Their Vertices

ANIMATED PHASE PORTRAITS OF NONLINEAR AND CHAOTIC DYNAMICAL SYSTEMS

ControVol: A Framework for Controlled Schema Evolution in NoSQL Application Development

Comparing the Effectiveness of Penetration Testing and Static Code Analysis

Introduction to Automated Testing

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

AP Computer Science Java Subset

Contribution of Multiresolution Description for Archive Document Structure Recognition

Overview of model-building strategies in population PK/PD analyses: literature survey.

Global Identity Management of Virtual Machines Based on Remote Secure Elements

An update on acoustics designs for HVAC (Engineering)

Information Technology Education in the Sri Lankan School System: Challenges and Perspectives

Automatic Generation of Correlation Rules to Detect Complex Attack Scenarios

On the beam deflection method applied to ultrasound absorption measurements

Reusable Connectors in Component-Based Software Architecture

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Moving from CS 61A Scheme to CS 61B Java

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Donatella Corti, Alberto Portioli-Staudacher. To cite this version: HAL Id: hal

The P versus NP Solution

Block-o-Matic: a Web Page Segmentation Tool and its Evaluation

Programming Languages CIS 443

Two Flavors in Automated Software Repair: Rigid Repair and Plastic Repair

Comparative optical study and 2 m laser performance of the Tm3+ doped oxyde crystals : Y3Al5O12, YAlO3, Gd3Ga5O12, Y2SiO5, SrY4(SiO4)3O

Habanero Extreme Scale Software Research Project

Introduction to Java

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Pemrograman Dasar. Basic Elements Of Java

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

The Cross Flow Turbine Behavior towards the Turbine Rotation Quality, Efficiency, and Generated Power

GDS Resource Record: Generalization of the Delegation Signer Model

Lecture 5: Java Fundamentals III

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Chapter 1 Java Program Design and Development

ANALYSIS OF SNOEK-KOSTER (H) RELAXATION IN IRON

Web Applications Testing

Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner

Experimental Comparison of Concolic and Random Testing for Java Card Applets

SELECTIVELY ABSORBING COATINGS

Crash Course in Java

Virtual plants in high school informatics L-systems

Transcription:

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs Thomas Durieux, Martin Monperrus To cite this version: Thomas Durieux, Martin Monperrus. IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs. [Research Report] Universite Lille 1. 2016. <hal-01272126> HAL Id: hal-01272126 https://hal.archives-ouvertes.fr/hal-01272126 Submitted on 11 Feb 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs Thomas Durieux, Martin Monperrus February 11, 2016 To refer to this document: Thomas Durieux and Martin Monperrus. IntroClassJava: A Benchmark of 297 Small Java Programs for Automatic Repair, Technical Report #hal-01272126, University of Lille. 2016. Abstract Reproducible and comparative research requires well-designed and publicly available benchmarks. We present IntroClassJava, a benchmark of 297 small Java programs, specified by Junit test cases, and usable by any fault localization or repair system for Java. The dataset is based on the IntroClass benchmark and is publicly available on Github. 1 Introduction Context Reproducible and comparative research requires well-designed and publicly available benchmarks. In the context of research in fault localization and automatic repair, this means benchmarks of buggy programs [5], where each buggy program comes with a specification of the bug, usually as test cases. Le Goues and colleagues [3] have published IntroClass a benchmark of small C programs written by students at the University of California Davis. The programs are 5-20 lines, usually in a single method, and are specified by a set of input-output pairs. Objective We aim at providing a Java version of IntroClass, called Intro- ClassJava. IntroClassJava would allow to run and compare repair systems built for Java, such as Nopol [2]. Method We use source to source transformation for building IntroClass- Java. There are two main transformations: the first one transforms the program itself, and esp. takes care of correctly handling pointers and input parameters. The second one transforms the input-output specification as standard JUnit test cases. We discard the programs that cannot be transformed. The resulting programs are standard Maven projects. 1

Result We obtain a benchmark of 297 Java programs usable by any fault localization or repair systems for Java. The dataset is publicly available on Github [1]. 2 Methodology We present how we build the IntroClassJava benchmark. 2.1 Overview We take the IntroClass benchmark. We remove the duplicate programs, we transform the C programs into Java using an ad hoc transformation. We check that each resulting Java program has the same behavior as the original C program, by executing them using the inputs provided with IntroClass. Also, we transform the input output test cases of the C program into standard JUnit test cases. When the transformed program does not compile, or has an observable behavior that is different from the original one (according to the provided inputoutput specification), we discard it. That is the reason for which IntroClassJava contains less programs than IntroClass. 2.2 Translating Pointers in Java Many programs of IntroClass use pointer manipulation on primitive types, such as *i or &i if i is an integer. We use the following transformation to emulate this behavior in Java. The key idea of the transformation is to encapsulate every primitive value into a class. For instance, for integer this results in: class IntObj { int value ; IntObj IntObj ( int i ) { value = i ;... Then all variable declarations, assignments and expressions are transformed as follows. C code int i = 0 k = i; int* j = &i; Java code IntObj i = IntObj(0) k.value = i.value IntObj j = i We use this schema to handle C types int, float, long, double and char (using respectively IntObj, FloatObj, LongObj, DoublObj, and CharObj). 2

Listing 1: Handling of standard input and output c l a s s Median { Scanner scanner ; // f o r handling inputs both in t e s t and command l i n e S t r i n g output = ; // f o r handling t h e output s t a t i c void main ( S t r i n g [ ] a r g s ) throws Exception { Median mainclass = new Median ( ) ; S t r i n g output ; i f ( a r g s. l e n g t h > 0) { mainclass. s c a n n e r = new j a v a. u t i l. Scanner ( a r g s [ 0 ] ) ; e l s e { // standard i n put mainclass. s c a n n e r = new j a v a. u t i l. Scanner ( System. i n ) ; mainclass. exec ( ) ; System. out. p r i n t l n ( mainclass. output ) ; void exec ( ) throws Exception { // t h e program code 2.3 Testability In IntroClass, the program are tested using standard command line arguments. For instance, to test a program computing the median of three numbers, one calls $ median 4 0 17. However, we aim at achieving standard Java testability using JUnit test cases. Consequently, we transforms the programs and test cases as follows. 2.3.1 Backward-compatible Main Method We aim at supporting both command line arguments and JUnit invocations. This is achieved as follows. The main method of the Java programs takes either one argument or no argument. In the former case, it is meant to be used in a test case, in the latter case, it reads the arguments from the standard input as a Java program. We introduce an exec method for each program that is responsible of doing the actual computation. The exec method is used in test cases directly and in the main method when invoked in command line. Listing 2 shows an example main method. 2.3.2 Test Cases In IntroClass, the specification of the expected behavior is based on input-output pairs, where each input and each output is in a separate file. For instance, file 1.in contains the input of the first input/output pair and fail 1.out the expected output. We write a test case generator that reads those files and create Junit test classes accordingly. Consequently, there is one Junit test method per inputoutput pair. Each test case ends with the assertion corresponding to checking the actual output against the expected one. An example test case is shown in 3

Listing 2: Example of generated test case c l a s s WhiteBoxTest { @Test ( timeout = 1000) // corresponds to f i l e whitebox /1. in of IntroClass public void t e s t 1 ( ) throws Exception { Smallest mainclass = new Smallest ( ) ; S t r i n g expected = P l e a s e e n t e r 4 numbers s e p a r a t e d by s p a c e s > 0 i s the s m a l l e s t ; mainclass. s c a n n e r = new j a v a. u t i l. Scanner ( 0 0 0 0 ) ; mainclass. exec ( ) ; S t r i n g out = mainclass. output. r e p l a c e ( \n, ). trim ( ) ; a s s e r t E q u a l s ( expected. r e p l a c e (, ), out. r e p l a c e (, ) ) ; //.. o t h e r t e s t Listing 2. Note that the original output specification of IntroClass contains the command line invitations, they are kept as is in the generated test cases. IntroClass contains two kinds of tests: blackbox and whitebox. Blackbox are manually written and whitebox ones are generated ones with symbolic execution with the golden implementation as oracle. Consequently, IntroClassJava contains two test classes: WhiteBoxTest and BlackBoxTest. 2.4 Unique Naming In order to be able to analyze and repair all programs of the benchmark in the same virtual machine, we give each program and each test class an unique name based on the original identifier of IntroClass. It follows the convention subject, user, revision. For instance, class smallest 5a568359 004 is a program for problem smallest, written by user 5a568359, for which it was the fourth revision. All programs are in package introclassjava. 2.5 Implementation The transformations are implemented in Python. To analyze C code, we transform it into XML suing srcml [4]. srcml encodes the abstract syntax trees of programs as XML. We then load the resulting XML using a DOM library and generate the Java file using standard string manipulation. The printf formatting instructions are transformed to their Java counterpart which are mostly compatible (e.g. String.format ( %d is the smallest, a.value) ). All programs of IntroClassJava are regular Maven projects. 3 Results Final benchmark After applying the transformation described in Section 2, we obtain a dataset of 297 Java programs, all specified with JUnit test cases, with at least one failing test cases. Table 1 gives the main statistics of this dataset. 4

Group checksum digits grade median smallest syllables Total only black box failing tests only white box failing tests both white box and black box Total # of buggy programs 11 programs 75 programs 89 programs 57 programs 52 programs 13 programs 297 programs 33 programs 39 programs 225 programs 297 programs Table 1: Main descriptive statistics of the IntroClassJava benchmark Project # wb ok # wb ko # bb ok # bb ko # both ko checksum 0 11 4 7 7 digits 15 60 24 51 36 grade 1 88 0 89 88 median 9 48 6 51 42 smallest 7 45 5 47 40 syllables 1 12 0 13 12 Total 33 264 39 258 225 Table 2: Full breakdown by project It first indicates the breakdown according to the subject program. For instance, in checksum, there are 11 programs that are failing. It then indicates the breakdown according to the failure type. For instance, there are 33 programs that which only white tests are failing. Table 2 gives the full breakdown by project where wb means whitebox, bb means blackbox, ko means failing and ok means passing. Comparison with IntroClass In IntroClass, there are 450 failing programs, it means that the automatic transformation has failed for 450 297 = 153 programs. Either the resulting program was not syntactically correct Java and consequently does not compile, or the transformation has changed the execution behavior. Interestingly, for some programs, the transformation itself repaired the bug (the C program fails on some inputs, but the Java version is correct). This is mostly due to the different behavior of the non-initialized variable in C versus Java. In C, the non-initialized local variables have the value of the allocated memory space and in Java, all primitive types have a default value (e.g. 0 for numbers). 5

4 Conclusion We have presented how we have set up IntroClassJava, a benchmark of 297 buggy Java programs. It results from the automated transformation from C to Java of the IntroClass benchmark. This dataset can be used for future research on fault localization and automatic repair on Java programs. The benchmark is publicly available on Github [1]. References [1] The github repository of introclassjava. https://github.com/ Spirals-Team/IntroClassJava, 2015. [2] F. DeMarco, J. Xuan, D. L. Berre, and M. Monperrus. Automatic repair of buggy if conditions and missing preconditions with smt. In Proceedings of the 6th International Workshop on Constraints in Software Testing, Verification, and Analysis (CSTVA 2014), 2014. [3] C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer. The manybugs and introclass benchmarks for automated repair of c programs. IEEE Transactions on Software Engineering (TSE), in press, 2015. [4] J. I. Maletic, M. L. Collard, and A. Marcus. Source code files as structured documents. In Proceedings of the 10th International Workshop on Program Comprehension (IWPC), 2002. [5] M. Monperrus. A critical review of automatic patch generation learned from human-written patches : Essay on the problem statement and the evaluation of automatic software repair. In Proceedings of the International Conference on Software Engineering, pages 234 242, 2014. 6