FIELD FAILURE REPRODUCTION USING SYMBOLIC EXECUTION AND GENETIC PROGRAMMING

Similar documents
Coverage Criteria for Search Based Automatic Unit Testing of Java Programs

SMock A Test Platform for the Evaluation of Monitoring Tools

Outline. 1 Denitions. 2 Principles. 4 Implementation and Evaluation. 5 Debugging. 6 References

A Cross-browser Web Application Testing Tool 1

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS

Introduction to Java and Eclipse

Evolutionary Testing of PHP Web Applications with WETT

Automatic Patch Generation Learned from Human-Written Patches

CS 2302 Data Structures Spring 2015

1. What are Data Structures? Introduction to Data Structures. 2. What will we Study? CITS2200 Data Structures and Algorithms

Do Code Clones Matter?

Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder

Applications of Machine Learning in Software Testing. Lionel C. Briand Simula Research Laboratory and University of Oslo

Search-based Data-flow Test Generation

A Platform Independent Testing Tool for Automated Testing of Web Applications

A GUI Crawling-based technique for Android Mobile Application Testing

Anomaly-Based Bug Prediction, Isolation, and Validation: An Automated Approach for Software Debugging

AUTOMATED TEST CASE GENERATION TO VALIDATE NON-FUNCTIONAL SOFTWARE REQUIREMENTS. Pingyu Zhang A DISSERTATION. Presented to the Faculty of

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

Integrating Artificial Intelligence. Software Testing

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

Enterprise Service Bus

Obfuscation: know your enemy

Docovery: Toward Generic Automatic Document Recovery

Open Source Fault Location in the Real World

Peach Fuzzer Platform

Formal Modeling and Reasoning about the Android Security Framework

Documentum Developer Program

Data Networks Project 2: Design Document for Intra-Domain Routing Protocols

ASSURE: Automatic Software Self- Healing Using REscue points. Automatically Patching Errors in Deployed Software (ClearView)

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

WebMate: Generating Test Cases for Web 2.0

PLG: a Framework for the Generation of Business Process Models and their Execution Logs

Techniques and Tools for Rich Internet Applications Testing

TECHNOLOGY Computer Programming II Grade: 9-12 Standard 2: Technology and Society Interaction

OFBiz Addons goals, howto use, howto manage. Nicolas Malin, Nov. 2012

Code Change Impact Analysis for Testing Configurable Software Systems. Mithun Acharya ABB Corporate Research Raleigh NC USA

Computing Concepts with Java Essentials

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

How To Test A Web Based Application Automatically

Debugging. Common Semantic Errors ESE112. Java Library. It is highly unlikely that you will write code that will work on the first go

Performance Monitoring API for Java Enterprise Applications

A Practical Method to Diagnose Memory Leaks in Java Application Alan Yu

Zend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues

Evotec: Evolving the Best Testing Strategy for Contract-Equipped Programs

Towards a Big Data Curated Benchmark of Inter-Project Code Clones

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

<Insert Picture Here> Java Application Diagnostic Expert

GUI Test Automation How-To Tips

Software testing. Objectives


CHAPTER 1 INTRODUCTION

CSCI E 98: Managed Environments for the Execution of Programs

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation

Effective Bug Tracking Systems: Theories and Implementation

CHAPTER 5 STAFFING LEVEL AND COST ANALYSES FOR SOFTWARE DEBUGGING ACTIVITIES THROUGH RATE- BASED SIMULATION APPROACHES

Chapter 5. Recursion. Data Structures and Algorithms in Java

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

Scalable Dynamic Information Flow Tracking and its Applications

Cloud Computing. Up until now

Supply Chain Management Use Case Model

Dsc+Mock: A Test Case + Mock Class Generator in Support of Coding Against Interfaces

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Basic Programming and PC Skills: Basic Programming and PC Skills:

How To Write A Test Generation Program

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

Oracle Insurance Policy Administration. Version

Regression Verification: Status Report

Survey Reproduction of Defect Reporting in Industrial Software Development

Unit title: Computer Games: Programming (SCQF level 6)

Simplifying Failure-Inducing Input

QUIZ-II QUIZ-II. Chapter 5: Control Structures II (Repetition) Objectives. Objectives (cont d.) 20/11/2015. EEE 117 Computer Programming Fall

where HCI and software engineering meet Andy Ko Information School * why you should work with Andy

A similarity-based approach for test case prioritization using historical failure data

Alpha Cut based Novel Selection for Genetic Algorithm

Predicting Software Complexity by Means of Evolutionary Testing

EventBreak: Analyzing the Responsiveness of User Interfaces through Performance-Guided Test Generation

Global Constraints in Software Testing Applications

Recent Issues in Software Testing: Part B

Talend Open Studio for ESB. Release Notes 5.2.1

Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney

Comparing the effectiveness of automated test generation tools EVOSUITE and Tpalus

DYNAMIC test generation tools, such as DART [17], Cute

Load Balancing in Distributed System. Prof. Ananthanarayana V.S. Dept. Of Information Technology N.I.T.K., Surathkal

Measuring the Effect of Code Complexity on Static Analysis Results

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

CS 241 Data Organization Coding Standards

Evolutionary Approach to Coverage Testing of IEC Function Block Applications

Optimizing Generation of Object Graphs in Java PathFinder

Transparent Monitoring of a Process Self in a Virtual Environment

Oracle Solaris Studio Code Analyzer

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Gadget: A Tool for Extracting the Dynamic Structure of Java Programs

Introduction to Programming System Design. CSCI 455x (4 Units)

Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Improving Knowledge-Based System Performance by Reordering Rule Sequences

Transcription:

FIELD FAILURE REPRODUCTION USING SYMBOLIC EXECUTION AND GENETIC PROGRAMMING Alessandro (Alex) Orso School of Computer Science College of Computing Georgia Institute of Technology Partially supported by: NSF, IBM, and MSR

DSE SBST FIELD FAILURE REPRODUCTION USING SYMBOLIC EXECUTION AND GENETIC PROGRAMMING Alessandro (Alex) Orso School of Computer Science College of Computing Georgia Institute of Technology Partially supported by: NSF, IBM, and MSR

DSE SBST FIELD FAILURE REPRODUCTION USING SYMBOLIC EXECUTION AND GENETIC PROGRAMMING Alessandro (Alex) Orso School of Computer Science College of Computing Georgia Institute of Technology Partially supported by: NSF, IBM, and MSR

DSE SBST FIELD FAILURE REPRODUCTION USING SYMBOLIC EXECUTION AND GENETIC PROGRAMMING Field failures are Alessandro (Alex) Orso unavoidable! School of Computer Science College of Computing Georgia Institute of Technology Partially supported by: NSF, IBM, and MSR

DSE SBST FIELD FAILURE REPRODUCTION USING SYMBOLIC EXECUTION AND GENETIC PROGRAMMING Field failures are Alessandro (Alex) Orso unavoidable! School of Computer Science College of Computing Georgia Institute of Technology Partially supported by: NSF, IBM, and MSR

TYPICAL DEBUGGING PROCESS Bug Repository Very hard to (1) reproduce (2) debug

TYPICAL DEBUGGING PROCESS Recent survey of Apache, Eclipse, and Mozilla developers: Information on how to reproduce field failures is the most valuable, and difficult to obtain, piece of information for investigating such failures. [Zimmermann10] Bug Repository Very hard to (1) reproduce (2) debug

TYPICAL DEBUGGING PROCESS Recent survey of Apache, Eclipse, and Mozilla developers: Information on how to reproduce field failures is the most valuable, and difficult to obtain, piece of information for investigating such failures. [Zimmermann10] Bug Repository OVERARCHING GOAL: help developers (1) investigate field failures, (2) understand their causes, and (3) eliminate such causes. Very hard to (1) reproduce (2) debug

OUR WORK SO FAR Recording and replaying executions [icsm 2007, icse 2007] Input minimization [woda 2006, icse 2007] Input anonymization [icse 2011] Mimicking field failures [icse 2012, icst 2014] Explaining field failures [issta 2013, TR]

MIMICKING FIELD FAILURES User run (R) Mimicked run (R ) F is analogous to F R is an actual execution in the field F F in house

MIMICKING FIELD FAILURES User run (R) Relevant events (breadcrumbs) Mimicked run (R )

In house OVERALL VISION In the field Software developer Application Instrumentation sed.c:8958 -> sed.c: 8958 sed.c:8993 -> sed.c: 9011 sed.c:8785 -> sed.c: 8786 sed.c:8786 -> sed.c: 8786 sed.c:990 -> sed.c: 990 Likely faults Field Failure Debugging Synthesized Executions Field Failure Reproduction Crash report (execution data)

DSE BUGREDUX/SBFR SBST Synthesized Executions Field Failure Reproduction Crash report (execution data)

Joint work with Wei Jin BUGREDUX Synthesized Executions Crash report (execution data)

Joint work with Wei Jin BUGREDUX Test Input Crash report (execution data)

Joint work with Wei Jin BUGREDUX Candidate input Test Input Oracle Input generator Crash report (execution data) Execution data Point of failure (POF) Failure call stack Call sequence Complete trace Input generation technique Guided symbolic execution

ALGORITHM (SIMPLIFIED) Input icfg for P goals (list of code locations) Output If (candidate input) Main algorithm init; currgoal = first(goals) repeat currstate = SelNextState() if (!currstate) backtrack or fail if (currstate.cl == currgoal) if (currgoal == last(goals)) return solve(currstate.pc) else currgoal = next(goals) currstate.goal = currgoal symbolicallyexec(currstate) statesset= {<cl, pc, ss, goal>} SelNextState mindis = retstate = null foreach state in statesset if (state.goal = currgoal) if (state.cl can reach currgoal) d = shortest path state.cl, currgoal if d < mindis mindis = d retstate = state return retstate

ALGORITHM (SIMPLIFIED) Input icfg for P goals (list of code locations) Output If (candidate input) statesset= {<cl, pc, ss, goal>} Main algorithm init; currgoal = first(goals) repeat currstate = SelNextState() if (!currstate) backtrack or fail if (currstate.cl == currgoal) if (currgoal == last(goals)) return solve(currstate.pc) else currgoal = next(goals) currstate.goal = currgoal symbolicallyexec(currstate) SelNextState mindis = retstate = null Optimizations/Heuristics Dynamic tainting to reduce the symbolic input space Program analysis information to prune the search space Some randomness in the shortest path computation foreach state in statesset if (state.goal = currgoal) if (state.cl can reach currgoal) d = shortest path state.cl, currgoal if d < mindis mindis = d retstate = state return retstate

BUGREDUX EVALUATION FAILURES CONSIDERED Name Repository Size(KLOC) # Faults sed SIR 14 2 grep SIR 10 1 gzip SIR 5 2 ncompress BugBench 2 1 polymorph BugBench 1 1 aeon exploit-db 3 1 glftpd exploit-db 6 1 htget exploit-db 3 1 socat exploit-db 35 1 tipxd exploit-db 7 1 aspell exploit-db 0.5 1 exim exploit-db 241 1 rsync exploit-db 67 1 xmail exploit-db 1 1

BUGREDUX EVALUATION FAILURES CONSIDERED Name Repository Size(KLOC) # Faults sed SIR 14 2 grep SIR 10 1 gzip SIR 5 2 ncompress BugBench 2 1 polymorph BugBench 1 1 aeon exploit-db 3 1 glftpd exploit-db 6 1 htget exploit-db 3 1 socat exploit-db 35 1 tipxd exploit-db 7 1 aspell exploit-db 0.5 1 exim exploit-db 241 1 rsync exploit-db 67 1 xmail exploit-db 1 1 None of these faults can be discovered by a vanilla KLEE with a timeout of 72 hours

BUGREDUX EVALUATION RESULTS Name POF Call Stack Call Seq. Compl. Trace sed #1 sed #2 grep gzip #1 gzip #2 ncompress polymorph aeon rsync glftpd htget socat tipxd aspell xmail exim One of three outcomes: : fail : synthesize : (synthesize and) mimic

BUGREDUX Synth.: 9/16 EVALUATION Synth.: 10/16 Synth.: 16/16 RESULTS Synth.: 2/16 Mimic: 6/16 Mimic: 6/16 Mimic: 16/16 Mimic: 2/16 Name POF Call Stack Call Seq. Compl. Trace sed #1 sed #2 grep gzip #1 gzip #2 ncompress polymorph aeon rsync glftpd htget socat tipxd aspell xmail exim

BUGREDUX Synth.: 9/16 EVALUATION Synth.: 10/16 Synth.: 16/16 RESULTS Synth.: 2/16 Mimic: 6/16 Name POF Call Stack Call Seq. Compl. Trace sed #1 sed #2 grep gzip #1 gzip #2 ncompress polymorph aeon rsync glftpd htget socat tipxd aspell xmail exim Observations: Faults can be distant from the failure points: => POFs and call stacks unlikely to help More information is not always better Symbolic execution can be a limiting factor Mimic: 6/16 Mimic: 16/16 Mimic: 2/16

BUGREDUX Synth.: 9/16 EVALUATION Synth.: 10/16 Synth.: 16/16 RESULTS Synth.: 2/16 Mimic: 6/16 Name POF Call Stack Call Seq. Compl. Trace sed #1 sed #2 grep gzip #1 gzip #2 ncompress polymorph aeon rsync glftpd htget socat tipxd aspell xmail exim Observations: Symbolic execution can be ineffective for Faults can be distant from the failure points: => POFs and call stacks unlikely to help programs with highly structured inputs programs that interact with external libraries large complex programs in general More information is not always better Symbolic execution can be a limiting factor Mimic: 6/16 Mimic: 16/16 Mimic: 2/16

Joint work with Kifetew, Jin, Tiella, Tonella SBFR Crash report (execution data) Execution data Call sequence Test Input Input generation technique Genetic Programming

Joint work with Kifetew, Jin, Tiella, Tonella SBFR <a> ::= <b> λ Grammar Crash report (execution data) Test Input

Joint work with Kifetew, Jin, Tiella, Tonella SBFR <a> ::= <b> λ Grammar Derivation Tree Crash report (execution data) Genetic Programming Sentence derivation from the grammar: Random application of grammar rules Uniform 80/20 Stochastic (from a corpus) Test Input

Joint work with Kifetew, Jin, Tiella, Tonella SBFR <a> ::= <b> λ Grammar Derivation Tree Crash report (execution data) Genetic Programming Evolution: Fitness function: Sentence derivation from Distance the grammar: b/w execution traces Random application of grammar (candidate actual rules failure) Uniform 80/20 Stochastic (from a corpus) Test Input

Joint work with Kifetew, Jin, Tiella, Tonella SBFR <a> ::= <b> λ Grammar Derivation Tree Crash report (execution data) Genetic Programming Stopping criterion: Success Ic reaches the point of failure The program fails in the same way Search budget exhausted Test Input

SBFR EVALUATION FAILURES CONSIDERED Name Language Size(KLOC) # Productions # Faults calc Java 2 38 2 bc C 12 80 1 MSDL Java 13 140 5 PicoC C 11 194 1 Lua C 17 106 2

SBFR EVALUATION FAILURES CONSIDERED Name Language Size(KLOC) # Productions # Faults calc Java 2 38 2 bc C 12 80 1 BugRedux was unable to reproduce any of these failures with a timeout of 72 hours MSDL Java 13 140 5 PicoC C 11 194 1 Lua C 17 106 2

SBFR EVALUATION RESULTS Name FRP (SBFR) FRP (Random) calc bug 1 0.0 calc bug 2 0.0 Parameters: bc Population: 0.0 500 MSDL bug 1 Budget: 0.010,000 unique MSDL bug 2 fitness 0.0 evaluations MSDL bug 3 Performed 10 runs 1.0 Measured failure MSDL bug 4 reproduction 0.0 probability MSDL bug 5 Used both 0.0 80/20 and PicoC stochastic 0.1derivations Lua bug 1 0.0 Lua bug 2 0.0

SBFR EVALUATION RESULTS Name FRP (SBFR) FRP (Random) calc bug 1 0.6 0.0 calc bug 2 0.8 0.0 bc 1.0 0.0 MSDL bug 1 1.0 0.0 MSDL bug 2 1.0 0.0 MSDL bug 3 1.0 1.0 MSDL bug 4 1.0 0.0 MSDL bug 5 1.0 0.0 PicoC 0.8 0.1 Lua bug 1 0.0 0.0 Lua bug 2 0.5 0.0

SBFR EVALUATION RESULTS Name FRP (SBFR) FRP (Random) calc bug 1 0.6 0.0 calc bug 2 0.8 0.0 bc 1.0 0.0 MSDL bug 1 1.0 0.0 MSDL bug 2 1.0 0.0 MSDL bug 3 1.0 1.0 MSDL bug 4 1.0 0.0 MSDL bug 5 1.0 0.0 PicoC 0.8 0.1 Lua bug 1 0.0 0.0 Lua bug 2 0.5 0.0

SBFR EVALUATION RESULTS Name FRP (SBFR) FRP (Random) calc bug 1 0.6 0.0 calc bug 2 0.8 0.0 bc 1.0 0.0 MSDL bug 1 1.0 0.0 Example: failure in bc MSDL bug 2 1.0 0.0 MSDL bug 3 1.0 1.0 segmentation fault triggered by an instruction sequence that allocates at least 32 arrays and declares a number of variables higher than the number of allocated arrays MSDL bug 4 1.0 0.0 MSDL bug 5 1.0 0.0 PicoC 0.8 0.1 Lua bug 1 0.0 0.0 Lua bug 2 0.5 0.0

SBFR EVALUATION RESULTS Name FRP (SBFR) FRP (Random) calc bug 1 0.6 0.0 calc bug 2 0.8 0.0 bc 1.0 0.0 Observations: MSDL bug 1 1.0 0.0 Example: failure in bc MSDL bug 2 1.0 0.0 Search-based approaches can be effective in cases that symbolic execution cannot handle segmentation fault triggered by an instruction sequence that allocates at least 32 arrays and declares => SBST a number and DSE of variables are complementary, higher than the rather number than of allocated alternative arrays techniques MSDL bug 3 1.0 1.0 Stochastic grammars are effective SBST more scalable, but less directed MSDL bug 4 1.0 0.0 MSDL bug 5 1.0 0.0 PicoC 0.8 0.1 Lua bug 1 0.0 0.0 Lua bug 2 0.5 0.0

FUTURE WORK / FOOD FOR THOUGHTS Relevant execution data identification Which types? Which specific ones? Failure explanation Reproduction is not enough Can DSE and SBST help? Use of different input generation techniques Grammar-based symbolic execution Backward symbolic analysis? Other SBST approaches? SBST targeted at different kinds of programs? Combination of techniques