Challenges of the Dynamic Detection of Functionally Similar Code Fragments



Similar documents
Do Code Clones Matter?

Why and How to Control Cloning in Software Artifacts. Elmar Juergens

Industrial Application of Clone Change Management System


Cross Platform Mobile. -Vinod Doshi

Incremental Origin Analysis of Source Code Files

Model Clone Detection in Practice

Fast Analytics on Big Data with H20

Optimizing the User Experience of a Social Content Management Software for Casual Users

Host-based Intrusion Prevention on Windows and UNIX. Dr. Rich Murphey White Oak Labs

SourceMeter SonarQube plug-in

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

General Introduction

MAQAO Performance Analysis and Optimization Tool

Pattern Insight Clone Detection

Software Testing. Definition: Testing is a process of executing a program with data, with the sole intention of finding errors in the program.

Introduction to Android Development. Jeff Avery CS349, Mar 2013

Lecture 1 Introduction to Android

The Agile Movement An introduction to agile software development


Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney

Masters programmes in Computer Science and Information Systems. Object-Oriented Design and Programming. Sample module entry test xxth December 2013

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Web Development in Java

Open Source SCADA. A Framework for the Connected World. Remote Monitoring and Control 2014 SCADA Technology Summit. Presented by:

How To Test A Web Based Application Automatically

Developing multi-platform mobile applications: doing it right. Mihail Ivanchev

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

neonex.in Head Office: Mastiska Technology Private Limited, E407, Ivory Towers, Sector 70, SAS Nagar, Mohali

Programming and Software Development CTAG Alignments

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Software Documentation Guidelines

Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison

Unified Big Data Processing with Apache Spark. Matei

Application Portfolio Management

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Android Application Development

Intro to Docker and Containers

CS314: Course Summary

GUI Test Automation How-To Tips

Online Construction of Dynamic Object Process Graphs

RESCO MOBILE CRM QUICK GUIDE. for MS Dynamics CRM. ios (ipad & iphone) Android phones & tablets

Advanced Software Testing

Web Application Development for the SOA Age Thinking in XML

Mobile Application Security

An ACO/VNS Hybrid Approach for a Large-Scale Energy Management Problem

Chapter 6 Experiment Process

Graduate presentation for CSCI By Janakiram Vantipalli ( Janakiram.vantipalli@colorado.edu )

Reconsidering Generic Composition

Getting started with Android and App Engine

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

The Rules 1. One level of indentation per method 2. Don t use the ELSE keyword 3. Wrap all primitives and Strings

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

Introduction to Programming System Design. CSCI 455x (4 Units)

The Visualization Pipeline

Chronon: A modern alternative to Log Files

CSE 403. Performance Profiling Marty Stepp

INTRODUCTION TO ANDROID CSCI 4448/5448: OBJECT-ORIENTED ANALYSIS & DESIGN LECTURE 11 02/15/2011

Minimizing code defects to improve software quality and lower development costs.

Violin: A Framework for Extensible Block-level Storage

Getting Started with Apple Pay on the Authorize.Net Platform

June TerraSAR-X-based Flood Mapping Service

Automatic vs. Manual Code Analysis

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Dealing with Device Fragmentation in Mobile Games Testing. Ru Cindrea - Altom Consulting

Calling R Externally : from command line, Python, Java, and C++

Testing. Chapter. A Fresh Graduate s Guide to Software Development Tools and Technologies. CHAPTER AUTHORS Michael Atmadja Zhang Shuai Richard

Runtime Verification - Monitor-oriented Programming - Monitor-based Runtime Reflection

Software Testing Capabilities in BMC BSM Copyright 2011 Vyom Labs Pvt. Ltd.

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment

Lecture 15 - Digital Signatures

Towards Elastic Application Model for Augmenting Computing Capabilities of Mobile Platforms. Mobilware 2010

Completing the Big Data Ecosystem:

Specialized Android APP Development Program with Java (SAADPJ) Duration 2 months

Big Data & Scripting Part II Streaming Algorithms

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

An Automated Model Based Approach to Test Web Application Using Ontology

Using Library Dependencies for Clustering

Coverage Criteria for Search Based Automatic Unit Testing of Java Programs

Towards Software Configuration Management for Test-Driven Development

Transcription:

Challenges of the Dynamic Detection of Functionally Similar Code Fragments Florian Deissenboeck, Lars Heinemann, Benjamin Hummel, Stefan Wagner Technische Universität München CSMR 2012, Szeged, Hungary

Similarities in Software Negative effects on maintenance unnecessary code size increase Similarities due to copy&paste (code clones) well-researched But what about similarities of independent origin? f(x) f(x) 2

Functional Similarities Not necessarily syntactically similar simion: functionally similar code fragment regarding I/O behavior Cannot be found with clone detection [1] [1] Code Similarities Beyond Copy & Paste, E. Juergens, F. Deissenboeck, B. Hummel, CSMR 2010 3

Motivation for this Research Own experience: functionality is reimplemented in large code bases Automated detection could be as beneficial as clone detection for QA Encouraged by results of existing approach for C systems (Jiang&Su, ISSTA 2011) Goal: Transfer to object-oriented Java systems 4

Disclaimer DON T TRY THIS AT HOME! Detection results of approach unsatisfactory (few simions detected & many false-positives) But: Valuable insights regarding challenges 5

General Idea Execute candidate code fragments on random input and compare output Source Code Detection Pipeline Simions 6

Pipeline Source Code Chunking i1,i2,.. i1,i2,.. i1,i2,.. o1,o2,.. o1,o2,.. o1,o2,.. 7

Pipeline Source Code Chunking Input Generation Deterministic random values for primitive types Recursive algorithm for object types Parameter permutation 100 input vectors for each chunk 8

Pipeline Source Code Chunking Input Generation Execute each chunk with all input vectors Security Manager to prevent harmful operations Execution 9

Pipeline Source Code Chunking Input Generation Execution Compare outputs for each output variable Simion if inputs and outputs for all executions are equal Filter clones and trivial I/O behavior Comparison 10

Case Study: Study Objects System SLOC Commons Lang 17,504 Freemind 51,762 Jabref 74,586 Jetty 29,800 JHotDraw 78,902 Info1 : Collection of 109 student implementations of an e-mail address validator (each 8..55 statements) 11

Number of Chunks Each chunk has to pass through pipeline Direct impact on processing time Experiments with 3 chunking strategies void foo () { void foo () { void foo () { } } } method sliding window intent 12

Number of Chunks (2) System Method Sl. Wind. Intent Commons Lang 1,538 7,940 1,843 Freemind 1,984 80,816 20,632 Jabref 2,085 133,556 21,388 Jetty 1,457 22,006 7,713 JHotDraw 2,813 211,283 16,221 13

Technical Challenges Parameter types are interfaces, abstract classes or untyped collections (37%-94% of chunks) Call to UI, networking or I/O API functions (5%-60% of chunks) 14

Effectiveness of Approach Full detection pipeline How many chunks pass each pipeline step? How many simions are detected? 15

Effectiveness (Student programs) Pipeline Step #Chunks Chunk Extraction 240 Input Generation 134 Execution 133 Comparison 105 = #simions method-based chunking 16

Effectiveness (Jabref) Pipeline Step #Chunks Chunk Extraction 2,085 Input Generation 1,542 Execution 621 Comparison 55 = #simions method-based chunking 17

Manual Inspection of Simions [not in paper] System Detected Confirmed Commons Lang 54 33 (61%) Freemind 11 7 (64%) Jabref 55 9 (16%) Jetty 15 0 (0%) JHotDraw 18 2 (11%) Overall 153 51 (33%) method-based chunking 18

Detected Simions 19

Detected Simions (2) 20

Differences to Jiang&Su Input generation complicated for Java (e.g. interfaces) Jiang&Su replace function calls with random values We filter trivial simions, e.g. identity function (occurs often) Linux kernel code vs. Java GUI apps 21

Discussion Low detection results: no simions or flaws in detection? What is the recall? (infeasible to determine manually) Main limitation: random testing approach no input or generated input does achieve sufficient code coverage can potentially addressed with white-box methods Notion of I/O similarity may not be suitable e.g different data types or signatures 22

Thank you. Questions?