Do Code Clones Matter?

Similar documents
Industrial Application of Clone Change Management System

Challenges of the Dynamic Detection of Functionally Similar Code Fragments

Automatic vs. Manual Code Analysis

Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study

Incremental Origin Analysis of Source Code Files

Finding Code Clones for Refactoring with Clone Metrics : A Case Study of Open Source Software

Why and How to Control Cloning in Software Artifacts. Elmar Juergens

Pattern Insight Clone Detection

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

Model Clone Detection in Practice

Index-Based Code Clone Detection: Incremental, Distributed, Scalable

Software Engineering. Introduction. Software Costs. Software is Expensive [Boehm] ... Columbus set sail for India. He ended up in the Bahamas...

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

White Paper. Software Development Best Practices: Enterprise Code Portal

Apache Tomcat. Load-balancing and Clustering. Mark Thomas, 20 November Pivotal Software, Inc. All rights reserved.

Towards a Big Data Curated Benchmark of Inter-Project Code Clones

Obtaining Enterprise Cybersituational

Module 1. Introduction to Software Engineering. Version 2 CSE IIT, Kharagpur

Toward Improving Graftability on Automated Program Repair

Pipeline Orchestration for Test Automation using Extended Buildbot Architecture

Continuous Monitoring: Match Your Business Needs with the Right Technique

Testing Metrics. Introduction

IEEE P1633 Software Reliability Best Practices. Nematollah Bidokhti Working Group Member

How to Grow and Transform your Security Program into the Cloud

HP DevOps by Design. Your Readiness for Continuous Innovation Rony Van Hove/ April 2 nd, HP Software: Apps meet Ops 2015

Monitoring Best Practices for

Monitoring Best Practices for COMMERCE

Modern practices TIE-21100/

Definitions. Software Metrics. Why Measure Software? Example Metrics. Software Engineering. Determine quality of the current product or process

Introduction to Software Engineering. 8. Software Quality

Manual. Ticket Center Manual. Ticket Center 2: May 17, AdNovum Informatik AG. Released. AdNovum Informatik AG. All rights reserved.

Data Analytics for a Secure Smart Grid

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS

ClonePacker: A Tool for Clone Set Visualization

Project Management. Lecture 3. Software Engineering CUGS. Spring 2012 (slides made by David Broman) Kristian Sandahl

ITM204 Post-Copy Automation for SAP NetWeaver Business Warehouse System Landscapes. October 2013

Application Performance Monitoring of a scalable Java web-application in a cloud infrastructure

Model-based Quality Assurance of Automotive Software

Measurement Information Model

False Positives & Managing G11n in Sync with Development

Lecture 9. Semantic Analysis Scoping and Symbol Table

Recovering Traceability Links between Requirements and Source Code using the Configuration Management Log *

HDFS Users Guide. Table of contents

Software Project Audit Process

Wireless Sensor Network Performance Monitoring

How To Improve Your Software

Intrusion Detection Systems

Are Comprehensive Quality Models Necessary for Evaluating Software Quality?

Software Verification and System Assurance

BUSINESS RULES MANAGEMENT AND BPM

MASCOT Search Results Interpretation

Global Delivery Excellence Best Practices for Improving Software Process and Tools Adoption. Sunil Shah Technical Lead IBM Rational

Root Cause Analysis for Customer Reported Problems. Topics

Continuous Integration on System z

Chapter 9 Software Evolution

Redefining Oracle Database Management

Auditing a Web Application. Brad Ruppert. SANS Technology Institute GWAS Presentation 1

Client Checklist for Migrations

Technical Report. Implementation and Performance Testing of Business Rules Evaluation Systems in a Computing Grid. Brian Fletcher x

Early Software Product Improvement with Sequential Inspection Sessions: An empirical Investigation of Inspector Capability and Learning Effects

The «SQALE» Analysis Model An analysis model compliant with the representation condition for assessing the Quality of Software Source Code

Lund, November 16, Tihana Galinac Grbac University of Rijeka

! Resident of Kauai, Hawaii

Software Engineering. So(ware Evolu1on

Configuration & Build Management

Application of Next Generation Telecom Network Management Architecture to Satellite Ground Systems

Supplement to Gaming Machine Technical Standards Consultation

A Study on Software Metrics and Phase based Defect Removal Pattern Technique for Project Management

Chapter 6: Episode discovery process

A Practical Approach to Software Continuous Delivery Focused on Application Lifecycle Management

Appendix O Project Performance Management Plan Template

Software Analysis Visualization

Using Data Mining and Machine Learning in Retail

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

1. Introduction. Annex 7 Software Project Audit Process

Mining a Change-Based Software Repository

Design by Contract beyond class modelling

Supporting Document Mandatory Technical Document. Evaluation Activities for Stateful Traffic Filter Firewalls cpp. February Version 1.

IA Metrics Why And How To Measure Goodness Of Information Assurance

ReLink: Recovering Links between Bugs and Changes

Transcription:

Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner Do Code Clones Matter? May 22 nd, 2009 31st ICSE, Vancouver 1

Code Clone 2

Agenda Related Work Empirical Study Detection of inconsistent clones Conclusion 3

Related Work Indicating harmfulness [Lague97]: inconsistent evolution of clones in industrial telecom. SW. [Monden02]: higher revision number for files with clones in legacy SW. [Kim05]: substantial amount of coupled changes to code clones. [Li06], [SuChiu07] and [Aversano07], [Bakota07]: discovery of bugs through search for inconsistent clones or clone evolution analysis. Doubting harmfulness [Krinke07]: inconsistent clones hardly ever become consistent later. [Geiger06]: Failure to statistically verify impact of clones on change couplings [Lozano08]: Failure to statistically verify impact of clones on changeability. a Indication for increased maintenance effort or faults a Does not confirm increased maintenance effort or faults 4

Related Work (2) Limitations of previous studies Indirect measures (e.g. stability of cloned vs. non-cloned code) used to determine effect of cloning are inaccurate Analyzed systems are too small or omit industrial software This Work Manual inspection of inconsistent clones by system developers a No indirect measures of consequences of cloning Both industrial and open source software analyzed Quantitative data 5

Terminology Clone Sequence of normalized statements At least one other occurrence in the code Exact clone Edit distance between clones = 0 Inconsistent clone Edit distance between clones > 0 & below given threshold (Inconsistent) Clone Group Set of clones at different positions (with at least 1 inconsistent clone) Semantic relationship between clones 6

Research Questions RQ1: Are clones changed inconsistently? IC / C RQ2: Are inconsistent clones created unintentionally? UIC / IC RQ3: Can inconsistent clones be indicators for faults in real systems? Clone Groups C (exact and incons.) Inconsistent clone groups IC Unintentionally Inconsistent Clone Groups UIC Faulty clone Groups F F / IC, F / UIC 7

Study Design Clone group candidate detection Novel algorithm Tailored to target program False positive removal Manual inspection of all inconsistent and ¼ exact CCs Performed by researchers Assessment of inconsistencies All inconsistent clone groups inspected Performed by developers CC C, IC UIC, F Tool detected clone group candidates CC Clone groups C (exact and incons.) Inconsistent clone groups IC Unintentionally inconsistent clone groups UIC Faulty clone groups F 8

Detection of Inconsistent Clones Approach Token based for easy adaptation to new (incl. legacy) languages Suffix tree of normalized statements Novel edit-distance based suffix tree traversal algorithm Scalability: 500 kloc: 3m, 5.6 MLOC: 3h Implementation Detection steps implemented as pipeline Configurable for project-specific tailoring Implemented as part of ConQAT clone detection infrastructure 9

Study Objects System Organization Language Age (years) Size (kloc) A Munich Re C# 6 317 B Munich Re C# 4 454 C Munich Re C# 2 495 D LV 1871 Cobol 17 197 Sysiphus TUM Java 8 281 Munich Re International reinsurance company, 37.000 employees LV 1871 Munich-based life-insurance company, 400 employees Sysiphus: Open source collaboration environment for distributed SW development. Developed at TUM. 10

Results (1) Project A B C D Sysip. Sum CC Clone groups C Inconsistent CGs IC 286 159 160 89 326 179 352 151 303 146 1427 724 C IC Unint. Incons. UIC Faulty CGs F 51 19 29 18 66 42 15 5 42 23 203 107 UIC F 11

Discovered Faults System-Crash or Data Loss 17 Exceptions Erroneous transaction handling Unexpected user-visible behavior 44 Wrong messages Inconsistent behavior in similar dialogs/forms Unexpected non-user visible behavior 46 Resource management Exception handling / log messages 12

Results (2) RQ1: Are clones changed inconsistently? RQ2: Are inconsistent clones created unintentionally? RQ3: Can inconsistent clones be indicators for faults? Can unintentionally incons. clones be indicators? System A B C D Sysip. Mean RQ1 IC / C 56% 56% 55% 43% 48% 52% RQ2 UIC / IC 32% 33% 37% 10% 29% 28% RQ3 F / IC 12% 20% 23% 3% 16% 15% F / UIC 37% 62% 64% 33% 55% 50% 13

External Do Code Clones Matter? Internal Construct Threats to Validity Threat Mitigation Analysis of latest version instead of evolution. All inconsistencies of interest, independent of creation time. Developer review error Conservative strategy only makes positive answers harder Clone Detector Configuration Validated during pre-study System selection not random 5 different dev. organizations (impact on transferability) 3 different languages Technically different 14

Study Replication http://wwwbroy.in.tum.de/~ccsm/icse09 Version of ConQAT used for the study (includes both detection and inspection tools) Source code and all results for Sysiphus http://www.conqat.org Apache License ABAP, Ada, C#, C/C++, COBOL, Java, VB, PL/I IDE Integration, Visualizations, 15

Summary Clone detection Scalable algorithm for inconsistent clone detection. Open source implementation (ConQAT). Consequences of code cloning on program correctness Inconsistent clones constituted numerous faults in productive software. Every second unintentional inconsistency constitutes a fault. 16

Conclusion Code clones do matter. 17