Investigating Preprocessor-Based Bugs in C Program Families

Similar documents
Colligens: A Tool to Support the Development of Preprocessor-based Software Product Lines in C

Generating Unit Tests for Checking Refactoring Safety

The C Programming Language course syllabus associate level

MISRA-C:2012 Standards Model Summary for C / C++

How To Use The C Preprocessor

Chapter 13 - The Preprocessor

Pattern Insight Clone Detection

Fully Automated Static Analysis of Fedora Packages

1 Abstract Data Types Information Hiding

Using Eclipse CDT/PTP for Static Analysis

Analysis Programming

Building Embedded Systems

Discovering, Reporting, and Fixing Performance Bugs

The GenomeTools Developer s Guide

Security types to the rescue

Common Errors in C/C++ Code and Static Analysis

Static Code Analysis For Embedded Systems. Master of Science Thesis in Computer Science and Engineering MAGNUS ÅGREN

Logistics. Software Testing. Logistics. Logistics. Plan for this week. Before we begin. Project. Final exam. Questions?

Software Metrics in Static Program Analysis

8.5. <summary> Cppcheck addons Using Cppcheck addons Where to find some Cppcheck addons

Erlang Testing and Tools Survey

How To Write Portable Programs In C

FeatureIDE: An Extensible Framework for Feature-Oriented Software Development

Variable Base Interface

Feature-Oriented Software Development

mbeddr: an Extensible MPS-based Programming Language and IDE for Embedded Systems

Detecting Pattern-Match Failures in Haskell. Neil Mitchell and Colin Runciman York University

agile, open source, distributed, and on-time inside the eclipse development process

Virtuozzo Virtualization SDK

About The Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Purify User s Guide. Version 4.1 support@rational.com

IPC. Semaphores were chosen for synchronisation (out of several options).

Static Code Analysis Procedures in the Development Cycle

Open-source Versus Commercial Software: A Quantitative Comparison

Dalhousie University CSCI 2132 Software Development Winter 2015 Lab 7, March 11

This presentation explains how to monitor memory consumption of DataStage processes during run time.

Applying Clang Static Analyzer to Linux Kernel

Génie Logiciel et Gestion de Projets. Evolution

Linux Kernel. Security Report

The following document contains information on Cypress products.

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

Random Testing: The Best Coverage Technique - An Empirical Proof

ios App Performance Things to Take Care

How To Do Continuous Integration

Statically Checking API Protocol Conformance with Mined Multi-Object Specifications Companion Report

Outline. 1 Denitions. 2 Principles. 4 Implementation and Evaluation. 5 Debugging. 6 References

Oracle Solaris Studio Code Analyzer

PRI-(BASIC2) Preliminary Reference Information Mod date 3. Jun. 2015

CSC408H Lecture Notes

Otto von Guericke University Magdeburg. Faculty of Computer Science Department of Technical and Business Information Systems.

C++ Programming Language

Lecture 11 Doubly Linked Lists & Array of Linked Lists. Doubly Linked Lists

Testing, Debugging, and Verification

Refactoring (in) Eclipse

pure::variants Connector for Source Code Management Manual

Introduction to Static Analysis for Assurance

Device Management API for Windows* and Linux* Operating Systems

Networks and Protocols Course: International University Bremen Date: Dr. Jürgen Schönwälder Deadline:

TOOL EVALUATION REPORT: FORTIFY

Defining and Checking Model Smells: A Quality Assurance Task for Models based on the Eclipse Modeling Framework

How to Write a Checker in 24 Hours

CpSc212 Goddard Notes Chapter 6. Yet More on Classes. We discuss the problems of comparing, copying, passing, outputting, and destructing

Device Management API for Windows* and Linux* Operating Systems

Automating the Porting of Linux to the VirtualLogix Hypervisor using Semantic Patches

An Empirical Study of Out-dated Third-party Code in Open Source Software. Pei Xia

Quality Assurance of Software Models within Eclipse using Java and OCL

Towards practical reactive security audit using extended static checkers 1

Efficient Case Study of Viscuate Defects in the Linux Kernel

Win32 API Emulation on UNIX for Software DSM

Infer: An Automatic Program Verifier for Memory Safety of C Programs

Would Static Analysis Tools Help Developers with Code Reviews?

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Implementing Rexx Handlers in NetRexx/Java/Rexx

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Fortify on Demand Security Review. Fortify Open Review

AWERProcedia Information Technology & Computer Science

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016

Smartphone Security for Android Applications

Cloud9 Parallel Symbolic Execution for Automated Real-World Software Testing

Development Testing for Agile Environments

Design and Code Complexity Metrics for OO Classes. Letha Etzkorn, Jagdish Bansiya, and Carl Davis. The University of Alabama in Huntsville

CiaoPP Demo. Pedro López-García 1, Edison Mera 2 and Teresa Trigo 1. ES PASS Barcelona Jun. 26, 2008

Learning Software Development - Origin and Collection of Data

SWAMP: Removing the Barriers to Adopting and Improving Software Assurance Capabilities

Automatic Patch Generation Learned from Human-Written Patches

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

CSE 403. Performance Profiling Marty Stepp

Using the Remote Access Library

SQLMutation: A tool to generate mutants of SQL database queries

6.s096. Introduction to C and C++

LiveWeb Core Language for Web Applications. CITI Departamento de Informática FCT/UNL

maintainer a web-dashboard for R package maintainers

Software Certification Coding, Code, and Coders

Software Metrics: Roadmap

Measuring the Effect of Code Complexity on Static Analysis Results

BCS THE CHARTERED INSTITUTE FOR IT. BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 6 Professional Graduate Diploma in IT SOFTWARE ENGINEERING 2

Software Quality Exercise 2

Winning the Battle against Automated Testing. Elena Laskavaia March 2016

An Incomplete C++ Primer. University of Wyoming MA 5310

Transcription:

Investigating Preprocessor-Based Bugs in C Program Families Flávio Medeiros PhD Candidate Rohit Gheyi Márcio Ribeiro UFAL May 2014 - Dagstuhl

Dia Linux Xfig PostgreSQL 2

#endif #elif #else #ifdef #error #include CPP Despite the widespread use of the C Preprocessor (CPP), it is often criticised due to: Negative impact on code quality and maintainability Error-prone characteristics 3

Program family: Libpng 1. static void progressive_row(png_structp ppin){ 2. if (new_row!= NULL) { 3. if (y >= dp->h) 4. row = store_image_row(dp->ps, pp, 0, y); 5. #ifdef INTERLACING 6. if (dp->do_interlace){ 7. // Code Here.. 8. } else 9. png_progressive_combine_row(pp, row, new_row); 10. } else if (dp->interlace_type == PNG_INTERLACE_ADAM7) 11. png_error("missing row"); 12. #endif 13.} 4

Syntax error:!interlacing 1. static void progressive_row(png_structp ppin){ 2. if (new_row!= NULL) { 3. if (y >= dp->h) 4. row = store_image_row(dp->ps, pp, 0, y); 5. #ifdef INTERLACING 6. if (dp->do_interlace){ 7. // Code Here.. 8. } else 9. png_progressive_combine_row(pp, row, new_row); 10. } else if (dp->interlace_type == PNG_INTERLACE_ADAM7) 11. png_error("missing row"); 12. #endif 13.} 5

1. static bin_tree_t * parse_bracket_exp (){ 2. // Code Here.. 3. re_bitset_ptr_t sbcset; 4. #ifdef I18N 5. re_charset_t *mbcset; 6. // Code Here.. 7. #endif 8. // Code Here.. 9. sbcset = (re_bitset_ptr_t) calloc (sizeof (unsigned int), BITSET); 10. #ifdef I18N 11. mbcset = (re_charset_t *) calloc (sizeof (re_charset_t), 1); 12. #endif 13. #ifdef I18N 14. if (BE (sbcset == NULL mbcset == NULL, 0)) 15. #else 16. if (BE (sbcset == NULL, 0)) 17. #endif 18. { 19. *err = REG_ESPACE; 20. return NULL; 21. } 22. // Code Here.. 23. } 6

Memory leak: I18N 1. static bin_tree_t * parse_bracket_exp (){ 2. // Code Here.. 3. re_bitset_ptr_t sbcset; 4. #ifdef I18N 5. re_charset_t *mbcset; 6. // Code Here.. 7. #endif 8. // Code Here.. 9. sbcset = (re_bitset_ptr_t) calloc (sizeof (unsigned int), BITSET); 10. #ifdef I18N 11. mbcset = (re_charset_t *) calloc (sizeof (re_charset_t), 1); 12. #endif 13. #ifdef I18N 14. if (BE (sbcset == NULL mbcset == NULL, 0)) 15. #else 16. if (BE (sbcset == NULL, 0)) 17. #endif 18. { 19. *err = REG_ESPACE; 20. return NULL; 21. } 22. // Code Here.. 23. }

Studies relate the CPP usage to bugs However, only a few studies to: Investigate preprocessor-based bugs Quantify to what extent bugs occur in practice 8

Empirical Study Investigate and quantify preprocessor-based bugs in C Program Families 40 Releases 84,000 Commits 22 program families gnuplot 9

Strategies to Detect Preprocessor-Based Bugs

1. Strategy to identify syntax errors TypeChef Program Family c c c h h h TypeChef Report #include h h Xh h External Dependencies Stubs #endif #elif #else #ifdef 11

512 variants 2. Strategy to identify semantic bugs CppCheck Variant 1 Variant 1 Variant 2 Variant N C P P C H E C K Variant 2 Variant N Preprocessor-Based Bugs #elif #ifdef #endif #else 12

Bugs in Software Repositories Files Selected Commit #1 #include <stdio.h> #include <mytypes.h> X X void function #include ( ) <stdio.h> { myint #include x = 10; <mytypes.h> #ifdef ENGLISH printf("value: void function #include %d.", ( ) x); <stdio.h> { #endif myint #include x = 10; <mytypes.h> #ifdef PORTUGUESE #ifdef ENGLISH printf("valor: printf("value: void %d.", function x); %d.", ( ) x); { #endif #endif myint x = 10; } #ifdef PORTUGUESE #ifdef ENGLISH printf("valor: printf("value: %d.", x); %d.", x); #endif #endif } #ifdef PORTUGUESE printf("valor: %d.", x); #endif } X #include <stdio.h> #include <mytypes.h> X X void function #include ( ) <stdio.h> { myint #include x = 10; <mytypes.h> #ifdef ENGLISH printf("value: void function #include %d.", ( ) x); <stdio.h> { #endif myint #include x = 10; <mytypes.h> #ifdef PORTUGUESE #ifdef ENGLISH printf("valor: printf("value: void %d.", function x); %d.", ( ) x); { #endif #endif myint x = 10; } #ifdef PORTUGUESE #ifdef ENGLISH printf("valor: printf("value: %d.", x); %d.", x); #endif #endif } #ifdef PORTUGUESE printf("valor: %d.", x); #endif } X All files of the first commit Commit #2 #include <stdio.h> #include <mytypes.h> void function #include ( ) <stdio.h> { myint #include x = 10; <mytypes.h> #ifdef ENGLISH printf("value: void function #include %d.", ( ) x); <stdio.h> { #endif myint #include x = 10; <mytypes.h> #ifdef PORTUGUESE #ifdef ENGLISH printf("valor: printf("value: void %d.", function x); %d.", ( ) x); { #endif #endif myint x = 10; } #ifdef PORTUGUESE #ifdef ENGLISH printf("valor: printf("value: %d.", x); %d.", x); #endif #endif } #ifdef PORTUGUESE printf("valor: %d.", x); #endif } X #include <stdio.h> #include <mytypes.h> void function ( ) { myint x = 10; #ifdef ENGLISH printf("value: %d.", x); #endif #ifdef PORTUGUESE printf("valor: %d.", x); #endif } X Files that developers change or add.. 13

Research Questions Q1. Do program families contain preprocessor-based bugs? Q2. For how long a preprocessor-based bug remains in commits of a particular source file? Q3. How do program families developers introduce the preprocessor-based bugs? Q4. Do developers introduce more bugs in functions with preprocessor directives? 14

Q1. Do program families contain preprocessor-based bugs? Releases Commits 51 Bugs 8 Bugs 54 Bugs 121 distinct bugs 9 6 6 4 4 15

121 distinct bugs Memory Leak Resource Leak Null Deference Undefined Variable Uninitialised Variable Out of Bounds Access 4% 31% 45% 1% 13% 6% 16

Q2. For how long a preprocessor-based bug remains in commits of a particular source file? It varies from days to years No correlation with the LOC or number of developers 46 errors 17

Q3. How do program families developers introduce the preprocessor-based bugs? Syntax errors Semantic Bugs 46.67% by modifying existing code and adding directives, e.g., to support a new operating system. 44.12% by adding existing new code, e.g., a new source file, or a new function 18

Q4. Do developers introduce more bugs in functions with preprocessor directives? 40 Program Family Releases Total Number of Functions Function with Directives Functions without Directives 25 bugs 11% 89% 5.314 51 bugs 47.263 41.949 19 26 bugs

Tool Support Colligens Eclipse FeatureIDE Strategies to detect preprocessor-based bugs Eclipse plug-in based on FeatureIDE Core TypeChef Defective Evolution Detecting existing problems like bugs 20

Tool Support Colligens Eclipse FeatureIDE Core TypeChef + Refactorings to remove incomplete (undisciplined) annotations without cloning code Perfective Evolution Improve code quality to avoid problem in the future 21

Incomplete Conditions Bad Smell 1. if ( condition1 2. #ifdef expression1 3. && condition2 4. #endif 5. ){ 6. // Statements.. 7. } 1. bool test = condition1; 2. #ifdef expression1 3. test = test && condition2; 4. #endif 5. if (test){ 6. // Statements.. 7. } Precondition: original code is not using variable test in this scope No code cloning No extra lines of code No extra preprocessor-directives 22

Conclusions We find 121 preprocessor-based bugs in program family releases and commits We submit 23 patches to fix bugs, developers accept more than 74% of them Developers introduce more syntax errors when changing code and adding directives Developers introduce more semantic bugs when adding new code Our results suggest that developers introduce more bugs in functions with preprocessor directives

Thanks!