AUTOMATED UNIT TEST GENERATION DURING SOFTWARE DEVELOPMENT A Controlled Experiment and Think-aloud Observations



Similar documents
Coverage Criteria for Search Based Automatic Unit Testing of Java Programs

Automated Unit Test Generation during Software Development: A Controlled Experiment and Think-Aloud Observations

SEVIZ: A Tool for Visualizing Symbolic Execution

How To Test For A Program

Optimised Realistic Test Input Generation

A Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study

Search Algorithm in Software Testing and Debugging

Test Data Generation for Web Applications: A Constraint and Knowledge-based Approach

Optimised Realistic Test Input Generation Using Web Services

Supporting Developers in Making Successful Apps

GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES

Accelerate Software Delivery

Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization

Test Coverage and Risk

Comparing Methods to Identify Defect Reports in a Change Management Database

Search-based Data-flow Test Generation

Keywords document, agile documentation, documentation, Techno functional expert, Team Collaboration, document selection;

Fail early, fail often, succeed sooner!

Model Driven Testing AGEDIS Architecture Interfaces and Tools

Making Oracle EBS R12 Upgrades Manageable

The Theory of Software Testing

Supporting Continuous Integration by Code-Churn Based Test Selection

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization

Generation of test cases from functional requirements. A survey

UML-based Test Generation and Execution

ENSC 427: COMMUNICATION NETWORKS Simulation and Analysis of Wi-Fi Performance in campus network

Software Test Automation in Practice: Empirical Study from Sri Lanka

Safe Harbor Statement

[Rokadiya,5(4): October-December 2015] ISSN Impact Factor

Random Testing: The Best Coverage Technique - An Empirical Proof

Orthogonal Defect Classification in Agile Development

Web Application Regression Testing: A Session Based Test Case Prioritization Approach

Software Module Test for an Electronic Steering Lock

1. Systematic literature review

Understandings and Implementations of Continuous Delivery

Project & Portfolio Management. The Future is Now!

Value of storage in providing balancing services for electricity generation systems with high wind penetration

AGILE SOFTWARE DEVELOPMENT A TECHNIQUE

Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency

A Brief Overview of Software Testing Techniques and Metrics

Effective Team Development Using Microsoft Visual Studio Team System

Stake Holders Perceptions in a Campus Recruitment Process

Testing Best Practices

Whole Test Suite Generation

How To Write A Test Generation Program

Exam Preparation and Clinical Decision-making in the Dawson Nursing Program

Basic Testing Concepts and Terminology

TMMi Case Study. Methodology. Scope. Use TMMi to do a gap analysis for an independent

Comparative Study of Automated testing techniques for Mobile Apps

Genetic Algorithm Based Test Data Generator

Computer Network Management: Best Practices

Sreerupa Sen Senior Technical Staff Member, IBM December 15, 2013

CUT COSTS, NOT PROJECTS

Service-oriented architectures (SOAs) support

Different Approaches to White Box Testing Technique for Finding Errors

Software Development Process

Testhouse Training Portfolio

Implementing Ant Colony Optimization for Test Case Selection and Prioritization

New Zealand Company Six full time technical staff Offices in Auckland and Wellington

Sofware Requirements Engineeing

Software Metrics in Static Program Analysis

Development Best Practices

Implementation and Security Development Online Exam, Performances and problems for Online University Exam

Dynamic Generation of Test Cases with Metaheuristics

Standards in health informatics

A Case Study in Test Management

Improving Government Websites and Surveys With Usability Testing and User Experience Research

Fuzzy Cognitive Map for Software Testing Using Artificial Intelligence Techniques

Automated Testing of a Fully Automated Greenfield Terminal: A Case Study

FSW QA Testing Levels Definitions

# # % &# # ( # ) + #, # #./0 /1 & 2 % & 6 4 & 4 # 6 76 /0 / 6 7 & 6 4 & 4 # // 8 / 5 & /0 /# 6222 # /90 8 /9: ; & /0 /!<!

Development of Expertise in Complex Domains

SEARCH-BASED SOFTWARE TEST DATA GENERATION USING EVOLUTIONARY COMPUTATION

Transcription:

AUTOMATED UNIT TEST GENERATION DURING SOFTWARE DEVELOPMENT A Controlled Experiment and Think-aloud Observations ISSTA 2015 José Miguel Rojas j.rojas@sheffield.ac.uk Joint work with Gordon Fraser and Andrea Arcuri

Testing is a widespread validation approach in industry, but it is still largely ad hoc, expensive, and unpredictably effective. Software Testing Research: Achievements, Challenges, Dreams, A. Bertolino. Future of Software Engineering. IEEE. 2007.

Testing is a widespread validation approach in industry, but it is still largely ad hoc, expensive, and unpredictably effective. Software Testing Research: Achievements, Challenges, Dreams, A. Bertolino. Future of Software Engineering. IEEE. 2007. Test case generation has a strong impact on the effectiveness and efficiency of testing. one of the most active research topics in software testing for several decades, resulting in many different approaches and tools. An orchestrated survey of methodologies for automated software test case generation, S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M.B. Cohen, W. Grieskamp, M. Harman, M.J. Harrold, P. McMinn. J. Systems and Software. Elsevier. 2013.

BACK IN ISSTA 2013 Does automated white-box test generation really help software testers?, G. Fraser, M. Staats, P. McMinn, A. Arcuri and F. Padberg

BACK IN ISSTA 2013 ARE UNIT TEST GENERATION TOOLS HELPFUL TO DEVELOPERS WHILE THEY ARE CODING? Does automated white-box test generation really help software testers?, G. Fraser, M. Staats, P. McMinn, A. Arcuri and F. Padberg

CODE COVERAGE

CODE COVERAGE TIME SPENT ON TESTING

CODE COVERAGE TIME SPENT ON TESTING IMPLEMENTATION QUALITY

CONTROLLED EXPERIMENT

CONTROLLED EXPERIMENT Golden Implementation

CONTROLLED EXPERIMENT Golden Implementation Class Template

CONTROLLED EXPERIMENT Golden Implementation Class Template Implementation

CONTROLLED EXPERIMENT Golden Implementation Class Template Implementation

CONTROLLED EXPERIMENT Manual Golden Implementation Class Template Implementation

CONTROLLED EXPERIMENT Manual Golden Implementation Class Template 1 hour Implementation

CONTROLLED EXPERIMENT Manual Golden Implementation Class Template 1 hour Implementation 41

CONTROLLED EXPERIMENT Manual Golden Implementation Class Template 1 hour Implementation 41

CONTROLLED EXPERIMENT Manual Golden Implementation Class Template 1 hour Implementation 41 2

CONTROLLED EXPERIMENT Manual Golden Implementation Class Template 1 hour Implementation 41 2 4

DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO TEST SUITES WITH HIGHER CODE COVERAGE? RQ 1

CODE COVERAGE participants test suites run on their own implementations 100% Assisted Manual 80% 83% Branch Coverage 60% 40% 63% 57% 39% 41% 38% 50% 20% 26% 0% FilterIterator FixedOrderComparator ListPopulation PredicatedMap

CODE COVERAGE Times coverage was checked Times Coverage was checked 10 8 6 4 2 0 Assisted Manual 9.6 9 6 5.9 6.4 5.3 4 1.9 FilterIterator FixedOrderComparator ListPopulation PredicatedMap Category Axis

CODE COVERAGE participant s test suites run on their own implementations Branch Coverage (%) 100% 75% 50% 25% 0% ListPopulation 0 10 20 30 40 50 60 Manual EvoSuite Assisted Time (min)

CODE COVERAGE participant s test suites run on their own implementations Branch Coverage (%) 100% 75% 50% 25% 0% ListPopulation 0 10 20 30 40 50 60 Manual EvoSuite Assisted Time (min)

CODE COVERAGE participant s test suites run on their own implementations Branch Coverage (%) 100% 75% 50% 25% 0% ListPopulation 0 10 20 30 40 50 60 Manual EvoSuite Assisted Time (min)

CODE COVERAGE participant s test suites run on their own implementations Branch Coverage (%) 100% 75% 50% 25% 0% ListPopulation 0 10 20 30 40 50 60 Manual EvoSuite Assisted Time (min)

CODE COVERAGE participants test suites run on golden implementations 100% Assisted Manual 80% Branch Coverage 60% 40% 20% 41% 30% 35% 21% 37% 28% 42% 50% 0% FilterIterator FixedOrderComparator ListPopulation PredicatedMap

CODE COVERAGE participant s test suites run on golden implementations, over time Branch Coverage Branch Coverage 100% 75% 50% 25% 0% 100% 75% 50% 25% 0% FilterIterator FixedOrderComparator 0 10 20 30 40 50 60 Time (min) Assisted Manual EvoSuite-generated ListPopulation PredicatedMap 0 10 20 30 40 50 60 Time (min) Branch Coverage Branch Coverage 100% 75% 50% 25% 0% 0 10 20 30 40 50 60 Time (min) 100% 75% 50% 25% 0% 0 10 20 30 40 50 60 Time (min)

CODE COVERAGE participant s test suites run on golden implementations, over time Branch Coverage Branch Coverage 100% 75% 50% 25% Assisted Manual EvoSuite-generated FilterIterator FixedOrderComparator 0% depending on how the generated 0% tests are used. 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time (min) Time (min) ListPopulation PredicatedMap 100% 75% 50% 25% 0% 0 10 20 30 40 50 60 Time (min) Branch Coverage Branch Coverage 100% 75% Coverage can be higher when using EvoSuite, 50% 25% 100% 75% 50% 25% 0% 0 10 20 30 40 50 60 Time (min)

DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO DEVELOPERS SPENDING MORE OR LESS TIME ON TESTING? RQ 2

TESTING EFFORT Number of test runs 14 11 13.7 12.9 11 Assisted Manual 8 6 3 7.8 7.2 8.2 4.4 5.6 0 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

TESTING EFFORT Minutes spent on testing 26 25 Assisted Manual 21 16 10 5 18.5 20 9.3 15.8 12.6 7.7 14.3 0 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

TESTING EFFORT Minutes spent on testing 26 21 16 10 5 18.5 20 9.3 15.8 25 Assisted Manual Using EvoSuite reduces the time spent on testing. 14.3 12.6 7.7 0 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO SOFTWARE WITH FEWER BUGS? RQ 3

IMPLEMENTATION QUALITY Golden test suites run on participants implementations Number of Failures+Errors 16 13 10 6 3 0 Assisted Manual 15.6 14.3 6.1 6.3 6.4 5.3 4.2 4.3 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

IMPLEMENTATION QUALITY Golden test suites run on participants implementations Number of Failures+Errors 16 13 10 6 3 0 6.1 6.3 6.4 5.3 4.2 4.3 Assisted Manual 15.6 14.3 Using EvoSuite during development did not lead to to better implementations. FilterIterator FixedOrderComparator ListPopulation PredicatedMap

DOES SPENDING MORE TIME WITH EVOSUITE AND ITS TESTS LEAD TO BETTER IMPLEMENTATIONS? RQ 4

PRODUCTIVITY Time spent with EvoSuite Correlation with number of failures plus errors 0.40 0.30 0.20 0.10 0.00-0.10-0.20-0.30-0.40-0.50 0.35-0.29-0.03-0.22 0.32 0.02 Number of runs Time spent on tests 0.35-0.49 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

PRODUCTIVITY Time spent with EvoSuite Correlation with number of failures plus errors 0.40 0.30 0.20 0.10 0.00-0.10-0.20-0.30-0.40-0.50 0.35-0.29-0.03-0.22 0.32 0.02 Number of runs Time spent on tests 0.35 Implementation quality improves the more time developers spend with EvoSuite-generated tests. -0.49 FilterIterator FixedOrderComparator ListPopulation PredicatedMap

Using automated unit test generation does impact developers productivity,

Using automated unit test generation does impact developers productivity, but

Using automated unit test generation does impact developers productivity, but how to make the most out of unit test generation tools?

THINK ALOUD OBSERVATIONS K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993. J. Hughes and S. Parkes, Trends in the use of verbal protocol analysis in software engineering research, Behaviour and Information Technology, vol. 22, no. 2, pp. 127 140, 2003.

THINK ALOUD OBSERVATIONS Subject K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993. J. Hughes and S. Parkes, Trends in the use of verbal protocol analysis in software engineering research, Behaviour and Information Technology, vol. 22, no. 2, pp. 127 140, 2003.

THINK ALOUD OBSERVATIONS Observer Subject K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993. J. Hughes and S. Parkes, Trends in the use of verbal protocol analysis in software engineering research, Behaviour and Information Technology, vol. 22, no. 2, pp. 127 140, 2003.

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

THINK ALOUD OBSERVATIONS Golden Implementation Class Template 2 hours Implementation 5 1 4

RESULTS FilterIterator FixedOrderComparator ListPopulation PredicatedMap PredicatedMap 3 6 2 2 5 15 20 23 7 10 94% 92% 97% 72% 94% 95% 100% 94% 100% 83% Images: http://jozef89.deviantart.com/

RESULTS FilterIterator FixedOrderComparator ListPopulation PredicatedMap PredicatedMap 3 6 2 2 5 15 20 23 7 10 94% 97% 94% 95% 100% 100% Images: http://jozef89.deviantart.com/

RESULTS FilterIterator FixedOrderComparator ListPopulation PredicatedMap PredicatedMap 3 6 2 2 5 15 20 23 7 10 94% 97% 72% 94% 100% 100% Images: http://jozef89.deviantart.com/

LESSONS LEARNED

LESSONS LEARNED There are different approaches to testing and test generation tools should be adaptable to them

LESSONS LEARNED There are different approaches to testing and test generation tools should be adaptable to them Developers behaviour is often not driven by code coverage

LESSONS LEARNED There are different approaches to testing and test generation tools should be adaptable to them Developers behaviour is often not driven by code coverage Readability of generated unit tests is paramount

LESSONS LEARNED There are different approaches to testing and test generation tools should be adaptable to them Developers behaviour is often not driven by code coverage Readability of generated unit tests is paramount Integration into development environments must be improved

LESSONS LEARNED There are different approaches to testing and test generation tools should be adaptable to them Developers behaviour is often not driven by code coverage Readability of generated unit tests is paramount Integration into development environments must be improved Education/Best practices: Developers do not know how to best use automated test generation tools!

Coverage is easy to assess because it is a number, while readability is a very nontangible property

Coverage is easy to assess because it is a number, while readability is a very nontangible property What is readable to me may not be readable to you. It is readable to me just because I spent the last hour and a half doing this.

Coverage is easy to assess because it is a number, while readability is a very nontangible property What is readable to me may not be readable to you. It is readable to me just because I spent the last hour and a half doing this. Participant 5

j.rojas@sheffield.ac.uk

www.evosuite.org/study-2014/ j.rojas@sheffield.ac.uk