Laboratory Project for a Software Quality Class



Similar documents
Java Software Quality Tools and techniques

Random Testing: The Best Coverage Technique - An Empirical Proof

Introduction to Automated Testing

Comparing the effectiveness of automated test generation tools EVOSUITE and Tpalus

Test Case Design Techniques

Test case design techniques I: Whitebox testing CISS

Software testing. Objectives

Test Design: Functional Testing

Unit Testing with Monkeys Karl-Mikael Björklid University of Jyväskylä Department of Mathematical Information Technology

Financial Management System

Test case design techniques I: Whitebox testing CISS

Introduction to Computers and Programming. Testing

Evaluation of AgitarOne

Cloud MDM Person Accounts

Oracle Endeca Server. Cluster Guide. Version May 2013

Chapter 6 Experiment Process

Introduction to Computer Programming (CS 1323) Project 9

Software Engineering I: Software Technology WS 2008/09. Integration Testing and System Testing

Standard for Software Component Testing

Introduction to Software Testing Chapter 8.1 Building Testing Tools Instrumentation. Chapter 8 Outline

APPROACHES TO SOFTWARE TESTING PROGRAM VERIFICATION AND VALIDATION

RUnit - A Unit Test Framework for R

Test case design techniques II: Blackbox testing CISS

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

Model-based Testing: Next Generation Functional Software Testing

Data Structures Using C++ 2E. Chapter 5 Linked Lists

QA & Test Management. Overview.

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization

Testing, Debugging, and Verification

Escaping the Works-On-My-Machine badge Continuous Integration with PDE Build and Git

Environment Modeling for Automated Testing of Cloud Applications

Software Testing. Definition: Testing is a process of executing a program with data, with the sole intention of finding errors in the program.

Scaling Up Automated Test Generation: Automatically Generating Maintainable Regression Unit Tests for Programs

Formal Software Testing. Terri Grenda, CSTE IV&V Testing Solutions, LLC

Coverage Criteria for Search Based Automatic Unit Testing of Java Programs

Approvals Management Engine R12 (AME) Demystified

Software Engineering. Software Testing. Based on Software Engineering, 7 th Edition by Ian Sommerville

Team Name : PRX Team Members : Liang Yu, Parvathy Unnikrishnan Nair, Reto Kleeb, Xinyi Wang

In depth study - Dev teams tooling

Let s put together a Manual Processor

Change Manager 5.0 Installation Guide

Web Impact Factors and Search Engine Coverage 1

Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison

Basic Unix/Linux 1. Software Testing Interview Prep

Assertions Are Strongly Correlated with Test Suite Effectiveness

Automated Unit Test Generation during Software Development: A Controlled Experiment and Think-Aloud Observations

Automatic Test Data Generation for TTCN-3 using CTE

Software Testing. Quality & Testing. Software Testing

Continuous integration End of the big bang integration era

Experimental Comparison of Concolic and Random Testing for Java Card Applets

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Distributed Computing and Big Data: Hadoop and MapReduce

Software Quality Exercise 2

Eclipse installation, configuration and operation

An Open Framework for Reverse Engineering Graph Data Visualization. Alexandru C. Telea Eindhoven University of Technology The Netherlands.

3 Some Integer Functions

Tools for Integration Testing

Software Engineering. How does software fail? Terminology CS / COE 1530

The Open University s repository of research publications and other research outputs

An Automated Model Based Approach to Test Web Application Using Ontology

D3.1: SYSTEM TEST SUITE

Chapter 8 Software Testing

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Automatic Patch Generation Learned from Human-Written Patches

How To Write A Test Generation Program

Efficient Data Structures for Decision Diagrams

Java SE 8 Programming

Tech Notes. Corporate Headquarters EMEA Headquarters Asia-Pacific Headquarters 100 California Street, 12th Floor San Francisco, California 94111

Analysis of Micromouse Maze Solving Algorithms

David RR Webber Chair OASIS CAM TC (Content Assembly Mechanism)

SQLMutation: A tool to generate mutants of SQL database queries

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

WEBVIZOR: A Visualization Tool for Applying Automated Oracles and Analyzing Test Results of Web Applications

TREE BASIC TERMINOLOGIES

Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization

Software infrastructure for Java development projects

Automatic promotion and versioning with Oracle Data Integrator 12c

Android: Setup Hello, World: Android Edition. due by noon ET on Wed 2/22. Ingredients.

Announcement. SOFT1902 Software Development Tools. Today s Lecture. Version Control. Multiple iterations. What is Version Control

Continuous Integration: Put it at the heart of your development

Generating Edit Operations for Profiled UML Models

Chapter 11, Testing, Part 2: Integration and System Testing

Educational Collaborative Develops Big Data Solution with MongoDB

Finding Execution Faults in Dynamic Web Application

Introduction to Synoptic

Tonight s Speaker. Life of a Tester at Microsoft Urvashi Tyagi Software Test Manager, Microsoft

Smarter Balanced Assessment Consortium. Recommendation

The University of Saskatchewan Department of Computer Science. Technical Report #

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Requirements Analysis for an Automated Quality Assurance System

Nova Software Quality Assurance Process

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

Inventory Computers Using TechAtlas for Libraries

Testing of safety-critical software some principles

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Exam Name: IBM InfoSphere MDM Server v9.0

Agile Development and Testing Practices highlighted by the case studies as being particularly valuable from a software quality perspective

Automatic Test Data Synthesis using UML Sequence Diagrams

Automatic Generation of Consistency-Preserving Edit Operations for MDE Tools

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

Transcription:

Laboratory Project for a Software Quality Class March 29, 2015 Abstract This document reports the experience of a laboratory course on software quality and is written primarily for instructors as a reference for similar activities. The project aims to teach students the main problems of testing and analyzing software systems and some solutions and tools to cope with the problems. Contents 1 Target software 3 2 Tools 4 3 Activities and results 6 3.1 Expert judgment.................................... 6 3.2 Combinatorial Testing................................. 6 3.3 Random testing..................................... 6 3.4 Structural coverage................................... 7 3.5 Failure and faults.................................... 7 3.6 Refining the test suites................................. 7 3.7 Structural coverage of the new test suite....................... 7 3.8 Mutation score..................................... 8 3.9 Identifying failure and faults.............................. 8 4 Final considerations 10 1

Executive summary This document reports the organisation and lessons learned of a laboratory project, and is written primarily for instructors as a reference experience for similar activities. The material and the results are available for inspection and use. The laboratory project aims to teach students the main problems of testing and analyzing software systems and some solutions and tools to cope with the problems. Here we report the work done in a laboratory course by 9 master students attending in parallel a course on software quality based on our book. The project was scheduled on 10 weeks with a two hours working time in class per week and 2-4 additional hours of off-line group work for a total of 50-55 hours of work. The task was to experiment with different techniques to test a software system (graphstream) seeded with 8 realistic faults that we describe in details in Section 1. The laboratory was organized as follows: 1. Students were asked to derive test cases based on expert judgement. We gave the students the documentation and asked them to derive a set of test cases. This serves as a baseline. Students can compare this first test suite with the ones generated with systematic approaches to understand the differences between the techniques experienced in the laboratory. 2. Students were asked to derive test cases with a combinatorial approach (category partition). This serves to start learning both the difficulty of using a combinatorial approach and the advantages with respect to an expert judgment approach [TOOL USED: GenPair, Feed4JUnint] 3. Students were asked to generate test suites randomly. The goal is to understand the effectiveness of random test case generators [TOOL USED: Randoop] 4. Students were asked to compute structural coverage for the different test suites. The goal is to understand the relation between functional and structural approaches and to evaluate the differences between different test suites from the coverage viewpoint [TOOL USED: EclEmma] 5. Students were asked to identify failures and faults. In this phase, students are asked to identify failures and locate faults using the test suites obtained in the different ways. The goal is to understand the effectiveness of the different approaches and build awareness of the oracle problem. 6. Students were asked to refine the test suites. In this phase, we divided the students in two groups and we asked a group to refine the expert judgment test suite and the other to refine the category partition suite. The test suites generated in the first iteration give the student the baseline to understand the difficulty of generating a good test suite and of applying a systematic approach, in our case category partition. [TOOL USED: GenPair, Feed4JUnint] 7. Students were asked to recompute structural coverage, and compare the results with the previous ones. The goal is to check the improvements that derive from the additional experience of the students with the test case generation techniques in terms of structural coverage [TOOL USED: EclEmma] 8. Students were asked to compute mutation score. The goal is to give students the opportunity to learn difficulties and advantages of mutation analysis and compare mutation analysis with structural coverage [TOOL USED: Pitclipse] 9. Students were asked to identify failures and faults. The goal is to learn about the relation between mutation score, structural coverage and effectiveness of the test suites, and to compare the test suites according to their ability of revealing failures and providing support to locate faults. 2

The students were asked to document both the test cases and the test suites, following the standard proposed in the book. In the next section, we present the details of the work with references to the artifacts that include documentation of the test cases, original test cases in JUnit, and reports from the different tools about coverage, motivation score, etc. In Section 4 we summarize the results of the work and provide some suggestions for the instructors. 1 Target software For the project we choosed the GraphStream Java library ( core package, release 1.2), an open source project that provides APIs to generate, modify and analyze graphs. For detailed informations about this software refer to its official website at http://graphstream-project.org. The original source code used in the project can be found here, and the library APIs documentation can be found here. During the project we focused on the following subset of the library: SingleGraph (org.graphstream.graph.implementations.singlegraph) MultiGraph (org.graphstream.graph.implementations.multigraph) Path (org.graphstream.graph.path) In these classes we seeded 8 faults inspiring from real faults found in bug repositories. The source code of the faulty library can be found here. The following list enumerates the faults showing the class they belong to, the original version of the code () and the faulty version (FAULT). 1. Path.java FAULT 172 d += (Double) l.getattribute(characteristic, Number.class); 172 d = (Double) l.getattribute(characteristic, Number.class); 2. AdjacencyListGraph.java 115 if (initialnodecapacity < DEFAULT NODE CAPACITY) 116 initialnodecapacity = DEFAULT NODE CAPACITY; 117 if (initialedgecapacity < DEFAULT EDGE CAPACITY) 118 initialedgecapacity = DEFAULT EDGE CAPACITY; FAULT These lines have been deleted. 3. AdjacencyListGraph.java 181 if (initialnodecapacity < DEFAULT NODE CAPACITY) 182 initialnodecapacity = DEFAULT NODE CAPACITY; 183 if (initialedgecapacity < DEFAULT EDGE CAPACITY) 184 initialedgecapacity = DEFAULT EDGE CAPACITY; FAULT These lines have been deleted. 4. AbstractGraph.java 3

FAULT 370 if (fromnode == null tonode == null) 370 if (fromnode == null) 5. AbstractGraph.java 776 if (!edgeit.hasnext()) 777 return; FAULT these lines have been deleted. 6. OneAttributeElement.java FAULT 191 if (attributes.size() >= 1) 191 if (attributes.size() >= 0) 7. Multinode.java FAULT 71 && (type!= O EDGE etype!= I EDGE)) 71 (type!= O EDGE etype!= I EDGE)) 8. GraphFactory.java FAULT 56 completegraphclass = org.graphstream.graph.implementations. 56 completegraphclass = org.graphstream.graph.implementations 2 Tools Understanding the importance of automation through the use of tools was one of the main goals of the project, During the laboratory classes the students have used several tools to support the various phases. The tools used in the project are: Genpairs: to generate test case specifications using the category partition approach Description Genpairs is a combinatorial test vector generator. Input: A specification file in a format based loosely on category partition test specifications as originally described by Ostrand and Balcer, and as presented in the book by Pezzè & Young, chapter 11. Output: Either a table or a CSV file containing the generated combinations. Link: The tool, the input syntax, and the several possible options can be found here. Comments: The students reported some inconsistent results in case of values with multiple properties. 4

Alternative tools: A set of alternative tools, both commercial and free, can be found at: http://www.pairwise.org/tools.asp Feed4JUnit: to execute parameterized unit test library Description Feed4Junit executes parameterized unit tests automatically. This tool can be used to speed up the generation of test cases with similar structure but different inputs. Input: Either a CSV or an Excel file that contains the parameter values, and a parameterized JUnit test case. Link: http://databene.org/feed4junit.html Comments: The students reported several problems in the setup of the tool. Alternative tools: https://code.google.com/p/junitparams/ Randoop: to generate test cases randomly Description Randoop generates test input for Java randomly. The user can define a deadline to limit the amount of time the tool should run or the number of test cases it should create. It runs the test cases on all the classes. Input: A class or a set of classes. Output: A set of test cases. Link: http://randoop.googlecode Comments: The students reported random crashes in Eclipse when the tool was executed with a time limit greater than 60 seconds. Alternative tools: EvoSuite EclEmma: to measure structural coverage Description EclEmma is an Eclipse plugin that computes structural coverage. Input: A set of JUnit test cases. Output: An HTML report with both statement and branch coverage. Link: http://www.eclemma.org Comments: No particular problem or limitation reported. Alternative tools: http://ecobertura.johoop.de/, http://codecover.org/ Pitclipse: to perform mutation analysis Description Pitclipse is an Eclipse plugin to perform mutation analysis. Input: A set of test cases. Output: An HTML report with per class mutation score. Link: http://pitest.org Comments: Pitclipse is not compatible with Feed4JUnit. Moreover, the tool has several major issues, such as lack of isolation among test case executions. Alternative tools: Jumble, Javalanche, Major 5

3 Activities and results In the following we report the results of the activities performed in the project. The references point to the detailed results that are grouped in a directory doc for the documentation and a directory test for the test cases. The documentation is a set of html documents reporting the documentation of the test suite (TS) and of the single test case (TC) in the format suggested in the book. The test cases are Junit test cases. 3.1 Expert judgment We divided the team in three groups,and we asked each group to derive test cases for the classes and methods under test. We merged the derived test cases in a single test suite (removing duplicated test cases). The final test suite includes 28 test cases (Table 1). The results of this activity can be found here. Number of generated test cases Classes Methods Group 1 Group 2 Group 3 Merged constructor 1 0 1 1 addnode 2 1 1 3 SingleGraph addedge 4 1 3 6 degree 1 0 0 1 removenode 1 1 0 1 removeedge 1 1 0 1 constructor 2 0 1 2 addnode 0 1 1 2 MultiGraph addedge 1 2 3 4 removenode 1 1 0 1 removeedge 1 1 0 1 constructor 1 0 1 1 setroot 1 0 1 1 Path addnode 1 0 1 1 addedge 1 0 1 1 removeloops 1 0 0 1 Table 1: Test cases generated with expert judgment. 3.2 Combinatorial Testing The students worked all together to identify categories, values and constraints according to the category partition approach. Students used Genpairs to automatically derive new test cases from the categories. The students learned that Genpairs does not always handle the constraints in the proper way, thus, they manually removed the infeasible combinations. The students focused on the classes SingleGraph and MultiGraph, ignoring the class Path (Table 2). The results of this activity can be found here. 3.3 Random testing The students used Randoop to randomly generate test cases for the classes under test. They generated 4527 test cases. Most of the generated test cases were not runnable because of a bug in Randoop, but the remising were still too many to be inspected manually. The results of this activity can be found here. 6

Classes Methods Number of generated test cases addnode 8 SingleGraph addedge 17 removenode 9 addnode 0 MultiGraph addedge 9 removenode 0 Path popnode 0 removeloop 0 Table 2: Test cases generated with category partition. 3.4 Structural coverage The students calculated statement and branch coverage of the generated test suites with EclEmma (Table 3). The results of this activity can be found here. Coverage Unit Expert Category Partition Random Statement Branches Statement Branches Statement Branches AbstractGraph 29 26 27 34 86 80 AbstractElement 4 1 5 3 75 70 AdjacencyListNode 38 29 16 10 74 60 AbstractNode 4 0 4 0 10 0 AdjacencyListGraph 65 25 35 17 100 100 MultiNode 51 17 30 11 61 22 AbstractEdge 42 31 38 44 45 44 SingleNode 87 59 45 23 94 86 MultiGraph 100 n.a. 79 n.a. 100 n.a. SingleGraph 100 n.a. 100 n.a. 100 n.a. AdjacencyListNode 0 0 0 0 89 79 SingleNodeTwoEdges 100 n.a. 100 n.a. 100 n.a. Path 43 53 0 0 43 53 IdAlreadyUsedException 0 n.a. 71 n.a. 71 n.a. Table 3: Structural coverage of the different test suites. 3.5 Failure and faults The students were asked to execute the test suites ti identify failures and the corresponding faults. At this stage they did not identify any fault. 3.6 Refining the test suites We divided the students in two groups, and we asked a group to refine the expert judgment test suite and the other to refine the category partition suite. This led to new test cases (Tables 4 and 5). The results of this activity can be found here (combinatorial testing) and here (expert judgement). 3.7 Structural coverage of the new test suite The students computed the structural coverage of the new test suites (Table 6). The results of this activity can be found here. 7

Classes Methods Number of generated test cases constructor 1 addnode 3 addedge 6 getdegree 1 removenode 1 removeedge 1 getarray 2 hasvector 2 hasnumber 2 haslabel 1 clear 1 SingleGraph getedgeset 2 getnodeset 2 addattribute 3 removeattribute 1 clearattributes 1 setattribute 1 changeattribute 1 getedge 2 getedgebetween 1 getnode 3 getedgeiterator 2 getnodeiterator 2 constructor 2 addnode 2 MultiGraph addedge 4 removenode 1 removeedge 1 constructor 1 setroot 5 addnode 8 Path addedge 8 removeloops 1 getacopy 2 push 2 pop 2 addnode 7 addedge 16 removenode 8 AbstractGraph removeedge 14 factory 2 attributes 11 step 3 Table 4: New expert judgment test suite. 3.8 Mutation score The students calculated the mutation score with Pitclipse. The results of this activity can be found here. 3.9 Identifying failure and faults The students identified failures and faults in the code using their test suites. They identified two out of the eight seeded faults and 6 additional faults present in in the software. 8

Classes Methods Number of generated test cases addnode 7 addedge 29 SingleGraph removenode 5 removeedge 36 constructor 17 addnode 0 MultiGraph addedge 9 removenode 9 Path popnode 0 removeloop 0 Table 5: New category partition test suite. Coverage Unit Expert Category Partition Randoop Stmt. Branches Stmt. Branches Stmt. Branch AbstractGraph 87 82 58 65 86 80 AbstractElement 46 28 5 2 75 70 AdjacencyListNode 55 40 65 52 74 60 AbstractNode 11 0 4 0 10 0 AdjacencyListGraph 89 92 82 75 100 100 MultiNode 51 17 30 11 61 22 AbstractEdge 83 38 58 50 45 44 SingleNode 92 73 90 86 94 86 MultiGraph 100 n.a. 100 n.a. 100 n.a. SingleGraph 100 n.a. 100 n.a. 100 n.a. AdjacencyListNode 100 100 89 79 89 79 SingleNodeTwoEdges 100 n.a. 100 n.a. 100 n.a. Path 91 91 0 0 43 53 IdAlreadyUsedException 71 n.a. 71 n.a. 71 n.a. Table 6: Structural coverage of the new category partition test suites. Unit Mutation Score AbstractGraph 53% AbstractElement 12% AdjacencyListNode 22% AbstractNode 2% AdjacencyListGraph 51% MultiNode 0% AbstractEdge 47% SingleNode 39% MultiGraph 0% SingleGraph 100% AdjacencyListNode 0% SingleNodeTwoEdges 0% Path 0% IdAlreadyUsedException 0% Table 7: Mutation score percentage by class. The faults they identified and the methods they used to identify them (either Expert opinion (EO) or Category partition (CP)) are summarized in Table 8. 9

Class Description Discovered by Bug No. Modifiable edge set EO Modifiable node set EO SingleGraph Removing a node from a different graph CP Removing an edge from a different graph CP Default values of capacity of edges and arrays CP Removing an edge exception CP AbstractGraph Removing a zero degree node EO and CP 5 Path Counting the weight of the path EO 1 4 Final considerations Table 8: Discovered faults. The first activities (generating test cases according to expert judgment and using category partition) indicate that the students were not good testers and that using category partition is more difficult than expected. The test suites were quite small and the identified categories and values were not covering all cases. The limits of the generated test suites were confirmed in the third and fifth step, resulting in a suboptimal mutation score, but most important not able to reveal any of the seeded faults. The experience with automatic random generation of test cases (activity 4) resulted in an enormous amount of test case unable to identify any of the seeded faults, with results difficult to analyze and the notification of a lot of uninteresting faults. The experience with category partition increased the awareness of the requirements about generating test cases, and the second suite generated following expert judgment was much more thorough that the first one. Similarly, an open discussion on how to improve the use of category partition produced much better results, as indicated by the last three steps, coverage and mutation score increased, but more notably they were able to reveal some real failures with both expert and category test suites due both to the thoroughness of the test cases and the effectiveness of test oracles. In particular, the test suite generated using category partition was able to reveal most of the seeded faults, more than the test suite generated by experts and in both cases more than the test suite generated randomly. During the project the students were able to understand the difficulty of test activities and the involved costs, were able to appreciate a systematic approach and the limits of random testing, and could see the oracle problem. The use of tools has been important. Not all tools behaved as expected and some created small problems, nevertheless the students could clearly see that automation is absolutely important for a thorough testing. 10