Confirmation Bias as a Human Aspect in Software Engineering

Similar documents
Defect Prediction Leads to High Quality Product

Data Mining for Fun and Profit

Data Mining: Overview. What is Data Mining?

Software Defect Prediction Modeling

PREDICTIVE TECHNIQUES IN SOFTWARE ENGINEERING : APPLICATION IN SOFTWARE TESTING

Software Defect Prediction for Quality Improvement Using Hybrid Approach

Software Metrics. Alex Boughton

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Call Graph Based Metrics To Evaluate Software Design Quality

Software Testing Lifecycle

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Collection from Open Source Software Repositories

Social Media Mining. Data Mining Essentials

Microsoft Azure Machine learning Algorithms

not possible or was possible at a high cost for collecting the data.

Software project cost estimation using AI techniques

GA as a Data Optimization Tool for Predictive Analytics

Analyze It use cases in telecom & healthcare

PROFESSIONAL SATISFACTION OF TEACHERS FROM KINDERGARTEN. PRELIMINARY STUDY

SA Tool Kit release life cycle

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Master of Science in Health Information Technology Degree Curriculum

Defect Tracking Best Practices

Estimating Software Reliability In the Absence of Data

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Comparison of K-means and Backpropagation Data Mining Algorithms

Understanding Characteristics of Caravan Insurance Policy Buyer

DATA MINING TECHNIQUES AND APPLICATIONS

Development (60 ЕCTS)

A Proposed Algorithm for Spam Filtering s by Hash Table Approach

Perspectives on Data Mining

REQUIREMENTS FOR THE MASTER THESIS IN INNOVATION AND TECHNOLOGY MANAGEMENT PROGRAM

Principles of Data Mining by Hand&Mannila&Smyth

Knowledge Discovery and Data Mining

E-commerce Transaction Anomaly Classification

Adaptive information source selection during hypothesis testing

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

On the effect of forwarding table size on SDN network utilization

M. Sugumaran / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3), 2011,

Data Mining Practical Machine Learning Tools and Techniques

Agile Software Engineering, a proposed extension for in-house software development

ISTQB Certified Tester. Foundation Level. Sample Exam 1

Using Data Mining for Mobile Communication Clustering and Characterization

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

Dynamic Data in terms of Data Mining Streams

Soft Skills Requirements in Software Architecture s Job: An Exploratory Study

Information Management course

Cross-Validation. Synonyms Rotation estimation

Protocol for the Systematic Literature Review on Web Development Resource Estimation

Data Mining - Evaluation of Classifiers

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

MSCA Introduction to Statistical Concepts

Static Analysis of Dynamic Properties - Automatic Program Verification to Prove the Absence of Dynamic Runtime Errors

Empirical Software Engineering Introduction & Basic Concepts

Model Combination. 24 Novembre 2009

Software Quality Management

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Metrics in Software Test Planning and Test Design Processes

Masters in Information Technology

Fundamentals of Measurements

Name of chapter & details

Class Imbalance Learning in Software Defect Prediction

SKILL DEVELOPMENT IN THE ERA OF QUALITY ASSURANCE MANAGEMENT WITH RESPECT TO PRODUCTS & SERVICES BASED SOFTWARE IT ORGANIZATIONS

Predicting earning potential on Adult Dataset

Collaborative Filtering. Radek Pelánek

Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics

Ezgi Dinçerden. Marmara University, Istanbul, Turkey

Information Technology Laboratory (ITL) - Strategic Planning Update - Cita Furlani, Director

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Scalable Developments for Big Data Analytics in Remote Sensing

ISTQB - Certified Tester Advanced Level - Test Manager

Chapter 20: Data Analysis

STUDENT THESIS PROPOSAL GUIDELINES

CITY OF KENT invites applications for the position of: Recruitment Specialist

SOFTWARE ENGINEERING INTERVIEW QUESTIONS

Masters in Human Computer Interaction

The Design and Improvement of a Software Project Management System Based on CMMI

Question 2 Naïve Bayes (16 points)

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Transcription:

Confirmation Bias as a Human Aspect in Software Engineering Gul Calikli, PhD Data Science Laboratory, Department of Mechanical and Industrial Engineering, Ryerson University

Why Human Aspects in Software Engineering? Enhance decision making under uncertainty, so that managers can take decisions about efficient allocation of resources during any phase of the SDLC. Which parts of software should be prioritized for testing? Who should test/develop the most critical parts of software? Who should fix the bugs in the most problematic parts of the software? Who should/should not develop/ maintain the same source files? Who should we hire as a developer/ tester/analyst/designer?

Why Human Aspects in Software Engineering? People s thought processes have a significant impact on software quality as software is analyzed, designed, tested, developed and managed by people. While solving problems in daily life people use heuristics to solve problems. When heuristics fail to produce a correct judgment, it results in a cognitive bias. Heuristics employed in daily software engineering activities may also result in cognitive biases, leading to defects. Some common cognitive bias types: confirmation bias anchoring and adjustment availability representativeness. We focus on confirmation bias!

Confirmation Bias in Software Engineering Confirmation bias is defined as the tendency of people to seek evidence to verify a hypothesis rather than seeking evidence to refute that hypothesis.

Confirmation Bias in Software Engineering Due to confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code. During all levels of software testing, we must employ a testing strategy, which includes adequate attempts to fail the code to reduce software defect density.

Methodology to Quantify Confirmation Bias Research Question 1: How can we identify the measures of confirmation bias in relation to software development process?

Methodology to Quantify Confirmation Bias Challenge: Quantifying confirmation bias to perform empirical analyses. Proposed Solution: Our methodology is an iterative process and it mainly consists of the following steps: 1) Preparation of the confirmation bias test 2) Formation of the confirmation bias metrics set

Confirmation Bias Test Confirmation bias test consists of the following: Interactive Test based on Wason s Rule Discovery Task Written Test based on Wason s Selection Task Question Type Written Test Content No. of Questions Abstract Questions 8 Thematic Questions 6 SW development/ testing questions TOTAL 22 8

Confirmation Bias Test Wason s Rule Discovery Task Goal: Discover the correct rule Initially, subject is given three numbers, which conform to a simple rule Experiment Protocol: repeat until correct rule is announced write down tree numbers & reasons for choice; receive feedback from tester; if you are sure about the rule announce the rule; end if you want to terminate end end break; % terminated

Confirmation Bias Test Wason s Selection Task: Goal: To find out which of the four cards should be turned over to test the validity of the statement given below: p If there is a D on one side of the card, then it has a 3 on its other side. p q q p not- p q not- q

Example: Wason s Rule Discovery Task in Relation to Unit Testing Wason s Rule Discovery Task: Subjects have a tendency to select many triples (i.e., test cases) that are consistent with their hypotheses and few tests that are inconsistent with them. T: Triples conforming to the correct rule H: Set of triples conforming to the hypotheses in subject s mind. Observed Similarity with Functional (Black-box) Testing 3 : Program testers may select many test cases consistent with the program specifications (positive tests) and a few that are inconsistent with them (negative tests).

Example: Wason s Rule Selection Task in Relation to Unit Testing Example 1 : Suppose you want to make sure that a program avoids dereferencing a null pointer by always checking before dereferencing. Someone If a pointer tells you is there are only four sections dereferenced, of code then to it be is tested, and they checked have determined for nullity. the following things about those sections: Section A checks whether the pointer is null. The pointer may or may not be dereferenced there. Section B does not check whether the pointer is null. The pointer may or may not be dereferenced there. Section C dereferences the pointer. The pointer may or may not have been checked for nullity. Section D does not dereference the pointer. The pointer may or may not have been checked for nullity. Which sections need to be investigated further? Stacy, W., & MacMillan, J. (1995). Cognitive bias in software engineering. Communication of the ACM, 38(6), 57 63.

Confirmation Bias Metrics Set Interactive Test Metrics Written Test Metrics. Next step in definition of the metrics suite

Test Severity Confirmation Bias Metrics Set: Some Practical Results Interactive Test Outcome: Hypothesis Testing Strategy Written Test Outcome: Reich and Ruth s Falsifier/Verifier/Matcher Classification Bins of Problem Solving Steps Falsifier Verifier Matcher None Group1*: Developers of a GSM/Telecommunications company (29 subjects) Group 8*: Computer Engineering PhD candidates with minimum 2 years of development experience (36 subjects)

Influence of Developers Confirmation Bias on Software Quality Part 1 Research Question 2: How do confirmation biases of developers affect software quality?

Influence of Developers Confirmation Bias on Software Quality Part 1 Dataset: Steps of the Analysis: Formation of developer groups Estimation of developer groups confirmation bias metric values from individual values: Measurement of defect rate for each developer group Analysis of the Pearson correlation between developer groups confirmation bias metrics and defect rates

Influence of Developers Confirmation Bias on Software Quality- Part 1 Estimation of the correlation between developer groups confirmation bias metrics (interactive test) and defect rates. Results: (Group1*) (Group8*) Conventional effect sizes as offered by Cohen

Influence of Developers Confirmation Bias on Software Quality- Part 1 Estimation of the correlation between developer groups confirmation bias metrics (written test) and defect rates. Results: (Group1*) (Group8*) Conventional effect sizes as offered by Cohen

Influence of Developers Confirmation Bias on Software Quality Part 2 Research Question 3: How do measures of confirmation bias perform in predicting defect prone parts of software?

Defect Prediction Models Software quality is often measured by the number of defects in the software. Testing takes ~50% of overall time in Software Development Lifecycle (SDLC). Oracles/Predictors can be used to supplement testing activities for effective allocation of testing resources. NASA Metrics Data Direct Usage of Metrics Equal Weighting Metrics Decision Tree Naïve Bayes Classification Company Metrics Data InfoGain/PCA Weighted Metrics

Defect Prediction Models At the intersection of AI and SWE Product/ Process- Related People-related Organizational metrics # of developers Developer experience Social Interaction nws Design metrics File Dependency Graphs Churn Metrics Static Code Metrics CGBR Data Content How can we enhance the performance of defect prediction models? Data Size under-sampling outperformed over-sampling. micro-sampling Algorithms k-nn Naïve Bayes Bayesian Networks Neural Networks SVM Logistic Regression..

Influence of Developers Confirmation Bias on Software Quality Part 2 Construction of the Prediction Model (also used in missing data problem) Algorithm: Naive Bayes Input data: static code, churn, confirmation bias metrics (models are constructed for each combination of these metrics) Preprocessing: undersampling 10x10 cross validation Performance measures:

Influence of Developers Confirmation Bias on Software Quality Part 2 Dataset: ERP Results Dataset: Telecom1 Dataset: Telecom2 Dataset: Telecom4 Dataset: Telecom3

Influence of Developers Confirmation Bias on Software Quality Part 2 Results Summary: Confirmation Bias is a single human aspect. Yet, using confirmation bias metrics led to comparable performance results in predicting defect prone parts of software. The performance of defect prediction models built by using only confirmation bias metrics is comparable with the performance of the defect prediction models that use static code metrics and churn metrics. Therefore, we should further investigate other human aspects

Residuals Current Work: The Impact of Confirmation Bias on the Release-based Defect Prediction of Developer Groups: Problem: Predicting defect rates of developer groups for next releases of a software product. Motivation: Towards task assignment Solution: Use Partial Least Regression (PLSR) and PCR (Principle Component Regression). Defect Rate? ( current & past releases) Methodology: Train the model with the releases 1, 2, i-1 and test it for the i th release. unknown high low Results (Dataset Telecom1) REQUIRED!!! Predict defect rates of developer groups use confirmation bias metrics Avoid that group The group is ok Results (Dataset ERP) Residual range: [-0.08-0.04] Results (Dataset Telecom2) Developer Group Indices

Current Work: Dealing with Missing Data Problem: Collecting data (e.g. confirmation bias metrics) though interviews/tests might be challenging: Tight schedule of developers Evaluation apprehension Lack of motivation Staff turnover Solution: Use Expectation Maximization (EM) Algorithm to impute missing data. Methodology (Experimental Setup): Form 2 N -2 different missing data configurations (N: Total number of developer groups) Use EM to impute missing data All these result in missing data problem Build defect prediction models using imputed data Compare obtained performance results with the performance of prediction models built using complete data.

Current Work: Dealing with Missing Data Refer to previous work on defect prediction for the dataset, construction of the model and estimation of performance criteria. Experimental Results: Dataset: ERP Dataset: Telecom1 Dataset: Telecom2 Dataset: Telecom3

Current Work: Confirmation Bias Metrics: A new metrics suite proposal to measure the thought processes of developers We initially identified a confirmation metrics set. Our Current Goal: To complete the following To-Do list Form theoretical basis Done! Refine existing metrics set We are here! To empirically demonstrate the feasibility of our metrics. Formulate a single derived metric using the refined metrics set To analytically evaluate our metrics suite and single derived metric against the principles of measurement theory. Empirically validate the feasibility of the single derived metric.

Current Work: Refine Existing Metrics Set: Criteria for the final metrics suite: Metrics should not be highly correlated with each other Check for the correlation between defect rates and the values of each metric Metrics should be able to differentiate problematic software product from the rest. Metric Name: positivecompatible χ2 : 84.9, df: 4 Telecom1 is currently experiencing serious post-release defects. Telecom2 and ERP are mission critical Software as they include billing and charging modules. Telecom1 Telecom2 ERP

Current Work: Refine Existing Metrics Set (cont d): Criteria for the final metrics suite: Metrics should be able to differentiate problematic software product from the rest. Metric Name: Ind ElimEnum (Wason s Eliminative/Enumerative Index) χ2 : 100, df: 6 Telecom1 is currently experiencing serious post-release defects. Telecom2 and ERP are mission critical Software as they include billing and charging modules. Telecom1 Telecom2 ERP

Current Work: Formulate a single derived metric In order to make the interpretation of the results much easier, we formulated a a single derived metric to quantify confirmation bias level. Confirmation Bias Level: Deviation of confirmation bias metrics values from the corresponding ideal metrics values.

Current Work: To analytically evaluate our metrics suite and single derived metric against the principles of measurement theory : According to measurement theory : We begin with a set of objects, each of which has one or more common attributes, each of which in turn can be divided into exclusive and exhaustive equivalence classes. The objects and the relationship between them constitute an Empirical Relational System (ERS). In parallel, we construct a Numerical Relational System (NRS) comprising numbers and the relationships between them. Example: Let M(x) be the value of the variable length for rod x, we assign numbers such that M(x) M(y) if and only if x y, where represents not shorter than x. Establish a homomorphism from the ERS denoted by [A, ], where A represents the Êset of rows Ê to the NRS denoted by [R, ] Ê

Current Work: To analytically evaluate our metrics suite and single derived metric against the principles of measurement theory. Question: Which concepts should be inherited from the measurement theory so that the following are prevented? Lacking in desirable measurement properties Being insufficiently generalized. Some formulations of measurement fail for disciplines such as psychology (example: concatenation operation x o y = z ) such formulations should be identified. Appropriate ones should be inherited. Goals: Avoiding the criticism regarding the lack of theoretical base in the formation of a metrics set. To form a formal methodology to define metrics sets for other cognitive aspects of people.

Related Publications G. Calikli and A. Bener, The Impact of Confirmation Bias on the Release-based Defect Prediction of Developer Groups, the 25 th Conference on Software Engineering and Knowledge Engineering (SEKE 2013), Boston, USA, 2013. (submitted) G. Calikli and A. Bener, Influence of Confirmation Biases of Developers on Software Quality: An Empirical Study, Software Quality Journal, 2012 G. Calikli, B. Caglayan, A. Tosun and A. Bener, Modeling Human Aspects to Enhance Software Quality Management, 2012 International Conference on Information Systems (ICIS 2012), Orlando Florida, USA, December, 2012. B. Caglayan, A. Tosun, G. Calikli, T. Aytac, A. Bener, and B. Turhan, Dione: An Integrated Measurement and Defect Prediction Solution, 20th International Symposium on Foundations of Software Engineering, Cary, North Carolina, USA, September, 2012. G. Calikli, and A. Bener, Empirical Analyses of the Factors Affecting Confirmation Bias and the Effects of Confirmation Bias on Software Developer/ Tester Performance, Promise 2010, Tmişoara, Romania, September, 12-13, 2010. G. Calikli, and A. Bener, Preliminary Analysis of the Effects of Confirmation Bias on Software Defect Density, ESEM 2010, Bozen, Italy, September, 16-17, 2010. G. Calikli, B. Arslan and A. Bener, Confirmation Bias in Software Development and Testing: An Analysis of the Effects of Company Size, Experience and Reasoning Skills, 22nd Annual Psychology of Programming Interest Group Workshop, 19-21 September 2010. G. Calikli, A. Bener, and B. Arslan, An Analysis of the Effects of Company Culture, Education and Experience on Confirmation Bias Levels of Software Developers and Testers, ICSE 2010, May 2-8, Cape Town. G. Calikli, A. Tosun, A. Bener, and M. Celik, The Effect of Granularity Level on Software Defect Prediction", Proceedings of the 24th International Symposium on Computer and Information Sciences (ISCIS 2009), pp. 531-536.

THANK YOU ANY QUESTIONS? Gül Çalıklı: gcalikli@ryerson.ca