Confirmation Bias as a Human Aspect in Software Engineering

Size: px
Start display at page:

Download "Confirmation Bias as a Human Aspect in Software Engineering"

Transcription

1 Confirmation Bias as a Human Aspect in Software Engineering Gul Calikli, PhD Data Science Laboratory, Department of Mechanical and Industrial Engineering, Ryerson University

2 Why Human Aspects in Software Engineering? Enhance decision making under uncertainty, so that managers can take decisions about efficient allocation of resources during any phase of the SDLC. Which parts of software should be prioritized for testing? Who should test/develop the most critical parts of software? Who should fix the bugs in the most problematic parts of the software? Who should/should not develop/ maintain the same source files? Who should we hire as a developer/ tester/analyst/designer?

3 Why Human Aspects in Software Engineering? People s thought processes have a significant impact on software quality as software is analyzed, designed, tested, developed and managed by people. While solving problems in daily life people use heuristics to solve problems. When heuristics fail to produce a correct judgment, it results in a cognitive bias. Heuristics employed in daily software engineering activities may also result in cognitive biases, leading to defects. Some common cognitive bias types: confirmation bias anchoring and adjustment availability representativeness. We focus on confirmation bias!

4 Confirmation Bias in Software Engineering Confirmation bias is defined as the tendency of people to seek evidence to verify a hypothesis rather than seeking evidence to refute that hypothesis.

5 Confirmation Bias in Software Engineering Due to confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code. During all levels of software testing, we must employ a testing strategy, which includes adequate attempts to fail the code to reduce software defect density.

6 Methodology to Quantify Confirmation Bias Research Question 1: How can we identify the measures of confirmation bias in relation to software development process?

7 Methodology to Quantify Confirmation Bias Challenge: Quantifying confirmation bias to perform empirical analyses. Proposed Solution: Our methodology is an iterative process and it mainly consists of the following steps: 1) Preparation of the confirmation bias test 2) Formation of the confirmation bias metrics set

8 Confirmation Bias Test Confirmation bias test consists of the following: Interactive Test based on Wason s Rule Discovery Task Written Test based on Wason s Selection Task Question Type Written Test Content No. of Questions Abstract Questions 8 Thematic Questions 6 SW development/ testing questions TOTAL 22 8

9 Confirmation Bias Test Wason s Rule Discovery Task Goal: Discover the correct rule Initially, subject is given three numbers, which conform to a simple rule Experiment Protocol: repeat until correct rule is announced write down tree numbers & reasons for choice; receive feedback from tester; if you are sure about the rule announce the rule; end if you want to terminate end end break; % terminated

10 Confirmation Bias Test Wason s Selection Task: Goal: To find out which of the four cards should be turned over to test the validity of the statement given below: p If there is a D on one side of the card, then it has a 3 on its other side. p q q p not- p q not- q

11 Example: Wason s Rule Discovery Task in Relation to Unit Testing Wason s Rule Discovery Task: Subjects have a tendency to select many triples (i.e., test cases) that are consistent with their hypotheses and few tests that are inconsistent with them. T: Triples conforming to the correct rule H: Set of triples conforming to the hypotheses in subject s mind. Observed Similarity with Functional (Black-box) Testing 3 : Program testers may select many test cases consistent with the program specifications (positive tests) and a few that are inconsistent with them (negative tests).

12 Example: Wason s Rule Selection Task in Relation to Unit Testing Example 1 : Suppose you want to make sure that a program avoids dereferencing a null pointer by always checking before dereferencing. Someone If a pointer tells you is there are only four sections dereferenced, of code then to it be is tested, and they checked have determined for nullity. the following things about those sections: Section A checks whether the pointer is null. The pointer may or may not be dereferenced there. Section B does not check whether the pointer is null. The pointer may or may not be dereferenced there. Section C dereferences the pointer. The pointer may or may not have been checked for nullity. Section D does not dereference the pointer. The pointer may or may not have been checked for nullity. Which sections need to be investigated further? Stacy, W., & MacMillan, J. (1995). Cognitive bias in software engineering. Communication of the ACM, 38(6),

13 Confirmation Bias Metrics Set Interactive Test Metrics Written Test Metrics. Next step in definition of the metrics suite

14 Test Severity Confirmation Bias Metrics Set: Some Practical Results Interactive Test Outcome: Hypothesis Testing Strategy Written Test Outcome: Reich and Ruth s Falsifier/Verifier/Matcher Classification Bins of Problem Solving Steps Falsifier Verifier Matcher None Group1*: Developers of a GSM/Telecommunications company (29 subjects) Group 8*: Computer Engineering PhD candidates with minimum 2 years of development experience (36 subjects)

15 Influence of Developers Confirmation Bias on Software Quality Part 1 Research Question 2: How do confirmation biases of developers affect software quality?

16 Influence of Developers Confirmation Bias on Software Quality Part 1 Dataset: Steps of the Analysis: Formation of developer groups Estimation of developer groups confirmation bias metric values from individual values: Measurement of defect rate for each developer group Analysis of the Pearson correlation between developer groups confirmation bias metrics and defect rates

17 Influence of Developers Confirmation Bias on Software Quality- Part 1 Estimation of the correlation between developer groups confirmation bias metrics (interactive test) and defect rates. Results: (Group1*) (Group8*) Conventional effect sizes as offered by Cohen

18 Influence of Developers Confirmation Bias on Software Quality- Part 1 Estimation of the correlation between developer groups confirmation bias metrics (written test) and defect rates. Results: (Group1*) (Group8*) Conventional effect sizes as offered by Cohen

19 Influence of Developers Confirmation Bias on Software Quality Part 2 Research Question 3: How do measures of confirmation bias perform in predicting defect prone parts of software?

20 Defect Prediction Models Software quality is often measured by the number of defects in the software. Testing takes ~50% of overall time in Software Development Lifecycle (SDLC). Oracles/Predictors can be used to supplement testing activities for effective allocation of testing resources. NASA Metrics Data Direct Usage of Metrics Equal Weighting Metrics Decision Tree Naïve Bayes Classification Company Metrics Data InfoGain/PCA Weighted Metrics

21 Defect Prediction Models At the intersection of AI and SWE Product/ Process- Related People-related Organizational metrics # of developers Developer experience Social Interaction nws Design metrics File Dependency Graphs Churn Metrics Static Code Metrics CGBR Data Content How can we enhance the performance of defect prediction models? Data Size under-sampling outperformed over-sampling. micro-sampling Algorithms k-nn Naïve Bayes Bayesian Networks Neural Networks SVM Logistic Regression..

22 Influence of Developers Confirmation Bias on Software Quality Part 2 Construction of the Prediction Model (also used in missing data problem) Algorithm: Naive Bayes Input data: static code, churn, confirmation bias metrics (models are constructed for each combination of these metrics) Preprocessing: undersampling 10x10 cross validation Performance measures:

23 Influence of Developers Confirmation Bias on Software Quality Part 2 Dataset: ERP Results Dataset: Telecom1 Dataset: Telecom2 Dataset: Telecom4 Dataset: Telecom3

24 Influence of Developers Confirmation Bias on Software Quality Part 2 Results Summary: Confirmation Bias is a single human aspect. Yet, using confirmation bias metrics led to comparable performance results in predicting defect prone parts of software. The performance of defect prediction models built by using only confirmation bias metrics is comparable with the performance of the defect prediction models that use static code metrics and churn metrics. Therefore, we should further investigate other human aspects

25 Residuals Current Work: The Impact of Confirmation Bias on the Release-based Defect Prediction of Developer Groups: Problem: Predicting defect rates of developer groups for next releases of a software product. Motivation: Towards task assignment Solution: Use Partial Least Regression (PLSR) and PCR (Principle Component Regression). Defect Rate? ( current & past releases) Methodology: Train the model with the releases 1, 2, i-1 and test it for the i th release. unknown high low Results (Dataset Telecom1) REQUIRED!!! Predict defect rates of developer groups use confirmation bias metrics Avoid that group The group is ok Results (Dataset ERP) Residual range: [ ] Results (Dataset Telecom2) Developer Group Indices

26 Current Work: Dealing with Missing Data Problem: Collecting data (e.g. confirmation bias metrics) though interviews/tests might be challenging: Tight schedule of developers Evaluation apprehension Lack of motivation Staff turnover Solution: Use Expectation Maximization (EM) Algorithm to impute missing data. Methodology (Experimental Setup): Form 2 N -2 different missing data configurations (N: Total number of developer groups) Use EM to impute missing data All these result in missing data problem Build defect prediction models using imputed data Compare obtained performance results with the performance of prediction models built using complete data.

27 Current Work: Dealing with Missing Data Refer to previous work on defect prediction for the dataset, construction of the model and estimation of performance criteria. Experimental Results: Dataset: ERP Dataset: Telecom1 Dataset: Telecom2 Dataset: Telecom3

28 Current Work: Confirmation Bias Metrics: A new metrics suite proposal to measure the thought processes of developers We initially identified a confirmation metrics set. Our Current Goal: To complete the following To-Do list Form theoretical basis Done! Refine existing metrics set We are here! To empirically demonstrate the feasibility of our metrics. Formulate a single derived metric using the refined metrics set To analytically evaluate our metrics suite and single derived metric against the principles of measurement theory. Empirically validate the feasibility of the single derived metric.

29 Current Work: Refine Existing Metrics Set: Criteria for the final metrics suite: Metrics should not be highly correlated with each other Check for the correlation between defect rates and the values of each metric Metrics should be able to differentiate problematic software product from the rest. Metric Name: positivecompatible χ2 : 84.9, df: 4 Telecom1 is currently experiencing serious post-release defects. Telecom2 and ERP are mission critical Software as they include billing and charging modules. Telecom1 Telecom2 ERP

30 Current Work: Refine Existing Metrics Set (cont d): Criteria for the final metrics suite: Metrics should be able to differentiate problematic software product from the rest. Metric Name: Ind ElimEnum (Wason s Eliminative/Enumerative Index) χ2 : 100, df: 6 Telecom1 is currently experiencing serious post-release defects. Telecom2 and ERP are mission critical Software as they include billing and charging modules. Telecom1 Telecom2 ERP

31 Current Work: Formulate a single derived metric In order to make the interpretation of the results much easier, we formulated a a single derived metric to quantify confirmation bias level. Confirmation Bias Level: Deviation of confirmation bias metrics values from the corresponding ideal metrics values.

32 Current Work: To analytically evaluate our metrics suite and single derived metric against the principles of measurement theory : According to measurement theory : We begin with a set of objects, each of which has one or more common attributes, each of which in turn can be divided into exclusive and exhaustive equivalence classes. The objects and the relationship between them constitute an Empirical Relational System (ERS). In parallel, we construct a Numerical Relational System (NRS) comprising numbers and the relationships between them. Example: Let M(x) be the value of the variable length for rod x, we assign numbers such that M(x) M(y) if and only if x y, where represents not shorter than x. Establish a homomorphism from the ERS denoted by [A, ], where A represents the Êset of rows Ê to the NRS denoted by [R, ] Ê

33 Current Work: To analytically evaluate our metrics suite and single derived metric against the principles of measurement theory. Question: Which concepts should be inherited from the measurement theory so that the following are prevented? Lacking in desirable measurement properties Being insufficiently generalized. Some formulations of measurement fail for disciplines such as psychology (example: concatenation operation x o y = z ) such formulations should be identified. Appropriate ones should be inherited. Goals: Avoiding the criticism regarding the lack of theoretical base in the formation of a metrics set. To form a formal methodology to define metrics sets for other cognitive aspects of people.

34 Related Publications G. Calikli and A. Bener, The Impact of Confirmation Bias on the Release-based Defect Prediction of Developer Groups, the 25 th Conference on Software Engineering and Knowledge Engineering (SEKE 2013), Boston, USA, (submitted) G. Calikli and A. Bener, Influence of Confirmation Biases of Developers on Software Quality: An Empirical Study, Software Quality Journal, 2012 G. Calikli, B. Caglayan, A. Tosun and A. Bener, Modeling Human Aspects to Enhance Software Quality Management, 2012 International Conference on Information Systems (ICIS 2012), Orlando Florida, USA, December, B. Caglayan, A. Tosun, G. Calikli, T. Aytac, A. Bener, and B. Turhan, Dione: An Integrated Measurement and Defect Prediction Solution, 20th International Symposium on Foundations of Software Engineering, Cary, North Carolina, USA, September, G. Calikli, and A. Bener, Empirical Analyses of the Factors Affecting Confirmation Bias and the Effects of Confirmation Bias on Software Developer/ Tester Performance, Promise 2010, Tmişoara, Romania, September, 12-13, G. Calikli, and A. Bener, Preliminary Analysis of the Effects of Confirmation Bias on Software Defect Density, ESEM 2010, Bozen, Italy, September, 16-17, G. Calikli, B. Arslan and A. Bener, Confirmation Bias in Software Development and Testing: An Analysis of the Effects of Company Size, Experience and Reasoning Skills, 22nd Annual Psychology of Programming Interest Group Workshop, September G. Calikli, A. Bener, and B. Arslan, An Analysis of the Effects of Company Culture, Education and Experience on Confirmation Bias Levels of Software Developers and Testers, ICSE 2010, May 2-8, Cape Town. G. Calikli, A. Tosun, A. Bener, and M. Celik, The Effect of Granularity Level on Software Defect Prediction", Proceedings of the 24th International Symposium on Computer and Information Sciences (ISCIS 2009), pp

35 THANK YOU ANY QUESTIONS? Gül Çalıklı:

Defect Prediction Leads to High Quality Product

Defect Prediction Leads to High Quality Product Journal of Software Engineering and Applications, 2011, 4, 639-645 doi:10.4236/jsea.2011.411075 Published Online November 2011 (http://www.scirp.org/journal/jsea) 639 Naheed Azeem, Shazia Usmani Department

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Software Defect Prediction Modeling

Software Defect Prediction Modeling Software Defect Prediction Modeling Burak Turhan Department of Computer Engineering, Bogazici University [email protected] Abstract Defect predictors are helpful tools for project managers and developers.

More information

PREDICTIVE TECHNIQUES IN SOFTWARE ENGINEERING : APPLICATION IN SOFTWARE TESTING

PREDICTIVE TECHNIQUES IN SOFTWARE ENGINEERING : APPLICATION IN SOFTWARE TESTING PREDICTIVE TECHNIQUES IN SOFTWARE ENGINEERING : APPLICATION IN SOFTWARE TESTING Jelber Sayyad Shirabad Lionel C. Briand, Yvan Labiche, Zaheer Bawar Presented By : Faezeh R.Sadeghi Overview Introduction

More information

Software Defect Prediction for Quality Improvement Using Hybrid Approach

Software Defect Prediction for Quality Improvement Using Hybrid Approach Software Defect Prediction for Quality Improvement Using Hybrid Approach 1 Pooja Paramshetti, 2 D. A. Phalke D.Y. Patil College of Engineering, Akurdi, Pune. Savitribai Phule Pune University ABSTRACT In

More information

Software Metrics. Alex Boughton

Software Metrics. Alex Boughton Software Metrics Alex Boughton Executive Summary What are software metrics? Why are software metrics used in industry, and how? Limitations on applying software metrics A framework to help refine and understand

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Call Graph Based Metrics To Evaluate Software Design Quality

Call Graph Based Metrics To Evaluate Software Design Quality Call Graph Based Metrics To Evaluate Software Design Quality Hesham Abandah 1 and Izzat Alsmadi 2 1 JUST University; 2 Yarmouk University [email protected], [email protected] Abstract Software defects

More information

Software Testing Lifecycle

Software Testing Lifecycle STLC-Software Testing Life Cycle SDLC Software Testing Lifecycle Software Testing Life Cycle (STLC) defines the steps/ stages/ phases in testing of software. However, there is no fixed standard STLC in

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Collection from Open Source Software Repositories

Data Collection from Open Source Software Repositories Data Collection from Open Source Software Repositories GORAN MAUŠA, TIHANA GALINAC GRBAC SEIP LABORATORY FACULTY OF ENGINEERING UNIVERSITY OF RIJEKA, CROATIA Software Defect Prediction (SDP) Aim: Focus

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Software project cost estimation using AI techniques

Software project cost estimation using AI techniques Software project cost estimation using AI techniques Rodríguez Montequín, V.; Villanueva Balsera, J.; Alba González, C.; Martínez Huerta, G. Project Management Area University of Oviedo C/Independencia

More information

GA as a Data Optimization Tool for Predictive Analytics

GA as a Data Optimization Tool for Predictive Analytics GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, [email protected]

More information

Analyze It use cases in telecom & healthcare

Analyze It use cases in telecom & healthcare Analyze It use cases in telecom & healthcare Chung Min Chen, VP of Data Science The views and opinions expressed in this presentation are those of the author and do not necessarily reflect the position

More information

PROFESSIONAL SATISFACTION OF TEACHERS FROM KINDERGARTEN. PRELIMINARY STUDY

PROFESSIONAL SATISFACTION OF TEACHERS FROM KINDERGARTEN. PRELIMINARY STUDY Volume 7, Volume 4, 2014 PROFESSIONAL SATISFACTION OF TEACHERS FROM KINDERGARTEN. PRELIMINARY STUDY Valerica Anghelache Abstract. Professional development is a topic of great interest for all those who

More information

SA Tool Kit release life cycle

SA Tool Kit release life cycle Release management Release management process is a software engineering process intended to oversee the development, testing, deployment and support of software releases. A release is usually a named collection

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization

Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization KanwalpreetKaur #, Satwinder Singh * #Research Scholar, Dept of Computer Science and

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Defect Tracking Best Practices

Defect Tracking Best Practices Defect Tracking Best Practices Abstract: Whether an organization is developing a new system or maintaining an existing system, implementing best practices in the defect tracking and management processes

More information

Estimating Software Reliability In the Absence of Data

Estimating Software Reliability In the Absence of Data Estimating Software Reliability In the Absence of Data Joanne Bechta Dugan ([email protected]) Ganesh J. Pai ([email protected]) Department of ECE University of Virginia, Charlottesville, VA NASA OSMA SAS

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Understanding Characteristics of Caravan Insurance Policy Buyer

Understanding Characteristics of Caravan Insurance Policy Buyer Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Development (60 ЕCTS)

Development (60 ЕCTS) Study program Faculty Cycle Software and Application Development (60 ЕCTS) Contemporary Sciences and Technologies Postgraduate ECTS 60 Offered in Tetovo Description of the program The objectives of the

More information

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering

More information

Perspectives on Data Mining

Perspectives on Data Mining Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London [email protected] April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery

More information

REQUIREMENTS FOR THE MASTER THESIS IN INNOVATION AND TECHNOLOGY MANAGEMENT PROGRAM

REQUIREMENTS FOR THE MASTER THESIS IN INNOVATION AND TECHNOLOGY MANAGEMENT PROGRAM APPROVED BY Protocol No. 18-02-2016 Of 18 February 2016 of the Studies Commission meeting REQUIREMENTS FOR THE MASTER THESIS IN INNOVATION AND TECHNOLOGY MANAGEMENT PROGRAM Vilnius 2016-2017 1 P a g e

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Adaptive information source selection during hypothesis testing

Adaptive information source selection during hypothesis testing Adaptive information source selection during hypothesis testing Andrew T. Hendrickson ([email protected]) Amy F. Perfors ([email protected]) Daniel J. Navarro ([email protected])

More information

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge

More information

On the effect of forwarding table size on SDN network utilization

On the effect of forwarding table size on SDN network utilization IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz

More information

M. Sugumaran / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3), 2011, 1001-1006

M. Sugumaran / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3), 2011, 1001-1006 A Design of Centralized Meeting Scheduler with Distance Metrics M. Sugumaran Department of Computer Science and Engineering,Pondicherry Engineering College, Puducherry, India. Abstract Meeting scheduling

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Agile Software Engineering, a proposed extension for in-house software development

Agile Software Engineering, a proposed extension for in-house software development Journal of Information & Communication Technology Vol. 5, No. 2, (Fall 2011) 61-73 Agile Software Engineering, a proposed extension for in-house software development Muhammad Misbahuddin * Institute of

More information

ISTQB Certified Tester. Foundation Level. Sample Exam 1

ISTQB Certified Tester. Foundation Level. Sample Exam 1 ISTQB Certified Tester Foundation Level Version 2015 American Copyright Notice This document may be copied in its entirety, or extracts made, if the source is acknowledged. #1 When test cases are designed

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk WHITEPAPER Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk Overview Angoss is helping its clients achieve significant revenue growth and measurable return

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Soft Skills Requirements in Software Architecture s Job: An Exploratory Study

Soft Skills Requirements in Software Architecture s Job: An Exploratory Study Soft Skills Requirements in Software Architecture s Job: An Exploratory Study 1 Faheem Ahmed, 1 Piers Campbell, 1 Azam Beg, 2 Luiz Fernando Capretz 1 Faculty of Information Technology, United Arab Emirates

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])

More information

Cross-Validation. Synonyms Rotation estimation

Cross-Validation. Synonyms Rotation estimation Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

More information

Protocol for the Systematic Literature Review on Web Development Resource Estimation

Protocol for the Systematic Literature Review on Web Development Resource Estimation Protocol for the Systematic Literature Review on Web Development Resource Estimation Author: Damir Azhar Supervisor: Associate Professor Emilia Mendes Table of Contents 1. Background... 4 2. Research Questions...

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Static Analysis of Dynamic Properties - Automatic Program Verification to Prove the Absence of Dynamic Runtime Errors

Static Analysis of Dynamic Properties - Automatic Program Verification to Prove the Absence of Dynamic Runtime Errors Static Analysis of Dynamic Properties - Automatic Program Verification to Prove the Absence of Dynamic Runtime Errors Klaus Wissing PolySpace Technologies GmbH Argelsrieder Feld 22 82234 Wessling-Oberpfaffenhofen

More information

Empirical Software Engineering Introduction & Basic Concepts

Empirical Software Engineering Introduction & Basic Concepts Empirical Software Engineering Introduction & Basic Concepts Dietmar Winkler Vienna University of Technology Institute of Software Technology and Interactive Systems [email protected]

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

Software Quality Management

Software Quality Management Software Quality Management Learning Guide Information for Students 1. Description Grade Module Máster Universitario en Ingeniería de Software - European Master on Software Engineering Support Processes

More information

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

More information

Metrics in Software Test Planning and Test Design Processes

Metrics in Software Test Planning and Test Design Processes Master Thesis Software Engineering Thesis no: MSE-2007:02 January 2007 Metrics in Software Test Planning and Test Design Processes Wasif Afzal School of Engineering Blekinge Institute of Technology Box

More information

Masters in Information Technology

Masters in Information Technology Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101

More information

Fundamentals of Measurements

Fundamentals of Measurements Objective Software Project Measurements Slide 1 Fundamentals of Measurements Educational Objective: To review the fundamentals of software measurement, to illustrate that measurement plays a central role

More information

Name of chapter & details

Name of chapter & details Course Title Course Code Software Testing IT905 (Elective-IV) Theory : 03 Course Credit Practical : 01 Tutorial : 00 Course Learning Outcomes Credits : 04 On the completion of the course, students will

More information

Class Imbalance Learning in Software Defect Prediction

Class Imbalance Learning in Software Defect Prediction Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang [email protected] University of Birmingham Research keywords: ensemble learning, class imbalance learning, online learning Shuo Wang

More information

SKILL DEVELOPMENT IN THE ERA OF QUALITY ASSURANCE MANAGEMENT WITH RESPECT TO PRODUCTS & SERVICES BASED SOFTWARE IT ORGANIZATIONS

SKILL DEVELOPMENT IN THE ERA OF QUALITY ASSURANCE MANAGEMENT WITH RESPECT TO PRODUCTS & SERVICES BASED SOFTWARE IT ORGANIZATIONS Tactful Management Research Journal ISSN: 2319-7943 Impact Factor : 2.1632(UIF) SKILL DEVELOPMENT IN THE ERA OF QUALITY ASSURANCE MANAGEMENT WITH RESPECT TO PRODUCTS & SERVICES BASED SOFTWARE IT ORGANIZATIONS

More information

Predicting earning potential on Adult Dataset

Predicting earning potential on Adult Dataset MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

More information

Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics

Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics Kumi Jinzenji NTT Software Innovation Canter NTT Corporation Tokyo, Japan [email protected] Takashi

More information

Ezgi Dinçerden. Marmara University, Istanbul, Turkey

Ezgi Dinçerden. Marmara University, Istanbul, Turkey Economics World, Mar.-Apr. 2016, Vol. 4, No. 2, 60-65 doi: 10.17265/2328-7144/2016.02.002 D DAVID PUBLISHING The Effects of Business Intelligence on Strategic Management of Enterprises Ezgi Dinçerden Marmara

More information

Information Technology Laboratory (ITL) - Strategic Planning Update - Cita Furlani, Director

Information Technology Laboratory (ITL) - Strategic Planning Update - Cita Furlani, Director Information Technology Laboratory (ITL) - Strategic Planning Update - Cita Furlani, Director 1 Strategy Why? NIST Mission: To promote U.S. innovation and industrial competitiveness by advancing measurement

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

ISTQB - Certified Tester Advanced Level - Test Manager

ISTQB - Certified Tester Advanced Level - Test Manager CTALTM - Version: 3 30 June 2016 ISTQB - Certified Tester Advanced Level - Test Manager ISTQB - Certified Tester Advanced Level - Test Manager CTALTM - Version: 3 5 days Course Description: Being a technical

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

STUDENT THESIS PROPOSAL GUIDELINES

STUDENT THESIS PROPOSAL GUIDELINES STUDENT THESIS PROPOSAL GUIDELINES Thesis Proposal Students must work closely with their advisor to develop the proposal. Proposal Form The research proposal is expected to be completed during the normal

More information

CITY OF KENT invites applications for the position of: Recruitment Specialist

CITY OF KENT invites applications for the position of: Recruitment Specialist SALARY: CITY OF KENT invites applications for the position of: Recruitment Specialist $4,109.00 - $5,002.00 Monthly $49,308.00 - $60,024.00 Annually OPENING DATE: 06/26/15 CLOSING DATE: 07/06/15 05:05

More information

SOFTWARE ENGINEERING INTERVIEW QUESTIONS

SOFTWARE ENGINEERING INTERVIEW QUESTIONS SOFTWARE ENGINEERING INTERVIEW QUESTIONS http://www.tutorialspoint.com/software_engineering/software_engineering_interview_questions.htm Copyright tutorialspoint.com Dear readers, these Software Engineering

More information

Masters in Human Computer Interaction

Masters in Human Computer Interaction Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from

More information

The Design and Improvement of a Software Project Management System Based on CMMI

The Design and Improvement of a Software Project Management System Based on CMMI Intelligent Information Management, 2012, 4, 330-337 http://dx.doi.org/10.4236/iim.2012.46037 Published Online November 2012 (http://www.scirp.org/journal/iim) The Design and Improvement of a Software

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System Machine Learning in Hospital Billing Management Janusz Wojtusiak 1, Che Ngufor 1, John M. Shiver 1, Ronald Ewald 2 1. George Mason University 2. INOVA Health System Introduction The purpose of the described

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information