Data Collection from Open Source Software Repositories
|
|
|
- Lee Cunningham
- 10 years ago
- Views:
Transcription
1 Data Collection from Open Source Software Repositories GORAN MAUŠA, TIHANA GALINAC GRBAC SEIP LABORATORY FACULTY OF ENGINEERING UNIVERSITY OF RIJEKA, CROATIA
2 Software Defect Prediction (SDP) Aim: Focus testing effort to software units with higher fault-proneness probability Motivation: High testing costs (80% after release) Pareto principle can be applied Bugs Code SDP approach: Classification based on parameters of size and complexity 80% 20% 80% 20% Predictive model building: Training Set Model Building Data Collection Data Preprocessing Data Division Evaluation Testing Set
3 Data Collection for SDP Motivation: The context of project development may influence SDP performance Small number of available datasets => inability to study the context influence Problem: Lack of systematic data collection approach Data collection is time consuming and not trivial Potential Source of data: 1. Industrial (large telecom. software) Rarely available 2. Open repositories (PROMISE gives NASA datasets) Impossible to validate (missing data collection procedure and source code) Often suffer from: missing values, outliers, duplicated entries, unbalance, Open source projects (Eclipse, Mozilla, Apache) Increasingly popular, easily validated, expandable,
4 Open Source Software Repositories Linking 2 repositories : Source code management & bug tracking Structured and unstructured data Problem: there is no formal link Consequence: different approach -» data bias Important characteristics : Bug status: (closed / opened) Bug resolution: (fixed / otherwise) Bugs severity: (blocker - normal / +trivial / +enhancement) Repository search order: (start with bugs / source code changes) Declaration of defect-free units (all the unlinked units / unlinked & unchanged)
5 Data Collection for SDP Linking Techniques : Simple search Regular expression search Authorship correspondence Time correlation Advanced NLP techniques (ReLink) Bug tracking repository Bug ID Bug assignee Bug closed Release Bug opened Comments Source code repository Commit message Commit author Commit date Release tag Release date Issues : Granularity level (package / file / class / method) Software metrics (product / development & process / usage) Bug File cardinality (many to many) Bug File duplicated links Bug ID varying length
6 Bug Code (BuCo) Analyzer Tool [SoftCOM 2014] Tool developed through : Systematic literature review (36 papers from [1] + 35 / 136 / 4447) Exploratory study (12 students, observer triangulation, 5 projects, 4 exercises, 5 data forms, 52 tasks) Software product metrics tools review (iterative review 35 / 19 / 5 / 2 tools) Iterative development (30 students - 13 groups) Systematic comparison of techniques (7 techniques, 5 projects, 37 releases) Tool properties : Automatic data collection Simple interface 6 bug-code linking techniques Calculation of 50 product metrics Bug counting Report generation [1] Hall T, Beecham S, Bowes D, Gray D, Counsell S: A systematic literature review on fault prediction performance in software engineering, IEEE Trans Softw Eng 38(6), pp , 2012
7 Bug Code (BuCo) Analyzer Tool [SoftCOM 2014] Tool offers: Bug download from Bugzilla of Eclipse, Apache and Mozilla communities SCM download from GIT Bug-Code linking techniques: Automatic calculation of product metrics Generate reports
8 Bug Code Linking Techniques [SQAMIA 2014] Analysis 1 : Comparison: Simple search & ReLink Aim: define Regex Search Project: Apache HTTPD Source: ReLink data, GIT repository Analyses 2 & 3 : Comparison: Regex search & ReLink Aim: benchmark evaluation Projects: Apache HTTPD, OpenNLP Source: ReLink & Benchmark data, GIT
9 Bug Code Linking Techniques - Results [SQAMIA 2014] Analysis 1 results : Unequal input & linking output: Manual investigation revealed: Regular expression:
10 Bug Code Linking Techniques - Results [SQAMIA 2014] Analyses 2 & 3 results : OpenNLP benchmark dataset (equal input), different linking output: Manual investigation REGEX : Manual investigation ReLink :
11 Bug Code Linking Techniques - Conclusion [SQAMIA 2014] The generalization of research requires: Datasets from various domains Systematic procedure with limited bias Bug Code linking Proven to be prone to bias Complex technique outperformed by regular expression search Future research Compare the whole data collection process approaches Analyze the environment influence to bug-code linking
12 Current Research Developing a systematic data collection procedure for SDP Comparison of different linking techniques on various environments: Comparison of the most popular SZZ approach [2] to our own Interactions between different techniques, approaches and datasets used in our experiment [2] SZZ: When do changes induce fixes?, SIGSOFT Softw Eng Notes, 2005
13 Thank you for you attention! Question?
Lund, November 16, 2015. Tihana Galinac Grbac University of Rijeka
Lund, November 16, 2015. Tihana Galinac Grbac University of Rijeka Motivation New development trends (IoT, service compositions) Quality of Service/Experience Demands Software (Development) Technologies
Processing and data collection of program structures in open source repositories
1 Processing and data collection of program structures in open source repositories JEAN PETRIĆ, TIHANA GALINAC GRBAC AND MARIO DUBRAVAC, University of Rijeka Software structure analysis with help of network
Analysis of Software Project Reports for Defect Prediction Using KNN
, July 2-4, 2014, London, U.K. Analysis of Software Project Reports for Defect Prediction Using KNN Rajni Jindal, Ruchika Malhotra and Abha Jain Abstract Defect severity assessment is highly essential
A Systematic Literature Review on Fault Prediction Performance in Software Engineering
1 A Systematic Literature Review on Fault Prediction Performance in Software Engineering Tracy Hall, Sarah Beecham, David Bowes, David Gray and Steve Counsell Abstract Background: The accurate prediction
A Systematic Review of Fault Prediction Performance in Software Engineering
Tracy Hall Brunel University A Systematic Review of Fault Prediction Performance in Software Engineering Sarah Beecham Lero The Irish Software Engineering Research Centre University of Limerick, Ireland
Analysis of Open Source Software Development Iterations by Means of Burst Detection Techniques
Analysis of Open Source Software Development Iterations by Means of Burst Detection Techniques Bruno Rossi, Barbara Russo, and Giancarlo Succi CASE Center for Applied Software Engineering Free University
Got Issues? Do New Features and Code Improvements Affect Defects?
Got Issues? Do New Features and Code Improvements Affect Defects? Daryl Posnett [email protected] Abram Hindle [email protected] Prem Devanbu [email protected] Abstract There is a perception
Confirmation Bias as a Human Aspect in Software Engineering
Confirmation Bias as a Human Aspect in Software Engineering Gul Calikli, PhD Data Science Laboratory, Department of Mechanical and Industrial Engineering, Ryerson University Why Human Aspects in Software
Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study
Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study Francisco Zigmund Sokol 1, Mauricio Finavaro Aniche 1, Marco Aurélio Gerosa 1 1 Department of Computer Science University of São
Empirical study of software quality evolution in open source projects using agile practices
1 Empirical study of software quality evolution in open source projects using agile practices Alessandro Murgia 1, Giulio Concas 1, Sandro Pinna 1, Roberto Tonelli 1, Ivana Turnu 1, SUMMARY. 1 Dept. Of
On the Cost of Mining Very Large Open Source Repositories
On the Cost of Mining Very Large Open Source Repositories Sean Banerjee Carnegie Mellon University Bojan Cukic University of North Carolina at Charlotte BIGDSE, Florence 2015 Introduction Issue tracking
Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project
Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project Dindin Wahyudin, Alexander Schatten, Dietmar Winkler, Stefan Biffl Institute of Software Technology
Software Configuration Management
Software Configuration Management 1 Software Configuration Management Four aspects Version control Automated build Change control Release Supported by tools Requires expertise and oversight More important
INVESTIGATING THE DEFECT PATTERNS DURING THE SOFTWARE DEVELOPMENT PROJECTS
INVESTIGATING THE DEFECT PATTERNS DURING THE SOFTWARE DEVELOPMENT PROJECTS A Paper Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Abhaya Nath
Quality prediction model for object oriented software using UML metrics
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. UML Quality prediction model for object oriented software using UML metrics CAMARGO CRUZ ANA ERIKA and KOICHIRO
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)
White Paper. Software Development Best Practices: Enterprise Code Portal
White Paper Software Development Best Practices: Enterprise Code Portal An Enterprise Code Portal is an inside the firewall software solution that enables enterprise software development organizations
Analyzing the Decision Criteria of Software Developers Based on Prospect Theory
Analyzing the Decision Criteria of Software Developers Based on Prospect Theory Kanako Kina, Masateru Tsunoda Department of Informatics Kindai University Higashiosaka, Japan [email protected] Hideaki
Automating the Measurement of Open Source Projects
Automating the Measurement of Open Source Projects Daniel German Department of Computer Science University of Victoria [email protected] Audris Mockus Avaya Labs Department of Software Technology Research
Impact CM: Model-Based Software Change and Configuration Management
Title Impact CM: Model-Based Software Change and Configuration Management Eclipse Integrated Development Day Berlin, 30 May 2012 Michael Diers elego Software Solutions GmbH 2012 1 Agenda
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)
Software Defect Prediction for Quality Improvement Using Hybrid Approach
Software Defect Prediction for Quality Improvement Using Hybrid Approach 1 Pooja Paramshetti, 2 D. A. Phalke D.Y. Patil College of Engineering, Akurdi, Pune. Savitribai Phule Pune University ABSTRACT In
Quantitative Project Management Framework via Integrating
Quantitative Project Management Framework via Integrating Six Sigma and PSP/TSP Sejun Kim, BISTel Okjoo Choi, Jongmoon Baik, Abstract: Process technologies such as Personal Software Process SM (PSP) and
A Prediction Model for System Testing Defects using Regression Analysis
A Prediction Model for System Testing Defects using Regression Analysis 1 Muhammad Dhiauddin Mohamed Suffian, 2 Suhaimi Ibrahim 1 Faculty of Computer Science & Information System, Universiti Teknologi
NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems
NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems Henrique S. C. Rocha 1, Guilherme A. de Oliveira 2, Humberto T. Marques-Neto 2, Marco Túlio O. Valente 1 1 Department of Computer Science
International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 5 : Issue on September / October, 2012
USING DEFECT PREVENTION TECHNIQUES IN SDLC Karthikeyan. Natesan Production Database Team Singapore Abstract : In our research paper we have discussed about different defect prevention techniques that are
A Visualization Approach for Bug Reports in Software Systems
, pp. 37-46 http://dx.doi.org/10.14257/ijseia.2014.8.10.04 A Visualization Approach for Bug Reports in Software Systems Maen Hammad 1, Somia Abufakher 2 and Mustafa Hammad 3 1, 2 Department of Software
A Manual Categorization of Android App Development Issues on Stack Overflow
2014 IEEE International Conference on Software Maintenance and Evolution A Manual Categorization of Android App Development Issues on Stack Overflow Stefanie Beyer Software Engineering Research Group University
Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization
Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization Amir H. Moin and Mohammad Khansari Department of IT Engineering, School of Science & Engineering, Sharif University
Towards a Big Data Curated Benchmark of Inter-Project Code Clones
Towards a Big Data Curated Benchmark of Inter-Project Code Clones Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy, Mohammad Mamun Mia Department of Computer Science, University of Saskatchewan,
ANALYSIS OF OPEN SOURCE DEFECT TRACKING TOOLS FOR USE IN DEFECT ESTIMATION
ANALYSIS OF OPEN SOURCE DEFECT TRACKING TOOLS FOR USE IN DEFECT ESTIMATION Catherine V. Stringfellow, Dileep Potnuri Department of Computer Science Midwestern State University Wichita Falls, TX U.S.A.
Copyrighted www.eh1infotech.com +919780265007, 0172-5098107 Address :- EH1-Infotech, SCF 69, Top Floor, Phase 3B-2, Sector 60, Mohali (Chandigarh),
Content of 6 Months Software Testing Training at EH1-Infotech Module 1: Introduction to Software Testing Basics of S/W testing Module 2: SQA Basics Testing introduction and terminology Verification and
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
The Evolution of Mobile Apps: An Exploratory Study
The Evolution of Mobile Apps: An Exploratory Study Jack Zhang, Shikhar Sagar, and Emad Shihab Rochester Institute of Technology Department of Software Engineering Rochester, New York, USA, 14623 {jxz8072,
Review On Google Android a Mobile Platform
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 5 (Mar. - Apr. 2013), PP 21-25 Review On Google Android a Mobile Platform Shyam Bhati 1, Sandeep Sharma
Software project cost estimation using AI techniques
Software project cost estimation using AI techniques Rodríguez Montequín, V.; Villanueva Balsera, J.; Alba González, C.; Martínez Huerta, G. Project Management Area University of Oviedo C/Independencia
A WHITE PAPER BY ASTORIA SOFTWARE
A WHITE PAPER BY ASTORIA SOFTWARE Managing multiple releases when using DITA can be quite a challenge. The usual technique is to branch a document for a specific release and then merge the branch with
Class Imbalance Learning in Software Defect Prediction
Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang [email protected] University of Birmingham Research keywords: ensemble learning, class imbalance learning, online learning Shuo Wang
MEASURING THE SIZE OF SMALL FUNCTIONAL ENHANCEMENTS TO SOFTWARE
MEASURING THE SIZE OF SMALL FUNCTIONAL ENHANCEMENTS TO SOFTWARE Marcela Maya, Alain Abran, Pierre Bourque Université du Québec à Montréal P.O. Box 8888 (Centre-Ville) Montréal (Québec), Canada H3C 3P8
A New Approach For Estimating Software Effort Using RBFN Network
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,
Bug Fixing Process Analysis using Program Slicing Techniques
Bug Fixing Process Analysis using Program Slicing Techniques Raula Gaikovina Kula and Hajimu Iida Graduate School of Information Science, Nara Institute of Science and Technology Takayamacho 8916-5, Ikoma,
Case Study of A Telecom Infrastructure Management Company
Case Study of A Telecom Infrastructure Management Company Customer : A Leading Telecom Tower Management Company in India Customer s Business Serves to Telecom Operators Provides Network Operations Services
Driving Quality Improvement and Reducing Technical Debt with the Definition of Done
Driving Quality Improvement and Reducing Technical Debt with the Definition of Done Noopur Davis Principal, Davis Systems Pittsburgh, PA [email protected] Abstract This paper describes our experiences
BugMaps-Granger: a tool for visualizing and predicting bugs using Granger causality tests
Couto et al. Journal of Software Engineering Research and Development 2014, 2:1 SOFTWARE Open Access BugMaps-Granger: a tool for visualizing and predicting bugs using Granger causality tests Cesar Couto
Characterizing and Predicting Blocking Bugs in Open Source Projects
Characterizing and Predicting Blocking Bugs in Open Source Projects Harold Valdivia Garcia and Emad Shihab Department of Software Engineering Rochester Institute of Technology Rochester, NY, USA {hv1710,
Understanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)
Today: Source code control CPSC 491 Source Code (Version) Control Exercise: 1. Pretend like you don t have a version control system (e. g., no git, subversion, cvs, etc.) 2. How would you manage your source
A Qualitative Study on Performance Bugs
A Qualitative Study on Performance Bugs Shahed Zaman, Bram Adams and Ahmed E. Hassan SAIL, Queen s University, Canada {zaman,ahmed}@cs.queensu.ca MCIS, École Polytechnique de Montréal, Canada [email protected]
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University [email protected] Taghi M. Khoshgoftaar
Got Issues? Who Cares About It?
Got Issues? Who Cares About It? A Large Scale Investigation of Issue Trackers from GitHub Tegawendé F. Bissyandé 1, David Lo 2, Lingxiao Jiang 2, Laurent Réveillère 3, Jacques Klein 1 and Yves Le Traon
Software defect prediction using machine learning on test and source code metrics
Thesis no: MECS-2014-06 Software defect prediction using machine learning on test and source code metrics Mattias Liljeson Alexander Mohlin Faculty of Computing Blekinge Institute of Technology SE 371
The Impact of Release Management and Quality Improvement in Open Source Software Project Management
Applied Mathematical Sciences, Vol. 6, 2012, no. 62, 3051-3056 The Impact of Release Management and Quality Improvement in Open Source Software Project Management N. Arulkumar 1 and S. Chandra Kumramangalam
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Software Continuous Integration & Delivery
November 2013 Daitan White Paper Software Continuous Integration & Delivery INCREASING YOUR SOFTWARE DEVELOPMENT PROCESS AGILITY Highly Reliable Software Development Services http://www.daitangroup.com
Software Configuration Management Plan
For Database Applications Document ID: Version: 2.0c Planning Installation & Acceptance Integration & Test Requirements Definition Design Development 1 / 22 Copyright 2000-2005 Digital Publications LLC.
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Mining a Change-Based Software Repository
Mining a Change-Based Software Repository Romain Robbes Faculty of Informatics University of Lugano, Switzerland 1 Introduction The nature of information found in software repositories determines what
Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development
Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development Stefan Dietze Fraunhofer Institute for Software and Systems Engineering (ISST), Mollstr. 1, 10178
Introduction to Programming Tools. Anjana & Shankar September,2010
Introduction to Programming Tools Anjana & Shankar September,2010 Contents Essentials tooling concepts in S/W development Build system Version Control System Testing Tools Continuous Integration Issue
Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities
Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities Monika Gupta [email protected] PhD Advisor: Dr. Ashish Sureka Industry
Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics
Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics Kumi Jinzenji NTT Software Innovation Canter NTT Corporation Tokyo, Japan [email protected] Takashi
Nagarjuna College Of
Nagarjuna College Of Information Technology (Bachelor in Information Management) TRIBHUVAN UNIVERSITY Project Report on World s successful data mining and data warehousing projects Submitted By: Submitted
ReLink: Recovering Links between Bugs and Changes
ReLink: Recovering Links between Bugs and Changes Rongxin Wu, Hongyu Zhang, Sunghun Kim and S.C. Cheung School of Software, Tsinghua University Beijing 100084, China [email protected], [email protected]
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Full-text Search in Intermediate Data Storage of FCART
Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia [email protected],
