StackOverflow and GitHub: Associations Between Software Development and Crowdsourced Knowledge



Similar documents
StackOverflow and GitHub: Associations Between Software Development and Crowdsourced Knowledge

Wait For It: Determinants of Pull Request Evaluation Latency on GitHub

Understanding the popularity of reporters and assignees in the Github

GITHUB and Continuous Integration in a Software World

Mining Questions Asked by Web Developers

Toward a community enhanced programming education

Facilitating Students Collaboration and Learning in a Question and Answer System

Crowdsourcing suggestions to programming problems for dynamic web development languages

What s Hot in Software Engineering Twitter Space?

Gamifying Software Development Environments Using Cognitive Principles

Enabling Practical SDN Security Applications with OFX (The OpenFlow extension Framework)

Big Data Research in the AMPLab: BDAS and Beyond

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization

Promoting Gatekeeper Course Success Among Community College Students Needing Remediation

TECH NATION VISA SCHEME (Tier 1 Exceptional Talent)

Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Recruit Using Github, Quora, Dribbble & More

A Comprehensive Review of Web-based Automation Testing Tools

Question Quality in Community Question Answering Forums: A Survey

Measuring API Documentation on the Web

Offloading file search operation for performance improvement of smart phones

Mini Project - Phase 3 Connexus Mobile App (Android)

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

HiddenLevers Statistical Analysis Approach

Performance Workload Design

AIS Electronic Library (AISeL) Association for Information Systems. Mark Borman University of Sydney,

Analyzing Test Driven Development based on GitHub Evidence

Know the Difference. Unified Functional Testing (UFT) and Lean Functional Testing (LeanFT) from HP

Gamification in Software Testing and QA. Robert Hoischen Producer & QA Manager, Camshaft Software

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Rita Mano Department of Human Services University of Haifa

Flexible mobility management strategy in cellular networks

Continuous???? Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

People intelligence from New Tech companies 2013 NEW TECH BENCHMARK REPORT

ANALYSING ATTRITION IN OUTSOURCED SOFTWARE PROJECT

Constant time median filtering of extra large images using Hadoop

A Manual Categorization of Android App Development Issues on Stack Overflow

Performance of Hybrid Mobile Application UI Frameworks

Scalable Web Programming. CS193S - Jan Jannink - 1/12/10

Transformational Benefits of the Cloud. Information & Communication technology October 2013

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

QASM: a Q&A Social Media System Based on Social Semantics

Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow

How To Determine If Technical Currency Trading Is Profitable For Individual Currency Traders

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Develop Hybrid Mobile Applications with Apache Cordova & PhoneGap Enterprise

Investigating Opportunistic Software Development Using Social Media Recommendation System

EUCIP - IT Administrator. Module 5 IT Security. Version 2.0

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

THE ABET CAC ACCREDITATION: IS ACCREDITATION RIGHT FOR INFORMATION SYSTEMS?

Do Onboarding Programs Work?

Unit 9: Software Economics

Big Data Analytics - Accelerated. stream-horizon.com

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

Oracle Sales Cloud Sales Performance Management

WebGL based E-Learning Platform on Computer Graphics

Introduction: Why do we need computer networks?

Pricing Crowdsourcing-based Software Development Tasks

In Memory Accelerator for MongoDB

Remote Android Assistant with Global Positioning System Tracking

An enhanced TCP mechanism Fast-TCP in IP networks with wireless links

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

Toward Variability Management to Tailor High Dimensional Index Implementations

Techniques for Improving Regression Testing in Continuous Integration Development Environments

Transcription:

http://www.flickr.com/photos/jamiemanley/5278662995 StackOverflow and GitHub: Associations Between Software Development and Crowdsourced Knowledge Bogdan Vasilescu Vladimir Filkov Alexander Serebrenik TU Eindhoven UC Davis TU Eindhoven @b_vasilescu @aserebrenik

Standing on the shoulders of others Developers: reuse components and libraries forage on the Web for information

Standing on the shoulders of others Developers: reuse components and libraries forage on the Web for information

Standing on the shoulders of others Developers: reuse components and libraries forage on the Web for information

Writing code vs. seeking and sharing knowledge Demand for knowledge Supply of knowledge

Is participation in SO related to productivity of developers?

Is participation in SO related to productivity of developers? Beneficial: good technical solutions! [Parnin et al. Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow, Georgia Institute of Technology, Tech. Rep., 2012] fast answers (median 11 mins) [Mamykina et al. Design lessons from the fastest Q&A site in the west, in CHI. ACM, 2011, pp. 2857 2866] http://www.flickr.com/photos/dw212/4433157278

Is participation in SO related to productivity of developers? Detrimental: competes for time! gamified, thus addictive! [Storey et al. The impact of social media on software engineering practices and tools, FoSER. ACM, 2010, pp. 359 364] [Deterding, Gamification: designing for motivation, Interactions, vol. 19, no. 4, pp. 14 17, 2012] context switches are expensive [Bacchelli et al. Harnessing Stack Overflow for the IDE, in RSSE. IEEE, 2012, pp. 26 30] http://www.flickr.com/photos/jamiemanley/5278662995

Is participation in SO related to productivity of developers? Asset or burden?

Dataset Largest code host in the world Largest programming Q&A site in the world ~400k users July 2011 - April 2012 ~1.3M users July 2008 - August 2012 [G. Gousios and D. Spinellis. GHTorrent: Github s data from a firehose, in MSR. IEEE, 2012, pp. 12 21] [Quarterly StackExchange data dump (August 2012)]

Dataset?! July 2011 - April 2012 July 2008 - August 2012 [G. Gousios and D. Spinellis. GHTorrent: Github s data from a firehose, in MSR. IEEE, 2012, pp. 12 21] [Quarterly StackExchange data dump (August 2012)]

Dataset Email address (plain text) Email address (MD5 hash)?! July 2011 - April 2012 July 2008 - August 2012 [G. Gousios and D. Spinellis. GHTorrent: Github s data from a firehose, in MSR. IEEE, 2012, pp. 12 21] [Quarterly StackExchange data dump (August 2012)]

Dataset Email address (plain text) Email address (MD5 hash) (24%) ~94k users (7%)! July 2011 - April 2012 July 2008 - August 2012 [G. Gousios and D. Spinellis. GHTorrent: Github s data from a firehose, in MSR. IEEE, 2012, pp. 12 21] [Quarterly StackExchange data dump (August 2012)]

Dataset Email address (plain text) Email address (MD5 hash) (12%) (4%) ~47k users active! on both GitHub and StackOverflow between July 2011 - April 2012 July 2011 - April 2012 July 2008 - August 2012 [G. Gousios and D. Spinellis. GHTorrent: Github s data from a firehose, in MSR. IEEE, 2012, pp. 12 21] [Quarterly StackExchange data dump (August 2012)]

Is participation in SO related to productivity of developers? Asset or burden?

Macro: overall activity levels To what extent can activity (expertise) on one platform be used as a proxy for activity (expertise) on the other? social signals (e.g., open source projects, professional social media) ~ career advancement [Capiluppi et al. Assessing technical candidates on the social web, IEEE Software, vol. 30, no. 1, pp. 45 51, 2013]

Intermediate: working rhythms Is attention focused (bursts of commits) or divided between the two platforms? working rhythms of developers ~ software quality [Eyolfson et al. Correlations between bugginess and time-based commit characteristics, Empirical Software Engineering, pp. 1 31, 2013]

Micro: coordination between commits and Q&A Do StackOverflow activities accelerate or slow down GitHub commits? [Storey et al. The impact of social media on software engineering practices and tools, FoSER. ACM, 2010, pp. 359 364]

Macro Overall activity #Commits #Questions #Answers Dave 100 5 50 Stuart 10 75 15 Kevin 25 10 75

Macro Overall activity #Commits #Questions #Answers Dave 100 5 50 Kevin 25 10 75 Stuart 10 75 15

Macro Overall activity Fix, sort Quartiles/Deciles, compare #Commits #Questions #Answers Dave 100 5 50 Kevin 25 10 75 Stuart 10 75 15 Not restricted to monotonic relations!

Findings Q2 Q3 Q4 Active GitHub committers are experienced developers: few StackOverflow questions many StackOverflow answers Q1 Quartiles (#Commits) Compare #Questions Q1 Q2 Q3 Quartiles (#Commits) Compare #Answers Q4

Findings Q2 Q3 Q4 Active GitHub committers are experienced developers: few StackOverflow questions many StackOverflow answers Q1 Quartiles (#Commits) Compare #Questions Q1 Q2 Top StackOverflow users are superstars rather than slackers! Quartiles (#Commits) Compare #Answers Q3 Q4

Findings Q2 Q3 Q4 Active GitHub committers are experienced developers: GitHub activity ~ few StackOverflow questions many StackOverflow answers Q1 Quartiles (#Commits) Compare #Questions StackOverflow willingness to answer technical questions (expertise) Q1 Q2 Top StackOverflow users are superstars rather than slackers! Quartiles (#Commits) Compare #Answers Q3 Q4

Intermediate Working rhythms Committing rhythm: series of inter-commit time intervals [Xuan et al. Measuring the effect of social communications on individual working rhythms: A case study of open source software, in Social Informatics. ASE/IEEE, 2012] Gini (Committing rhythms): focused vs. distributed attention Gini (Committing rhythms) #Q #A C A C C A C C C A 5 50 C Q Q Q C C C Q C 75 15

Findings Asking questions on StackOverflow influences how developers distribute their GitHub commits: heavy askers: bursts of intense commit activity followed by longer periods of inactivity (focused attention)

Findings Asking questions on StackOverflow influences how developers distribute their GitHub commits: Learning from StackOverflow (by asking) and heavy askers: bursts of intense commit committing to GitHub activity followed by longer periods of inactivity (focused attention)

Micro Who benefits from participating in SO Dave [Xuan et al. Measuring the effect of social communications on individual working rhythms: A case study of open source software, in Social Informatics. ASE/IEEE, 2012]

Micro Who benefits from participating in SO Dave [Xuan et al. Measuring the effect of social communications on individual working rhythms: A case study of open source software, in Social Informatics. ASE/IEEE, 2012]

Micro Who benefits from participating in SO Dave [Xuan et al. Measuring the effect of social communications on individual working rhythms: A case study of open source software, in Social Informatics. ASE/IEEE, 2012]

Micro Who benefits from participating in SO Dave Compare actual and shuffled series: actual < shuffled: acceleration actual > shuffled: impediment [Xuan et al. Measuring the effect of social communications on individual working rhythms: A case study of open source software, in Social Informatics. ASE/IEEE, 2012]

Findings

Findings For active committers, asking and answering questions on StackOverflow catalyses committing on GitHub. For no group is participating in StackOverflow detrimental!

Summary: Is participation in SO related to productivity of GitHub dev s? Asset or burden?

Summary Experts are experts Active committers are also active answerers (knowledge providers) on! Different working rhythms for novices (focused attention) and experts! everywhere! Going to is costlier for novices! Participating in reinforces commit activities on asset or burden

Summary Active committers are also active answerers (knowledge providers) on! Different working rhythms for novices (focused attention) and experts! Experts are experts everywhere! Going to! is costlier for novices Participating in reinforces commit activities on asset or burden