Pricing Crowdsourcing-based Software Development Tasks



Similar documents
Prediction of Stock Performance Using Analytical Techniques

COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS

Role of Social Networking in Marketing using Data Mining

A Review of Data Mining Techniques

Data Mining + Business Intelligence. Integration, Design and Implementation

Chapter 6. The stacking ensemble approach

Using Data Mining for Mobile Communication Clustering and Characterization

Benchmarking of different classes of models used for credit scoring

An Introduction to Advanced Analytics and Data Mining

Introduction to Predictive Analytics. Dr. Ronen Meiri

Social Media and Digital Marketing Analytics ( INFO-UB ) Professor Anindya Ghose Monday Friday 6-9:10 pm from 7/15/13 to 7/30/13

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Office: LSK 5045 Begin subject: [ISOM3360]...

Software Development Cost and Time Forecasting Using a High Performance Artificial Neural Network Model

Unit 9: Software Economics

Prerequisites. Course Outline

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

Data Mining Part 5. Prediction

Information Management course

Multichannel Attribution

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

Data Mining - Evaluation of Classifiers

Core Curriculum to the Course:

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

not possible or was possible at a high cost for collecting the data.

Azure Machine Learning, SQL Data Mining and R

Why is Internal Audit so Hard?

Predictive Analytics Tools and Techniques

DATA MINING TECHNIQUES AND APPLICATIONS

Predictive Modeling and Big Data

Big Data and Marketing

Course Syllabus. Purposes of Course:

Econometrics and Data Analysis I

Building and deploying effective data science teams. Nikita Lytkin, Ph.D.

MS1b Statistical Data Mining

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Keywords data mining, prediction techniques, decision making.

Cleaned Data. Recommendations

MSCA Introduction to Statistical Concepts

Introduction to Data Mining

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

2015 Workshops for Professors

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Crowdfunding Support Tools: Predicting Success & Failure

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Predicting Student Performance by Using Data Mining Methods for Classification

Developer Recommendation for Crowdsourced Software Development Tasks

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

Predictive Dynamix Inc

MACHINE LEARNING BASICS WITH R

Data Mining. Nonlinear Classification

Chapter 7: Data Mining

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Machine Learning Logistic Regression

Microsoft Azure Machine learning Algorithms

Big data, the future of statistics

Chapter 12 Discovering New Knowledge Data Mining

Social Media Boot Camp

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

THE THREE "Rs" OF PREDICTIVE ANALYTICS

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CII Institute of Logistics

Stock Market Forecasting Using Machine Learning Algorithms

ANALYTICS CENTER LEARNING PROGRAM

Cloud Analytics for Capacity Planning and Instant VM Provisioning

Anatomy of a Crowdsourcing Platform - Using the Example of Microworkers.com

The Predictive Data Mining Revolution in Scorecards:

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

PREDICTIVE TECHNIQUES IN SOFTWARE ENGINEERING : APPLICATION IN SOFTWARE TESTING

DESCRIPTION OF COURSES

Machine Learning Capacity and Performance Analysis and R

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

INDIAN STATISTICAL INSTITUTE announces Training Program on Statistical Techniques for Data Mining & Business Analytics

Knowledge Discovery from patents using KMX Text Analytics

An Overview of Knowledge Discovery Database and Data mining Techniques

Business Intelligence. Data Mining and Optimization for Decision Making

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

High Productivity Data Processing Analytics Methods with Applications

Data quality in Accounting Information Systems

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

Make Better Decisions Through Predictive Intelligence

Pattern-Aided Regression Modelling and Prediction Model Analysis

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

Transcription:

Pricing Crowdsourcing-based Software Development Tasks Ke Mao maoke at nfs.iscas.ac.cn Institute of Software, Chinese Academy of Sciences

Overview 2 Background Crowdsourcing: Micro task VS. Complex task The TopCoder Platform Motivation New Phenomenon The Pricing Issue Methodology Experiments & Insights Conclusion

Overview 3 Background Crowdsourcing: Micro task VS. Complex task The TopCoder Platform Motivation New Phenomenon The Pricing Issue Methodology Experiments & Insights Conclusion

A Recent News... 4 Typical work day of one star developer in Verizon: 09:00 a.m. Arrive and surf Reddit, watch cat videos 11:30 a.m. Take lunch 01:00 p.m. Ebay time 02:00 p.m. Facebook updates LinkedIn 04:30 p.m. End of day update e-mail to management 05:00 p.m. Go home

Introduction to Crowdsourcing 5 A proper way... Labor of the Internet Low cost Suprising deliverable Wisdom of the Crowd

What is Crowdsourcing? 6 "Crowdsourcing" defined by Jeff Howe: The act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. Crowdsourcing VS. Outsourcing: The crucial prerequisite is the use of the open call format and the large network of potential laborers.

7 Micro Task VS. Complex Task

Micro Task VS. Complex Task 8 Credit: http://sandfishdesign.co.uk, 2012, Crowdsourcing, LLC

What is TopCoder? 9 The world's largest competitive community for crowdsourced software development The TopCoder Community is 600,000 strong Membership China India U.S. Credit: www.topcoder.com, 2007, TopCoder, Inc

What is TopCoder? 10 What kinds of projects can I do with TopCoder? Mobile Applications Analytics and Optimization Scientific Algorithm Development Online Communities Open Platforms Digital Media Business Systems

Overview 11 Background Crowdsourcing: Micro task VS. Complex task The TopCoder Platform Motivation New Phenomenon The Pricing Issue Methodology Experiments & Insights Conclusion

Motivation - New Phenomenon 12 New Paradigm Crowdsourced development 1. Open call format 2. Large networked potential labor 2. Global Labor 1. Competition Fig.1 Illustration of crowdsourcing-based software development process.

Motivation - New Phenomenon 13 New Phenomenon In SE activity 2 examples that challenge traditional law Parkinson's Law COCOMO Model

Motivation - New Phenomenon 14 Parkinson's Law ( Work expands so as to fill the time available for its completion. ) Fig.2 Correlation between the time allocated and the actual time consumed

Motivation - New Phenomenon 15 Basic COCOMO Model ( EFFORT = a * SIZE b ) Fig.3 The effort estimated by COCOMO model, compared to the actual effort.

Motivation - The Pricing Issue 16 Fig.4 Active Component Development Contests on TopCoder.com Inappropriate price often lead to low capital efficiency and task starvation How to build empirical pricing models?

Overview 17 Background Crowdsourcing: Micro task VS. Complex task The TopCoder Platform Motivation New Phenomenon The Pricing Issue Methodology Experiments & Insights Conclusion

Methodology 18 Price Drivers Input Complexity Quality of Input Previous Phase Decision Development Type TABLE I. DESCRIPTIVE STATISTICS AND REGRESSION COEFFICIENTS OF PROPOSED FACTORS

Methodology 19 Predictive Models Multiple Linear Regression Model: 9 other Machine Learning models 4 Decision Tree based learners C5.0, CART, QUEST, CHAID 2 Instance based learners KNN-1, KNN-k [3, 7] 1 Neural Net 1 Support Vector Machine 1 Logistic Regression

Overview 20 Background Crowdsourcing: Micro task VS. Complex task The TopCoder Platform Motivation New Phenomenon The Pricing Issue Methodology Experiments & Insights Conclusion

Experiments 21 Aim: To answer the following RQs. RQs: Baseline Comparison How much better? Performance Assessment Which is the best? Actionable Insights What guidance can we offer?

Experiments 22 Dataset Sep 29th 2003 to Sep 2nd 2012 2,895 design and 3,015 development tasks 490 successful sw dev projects from TopCoder Validation method LOOCV

Experiments 23 Performance Measures:

Experimental Results 24 Answer to RQ1: Outperformed by all 10 predictive models, according to Pred(30) measure Fig.5 Performance of pricing models learned by each approach

Experimental Results 25 Answer to RQ2: Decision tree based learners C5.0, QUEST, CART (Consistant with former assessment on TSE) Fig.5 Performance of pricing models learned by each approach

Insights 26 Answer to RQ3: Significance Anlysis TABLE I. DESCRIPTIVE STATISTICS AND REGRESSION COEFFICIENTS OF PROPOSED FACTORS

Insights 27 Answer to RQ3: Rules of Thumb ISUP => $70 COMP(4 pages) => $30 SEQU(4 diagrams) => $30 SIZE(1 KSLOC) => $30 May not alway be right But "Why am I bucking the trend?"

Overview 28 Background Crowdsourcing: Micro task VS. Complex task The TopCoder Platform Motivation New Phenomenon The Pricing Issue Methodology Experiments & Insights Conclusion

Conclusion 29 Analyzed 5,910 sw dev tasks on TopCoder Proposed 16 price drivers Assessed 10 empirical pricing models Useful prediction quality is achievable (Pred(30)>0.8) Actionable advice can be extracted from our models to assist the managers and developers on TopCoder

谢 谢! Thanks!