A Crowd Method for Internet-based Software with Big Data



Similar documents
Collaborative Software Development Using R-Forge

On the Influence of Free Software on Code Reuse in Software Development

Traditional Commercial Software Development. Open Source Development. Traditional Assumptions. Intangible Goods. Dr. James A.

Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development

Open Source Software Development

BUSMASTER An Open Source Tool

Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project

Collaborative Software Development Platforms for Crowdsourcing

Selection and Management of Open Source Software in Libraries.

Wait For It: Determinants of Pull Request Evaluation Latency on GitHub

Legal Documentation Guidelines and Procedures

FOSS License Restrictions and Some Important Issues

Understanding the popularity of reporters and assignees in the Github

Free software GNU/Linux TOR project

Open-source business models: Creating value from free stuff'

FOSSBazaar A Governance Initiative to manage Free and Open Source Software life cycle

Inner Source Adopting Open Source Development Practices within Organizations: A Tutorial

Sampling from the Debian GNU/Linux Distribution:

Mining Textual Data for Software Engineering Tasks

Agenda. Tango meeting : Krakow

Software Configuration Management, Advantages and Limitations

Open Source and Closed Source Software Development Methodologies

What is PINES? PINES Current and Future Members. About PINES. About PINES

Two case studies of Open Source Software Development: Apache and Mozilla

Introduction to Free Software

What s Hot in Software Engineering Twitter Space?

Innovative Program to Access VMware Tools

TSRR: A Software Resource Repository for Trustworthiness Resource Management and Reuse

Do Onboarding Programs Work?

Security Vulnerability Management. Mark J Cox

Bug management in open source projects

What CCPForge does Introduction to SESC and CCPForge Workshop Gemma Poulter

Analysis of Open Source Software Development Iterations by Means of Burst Detection Techniques

Inner Source Adopting Open Source Development Practices in Organizations A Tutorial

An elearning platform for distanced collaborative programming

Software Development In the Cloud Cloud management and ALM

TAUS Quality Dashboard. An Industry-Shared Platform for Quality Evaluation and Business Intelligence September, 2015

Case Study. Using Knowledge: Advances in Expertise Location and Social Networking

Open Source Software Project Management A Case Study Red Hat Enterprise Linux. Bob Johnson, Red Hat

A Cloud Platform for Delivering Instant Development Service with Service Oriented Approaches

What You Should Know About Open Source Software

An Introduction to Recommendation Systems in Software Engineering

SOFTWARE DEVELOPMENT BASICS SED

Release Management Within Open Source Projects

The Virtualization Practice

Open Source Software: Strategies and Risk Management

PAID VS. VOLUNTEER WORK IN OPEN SOURCE

Reactive Variability Realization with Test-Driven Development and Refactoring

Air Force SOA Enterprise Service Bus Study Using Business Process Management Workflow Orchestration for C4I Systems Integration

Dr. Marco Hugentobler, Sourcepole QGIS from a geodata viewer to a GIS platform

HP Systinet. Software Version: Windows and Linux Operating Systems. Concepts Guide

Measuring API Documentation on the Web

Bug Tracking and Reliability Assessment System (BTRAS)

Guidelines and Procedures for Project Management

A Manual Categorization of Android App Development Issues on Stack Overflow

Developer Fluency: Achieving True Mastery in Software Projects

Presentation. Open Source is NOT Free. For ISACA. By Dave Yip / Gamatech Ltd. Agenda

A Model for Effective Asset Re-use in Software Projects

Transcription:

2014 中 南 大 学 英 特 尔 透 明 计 算 与 大 数 据 研 讨 会 A Crowd Method for Internet-based Software with Big Data Gang Yin Software Collaboration and Data Mining Group National University of Defense Technology Changsha, July, 1st, 2014

Contents Motivation Approach Application From Bazaar to Big Data 2014/7/14 2

Internet-based Software On the Internet The various online user communities are reshaping the development of Internet-based software 软 件 问 答 软 件 版 本 In the Internet Through the Internet 2014/7/14 3

Characteristics of Internet-based Software Function Attractive Solutions and Features Construction Rapid Experience and Response Evolution Continuous Evolution and Improvement 2014/7/14 4

Characteristics of Internet-based Software Production Oriented Innovation Oriented 2014/7/14 5

Open Source Miracles Richard Stallman Linus Torvalds Eric Raymond launched the GNU Project, wrote the GPL lead the Linux kernel project 2014/7/14 6

Open Source Miracles Collaborative Development Communities Sourceforge GitHub MIUI Baidu Crowd Test Sourceforge:3.5 million users, 400,000 projects Github:4 million users, 6 million repositories MIUI: 1 million users 2014/7/14 7

Open Source Miracles Knowledge Sharing Communities StackOverflow OSChina CSDN ZDNet Slashdot 2 million users users developers IT practitioners 14 million topics Avg. response time : 11 minutes Open source software has strongly demonstrated the power of the Crowds 2014/7/14 8

Open Source Miracles 2014/7/14 9

Other Peer-based Practices Peering Sharing Collaboration 2014/7/14 10

Crowd-based Approach Open Source Crowd-based Approach? High-Level Language Software Engineering 1960s 1970s 1990s Engineering Approach Automated Approach 2014/7/14 11

Crowd-based Approach: Step I Crowd-based Approach Traditional Approaches Peer-based Approaches 2014/7/14 12

Big Data in Software Development Collaborative Development Communities project profile source code issue tracker mailing list API software user tag time Knowledge Sharing Communities Q & A tags / features forum posts blogs / news These data contain valuable information and knowledge for crowd-based software development 2014/7/14 13

The power of Big Data Crowd-based Approach Traditional Approaches Peer-based Approaches Scope Quality SourceForge GitHub ohloh Softpedia StackOverflow Internet-based Software Communities 2014/7/14 14

Crowd-based Approach: Step II Crowd-based Approach with Big Data? Fundamental Approaches Human-Centric Peer-based Approaches Approaches for Mining Engineering Data Data-Centric Approaches for Mining Community Data How to combine the strengths of the Crowds and the Big Data? 2014/7/14 15

Trustie Project National High-Tech Development Plan (863 Program) National Trustworthy Software Resource Sharing and Cooperating Production Environment (Trustie, Since 2007) 2014/7/14 16

Contents Motivation Approach Application The secret of our approach is the meaning of trustworthiness 2014/7/14 17

Software Trustworthiness Given enough eyeballs, all bugs are shallow The history of Linux suggested a surprising theories about software engineering. Human-Centric Vision 2014/7/14 18

Software Trustworthiness Trustworthiness of Internet-based software is hidden in the big data Novelty Productivity Quality Open source software gives us a new sense of value for software development. Engineering Data + Community Data Data-Centric Vision 2014/7/14 19

Data-centric Innovation Cycle Crowd-based Creation Software Data Crowd-based Construction Crowd-based Evolution 2014/7/14 20

The Crowd Method Three Key Principles Open Sharing, Mass Collaboration, Data Analysis 2014/7/14 21

Principles of the Crowd Method The three key principles should be carried out during all innovation cycles 2014/7/14 22

Research Issues on Software Big Data Mass Collaboration How to support the engineers and crowds to collaborate in large scale development? How to enable the crowd development for the industrial software production? Internet Software Communities Data Analysis How to evaluate the contribution of the developers in projects? How to evaluate the trustworthiness of software artifacts? Open Resource Sharing How to find the software more accurately across the various Internet communities? How to locate the trustworthy software artifacts in Internet communities? Trustie team has published papers in international journals (TSE, TSC, JASE, ) and top level conferences (ICSE, ASE, FSE, ICSM,...). 2014/7/14 23

Results on Data Analysis Developers productivity plateaus within 6-7 months in small and medium projects and it takes up to 12 months in large projects. Minghui Zhou, Audris Mockus: Developer fluency: achieving true mastery in software projects. SIGSOFT FSE 2010: 137-146. 2014/7/14 24

Results on Data Analysis The crowds can find interesting projects The crowds can collaborate with engineers Minghui Zhou, Audris Mockus: What make long term contributors: Willingness and opportunity in OSS community. ICSE 2012: 518-528 2014/7/14 25

New Results on Mass Collaboration Android Issue Tracker Bugs StackOverflow Q&A Community Similarity of the texts of bugs and posts Text Time The time when the issues and Q & A are published Co-occurred users in the two communities Co-occurred users Automatic Knowledge Propagation across Communities: A Case Study of Android Issue Tracker and Stack Overflow, to be submitted. 2014/7/14 26

New Results on Mass Collaboration Coder Reviewer Prediction 0.42 0.23 Classifier 0.17 Top-N 0.12 0.06 Who Should Review this Pull-Request: Recommending Reviewers to Expedite Crowd Collaboration, to be submitted. 2014/7/14 27

New Results on Resource Sharing SourceForge Hierarchical Categories Software Communities Ohloh Freecode Aggregation of online descriptions Hierarchical Classifier Fine grained, efficient software resource classification for Crowd generated artifacts Tao Wang, Huaimin Wang, Gang Yin, Charles X. Ling, Xiang Li, Peng Zou: Mining Software Profile across Multiple Repositories for Hierarchical Categorization. ICSM 2013: 240-249 2014/7/14 28

Platform and Practices Application Practices Application in Large Scale Software Industries Neusoft Careland Wonders group Digital China Common Application Modes and Platforms Enterprise Version Community Version Education Version Application in Mission Critical Systems Space flight Electricity Flight control Defense Componentbased SPL Serviceoriented SPL Heterogeneous SPL Runtimemonitoring SPL Third-party SPL Third-party SPL Development Environment Trustie Collaborative Development Toolset Software Communities Trustie Software Resource Sharing Toolset Trustie Software Data Storage and Analysis Toolset Technology System Large Scale Software Resource Sharing Technologies Large Scale Software Collaborative Development Technologies Crowd-based Software Development Approach Big Data enabled Software Trustworthiness Analysis Technologies 2014/7/14 29

Contents Motivation Approach Application Software industries Software engineering education Critical information systems Is the Crowd Method practically efficient, or not? 2014/7/14 30

Application in Internet Communities Collaboration Community more than 32,000 users more than 1,500 projects users and projects can be analyzed comprehensively Sharing Community various kinds of software resources OSS, services, components, more than 60,000 evaluated resources 2014/7/14 31

Application in Software Industries Neusoft Corporation Trustie supported the development of 8 health care information systems in Neusoft. Software reusability increases 75%; productivity increases 65% Digital China Holdings Limited Digital China set up the industrial SPL for trustworthy taxation software development. Software reusability increases 60%; # of bugs decreases 20% Trustie are imported into more than 10 software companies in China, and successfully supports 22 large scale software projects. 2014/7/14 32

Application in Software Industries 2014/7/14 33

Application in Universities project Course Course project Interests project Course Collaboration Course MOOC project MOOP MOOC 2.0 Big Data for Education? 2014/7/14 34

Application in Universities http://forge.trustie.net Project Hosting Version control Issue tracking Project profile Forum/wiki Gant/Documents http://course.trustie.net Course Hosting Course management Member management Exercise monitoring Resource management Forum/Message/Board http://contest.trustie.net Contest Hosting Contest publishing Submission of works Discussion Ranking Notification Social Network + Data Analysis 2014/7/14 35

Future Work Application of Trustie Technologies MOOP, MOOC 2.0 Software engineering education Software garden and industries Industry Education Critical System Research on the Crowd Method Data-driven collaborative development Data-driven software resource sharing Data-driven trustworthiness analysis Software Engineering Network Analysis Data Mining 2014/7/14 36

2014 中 南 大 学 英 特 尔 透 明 计 算 与 大 数 据 研 讨 会 Thank You! Questions? http://forge.trustie.net http://course.trustie.net 2014/7/14 37