2014 中 南 大 学 英 特 尔 透 明 计 算 与 大 数 据 研 讨 会 A Crowd Method for Internet-based Software with Big Data Gang Yin Software Collaboration and Data Mining Group National University of Defense Technology Changsha, July, 1st, 2014
Contents Motivation Approach Application From Bazaar to Big Data 2014/7/14 2
Internet-based Software On the Internet The various online user communities are reshaping the development of Internet-based software 软 件 问 答 软 件 版 本 In the Internet Through the Internet 2014/7/14 3
Characteristics of Internet-based Software Function Attractive Solutions and Features Construction Rapid Experience and Response Evolution Continuous Evolution and Improvement 2014/7/14 4
Characteristics of Internet-based Software Production Oriented Innovation Oriented 2014/7/14 5
Open Source Miracles Richard Stallman Linus Torvalds Eric Raymond launched the GNU Project, wrote the GPL lead the Linux kernel project 2014/7/14 6
Open Source Miracles Collaborative Development Communities Sourceforge GitHub MIUI Baidu Crowd Test Sourceforge:3.5 million users, 400,000 projects Github:4 million users, 6 million repositories MIUI: 1 million users 2014/7/14 7
Open Source Miracles Knowledge Sharing Communities StackOverflow OSChina CSDN ZDNet Slashdot 2 million users users developers IT practitioners 14 million topics Avg. response time : 11 minutes Open source software has strongly demonstrated the power of the Crowds 2014/7/14 8
Open Source Miracles 2014/7/14 9
Other Peer-based Practices Peering Sharing Collaboration 2014/7/14 10
Crowd-based Approach Open Source Crowd-based Approach? High-Level Language Software Engineering 1960s 1970s 1990s Engineering Approach Automated Approach 2014/7/14 11
Crowd-based Approach: Step I Crowd-based Approach Traditional Approaches Peer-based Approaches 2014/7/14 12
Big Data in Software Development Collaborative Development Communities project profile source code issue tracker mailing list API software user tag time Knowledge Sharing Communities Q & A tags / features forum posts blogs / news These data contain valuable information and knowledge for crowd-based software development 2014/7/14 13
The power of Big Data Crowd-based Approach Traditional Approaches Peer-based Approaches Scope Quality SourceForge GitHub ohloh Softpedia StackOverflow Internet-based Software Communities 2014/7/14 14
Crowd-based Approach: Step II Crowd-based Approach with Big Data? Fundamental Approaches Human-Centric Peer-based Approaches Approaches for Mining Engineering Data Data-Centric Approaches for Mining Community Data How to combine the strengths of the Crowds and the Big Data? 2014/7/14 15
Trustie Project National High-Tech Development Plan (863 Program) National Trustworthy Software Resource Sharing and Cooperating Production Environment (Trustie, Since 2007) 2014/7/14 16
Contents Motivation Approach Application The secret of our approach is the meaning of trustworthiness 2014/7/14 17
Software Trustworthiness Given enough eyeballs, all bugs are shallow The history of Linux suggested a surprising theories about software engineering. Human-Centric Vision 2014/7/14 18
Software Trustworthiness Trustworthiness of Internet-based software is hidden in the big data Novelty Productivity Quality Open source software gives us a new sense of value for software development. Engineering Data + Community Data Data-Centric Vision 2014/7/14 19
Data-centric Innovation Cycle Crowd-based Creation Software Data Crowd-based Construction Crowd-based Evolution 2014/7/14 20
The Crowd Method Three Key Principles Open Sharing, Mass Collaboration, Data Analysis 2014/7/14 21
Principles of the Crowd Method The three key principles should be carried out during all innovation cycles 2014/7/14 22
Research Issues on Software Big Data Mass Collaboration How to support the engineers and crowds to collaborate in large scale development? How to enable the crowd development for the industrial software production? Internet Software Communities Data Analysis How to evaluate the contribution of the developers in projects? How to evaluate the trustworthiness of software artifacts? Open Resource Sharing How to find the software more accurately across the various Internet communities? How to locate the trustworthy software artifacts in Internet communities? Trustie team has published papers in international journals (TSE, TSC, JASE, ) and top level conferences (ICSE, ASE, FSE, ICSM,...). 2014/7/14 23
Results on Data Analysis Developers productivity plateaus within 6-7 months in small and medium projects and it takes up to 12 months in large projects. Minghui Zhou, Audris Mockus: Developer fluency: achieving true mastery in software projects. SIGSOFT FSE 2010: 137-146. 2014/7/14 24
Results on Data Analysis The crowds can find interesting projects The crowds can collaborate with engineers Minghui Zhou, Audris Mockus: What make long term contributors: Willingness and opportunity in OSS community. ICSE 2012: 518-528 2014/7/14 25
New Results on Mass Collaboration Android Issue Tracker Bugs StackOverflow Q&A Community Similarity of the texts of bugs and posts Text Time The time when the issues and Q & A are published Co-occurred users in the two communities Co-occurred users Automatic Knowledge Propagation across Communities: A Case Study of Android Issue Tracker and Stack Overflow, to be submitted. 2014/7/14 26
New Results on Mass Collaboration Coder Reviewer Prediction 0.42 0.23 Classifier 0.17 Top-N 0.12 0.06 Who Should Review this Pull-Request: Recommending Reviewers to Expedite Crowd Collaboration, to be submitted. 2014/7/14 27
New Results on Resource Sharing SourceForge Hierarchical Categories Software Communities Ohloh Freecode Aggregation of online descriptions Hierarchical Classifier Fine grained, efficient software resource classification for Crowd generated artifacts Tao Wang, Huaimin Wang, Gang Yin, Charles X. Ling, Xiang Li, Peng Zou: Mining Software Profile across Multiple Repositories for Hierarchical Categorization. ICSM 2013: 240-249 2014/7/14 28
Platform and Practices Application Practices Application in Large Scale Software Industries Neusoft Careland Wonders group Digital China Common Application Modes and Platforms Enterprise Version Community Version Education Version Application in Mission Critical Systems Space flight Electricity Flight control Defense Componentbased SPL Serviceoriented SPL Heterogeneous SPL Runtimemonitoring SPL Third-party SPL Third-party SPL Development Environment Trustie Collaborative Development Toolset Software Communities Trustie Software Resource Sharing Toolset Trustie Software Data Storage and Analysis Toolset Technology System Large Scale Software Resource Sharing Technologies Large Scale Software Collaborative Development Technologies Crowd-based Software Development Approach Big Data enabled Software Trustworthiness Analysis Technologies 2014/7/14 29
Contents Motivation Approach Application Software industries Software engineering education Critical information systems Is the Crowd Method practically efficient, or not? 2014/7/14 30
Application in Internet Communities Collaboration Community more than 32,000 users more than 1,500 projects users and projects can be analyzed comprehensively Sharing Community various kinds of software resources OSS, services, components, more than 60,000 evaluated resources 2014/7/14 31
Application in Software Industries Neusoft Corporation Trustie supported the development of 8 health care information systems in Neusoft. Software reusability increases 75%; productivity increases 65% Digital China Holdings Limited Digital China set up the industrial SPL for trustworthy taxation software development. Software reusability increases 60%; # of bugs decreases 20% Trustie are imported into more than 10 software companies in China, and successfully supports 22 large scale software projects. 2014/7/14 32
Application in Software Industries 2014/7/14 33
Application in Universities project Course Course project Interests project Course Collaboration Course MOOC project MOOP MOOC 2.0 Big Data for Education? 2014/7/14 34
Application in Universities http://forge.trustie.net Project Hosting Version control Issue tracking Project profile Forum/wiki Gant/Documents http://course.trustie.net Course Hosting Course management Member management Exercise monitoring Resource management Forum/Message/Board http://contest.trustie.net Contest Hosting Contest publishing Submission of works Discussion Ranking Notification Social Network + Data Analysis 2014/7/14 35
Future Work Application of Trustie Technologies MOOP, MOOC 2.0 Software engineering education Software garden and industries Industry Education Critical System Research on the Crowd Method Data-driven collaborative development Data-driven software resource sharing Data-driven trustworthiness analysis Software Engineering Network Analysis Data Mining 2014/7/14 36
2014 中 南 大 学 英 特 尔 透 明 计 算 与 大 数 据 研 讨 会 Thank You! Questions? http://forge.trustie.net http://course.trustie.net 2014/7/14 37