A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY
|
|
|
- Charles Rogers
- 9 years ago
- Views:
Transcription
1 A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY Jin Xu,Yongqin Gao, Scott Christley & Gregory Madey Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN HICSS 38, Hawaii January 2005 Partially Supported by NSF, CISE/IIS-Digital Science and Technology
2 INTRODUCTION FLOSS: Complex/Self-organizing System Characteristic Structural/Topological Properties Size distributions Connectivity Social Networks Small-world Models Kevin Bacon Number Erdos Numbers Collaboration Networks Quantitative Structural/Topological Study of Project Communities Most developers participate on one project Some developers participate on more than one project Extension on prior study, using newer expanded data
3 PREVIOUS RESEARCH Social Networks Wasserman, S. Faust, K., Social Network Analysis, Watts, D., Small Worlds, Newman, M.E.J. "Clustering and Preferential Attachment in Growing Networks," Barabasi, A. L., Linked, 2003 Gao Y. Q., Vincent F., Madey G., "Analysis and Modeling of the Open Source Software Community", North American Association for Computational Social and Organizational Science (NAACSOS 2003), Pittsburgh, PA, Madey G., Freeh V., and Tynan R., "Modeling the F/OSS Community: A Quantitative Investigation," in Free/Open Source Software Development, ed., Stephan Koch, Idea Publishing, 2004.
4 PREVIOUS RESEARCH (cont.) Web Data Mining Xu J., Huang Y., Madey G., "A Research Support System Framework for Web Data Mining", Workshop on Applications, Products and Services of Web-based Support Systems at the Joint International Conference on Web Intelligence (2003 IEEE/WIC) and Intelligent Agent Technology, Halifax, Canada, October Howison J. and Crowston K.", "The perils and pitfalls of mining SourceForge", Proceedings of Mining Software Repositories Workshop, International Conference on Software Engineering (ICSE 2004), Edinburgh, Scotland, 2004.
5 PREVIOUS RESEARCH (cont.) Roles Classification Nakakoji K., Yamamoto Y., Kishida K., Ye Y., "Evolution Patterns of Open-source Software Systems and Communities", Proceedings of The International Workshop on Principles of Software Evolution, Orlando Florida, May 19-20, Xu N., An Exploratory Study of Open Source Software Based on Public Project Archives, Thesis, The John Molson School of Business, Concordia University, Canada, 2003.
6 OSS COMMUNITY User Group Passive Users: no direct attributable contribution in the data (downloads, user base, word-of-mouth publicity, etc.) Active Users: bug reports, patch submissions, feature requests, help requests, etc. Developer Group Peripheral Developer: irregularly contribute Central Developer: regularly contribute Core Developer: extensively contribute, manage CVS releases and coordinate peripheral developers and central developers. Project Leader: guide the vision and direction of the project.
7 OSS DEVELOPMENT COMMUNITY Project Leaders Active Users Core Developers Co-developers
8 OSS SOCIAL NETWORK Two Entities: developer (community member) & project Project-Developer Network (bipartite graph) Nodes: project, developer Edges: developers in a project are connected to that project
9 PROJECT-DEVELOPER NETWORK
10 OSS SOCIAL NETWORK (cont.) Project Network (unipartite graph) Node: project Edge: two projects are connected if there is a developer participating on both. Developer Network (unipartite graph) Node: developer Edge: developers participating in a common project are connected to each other
11 OSS DEVELOPER NETWORK
12 DATA COLLECTION & EXTRACTION Data Source: SourceForge 2003 database dump, plus earlier web-mined data SourgeForge.net approaching: 100,000 registered projects 1,000,000 registered users Project Leaders & Core Developers Identification explicitly stored in data dump Co-developers & Active Users Identification indirectly available Forums: ask and/or answered questions Artifacts: bug reports, patch submissions, feature requests, help requests, etc.
13 SUBSET OF THE DATABASE SCHEMA
14 Analysis: SourceForge.net Level
15 MEMBER DISTRIBUTION Project Size Project Count Project Leaders Core Developers Codevelopers Active Users! % 20.6% 19.8% 11.8% <! % 5.7% 60.3% 31.7% < % 2.7% 55.8% 40.6%
16 TOPOLOGICAL PROPERTIES Degree Distribution The total number of links connected to a node Relative frequency of each index value Power law distribution Diameter The maximum longest shortest-path The average longest shortest path Cluster A social network consists of connected nodes Clustering Coefficient The ratio of the number of links to the total possible number of links among its neighbors
17 FOUR SETS Subset A = { project leaders} Subset B = {project leaders} U {core developers} Subset C = {project leaders} U {core developers} U {co-developers} Subset D = {project leaders} U {core developers} U {co-developers} U {active users} Prior work looked at Subset B only
18 DEVELOPER DEGREE DISTRIBUTION Subset A Subset B
19 DEVELOPER DEGREE DISTRIBUTION Subset C Subset D
20 REGRESSION PARAMETERS A B C D Project- Network R-squared Slope Developer -Network R-squared Slope
21 DEGREE DISTRIBUTION Prior research suggests that OSS community network is a scale free network growing by two rules: Sequential addition of new developers Preferential attachment Related research on mechanisms that could plausibly generate observed topologies All degree distributions are skewed Most community members participate on 1 project Linchpin members join multiple projects The largest number of communities a member joins increases from 29 (A) to 95 (D)
22 Diameter length Subset A N/A DIAMETER Subset B 10.2 out of members Subset C 2.7 out of members Subset D 2.7 out of members Project leader network is highly disconnected The degree of separation is significantly decreased with the participation of codevelopers and active users.
23 CLUSTERS & CLUSTERING COEFFICIENT Property A B C D Largest cluster nd largest cluster # of clusters Clustering coefficient All linked projects form a project cluster The largest cluster is much bigger than the 2 nd largest cluster The largest cluster grows from A to D High clustering coefficient on all 4 subsets because members are fully collected in each project
24 A PROJECT CLUSTER
25 DISCUSSION & FUTURE WORK Small World Phenomenon Small diameter & high clustering coefficient Scale Free Power law distribution Effect of Co-developers & Active Users Agent-based Simulation More OSS Web Sites: Apache, Bio.org, Savannah, etc.
26 THANK YOU
27
Open Source Software Developer and Project Networks
Open Source Software Developer and Project Networks Matthew Van Antwerp and Greg Madey University of Notre Dame {mvanantw,gmadey}@cse.nd.edu Abstract. This paper outlines complex network concepts and how
ModelingandSimulationofthe OpenSourceSoftware Community
ModelingandSimulationofthe OpenSourceSoftware Community Yongqin Gao, GregMadey Departmentof ComputerScience and Engineering University ofnotre Dame ygao,[email protected] Vince Freeh Department of ComputerScience
Computational Discovery in Evolving Complex Networks
Computational Discovery in Evolving Complex Networks Yongqin Gao Advisor: Greg Madey Yongqin Gao December 2006 Dissertation Defense Outline Background Methodology for Computational Discovery Problem Domain
THE OPEN SOURCE SOFTWARE DEVELOPMENT PHENOMENON: AN ANALYSIS BASED ON SOCIAL NETWORK THEORY
THE OPEN SOURCE SOFTWARE DEVELOPMENT PHENOMENON: AN ANALYSIS BASED ON SOCIAL NETWORK THEORY Greg Madey Computer Science & Engineering University of Notre Dame [email protected] Vincent Freeh Computer Science
COMPUTATIONAL DISCOVERY IN EVOLVING COMPLEX NETWORKS. A Dissertation. Submitted to the Graduate School. of the University of Notre Dame
COMPUTATIONAL DISCOVERY IN EVOLVING COMPLEX NETWORKS A Dissertation Submitted to the Graduate School of the University of Notre Dame in Partial Fulfillment of the Requirements for the Degree of Doctor
Discovering Determinants of Project Participation in an Open Source Social Network
Association for Information Systems AIS Electronic Library (AISeL) ICIS 2009 Proceedings International Conference on Information Systems (ICIS) 1-1-2009 Discovering Determinants of Project Participation
On Clusters in Open Source Ecosystems
On Clusters in Open Source Ecosystems Shaheen Syed and Slinger Jansen Department of Information and Computer Sciences, Utrecht University Princetonplein 5, 3508 TB Utrecht, the Netherlands {s.a.s.syed,
A Framework to Represent Antecedents of User Interest in. Open-Source Software Projects
542 Business Transformation through Innovation and Knowledge Management: An Academic Perspective A Framework to Represent Antecedents of User Interest in Open-Source Software Projects 1 Amir Hossein Ghapanchi,
A Social Network Approach to Free/Open Source Software Simulation
A Social Network Approach to Free/Open Source Software Simulation Patrick Wagstrom 1, Jim Herbsleb 2, Kathleen Carley 2 [email protected], [email protected], [email protected] 1 Department of
Modeling and Simulating Free/Open Source Software Development Processes
Modeling and Simulating Free/Open Source Software Development Processes Walt Scacchi Institute for Software Research School of Information and Computer Science University of California, Irvine Irvine,
Bug Fixing Process Analysis using Program Slicing Techniques
Bug Fixing Process Analysis using Program Slicing Techniques Raula Gaikovina Kula and Hajimu Iida Graduate School of Information Science, Nara Institute of Science and Technology Takayamacho 8916-5, Ikoma,
Social Network Dynamics for Open Source Software Projects
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2006 Proceedings Americas Conference on Information Systems (AMCIS) 12-31-2006 Social Network Dynamics for Open Source Software
Introduction to Networks and Business Intelligence
Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological
Graph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing
Mining Email Archives and Simulating the Dynamics of Open-Source Project Developer Networks
Mining Email Archives and Simulating the Dynamics of Open-Source Project Developer Networks Liguo Yu 1, Srini Ramaswamy 2, and Chuanlei Zhang 2 1 Computer Science and Informatics, Indaian University South
Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?
What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graph-based representations Representing a problem
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
A discussion of Statistical Mechanics of Complex Networks P. Part I
A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts
Scientific Collaboration Networks in China s System Engineering Subject
, pp.31-40 http://dx.doi.org/10.14257/ijunesst.2013.6.6.04 Scientific Collaboration Networks in China s System Engineering Subject Sen Wu 1, Jiaye Wang 1,*, Xiaodong Feng 1 and Dan Lu 1 1 Dongling School
Master Thesis. The influence of social network structure on the chance of success of Open Source software project communities
Master Thesis The influence of social network structure on the chance of success of Open Source software project communities Name Email Bart Vreugdenhil [email protected] University MSc Program RSM
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,
Proposed Application of Data Mining Techniques for Clustering Software Projects
Proposed Application of Data Mining Techniques for Clustering Software Projects HENRIQUE RIBEIRO REZENDE 1 AHMED ALI ABDALLA ESMIN 2 UFLA - Federal University of Lavras DCC - Department of Computer Science
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
MINFS544: Business Network Data Analytics and Applications
MINFS544: Business Network Data Analytics and Applications March 30 th, 2015 Daning Hu, Ph.D., Department of Informatics University of Zurich F Schweitzer et al. Science 2009 Stop Contagious Failures in
How the FLOSS Research Community Uses Email Archives
How the FLOSS Research Community Uses Email Archives Megan Squire Elon University, USA ABSTRACT Artifacts of the software development process, such as source code or emails between developers, are a frequent
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Oriented Network Evolution Mechanism for Online Communities Caihong Sun and Xiaoping Yang School of Information, Renmin University of China, Beijing 100872, P.R. China {chsun.vang> @ruc.edu.cn
Complex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Emergence of New Project Teams from Open Source Software Developer Networks: Impact of Prior Collaboration Ties
Emergence of New Project Teams from Open Source Software Developer Networks: Impact of Prior Collaboration Ties Jungpil Hahn, Jae Yun Moon, and Chen Zhang Last Revised: July 15, 2006 ABSTRACT Software
PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS
Vol. 36 (2005) ACTA PHYSICA POLONICA B No 5 PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS Julian Sienkiewicz and Janusz
DESIGN FOR QUALITY: THE CASE OF OPEN SOURCE SOFTWARE DEVELOPMENT
DESIGN FOR QUALITY: THE CASE OF OPEN SOURCE SOFTWARE DEVELOPMENT Caryn A. Conley Leonard N. Stern School of Business, New York University, New York, NY 10012 [email protected] WORK IN PROGRESS DO NOT
Open Source ERP for SMEs
Open Source ERP for SMEs Hyoseob Kim 1, Cornelia Boldyreff 2 1 Dongbu Information Technology Co., Ltd, 154-17 Samseong1-Dong, Kangnam-Ku, Seoul, 135-879, Korea, [email protected] 2 Dept. of Computing
Bug management in open source projects
Bug management in open source projects Thomas Basilien, Roni Kokkonen & Iikka Manninen Abstract 1. Introduction 2. Bug management in general 2.1 Bug management in proprietary projects 2.2 Project management
The Impact of Release Management and Quality Improvement in Open Source Software Project Management
Applied Mathematical Sciences, Vol. 6, 2012, no. 62, 3051-3056 The Impact of Release Management and Quality Improvement in Open Source Software Project Management N. Arulkumar 1 and S. Chandra Kumramangalam
Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003
Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Effects of node buffer and capacity on network traffic
Chin. Phys. B Vol. 21, No. 9 (212) 9892 Effects of node buffer and capacity on network traffic Ling Xiang( 凌 翔 ) a), Hu Mao-Bin( 胡 茂 彬 ) b), and Ding Jian-Xun( 丁 建 勋 ) a) a) School of Transportation Engineering,
Course Syllabus. BIA658 Social Network Analytics Fall, 2013
Course Syllabus BIA658 Social Network Analytics Fall, 2013 Instructor Yasuaki Sakamoto, Assistant Professor Office: Babbio 632 Office hours: By appointment [email protected] Course Description This
Social Network Mining
Social Network Mining Data Mining November 11, 2013 Frank Takes ([email protected]) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics
Graph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
The mathematics of networks
The mathematics of networks M. E. J. Newman Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109 1040 In much of economic theory it is assumed that economic agents interact,
