PAID VS. VOLUNTEER WORK IN OPEN SOURCE

Size: px
Start display at page:

Download "PAID VS. VOLUNTEER WORK IN OPEN SOURCE"

Transcription

1 th Hawaii International Conference on System Science PAID VS. VOLUNTEER WORK IN OPEN SOURCE Dirk Riehle Computer Science Department Friedrich-Alexander University Erlangen-Nürnberg Martensstr. 3, Erlangen, Germany Philipp Riemer Computer Science Department Friedrich-Alexander University Erlangen-Nürnberg Martensstr. 3, Erlangen, Germany Carsten Kolassa Software Engineering RWTH Aachen University Ahornstr. 55, Aachen, Germany Michael Schmidt Mathematics Department Friedrich-Alexander University Erlangen-Nürnberg Martensstr. 3, Erlangen, Germany ABSTRACT Many open source projects have long become commercial. This paper shows just how much of open source software development is paid work and how much has remained volunteer work. Using a conservative approach, we find that about 50% of all open source software development has been paid work for many years now and that many small projects are fully paid for by companies. However, we also find that any non-trivial project balances the amount of paid developer with volunteer work, and we suggest that the ratio of volunteer to paid work can serve as an indicator for the health of open source projects and aid the management of the respective communities. Index Terms Open source software, empirical software engineering, volunteer open source, paid open source. 1. INTRODUCTION Open source software development has long become an important commercial activity. Corbet et al.'s analyses of recent Linux kernel releases show that a large part of this work is being carried out by developers using their companies' addresses when submitting code [6], implying that they are paid for the work by their employers. However, while commercial contributions to the Linux kernel have been widely acknowledged, little is known about the overall commercial contribution to open source projects in the form of paid rather than volunteer development work. In this paper we show that open source has both strong and broad commercial support by companies paying developers to perform open source software development. Also, we suggest that understanding the relationship between paid and volunteer work in open source projects will aid project leaders in steering their community. This work makes the following contributions: It shows empirically that open source has strong commercial support across a broad range of projects, it shows the possible range of healthy paid-tovolunteer work ratios to help project steering, it presents measurements of how much open source work is being performed during working time, how open source working time work has changed over the years, the percentage of open source programmers that are paid programmers, and, the distribution of volunteer vs. paid work across open source projects, using the Linux kernel specifically and a large sample of active open source projects (>5.000 projects). The paper is organized as follows: In Section 2, we describe our research approach and define key terms. In Section 3 we present the main empirical results. In Section 4 we discuss our findings as well as their limitations. In Section 5 we review related work in and Section 6 we present final conclusions /14 $ IEEE DOI /HICSS [RRK+14] D. Riehle, P. Riemer, C. Kolassa, M. Schmidt: Paid vs. Volunteer Work in Open Source th Hawaii International Conference on System Sciences, pp , th Hawaii International Conference on System Sciences,

2 2. RESEARCH APPROACH 2.1 Definitions We use the following definitions: An author of some piece of code is the creator of the code, i.e., the original developer. A commit is the process of putting some piece of code into a code repository. Code repository is used as a synonym for configuration management system. A committer is a software developer who has the necessary rights to commit to a code repository. Maintainer is a synonym for a committer (as used in Linux kernel development). A patch is a code contribution submitted to a committer for inclusion in the project. A code contribution by an author indicates when the code was written and a commit by a committer indicates when the committer integrated the code into the code base. Author and committer are roles. Typically, in a two-step process, an author submits a patch and a committer integrates the patch into the main code base. An author, who is also a committer, can do both of these steps as one. The common case is that an author is not a committer, hence we separate both roles in our analysis of the Linux kernel. Moreover, we define the following time-related terms using common governmental regulations in Western countries: Working time is the time from 9am to 5pm local time, Mondays to Fridays. Spare time is all the time that is not working time. Consequently, working time and spare time depend on the time zone of the developer. 2.2 Data Sources We use two data sources: The Linux kernel and the Ohloh projects. Our analysis of the Linux kernel development work is based on its public configuration management data found at Kernel.org [5]. Since 2005, it has been managed using Git, which in contrast to older configuration management systems lets us distinguish the authors of some code, i.e., the original developer, from the committer of the code, who integrated it into the kernel code base. For this work, we downloaded the whole configuration management history from 2005 to Our analysis of open source projects is based on a 2008 snapshot of the Ohloh open source project database [20]. Using Daffara's definition of "active projects" [7], we find that our database snapshot contains 5,117 active open source projects. Daffara estimates that there were about 18,000 active open source projects in the world by August 2007, so our sample represents about 30% of the total active project population at that time. While not wholly representative for open source at its time, it is close nevertheless. 2.3 Data Quality Since 2005, the Linux kernel configuration management data (using Git) has been providing more precise information than traditional systems (CVS, svn). We can distinguish between authors and committers, and we can assess the exact time of a commit, whether a code contribution or code integration. The 2008 Ohloh database snapshot is not quite as detailed. Collecting 8,705,118 commits from more than 9,192 projects, it does not directly provide all relevant data. Due to the diversity of configuration management systems used in open source, Ohloh cannot distinguish between an author and a committer; thus, we only have committer data at hand. Another consequence of the variety of configuration management systems is that Ohloh stores all commit timestamps using UTC, ignoring the original time zone of a developer. However, for this work, we need the local time of a commit and hence the time zone. We address this problem by using location data that Ohloh provides to determine the timezone of individual developers. By hand, we identified 580 committers (out of 45,870 distinct committer ids), constituting 1.3% of the committer population. Those identified committers performed 646,705 of the 8 million commits, totaling about 8% of the work being performed. We call the set of identified committers the known committer set or the known committers, in short. While we can argue that the original Ohloh data set is close to being representative of open source, the reduced number of known committers may not be. For one, we identified mostly committers of above average activity (1.3% of the population performing 8% of the work), so there is some bias. Thus, we need to understand whether this bias is relevant for the analysis presented in this paper. For this, we ranked committers by number of commits and then binned the resulting committer sequence into 26 different bins. The 26 bins of known committers all have close-to-equal total numbers of commits and were suggested by R, the statistical analysis tool and environment we are using. Thus, the amount of work in each bin is about the same, but was performed by very different numbers of people. 3287

3 Figure 1. Paid-work-percentage of total work for 26 equal-commit-numbers committer bins (higher bin number means more committers in bin) Figure 2. Distribution of percentage of total commits over time zones (light gray = known committers, dark gray = extended committers) Assuming paid work to be work performed Mon- Fri from 9am-5pm (see below for definition and discussion), we can calculate the percentage of total work performed that is paid work. Figure 1 shows this paidwork-percentage (of total work) by committer bin. With a null hypothesis that no trend (bias) is apparent (and alternative hypotheses that there is a bias), using a t-test and assuming a normal distribution, we cannot reject the null hypothesis at the 95% confidence level. We therefore have no reason to assume that reducing the overall committer set to the known committer set introduces a bias that impacts our analysis (but can also not exclude it). In addition to the known committers set, we define an extended committers set or extended committers, in short. The extended committers set comprises all committers in the original Ohloh database, where the committer timezone is either known or assumed. The assumed timezone is determined using the following heuristic: We first condense all commits of a committer in the database into a single week. For all committers where we do not know the time zone, we match their week on an hourly basis with the weeks of the known committers set. Using a least-squares approach, we identify the time zone that has a minimum difference to the established data. This provides us with the most probable time zone for the not-known committers so that we can determine the local time for each commit (ignoring work while traveling). The known committers set provides a sharp picture of time-zone-based weekly work activities, including paid and volunteer work, while the extended committers set provides a richer (more data), but more fuzzy picture of the weekly work activities of committers. In the following, we present both the known and extended committer data side-by-side. 2.4 Data Interpretation We would like to understand how much work in open source is paid work and how it is distributed across a wide range of open source projects. As introduced above, we assume that work performed during regular working hours (weekdays, 9am- 5pm) is work paid for by companies or paid for through self-sponsorship of the developer. One possible objection to this is that not all cultures have working weeks of Mon-Fri, 9am-5pm. For example, some cultures work on Saturdays. Figure 2 shows the distribution of commits over the different timezones of this planet. All countries, mostly Islamic countries, that work on Saturdays, show very little open source activity (an interesting fact in itself). Thus, we feel safe to proceed with our definition of weekend and weekdays. Also, one may argue that many people have working hours outside of 9am-5pm and that we are too conservative in our estimate then. For one, we'd rather estimate paid work conservatively, but we also shifted working hour definitions around, to 8am-4pm, 10am- 6pm, 8am-8pm, etc. with no significant change in the results. Thus, we decided to stick to the most common workday definition. 3. PAID WORK IN OPEN SOURCE 3.1 Total Work during Working Time First, we investigate how much time is being spent on open source during regular working hours. Figures 3-6 show the workweek on an hourly base for the years for the Linux kernel and for the years for the Ohloh data. 3288

4 Figure 3. Number of commits by authors (when code is developed) per hour counted over all weeks for the Linux Kernel Figure 4. Number of commits by committers (when code is integrated) per hour counted over all weeks for the Linux Kernel Figure 5. Number of commits by known committers per hour counted over all weeks for the Ohloh projects From perusing Figures 3-6, we can gain a number of insights already. The most obvious insights are that there is a clear difference between work days and weekend days: about twice as much work is being done on a work day as is on a weekend day; and that most work is being performed during regular waking and working hours, i.e., from 9am to 5pm, even on the weekends and that developers take lunch and dinner breaks with work picking up again for a few hours after dinner before quieting down for the night. With a working time definition of Mon-Fri, 9am- 5pm, Table 1 shows the percentages of all code contributions respectively all commits made during working time for the Linux kernel and the Ohloh projects: Figure 6. Number of commits by extended committers per hour counted over all weeks for the Ohloh projects About 50% of all work contributed to open source software projects has been provided Monday to Friday, between 9am and 5pm. 3.2 Trends in Work during Working Time Next, we look at how the working time work spent on open source has changed over the years. Figures 7-10 show how the percentage of commits made during working time changed over the years. In Figures 7-10, each data point is the percentage of working time work for the given week. The moving average is a LOESS curve, and the grayed-out space around it indicates the 95% confidence interval for a data point. The widening of that space at the boundaries of the graph is an artifact of not using additional data beyond those boundaries. 3289

5 Table 1. Percentage of work performed during working time (9am-5pm, Mon-Fri) for Linux ( ) and the Ohloh projects ( ) Linux Kernel Ohloh Projects author committer known committer extended committers Percentage of total commits made during working time 45.00% 51.36% 47.3% (min. 28,2%, max. 58,8%) 55.4% (min. 36.5%, max. 59.5%) The Linux kernel data in Figures 7-8 shows significant fluctuations, which reflect the rapid release cycle of the project. A release is performed about every 80 days [5]. The process is highly regulated with defined time periods of increasing or decreasing activity. The activity is highest during the two week merge window, after which stabilization kicks in and activity decreases rapidly. Thus, there is no apparent annual schedule. The Ohloh data, a more diverse data set, also shows some fluctuations, but much less so. The main annual dips from 2002 on onwards occur during Christmas week, where it seems naturally to have a drop in working time work relative to spare time work (cf. Section 2 for the dominance in open source activity by Western cultures). Looking at the Linux kernel data in Fig. 7-8 again, we can see a clear upward trend for both authors and committers. Thus, from around 2007 through to 2010 increasing amounts of working time work in relation to spare time work was being spent on the Linux kernel. Starting 2010, this growth largely plateaued. In contrast to the Linux kernel data, the Ohloh data set shows a straight line. Using a likelihood ratio test (F-Test), with a straight line as the null hypothesis, we have to reject any other hypothesis (at a confidence level of 98%), and conclude that no growth occurred. The percentage of working time work spent on the Ohloh projects has remained flat. However, during these time frames, the total underlying data sets have grown substantially. The Linux kernel is growing at a polynomial rate [13] [23] while the combined Ohloh project data, and presumably all of open source, is growing at a near-exponential rate [8]. Thus, for the Ohloh project data, the year 2000 data is much more sparse than the year 2007 data. Still, as we just showed, no working time growth occurred in our open source project data, and the working time percentage of total work performed stayed flat. With growing market share [25] the economic significance of the Linux kernel has only been increasing, so it is not surprising to see growth in working time work being spent on it. What is surprising is that open source in total (using the Ohloh data as a proxy), which has been growing near-exponentially, has maintained a constant ratio of working time to spare time work. Thus, for every project with increasing economic significance that received more paid development work, new projects have been started with less working time engagement, but possibly growing into it. It is too early to speculate about a stable state of open source in terms of a stable ratio of paid working time work to volunteer spare time work contributions, but open source appears to have reached at least an intermediate stable state. Thus, even with underlying near-exponential growth, we expect this ratio to remain stable for now. 3.3 Developer Classification While overall weekly working times and working time trends are interesting, we also would like to know how many developers are earning their living by performing open source software development. Thus, we now look at individual developers and how much of their code is written during working time, i.e., to what extent they are being paid for their work. Fig show the distribution of developers over the percentage of work that is paid work (working time work) for the Linux kernel and Ohloh projects, respectively. It has been counted over all years. Please note that the y-axis is log-scale and that we are talking about contributors now, not just contributions. Only the extended committer data is shown, because the known set was too small to provide meaningful data for this particular discussion. Both the Linux kernel and Ohloh projects are dominated by the extremes: Developers doing all their work during spare time and developers doing all their work during working time. Table 2 shows the dominance of these extremes. Here, we define paid developers to be those who performed 95% or more of their commits during working time, and volunteer developers to be those who performed 95% and more of their commits during spare time, outside the weekdays 9am-5pm time frame. Thus, at least 23.15% of all authors working on the Linux kernel, totaling 1,807 developers, are paid for their work % of all committers working on the Linux kernel, totaling 37 developers, are paid for their work as well. Given the economic significance of the Linux kernel, one would expect more committers (maintainers) to be paid for their work than authors. A possible explanation for not confirming this assumption is the long-term engagement of committers that 3290

6 Figure 7. Data and trend line for percentage of commits made by authors to the Linux Kernel during working time for a given week Figure 8. Data and trend line for percentage of commits made by committers to the Linux Kernel during working time for a given week Figure 9. Data and trend line for percentage of commits made to the Ohloh projects during working time for a given week, known committers Figure 10. Data and trend line for percentage of commits made to the Ohloh projects during working time for a given week, extended committers Figure 11. Number of authors and committers with a given average percentage of paid work for the years for the Linux Kernel Figure 12. Number of committers with a given average percentage of paid work for the years for the Ohloh projects, extended committers 3291

7 Table 2. Distribution of volunteer (spare time) to paid (working time) developers, binned, over all years Volunteer (Spare Time) Work Mixed Paid (Working Time) Work Working Time Work % 0% 0.01%-5% 5.01%-94.99% 95%-99.99% 100% Linux Kernel author 33.06% 0.35% 43.45% 0.17% 22.98% committer 11.59% 3.05% 74.09% 1.52% 9.76% Ohloh Projects known committers 2.41% 1.21% 95.69% 0.00% 0.69% extended committers 7.04% 0.4% 74.58% 0.41% 17.56% motivates many to keep working outside traditional working time boundaries, which makes them fall outside our conservative definition of paid work. As to the Ohloh projects, 17.97% of all extended committers, totaling 8,244 developers, are being paid for their work. Common to these paid developers is that they do not work on open source projects in their spare time, i.e., fall outside the boundaries of the traditional open source enthusiast and volunteer categories. 3.4 Project Classification Finally, not only are we interested in what percentage of developers are being paid to work on open source, we also would like to know how they are allocated to projects. It is fair to assume that some projects get more commercial attention than others. Thus, we investigate which projects receive this attention. The Linux kernel project is a single (albeit large) project, so in this Section we are looking at the Ohloh projects only. We find that there is a large number of small (1-2 developers) projects with a long-tail distribution of size that are fully paid for by companies: All developers, frequently just one, are making their contributions only during working time. The top 5 smallest projects in our sample, fully paid for, are called subtle, WebPA, ShARPE, gst-openmax, and phpesp. These smallest projects have low commit numbers (in the 100's only, sometimes less). Inspection by hand shows that many times, code is being committed in large chunks. This is uncommon in traditional open source software development, where the most frequent commit size is one line of code [16]. Thus, it is safe to assume that these small but still active projects are being developed in-house and are being provided in a snapshot-style to the public at appropriate times. The largest projects in our sample maintain a paidfor developer percentage in the 10-20% range. Five example projects of this size are GNOME, Netbeans IDE, Eclipse Platform, KDE, and KVM. These are wellknown open source projects that are being developed in an open collaborative style, and the paid developer population of these projects can serve as an indicator of healthy public open source projects. 4. DISCUSSION OF FINDINGS In this work, we are making the assumption that work performed during working time hours (Mon-Fri, 9am-5pm, in the resp. local time zone) is paid work. This time frame has been defined as working time by most Western countries and thus we feel justified in considering it paid work. Even students typically have to go to class during that time and spending it on open source development implies economic self-sponsorship as it delays graduation and hence borrows against future income. The time of a commit is not the actual time the work was performed; it is the point of time when it is committed (made public). Thus, the actual work is performed right up to that point in time. In other work we show that the median time between two commits of the same open source developer is about 100min [15]. When we ran the analyses with shifted working time frames, we found little difference to the numbers from the 9am-5pm time frame and decided to ignore this imprecision. We believe it has no effect on the results. There is a cultural bias implied by these working hours, as some countries work on other days than Monday to Friday. Our analysis of contributions by time zone (Section 2) demonstrates that open source software development is strongly dominated by Western societies, as witnessed by a sharp drop in activities around Christian holidays like Easter or Christmas. One might argue that the Ohloh data is getting old. One advantage of the Ohloh data is that it draws broadly on the total population of available open source projects. It was seeded by the original providers of the Ohloh service with the most popular open source projects (by Yahoo search engine ranking) and has since been maintained by hand by the respective providers of open source projects. Unlike other data sources, the Ohloh data it is much less biased to any 3292

8 particular subgroup of open source projects. If there is a bias, it is a bias towards active well-working projects, which happen to be those we are interested in. Still, it would be desirable to have new data. Unfortunately, there is no alternative at present: No public access to the new Ohloh data is available on the level of detail required, other newer data sources are substantially more biased towards particular subgroups of open source, and it is prohibitively expensive for a research group to build a comprehensive and representative data set for all of open source (which is why nobody has done it yet). Consequently, this is the best research we can do for now. Our definition of "paid developer" is highly conservative. It is a person who does 95% or more of their work during working time hours only. It represents a regular developer with a regular life-style and presumably no interest in open source software development beyond their work. There are important and common exceptions to this type of person: Many paid developers are open source enthusiasts and keep working outside regular working hours. The software industry is by and large not unionized and tends to ignore regular working hours. Consequently, our estimate presents a lower boundary for the number of paid developers. 5. RELATED WORK Since 2008, Corbet et al. have been providing statistics about the Linux kernel development annually [6]. They investigate topics like evolution of the release frequency as well as number of changes introduced per release. In addition, they provide a list of the most-active companies supporting the development of the Linux kernel and list the percentage of commits performed by each of them. Similar to the work presented in this paper, the reports distinguish between authors and committers. Corbet et al. consider a contribution commercial, if it is made using a company's address to identify the contributor. They also maintain a separate mapping list for regular contributors that allows tracking a person even if he or she changes the employer. They find that at least 75% of all contributions since 2005 can be assigned to company employees. A "Report on the International Status of Open Source Software 2010" finds that the U.S.A., Australia, and the West European countries lead the development and adoption of open source software [19]. This is in line with our observation that weekly work as well as holiday drops line up well with Western cultural work patterns. Godfrey and Tu studied the Linux kernel growth in 2000 [13], and Robles et al., following up on Godrey and Tu, studied the Linux kernel growth in 2005 [23]. Robles et al. provide a good summary about what analyses were made in the area of evolution research of open source software projects. They study 123 stable and 457 development releases up to April 2005 (the point in time where the data for our analysis starts) and, by also counting the number of uncommented lines of code, confirm a super-linear growth rate, that is even more significant than already shown in the preceding paper. At the same time, the authors point out that not all work in a project is programming, but that also many tasks, such as testing, are done outside of the code repository and thus are hard to measure. Independently of that paper, a study by Succi et al. about the growth in "libre" (open source) software systems, confirms this super-linearity for the Linux kernel [24]. Both the proceedings of ICSE (the international conference on software engineering) and MSR (a conference on mining software repositories) as well as other conferences and journals by now provide extensive literature on empirical analyses of open source and closed source projects. An example classic open source studies is [17] by Mockus et al. Topics of interest range from bug prediction [2] [4] [18] [26] through engineering practices [1] [21] [22], social structures and community management [3] [14], software evolution [11] [12], all the way to issues of global collaboration and distributed development [2]. Research methods itself, mostly on data quality issues, are also analyzed [9] [27]. A few papers compare open source with commercial software development [1]. However, to the best of our knowledge none of this work addresses the issue of paid vs. volunteer work as discussed in this paper. We did not find any research that analyzes the commercialization of open source software projects by investigating when what work was done. A reason might be that modern version control systems, such as Git and Mercurial, have allowed us to access commit history data in detail, including time zone information, only recently. Older systems, such as CVS or svn only store a single UTC time stamp per commit. 6. CONCLUSIONS This paper analyzes to what extent open source software development has become commercial paid-for software development. A paid contribution is defined as having been contributed during regular (Western) working hours, Mon-Fri, 9am-5pm. By studying the Linux kernel from 2005 to 2011 and the Ohloh 3293

9 projects, a large set of more than 5,000 active open source projects, from 2000 to 2007, we find that about 50% of all contributions to projects in our sample population have been paid work. Moreover, no change in this percentage has occurred for the Ohloh projects, suggesting that the ratio of paid-to-volunteer work is stable in open source for now. Going one step further, we find that 10-20% of the developers engaged in our sample projects perform development work only during working hours, suggesting that they are fully paid for their work. Unlike traditional volunteers, they perform no work on our sample projects outside this time-frame, making our estimate a conservative one. We also find that many small projects are fully paid for, and that larger projects have a healthy mixture of paid and volunteer work in the 10-20% range as well. In future work, we intend to analyze the relationship between these categories of developers, company engagement, and project success. REFERENCES [1] C. Bird, A. Gourley, P. Devanbu "Detecting Patch Submission and Acceptance in OSS Projects," in Proceedings of the Fourth International Workshop on Mining Software Repositories (MSR '07), pp. 26. [2] C. Bird, N. Nagappan, P. Devanbu, H. Gall, B. Murphy, "Does distributed development affect software quality?: An empirical case study of Windows Vista," in Communications of the ACM, vol. 52, no. 8, pp [3] C. Bird, D. Pattison, R. D'Souza, V. Filkov, P. Devanbu, "Latent social structure in open source projects," in SIGSOFT '08/FSE-16 Proceedings, 2008, pp [4] E. Capra, "An Empirical Study on the Relationship Between Software Design Quality, Development Effort and Governance in Open Source Projects," in IEEE Transactions on Software Engineering, vol. 34, no. 6 (2008), pp [5] J. Corbet, "How to participate in the Linux community," 2008, at [6] J. Corbet, G. Kroah-Hartman, and A. McPherson, "Linux Kernel Development How fast it is going, who is doing it, what they are doing, and who is sponsoring it?", 2012, from [7] C. Daffara, "Estimating the number of active and stable FLOSS projects", 2007, from (Archived at [8] A. Deshpande and D. Riehle, "The total growth of open source," in Proceedings of the fourth Conference on Open Source Systems (OSS 2008), Springer Verlag, 2008, pp [9] M. Fischer, M. Pinzger, H. Gall, "Populating a release history database from version control and bug tracking systems," in Proceedings of the International Conference on Software Maintenance (ICSM 2003), pp [10] FLOSSmole. Collaborative collection and analysis of free/libre/open source project data, 2012, from (Archived at [11] B. Fluri, M. Wursch, H. C. Gall, "Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes," in Proceedings of the 14th Working Conference on Reverse Engineering (WCRE 2007), pp [12] H. C. Gall, M. Lanza, "Software evolution: analysis and visualization," in Proceedings of the 28th International Conference on Software Engineering (ICSE 2006), pp [13] M. W. Godfrey and Q. Tu, "Evolution in open source software: a case study," in Proceedings of the International Conference on Software Maintenance (ICSM), 2000, pp [14] V. K. Gurbani, A. Garvert, J. D. Herbsleb, "A case study of open source tools and practices in a commercial setting," in Proceedings of the Fifth Workshop on Open Source Software Engineering, pp [15] C. Kolassa, D. Riehle, M.A. Salim. "The empirical commit frequency distribution of open source projects." In Proceedings of the 2013 International Symposium on Open Collaboration (WikiSym + OpenSym 2013), ACM, 2013, paper C4. [16] C. Kolassa, D. Riehle, M.A. Salim. A Model of the Commit Size Distribution of Open Source. In Proceedings of the 39th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2013), LNCS Springer Verlag, 2013, pp [17] A. Mockus, R. T. Fielding, and J. D. Herbsleb, "A case study of open source software development: The Apache server," in ICSE 2000 Proceedings, 2000, pp [18] N. Nagappan, T. Ball, "Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study," in First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), pp [19] National Open Source Software Observatory, "Report on the International Status of Open Source Software," 2010 (Archived at 69t93TcPq). [20] Ohloh, the open source network, 2012, online at (Archived at webcitation.org/69t9bylcw). [21] J. W. Paulson, G. Succi, A. Eberlein, "An empirical study of open-source and closed-source software products," in Transactions on Software Engineering, vol. 30, no. 4 (April 2004), pp [22] P. C. Rigby, D. M. German, M.-A. Storey, "Open source software peer review practices: a case study of the apache server," in Proceedings of the 30th 3294

10 International Conference on Software Engineering (ICSE 2008), IEEE, pp [23] G. Robles, J. J. Amor, J. M. Gonzalez-Barahona, and I. Herraiz, "Evolution and growth in Large libre software projects," in Proceedings of the eigth international workshop on Principles of Software Evolution, 2005, pp [24] G. Succi, J. Paulson, and A. Eberlein, "Preliminary results from an empirical study on the growth of open source and commercial software products," in EDSER- 3 Workshop, co-located with ICSE, [25] S. J. Vaughan-Nichols, Linux servers keep growing, Windows and Unix keep shrinking. ZDnet. (Archived at [26] T. Zimmermann, N. Nagappan, "Predicting defects using network analysis on dependency graphs," in Proceedings of the 30th International Conference on Software Engineering (ICSE 2008), pp [27] T. Zimmermann, P. Weißgerber, A. Zeller, "Mining version histories to guide software changes," in Proceedings of the 26th International Conference on Software Engineering (ICSE 2004), pp

PAID VS. VOLUNTEER WORK IN OPEN SOURCE

PAID VS. VOLUNTEER WORK IN OPEN SOURCE PAID VS. VOLUNTEER WORK IN OPEN SOURCE Dirk Riehle Philipp Riemer Computer Science Department Friedrich-Alexander University Erlangen-Nürnberg Martensstr. 3, 91058 Erlangen, Germany dirk@riehle.org Computer

More information

The Empirical Commit Frequency Distribution of Open Source Projects

The Empirical Commit Frequency Distribution of Open Source Projects The Empirical Commit Frequency Distribution of Open Source Projects Carsten Kolassa Software Engineering RWTH Aachen University, Germany Carsten@Kolassa.de Dirk Riehle Friedrich-Alexander-University Erlangen-Nürnberg,

More information

The Total Growth of Open Source

The Total Growth of Open Source Reference: In Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008, to appear. The Total Growth of Open Source Amit Deshpande SAP Research, SAP Labs LLC 3475 Deer

More information

The Commit Size Distribution of Open Source Software

The Commit Size Distribution of Open Source Software The Commit Size Distribution of Open Source Software Oliver Arafat, Dirk Riehle SAP Research, SAP Labs LLC 3410 Hillview Ave, Palo Alto, CA, 94304 U.S.A. oarafat@gmail.com, dirk@riehle.org Abstract With

More information

Continuous Integration in Open Source Software Development

Continuous Integration in Open Source Software Development Reference: In Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008, to appear. Continuous Integration in Open Source Software Development Amit Deshpande TechSolve,

More information

Responsiveness as a measure for assessing the health of OSS ecosystems

Responsiveness as a measure for assessing the health of OSS ecosystems Responsiveness as a measure for assessing the health of OSS ecosystems Jonas Gamalielsson, Björn Lundell and Brian Lings University of Skövde, Sweden {jonas.gamalielsson, bjorn.lundell, brian.lings}@his.se,

More information

Learning and Researching with Open Source Software

Learning and Researching with Open Source Software Learning and Researching with Open Source Software Minghui Zhou zhmh@pku.edu.cn Associate Professor Peking University Outline A snapshot of Open Source Software (OSS) Learning with OSS Research on OSS

More information

Adrian Schröter. Holger Schackmann RWTH Aachen University Aachen, Germany schackmann@swc.rwthaachen.de. schadr@uvic.ca.

Adrian Schröter. Holger Schackmann RWTH Aachen University Aachen, Germany schackmann@swc.rwthaachen.de. schadr@uvic.ca. Attracting the s Many Eyes: an Exploration of User Involvement in Issue Tracking Lars Grammel University of Victoria Victoria, BC, Canada Lars.Grammel@gmail.com Christoph Treude University of Victoria

More information

Automating the Measurement of Open Source Projects

Automating the Measurement of Open Source Projects Automating the Measurement of Open Source Projects Daniel German Department of Computer Science University of Victoria dmgerman@uvic.ca Audris Mockus Avaya Labs Department of Software Technology Research

More information

Improving Developer Activity Metrics with Issue Tracking Annotations

Improving Developer Activity Metrics with Issue Tracking Annotations Improving Developer Activity s with Issue Tracking Annotations Andrew Meneely, Mackenzie Corcoran, Laurie Williams North Carolina State University {apmeneel, mhcorcor, lawilli3}@ncsu.edu ABSTRACT Understanding

More information

Do Onboarding Programs Work?

Do Onboarding Programs Work? Do Onboarding Programs Work? Adriaan Labuschagne and Reid Holmes School of Computer Science University of Waterloo Waterloo, ON, Canada alabusch,rtholmes@cs.uwaterloo.ca Abstract Open source software systems

More information

(2) Question 2: Size of Mozilla community.

(2) Question 2: Size of Mozilla community. Open Source Software Development Case Studies CSE564 Group 17 HW1 Shihuan Shao, Tuyue Chen, Yongming Zhang 1. Introduction Open source software(oss) development is considered to be a competitive rivalry

More information

Quality Practices and Problems in Free Software Projects

Quality Practices and Problems in Free Software Projects Quality Practices and Problems in Free Software Projects Martin Michlmayr, Francis Hunt, David Probert Centre for Technology Management University of Cambridge Cambridge, CB2 1RX, UK martin@michlmayr.org

More information

Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities

Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities Monika Gupta monikag@iiitd.ac.in PhD Advisor: Dr. Ashish Sureka Industry

More information

Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory

Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory PETER C. RIGBY, Concordia University DANIEL M. GERMAN, LAURA COWEN, and MARGARET-ANNE STOREY, University of Victoria

More information

Mining Email Archives and Simulating the Dynamics of Open-Source Project Developer Networks

Mining Email Archives and Simulating the Dynamics of Open-Source Project Developer Networks Mining Email Archives and Simulating the Dynamics of Open-Source Project Developer Networks Liguo Yu 1, Srini Ramaswamy 2, and Chuanlei Zhang 2 1 Computer Science and Informatics, Indaian University South

More information

Resource Allocation in Open Source Projects: A Profile Based Approach

Resource Allocation in Open Source Projects: A Profile Based Approach Resource Allocation in Open Source Projects: A Profile Based Approach Bachelor of Science Thesis Software Engineering and Management Chayi Fan Stefan Ulvdal Department of Computer Science and Engineering

More information

Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers

Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers Rahul Mishra, Ashish Sureka Indraprastha Institute of Information Technology, Delhi (IIITD) New Delhi {rahulm,

More information

The Evolution of Mobile Apps: An Exploratory Study

The Evolution of Mobile Apps: An Exploratory Study The Evolution of Mobile Apps: An Exploratory Study Jack Zhang, Shikhar Sagar, and Emad Shihab Rochester Institute of Technology Department of Software Engineering Rochester, New York, USA, 14623 {jxz8072,

More information

6 A Statistical Analysis of Defects in Debian and Strategies for Improving Quality in Free Software Projects

6 A Statistical Analysis of Defects in Debian and Strategies for Improving Quality in Free Software Projects The Economics of Open Source Software Development Jürgen Bitzer and Philipp J. H. Schröder (Editors) 2006 Published by Elsevier B.V. 6 A Statistical Analysis of Defects in Debian and Strategies for Improving

More information

Got Issues? Do New Features and Code Improvements Affect Defects?

Got Issues? Do New Features and Code Improvements Affect Defects? Got Issues? Do New Features and Code Improvements Affect Defects? Daryl Posnett dpposnett@ucdavis.edu Abram Hindle ah@softwareprocess.es Prem Devanbu devanbu@ucdavis.edu Abstract There is a perception

More information

Wait For It: Determinants of Pull Request Evaluation Latency on GitHub

Wait For It: Determinants of Pull Request Evaluation Latency on GitHub Wait For It: Determinants of Pull Request Evaluation Latency on GitHub Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu College of Computer, National University of Defense

More information

The Importance of Social Network Structure in the Open Source Software Developer Community

The Importance of Social Network Structure in the Open Source Software Developer Community The Importance of Social Network Structure in the Open Source Software Developer Community Matthew Van Antwerp Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556

More information

Two case studies of Open Source Software Development: Apache and Mozilla

Two case studies of Open Source Software Development: Apache and Mozilla 1 Two case studies of Open Source Software Development: Apache and Mozilla Audris Mockus, Roy Fielding, and James D Herbsleb Presented by Jingyue Li 2 Outline Research questions Research methods Data collection

More information

Are Suggestions of Coupled File Changes Interesting?

Are Suggestions of Coupled File Changes Interesting? Are Suggestions of Coupled File Changes Interesting? Jasmin Ramadani 1, Stefan Wagner 1 1 University of Stuttgart jasmin.ramadani, stefan.wagner@informatik.uni-stuttgart.de Keywords: Abstract: Data Mining,

More information

Open Source Software Development: A Case Study of FreeBSD

Open Source Software Development: A Case Study of FreeBSD Open Source Software Development: A Case Study of FreeBSD Trung Dinh-Trong and James M. Bieman Software Assurance Laboratory Computer Science Department Colorado State University Fort Collins, CO 80523

More information

Evaluating the Evolution of Small Scale Open Source Software Systems

Evaluating the Evolution of Small Scale Open Source Software Systems Evaluating the Evolution of Small Scale Open Source Software Systems Chanchal Kumar Roy and James R. Cordy Queen s University, Kingston, Ontario, Canada K7L3N6 {croy, cordy}@cs.queensu.ca http://www.cs.queensu.ca/~cordy

More information

GITHUB and Continuous Integration in a Software World

GITHUB and Continuous Integration in a Software World Continuous integration in a social-coding world: Empirical evidence from GITHUB Bogdan Vasilescu, Stef van Schuylenburg, Jules Wulms, Alexander Serebrenik, Mark G. J. van den Brand Eindhoven University

More information

Developer identification methods for integrated data from various sources

Developer identification methods for integrated data from various sources Developer identification methods for integrated data from various sources Gregorio Robles, Jesus M. Gonzalez-Barahona {grex, jgb}@gsyc.escet.urjc.es Grupo de Sistemas y Comunicaciones Universidad Rey Juan

More information

Software Bugs and Evolution: A Visual Approach to Uncover Their Relationship

Software Bugs and Evolution: A Visual Approach to Uncover Their Relationship Software Bugs and Evolution: A Visual Approach to Uncover Their Relationship Marco D Ambros and Michele Lanza Faculty of Informatics University of Lugano, Switzerland Abstract Versioning systems such as

More information

The Impact of Release Management and Quality Improvement in Open Source Software Project Management

The Impact of Release Management and Quality Improvement in Open Source Software Project Management Applied Mathematical Sciences, Vol. 6, 2012, no. 62, 3051-3056 The Impact of Release Management and Quality Improvement in Open Source Software Project Management N. Arulkumar 1 and S. Chandra Kumramangalam

More information

Analysis of Open Source Software Development Iterations by Means of Burst Detection Techniques

Analysis of Open Source Software Development Iterations by Means of Burst Detection Techniques Analysis of Open Source Software Development Iterations by Means of Burst Detection Techniques Bruno Rossi, Barbara Russo, and Giancarlo Succi CASE Center for Applied Software Engineering Free University

More information

It s not a Bug, it s a Feature: How Misclassification Impacts Bug Prediction

It s not a Bug, it s a Feature: How Misclassification Impacts Bug Prediction It s not a Bug, it s a Feature: How Misclassification Impacts Bug Prediction Kim Herzig Saarland University Saarbrücken, Germany herzig@cs.uni-saarland.de Sascha Just Saarland University Saarbrücken, Germany

More information

What s a Typical Commit? A Characterization of Open Source Software Repositories

What s a Typical Commit? A Characterization of Open Source Software Repositories What s a Typical Commit? A Characterization of Open Source Software Repositories Abdulkareem Alali, Huzefa Kagdi, Jonathan I. Maletic Department of Computer Science Kent State University Kent Ohio 44242

More information

DAHLIA: A Visual Analyzer of Database Schema Evolution

DAHLIA: A Visual Analyzer of Database Schema Evolution DAHLIA: A Visual Analyzer of Database Schema Evolution Loup Meurice and Anthony Cleve PReCISE Research Center, University of Namur, Belgium {loup.meurice,anthony.cleve}@unamur.be Abstract In a continuously

More information

Integrating Service Oriented MSR Framework and Google Chart Tools for Visualizing Software Evolution

Integrating Service Oriented MSR Framework and Google Chart Tools for Visualizing Software Evolution 2012 Fourth International Workshop on Empirical Software Engineering in Practice Integrating Service Oriented MSR Framework and Google Chart Tools for Visualizing Software Evolution Yasutaka Sakamoto,

More information

QualitySpy: a framework for monitoring software development processes

QualitySpy: a framework for monitoring software development processes Journal of Theoretical and Applied Computer Science Vol. 6, No. 1, 2012, pp. 35-45 ISSN 2299-2634 http://www.jtacs.org QualitySpy: a framework for monitoring software development processes Marian Jureczko,

More information

Comparing Methods to Identify Defect Reports in a Change Management Database

Comparing Methods to Identify Defect Reports in a Change Management Database Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com

More information

Modeling Practices in Open Source Software

Modeling Practices in Open Source Software Modeling Practices in Open Source Software Omar Badreddin 1, Timothy C. Lethbridge 1, Maged Elassar 2 1 University of Ottawa 800 King Edward 2 IBM Ottawa Laboratories 770 Palladium Dr. Ottawa, Ontario,

More information

The Secret Life of Patches: A Firefox Case Study

The Secret Life of Patches: A Firefox Case Study The Secret Life of Patches: A Firefox Case Study Olga Baysal, Oleksii Kononenko, Reid Holmes, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo {obaysal, okononen,

More information

Comparing Software Quality between Open Source and Closed Source Software Development. Jussi Heikkilä, Joona Hartikainen & Pramod Guruprasad

Comparing Software Quality between Open Source and Closed Source Software Development. Jussi Heikkilä, Joona Hartikainen & Pramod Guruprasad Comparing Software Quality between Open Source and Closed Source Software Development Jussi Heikkilä, Joona Hartikainen & Pramod Guruprasad Abstract 1. Introduction 2. Software quality 2.1 Definition 2.2

More information

Bug Fixing Process Analysis using Program Slicing Techniques

Bug Fixing Process Analysis using Program Slicing Techniques Bug Fixing Process Analysis using Program Slicing Techniques Raula Gaikovina Kula and Hajimu Iida Graduate School of Information Science, Nara Institute of Science and Technology Takayamacho 8916-5, Ikoma,

More information

How to Manage the Patch Contribution Process in Open Source Software Projects

How to Manage the Patch Contribution Process in Open Source Software Projects Managing Open Source Contributions for Software Project Sustainability Bhuricha Deen Sethanandha 1, Bart Massey 1, William Jones 2 1 Department of Computer Science, Portland State University, Portland,

More information

Empirical study of software quality evolution in open source projects using agile practices

Empirical study of software quality evolution in open source projects using agile practices 1 Empirical study of software quality evolution in open source projects using agile practices Alessandro Murgia 1, Giulio Concas 1, Sandro Pinna 1, Roberto Tonelli 1, Ivana Turnu 1, SUMMARY. 1 Dept. Of

More information

Open Source Software Developer and Project Networks

Open Source Software Developer and Project Networks Open Source Software Developer and Project Networks Matthew Van Antwerp and Greg Madey University of Notre Dame {mvanantw,gmadey}@cse.nd.edu Abstract. This paper outlines complex network concepts and how

More information

maintainer a web-dashboard for R package maintainers

maintainer a web-dashboard for R package maintainers maintainer a web-dashboard for R package maintainers Maëlick Claes COMPLEXYS Research Institute University of Mons, Belgium RIMEL 2nd December 2015 R ecosystem Statistical environment Multiple package

More information

Coordination in Open Source versus Commercial Software Development

Coordination in Open Source versus Commercial Software Development Coordination in Open Source versus Commercial Software Development Chintan Amrit IEBIS Department University of Twente The Netherlands c.amrit@utwente.nl ABSTRACT Process patterns based on particular coordination

More information

A Visualization Approach for Bug Reports in Software Systems

A Visualization Approach for Bug Reports in Software Systems , pp. 37-46 http://dx.doi.org/10.14257/ijseia.2014.8.10.04 A Visualization Approach for Bug Reports in Software Systems Maen Hammad 1, Somia Abufakher 2 and Mustafa Hammad 3 1, 2 Department of Software

More information

Code Ownership in Open-Source Software

Code Ownership in Open-Source Software Code Ownership in Open-Source Software Matthieu Foucault University of Bordeaux LaBRI, UMR 5800 F-33400, Talence, France mfoucaul@labri.fr Jean-Rémy Falleri University of Bordeaux LaBRI, UMR 5800 F-33400,

More information

Are Change Metrics Good Predictors for an Evolving Software Product Line?

Are Change Metrics Good Predictors for an Evolving Software Product Line? Are Change Metrics Good Predictors for an Evolving Software Product Line? Sandeep Krishnan Dept. of Computer Science Iowa State University Ames, IA 50014 sandeepk@iastate.edu Chris Strasburg Iowa State

More information

A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY

A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY Jin Xu,Yongqin Gao, Scott Christley & Gregory Madey Department of Computer Science and Engineering University of Notre Dame Notre

More information

How to Measure Software Quality in Vain

How to Measure Software Quality in Vain Empirical Estimates of Software Availability in Deployed Systems Avaya Labs Audris Mockus audris@mockus.org Avaya Labs Research Basking Ridge, NJ 07920 http://www.research.avayalabs.com/user/audris Motivation

More information

Dynamical Simulation Models for the Development Process of Open Source Software Projects

Dynamical Simulation Models for the Development Process of Open Source Software Projects Dynamical Simulation Models for the Development Process of Open Source Software Proects I. P. Antoniades, I. Stamelos, L. Angelis and G.L. Bleris Department of Informatics, Aristotle University 54124 Thessaloniki,

More information

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)

More information

Co-evolving Code-Related and Database-Related Changes in a Data-Intensive Software System

Co-evolving Code-Related and Database-Related Changes in a Data-Intensive Software System Co-evolving Code-Related and Database-Related Changes in a Data-Intensive Software System Mathieu Goeminne, Alexandre Decan, Tom Mens Département Informatique, Université de Mons 7000 Mons, Belgique firstname.lastname@umons.ac.be

More information

Understanding the popularity of reporters and assignees in the Github

Understanding the popularity of reporters and assignees in the Github Understanding the popularity of reporters and assignees in the Github Joicy Xavier, Autran Macedo, Marcelo de A. Maia Computer Science Department Federal University of Uberlândia Uberlândia, Minas Gerais,

More information

Processing and data collection of program structures in open source repositories

Processing and data collection of program structures in open source repositories 1 Processing and data collection of program structures in open source repositories JEAN PETRIĆ, TIHANA GALINAC GRBAC AND MARIO DUBRAVAC, University of Rijeka Software structure analysis with help of network

More information

Contemporary Peer Review in Action:

Contemporary Peer Review in Action: Feature: Software Reviews Contemporary Peer Review in Action: Lessons from Open Source Development Peter C. Rigby, Concordia University, Montreal, Canada Brendan Cleary, University of Victoria, Canada

More information

Learn Software Microblogging - A Review of This paper

Learn Software Microblogging - A Review of This paper 2014 4th IEEE Workshop on Mining Unstructured Data An Exploratory Study on Software Microblogger Behaviors Abstract Microblogging services are growing rapidly in the recent years. Twitter, one of the most

More information

Automated Classification of Change Messages in Open Source Projects

Automated Classification of Change Messages in Open Source Projects Automated Classification of Change Messages in Open Source Projects Ahmed E. Hassan School of Computing Queen s University Kingston, Canada ahmed@cs.queensu.ca ABSTRACT Source control systems permit developers

More information

Convergent Contemporary Software Peer Review Practices

Convergent Contemporary Software Peer Review Practices Convergent Contemporary Software Peer Review Practices Peter C. Rigby Concordia University Montreal, QC, Canada peter.rigby@concordia.ca Christian Bird Microsoft Research Redmond, WA, USA cbird@microsoft.com

More information

Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development

Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development Stefan Dietze Fraunhofer Institute for Software and Systems Engineering (ISST), Mollstr. 1, 10178

More information

Jeffrey Klow, Paul German, and Emily Andrulis

Jeffrey Klow, Paul German, and Emily Andrulis Jeffrey Klow, Paul German, and Emily Andrulis Network Traffic Initial Findings Last week we presented on how we first got our data from the Tims in the IT department. This past week we have been continuously

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Process Mining Event Logs from FLOSS Data: State of the Art and Perspectives

Process Mining Event Logs from FLOSS Data: State of the Art and Perspectives Process Mining Event Logs from FLOSS Data: State of the Art and Perspectives Patrick Mukala, Antonio Cerone and Franco Turini Dipartimento di Informatica, University of Pisa, Pisa, Italy {mukala,cerone,turini}@di.unipi.it

More information

Structural Complexity Evolution in Free Software Projects: A Case Study

Structural Complexity Evolution in Free Software Projects: A Case Study Structural Complexity Evolution in Free Software Projects: A Case Study Antonio Terceiro and Christina Chavez Computer Science Department Universidade Federal da Bahia {terceiro,flach}@dcc.ufba.br Abstract.

More information

Defect Prediction using Combined Product and Project Metrics A Case Study from the Open Source Apache MyFaces Project Family

Defect Prediction using Combined Product and Project Metrics A Case Study from the Open Source Apache MyFaces Project Family Defect Prediction using Combined Product and Project Metrics A Case Study from the Open Source Apache MyFaces Project Family Dindin Wahyudin, Alexander Schatten, Dietmar Winkler, A Min Tjoa, Stefan Biffl

More information

Every Cornell student knows what it feels like to have to wait on the slow

Every Cornell student knows what it feels like to have to wait on the slow Paul German, Jeffrey Klow, and Emily Andrulis Network Traffic Final Presentation Every Cornell student knows what it feels like to have to wait on the slow internet speeds that they run into during the

More information

Practical Software Quality Prediction

Practical Software Quality Prediction Practical Software Quality Prediction Emad Shihab Department of Computer Science and Software Engineering Concordia University eshihab@cse.concordia.ca Abstract Software systems continue to play an increasingly

More information

Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization

Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization Amir H. Moin and Mohammad Khansari Department of IT Engineering, School of Science & Engineering, Sharif University

More information

INTRODUCING HEALTH PERSPECTIVE IN OPEN SOURCE WEB-ENGINEERING SOFTWARE PROJECTS, BASED ON PROJECT DATA ANALYSIS 1)

INTRODUCING HEALTH PERSPECTIVE IN OPEN SOURCE WEB-ENGINEERING SOFTWARE PROJECTS, BASED ON PROJECT DATA ANALYSIS 1) INTRODUCING HEALTH PERSPECTIVE IN OPEN SOURCE WEB-ENGINEERING SOFTWARE PROJECTS, BASED ON PROJECT DATA ANALYSIS 1) Dindin Wahyudin, Alexander Schatten, Khabib Mustofa, Stefan Biffl, A Min Tjoa Institute

More information

ModelingandSimulationofthe OpenSourceSoftware Community

ModelingandSimulationofthe OpenSourceSoftware Community ModelingandSimulationofthe OpenSourceSoftware Community Yongqin Gao, GregMadey Departmentof ComputerScience and Engineering University ofnotre Dame ygao,gmadey@nd.edu Vince Freeh Department of ComputerScience

More information

Vulnerability Discovery in Multi-Version Software Systems

Vulnerability Discovery in Multi-Version Software Systems 1th IEEE High Assurance Systems Engineering Symposium Vulnerability Discovery in Multi-Version Software Systems Jinyoo Kim, Yashwant K. Malaiya, Indrakshi Ray Computer Science Department Colorado State

More information

Coordination Implications of Software Coupling in Open Source Projects

Coordination Implications of Software Coupling in Open Source Projects Coordination Implications of Software Coupling in Open Source Projects Chintan Amrit and Jos van Hillegersberg IS&CM Department, University of Twente, P.O. Box 217 7500 AE Enschede, The Netherlands {c.amrit,j.vanhillegersberg}@utwente.nl

More information

Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project

Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project Aspects of Software Quality Assurance in Open Source Software Projects: Two Case Studies from Apache Project Dindin Wahyudin, Alexander Schatten, Dietmar Winkler, Stefan Biffl Institute of Software Technology

More information

Visual Data Mining: A Review

Visual Data Mining: A Review Visual Data Mining in Software Archives To Detect How Developers Work Together Peter Weißgerber Mathias Pohl Michael Burch Computer Science Department University of Trier 54286 Trier, Germany {weissger,

More information

When Do Software Issues and Bugs get Reported in Large Open Source Software Project?

When Do Software Issues and Bugs get Reported in Large Open Source Software Project? When Do Software Issues and Bugs get Reported in Large Open Source Software Project? Rakesh Rana and Miroslaw Staron Department of Computer Science & Engineering Chalmers/ University of Gothenburg Göteborg,

More information

On Software Engineering Repositories and Their Open Problems

On Software Engineering Repositories and Their Open Problems On Software Engineering Repositories and Their Open Problems Daniel Rodriguez University of Alcala Alcala de Henares, Spain daniel.rodriguez@uah.es Israel Herraiz Technical University of Madrid Madrid,

More information

Observability of Software Engineering Processes in Open Source Software Projects Domain

Observability of Software Engineering Processes in Open Source Software Projects Domain Observability of Software Engineering Processes in Open Source Software Projects Domain Wikan Danar Sunindyo Christian Doppler Laboratory for Software Engineering Integration for Flexible Automation Systems

More information

Getting the global picture

Getting the global picture Jesús M. González Barahona, Gregorio Robles GSyC, Universidad Rey Juan Carlos, Madrid, Spain {jgb,grex}@gsyc.escet.urjc.es Oxford Workshop on Libre Software 2004 Oxford, UK, June 25th Overview 1 Overview

More information

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)

More information

The Improvement of Test Case Selection for the Process Software Maintenance

The Improvement of Test Case Selection for the Process Software Maintenance The Improvement of Test Case Selection for the Process Software Maintenance Adtha Lawanna* Abstract following topics in software-development life cycle (SDLC) Software maintenance is one of the critical

More information

Free/Libre and Open Source Software: Survey and Study FLOSS

Free/Libre and Open Source Software: Survey and Study FLOSS Free/Libre and Open Source Software: Survey and Study FLOSS Deliverable D18: FINAL REPORT Part V: Software Source Code Survey Rishab Aiyer Ghosh Gregorio Robles Ruediger Glott International Institute of

More information

Discovering loners and phantoms in commit and issue data

Discovering loners and phantoms in commit and issue data Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2015 Discovering loners and phantoms in commit and issue data Schermann, Gerald;

More information

Recommending change clusters to support software investigation: an empirical study

Recommending change clusters to support software investigation: an empirical study JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE J. Softw. Maint. Evol.: Res. Pract. 2010; 22:143 164 Published online 9 September 2009 in Wiley InterScience (www.interscience.wiley.com)..413

More information

Tracking the Impact of Design Changes During Software Development

Tracking the Impact of Design Changes During Software Development Tracking the Impact of Design Changes During Software Development Frank Padberg Fakultät für Informatik Universität Karlsruhe, Germany padberg@ira.uka.de Abstract Design changes occur frequently during

More information

An Empirical Investigation into the Role of API-Level Refactorings during Software Evolution

An Empirical Investigation into the Role of API-Level Refactorings during Software Evolution An Empirical Investigation into the Role of API-Level Refactorings during Software Evolution ABSTRACT Miryung Kim The University of Texas at Austin Austin TX miryung@ece.utexas.edu It is widely believed

More information

THE OPEN SOURCE DEVELOPER REPORT

THE OPEN SOURCE DEVELOPER REPORT THE OPEN SOURCE DEVELOPER REPORT 2011 ECLIPSE COMMUNITY SURVEY JUNE 20 11 EXECUTIVE SUMMARY Eclipse is a large, vibrant, well-established open source community with over 200 open source projects, close

More information

Importance of Open Source Contributions within the Educational Process

Importance of Open Source Contributions within the Educational Process Importance of Open Source Contributions within the Educational Process Victor Cărbune Computer Science ETH Zürich, Switzerland vcarbune@student.ethz.ch Laura Mihaela Vasilescu Computer Science & Engineering

More information

Content. Development Tools 2(63)

Content. Development Tools 2(63) Development Tools Content Project management and build, Maven Version control, Git Code coverage, JaCoCo Profiling, NetBeans Static Analyzer, NetBeans Continuous integration, Hudson Development Tools 2(63)

More information

Full-text Search in Intermediate Data Storage of FCART

Full-text Search in Intermediate Data Storage of FCART Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,

More information

Measurement and Modelling of Internet Traffic at Access Networks

Measurement and Modelling of Internet Traffic at Access Networks Measurement and Modelling of Internet Traffic at Access Networks Johannes Färber, Stefan Bodamer, Joachim Charzinski 2 University of Stuttgart, Institute of Communication Networks and Computer Engineering,

More information

A Crowd Method for Internet-based Software with Big Data

A Crowd Method for Internet-based Software with Big Data 2014 中 南 大 学 英 特 尔 透 明 计 算 与 大 数 据 研 讨 会 A Crowd Method for Internet-based Software with Big Data Gang Yin Software Collaboration and Data Mining Group National University of Defense Technology Changsha,

More information

Towards A Portfolio of FLOSS Project Success Measures

Towards A Portfolio of FLOSS Project Success Measures Towards A Portfolio of FLOSS Project Success Measures Kevin Crowston, Hala Annabi, James Howison and Chengetai Masango Syracuse University School of Information Studies {crowston, hpannabi, jhowison, cmasango}@syr.edu

More information

An empirical study of fine-grained software modifications

An empirical study of fine-grained software modifications An empirical study of fine-grained software modifications Daniel M. German Software Engineering Group Department of Computer Science University of Victoria Victoria, Canada dmgerman@uvic.ca Abstract Software

More information

Applying Social Network Analysis to the Information in CVS Repositories

Applying Social Network Analysis to the Information in CVS Repositories Applying Social Network Analysis to the Information in CVS Repositories Luis Lopez-Fernandez, Gregorio Robles, Jesus M. Gonzalez-Barahona GSyC, Universidad Rey Juan Carlos {llopez,grex,jgb}@gsyc.escet.urjc.es

More information

VISUALIZATION APPROACH FOR SOFTWARE PROJECTS

VISUALIZATION APPROACH FOR SOFTWARE PROJECTS Canadian Journal of Pure and Applied Sciences Vol. 9, No. 2, pp. 3431-3439, June 2015 Online ISSN: 1920-3853; Print ISSN: 1715-9997 Available online at www.cjpas.net VISUALIZATION APPROACH FOR SOFTWARE

More information

Validity of Network Analyses in Open Source Projects

Validity of Network Analyses in Open Source Projects Validity of Network Analyses in Open Source Projects Roozbeh Nia, Christian Bird, Premkumar Devanbu, Vladimir Filkov Computer Science Department University of California, Davis {nia, bird, devanbu, filkov}@cs.ucdavis.edu

More information

An Empirical Study on the Relation between Dependency Neighborhoods and Failures

An Empirical Study on the Relation between Dependency Neighborhoods and Failures An Empirical Study on the Relation between Dependency Neighborhoods and Failures Thomas Zimmermann, Nachiappan Nagappan, Kim Herzig, Rahul Premraj and Laurie Williams Microsoft Research, Redmond, WA 98052,

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information