Behind Linus s Law: A Preliminary Analysis of Open Source Software Peer Review Practices in Mozilla and Python

Transcription

1 Behind Linus s Law: A Preliminary Analysis of Open Source Software Peer Review Practices in Mozilla and Python Jing Wang and John M. Carroll Center for Human-Computer Interaction and College of Information Sciences and Technology The Pennsylvania State University, University Park, Pennsylvania USA ABSTRACT Open source is an important model of collaborative knowledge work and virtual organizations. One of its work practices, peer review, is considered critical to its success, as Linus s law highlights. Thus, understanding open source peer review, particular effective review practices, will improve the understanding of how to support collaborative work in new ways. Therefore, we conduct case studies in two open source communities that are well recognized as effective and successful, Mozilla and Python. In this paper, we present the preliminary results of our analysis on data from the bug tracking systems of those two organizations. We identify four common activities critical to open source software peer review, submission, identification, resolution and evaluation. Differences between communities indicate factors, such as reporter expertise, product type and structure, and organization size, affect review activities. We also discuss features of open source software peer review distinct from traditional review, as well as reconsiderations of Linus s law. KEYWORDS: Coordination, cooperation and collaboration; designing collaborative & virtual organizations 1. INTRODUCTION Given enough eyeballs, all bugs are shallow. - Linus s Law [1] Open source software (OSS) development is an important model of computer supported collaboration. It is still evolving and becoming increasingly important. One characteristic of OSS development success is extensive peer review, as highlighted by Linus s law. Peer review is evidently a collaborative practice, involving critical evaluation on the software products peers created before integrating them into the system. Despite its importance, OSS peer review has not been well understood. A few researchers focused on the coordination during bug fixing processes [2], or expertise issues [3]. Some studies, mostly statistical reports, analyzed the patch review activity, which is only part of peer review practices [4-6]. Others either only described policy statements [7] or developed quantitative measurements [8-9]. To better understand OSS peer review practices, especially how they can be effective, we conduct qualitative and quantitative analyses on two established OSS communities, Mozilla and Python. We adapt activity theory to identify the shared critical activities of their peer review practices and characterize each activity in terms of its three elements, participants, objects, and tools that mediate the activity [10]. We also contrast activity elements between the two communities. Activity theory offers more flexibility and generalizability than a rigid process perspective, like a single project s bug lifecycle [11]. It accommodates sequential and compositional variances of review processes. It also draws attention to the interactions between activity elements, tools and their users in particular. This study contributes to Human Computer Interaction (HCI) and Computer Supported Cooperative Work (CSCW) in several ways. First, it codifies collaborative activities of OSS peer review practices, improving understanding of how specific sociotechnical affordances can enhance review effectiveness. Second, it contrasts the traditional software review and OSS peer review, urging reflections on Linus s law. Finally, it highlights the mediation of computer-supported tools in OSS peer /11/$ IEEE 117

2 review, providing indications of designing applications for collaborative software review. 2. SOFTWARE PEER REVIEW Peer review is one type of software review for discovering defects (or bugs) as early as possible during development processes, suggesting improvements and even helping developers create better products [12]. According to the Standard for Software Reviews [13], software products that can be reviewed include requirements specifications, use cases, project plans, user interface design and prototypes, and program documentation. In software engineering literature, peer review varies at degrees of formality and flexibility, such as inspections (the most systematic and rigorous) and walkthroughs (informal) [12]. Formal review usually proceeds through planning, preparation (individual review), examination (review meetings), rework/follow-up, and exit evaluation. It involves five roles, including the inspection leader/moderator, the recorder, readers, the author, and inspectors [13-14]. High formality is often superior at capturing defects but compromised by labor cost. Thus, achieving effectiveness became a crucial issue. Research has been focused on team size and necessity of review meetings. Several studies recommend two to five members as optimal team size [15-16]. Tradeoffs of synchronous meetings and asynchronous reviews are both suggested from past investigation [12]. One group of studies related to OSS peer review analyzes patch review [4-6]. We consider patch review only part of OSS peer review activities because it only involves evaluation on one type of software product, patches. Our scope is concerned with evaluation on any software products the project is producing, which includes all bug reporting and fixing activities. Some other studies concentrate on the participation and quality of bug reporting. Recent work by Ko et al. [3] analyzed the reports of Mozilla contributors who reported problems but were never assigned problems to fix, indicating that the primary value of open bug reporting is in recruiting talented reporters. Other literatures suggest ways to improve bug reports [17-18]. Another theme of OSS research investigates bug fixing processes, coordination mechanisms in particular. For example, Crowston et al. [2] applied coordination theory to four OSS projects, concluding the lack of task assignment in OSS development. However, the evaluation related activities did not emerge in their study, and their perspective focused on personal interaction rather than the role of tools and their interaction with people. The other body of OSS literature develops quantitative metrics to extract features of OSS projects, such as the work of Mockus et al. on Mozilla and Apache [9]. However, such quantitative studies are limited at their depth and descriptive ability. 3. METHOD We selected two established and well-recognized OSS projects for analysis, Mozilla and Python. Their success provides opportunities to access substantial peer review instances, particularly the ones illustrating effective practices. Multiple cases also enable contrasts that highlight consistencies. Mozilla, operating as an OSS project since 1998, creates and maintains a variety of software applications, such as the web browser Firefox and the bug tracking system Bugzilla. The Mozilla community is organized and supported by the non-profit organization, Mozilla Foundation. Tools supporting peer review mainly consist of Mercurial (version control system), Bugzilla (bug tracking system), Internet Relay Chat (IRC), and mailing lists (identical to discussion forums in Mozilla). Python, initiated as an OSS project in 1991, develops the computer programming language Python. Similar to Mozilla, Python community is supported by a non-profit organization, Python Software Foundation (PSF). Current tools for peer review include Subversion (version control system), Roundup issue tracker (bug tracking system), mailing lists, Python Enhancement Proposals (PEPs), IRC, and Rietveld (code review tool). We analyzed bug reports from Mozilla s and Python s bug tracking systems, which are crucial data sources of peer review practices in large OSS projects. Data resources like mailing lists are not included in this paper but will be employed when our study progresses. Bug tracking systems keep track of reported defects or deficiencies of source code, design, and documents. Since both systems record large and diverse sets of bug reports, we only sampled from Firefox and Python language to help focus our investigation. Those two products were selected not only because they are the core products of the two communities, but also because their users are likely to have different levels of technical expertise. Firefox is an end-user application, while Python language is used by programmers. Such difference may increase the generalizability of their commonalities to other contexts. We collected bug reports created between two recent stable releases of Firefox and Python for our quantitative examination. For Mozilla, reports were filed between the 118

3 release of Firefox 3.5 final on June 30, 2009 and 3.6 final on January 21, For Python, issues were generated between the release of Python 3.0 final on December 3, 2008 and 3.1 final on June 27, This sampling strategy was to capture the possible behavioral changes near releases, as suggested by Francalanci et al. [19]. Other data resources leveraging our study are online documents, such as projects review instructions and policy statement [20-21]. We also identified core members of these two communities from their websites and wikis. For Mozilla, they include Firefox and Toolkit module owners and peers, super reviewers, security group members, and quality assurance (QA) members. For Python, committers, PSF officers, board of directors, nominated members, and emeritus members are counted. Our qualitative analysis was carried out through two phases. First, we explored our data set randomly, comparing the actual processes with the bug cycles/workflows documented by both communities. Commonalities were identified out of this comparison. We further extracted critical activities and their elements from the commonalities, distinguishing each activity by its outcome. These all served as our first-level code. Second, we selected reports with diverse characteristics of activity elements, such as different status, resolution and roles of participants, to refine our understanding of each activity. 4. RESULTS: CRITICAL ACTIVITIES OF OSS PEER REVIEW Four critical activities of peer review practices have emerged from our qualitative analysis on the two cases. A complete peer review process may proceed along different sequences and combinations of the four activities. We describe each activity in the order of objects, participants, and tools. We only consider bug tracking systems as tools for our current study Submission Peer review process starts with a community member experiencing a failure, discovering a product defect, or desiring a design enhancement. The goal of submitting reports regarding such issues is to evoke communities awareness of a quality improvement request. Submissions report three types of issues failures of using software, content or design defects, and requirements of new functionalities. Cannot drag a page link to an empty bookmarks toolbar (Bug ; Mozilla). [U]rllib: urlretrieve() does not close file objects on failure (Issue 5536; Python). Feature request: User controlled width of containers (Bug ; Mozilla). Any community member can submit a report. Compared to other development activities, submission requires relatively less expertise, and thereby involves wide participation. However, for the purposes of public awareness and sensemaking, participants have to describe the issue clearly and informatively. Therefore, a bug report usually entails a concise summary of the problem, the environment of the occurrence (e.g., operating systems and software components), steps to reproduce the problem, and the version of the reviewed product. Large OSS projects often employ bug tracking systems to receive submissions. These systems present a semistructured form for reporters to fill, with both required and optional fields. Bugzilla and Roundup also assist reporters with notes explaining the meaning of some fields Identification Identification is an activity in which members detect the cause of product defects or determine the specification of requested improvements. It also involves judgment and decision making on whether the bug should be fixed. Depending on the volume of submissions, an identification activity may consist of more than one step: a large number of bug reports need screening before any further analysis Preprocessing Preprocessing examines whether the submission discloses an actual defect of the product or a suitable request of new features. Some projects use the term triage in a general sense to refer to this stage. Screening bug reports is attempted to be not controversial but self-evident. Reports filtered during this stage include (1) failures that are not reproducible, (2) failures that are caused by third party applications or reporters mistake, (3) issues that are intended design, (4) issues that have already been reported, (5) issues that maintain the state of lacking indispensible information, and (6) requests contradicting to project objectives or raising risk of system security. Requests that seem plausible but depart from projects short-term plans are suspended. Active contributors usually mediate the screening process. They follow up with reporters to clarify the ambiguity in the bug description, solicit missing information, or guide the reporter to re-test the problem. 119

4 Can you reproduce this problem after you change (Bug ; Mozilla). Do you happen to have a environment variable? If yes, I suggest to remove it (Issue 5528; Python). Bug tracking systems enable qualified users to set flags indicating whether an issue needs further identification. Both Bugzilla and Roundup use works for me, invalid, duplicate, and won t fix to classify filtered issues. Modifying these flags requires members requesting editing privileges, which often entails prior contributions. However, neither Mozilla nor Python is conservative at granting those privileges. Bug tracking systems also provide search features to assist decisions of preprocessing. Words in report text are the most common index for retrieval. Bugzilla and Roundup allow search by bug ID, creation date, change date, status, resolution, creator, involved members, assignee, version, component, priority, and related bugs Identification Bug reports not filtered out at preprocessing proceed to further identification, unless a defect is trivial and its cause is self-evident (e.g., a typo in a document). Further investigation may refine bugs classification. For example, during analysis members may realize the cause of failure is identical to an existing bug s but just another symptom. Or they may find out the deficiency is beyond project goals. Relationship between bugs, if any exists, is also expected to be identified. Awareness of bug connections helps coordinate resolving efforts. Participants involved in identification activities often encounter the challenge of reaching consensus, especially when the issue is a feature request. Under such circumstances, the group sometimes integrates both perspectives into a compromise. Regarding [a]dding an option of non-zero exit status of setup.py when building of extensions has failed, some members favored the suggestion, while others argued that users might mistake the failure of compiling extension for the incorrect implementation of Python. Finally, all agreed on add an option where you tell exactly which modules aren't expected to be compiled successfully If any modules besides those specified are not built, then setup.py returns [failure] (Issue 6731; Python). Sometimes the group consults the core developers who have decision power over the specific issue (e.g., owners of the affected module and the benevolent dictator). Some other times the group appeals to the rest of the community (e.g., posting to relevant mailing lists). The reporter, report reviewers, and module owners are three major roles in an identification activity. The reporter should be available to provide additional information when queried by reporter reviewers. Report reviewers need to be knowledgeable of the affected components. They offer insights for assessing the defect; they may also mediate the discussion by involving the affected module owners and relevant parties in decision making. Bug tracking systems provide each bug an individual space for discussions and display them on the same page along with other reported information. They also keep logs of attribute changes of a bug report, for example, classification changes. The log also records who made the change at what time. Additionally, the systems present links to related bugs Resolution Resolution activities are primarily concerned with the development of solutions to address defects and requests. Accurate and clear identification of defect causes or request specifications can facilitate the resolution process. OSS projects often implement the resolution as patches to their software product unless the repair is very trivial. Patches sometimes are created by the bug reporter at the time of submission; otherwise, other members may volunteer. Even though members can assign a specific developer to fix a bug, OSS communities encourage selfassignment: developers can choose what to work on and when to quit by their own interests. I'm a bit worried about committers giving other committers priorities on how they should work; as a principle, volunteers are free to work on whatever they please, in whatever order they please So unassign [developer 1] (Issue 4614; Python). Implementing a patch is well coordinated in Mozilla and Python, avoiding duplicate development efforts. The coordination is often implicit: a member who understands the bug just submits a patch shortly after identification discussions. Sometimes it is explicitly stated if members need some time to start implementation. They often claim the implementation responsibility by changing the value of field assignee in bug reports, or commenting in the discussion space: Well, I think reformatting the about page could make some sense, so taking for now (Bug ; Mozilla). Core developers are intensively engaged in resolution activities, although they are not obliged to. This is mostly due to the fact that contributing a patch usually requires 120

5 sufficient expertise and knowledge of the affected software components. Both Bugzilla and Roundup support designating assignees and uploading patches as attachments to a bug report. Currently they both constrain the access to editing assignees within a group of authorized members. Discussion spaces reserve all the suggested approaches and specified requirements generated during discussion Evaluation Evaluation is to ensure a resolution successfully repairs identified defects or satisfies specified requests. If an implementation fails evaluation, resolution activities and even identification activities will be reinitiated. Patch review is the most common type of evaluation. In Mozilla and Python, patch review is to examine not only whether a resolution can perform as expected, but maintainability, security, usability, and test, and license. We've beginning to use as code conventions. This'll matter a down the review, a reviewer commented on a piece of patch code. The reviewer also assessed the behavior of the patch, [t]his part is failing for me when running the testscript via command line. At the end, he complimented, [o]therwise, this looks good. Short and sweet, the way I like it :) (Bug ; Mozilla). Reviewers tend to let a resolution pass review if only minor changes are needed. This lenient convention encourages novice contributors, and expedites the cycle of product improvement. r=me with the last few changes talked about on irc 1 (Bug ; Mozilla). The patch looks about right...so please remove the note on releasing md_state. (Issue 4614; Python). At least one core member other than the resolution author has to participate in evaluation activities. Peripheral members can suggest any modification, but are not eligible to determine the evaluation result. When implementations may affect multiple modules, or the core developer is not confident to make a decision, more core reviewers will participate in the evaluation. For instance, Mozilla has the super review policy, which applies to significant architectural refactoring, any change to any API or pseudo-api, and all changes that affect how code modules interact. Python requires changes to a release candidate or a final release must be reviewed by two core 1 Note that Mozilla uses r= to mark patches that pass review. developers unless the patch author is a core developer [20]. Both projects indicate that the resolution author should be responsible to keep pursuing a review when no timely response is received. Bugzilla and Roundup both support viewing patch content, notifying specific reviewers by adding addresses, and displaying evaluation decisions. However, the two systems are designed in different ways to support those functions, which are elaborated in later sections. 5. CASE COMPARISON In this section, we elaborate and compare the different characteristics of each review activity in Mozilla and Python. Table 1 summarizes our quantitative findings. Table 1. Quantitative Activity Characteristics Variable Mozilla Python Core members Submissions Distinct submitters Filtered submissions (Ratio among submissions) Open submissions (Ratio among submissions) Implemented resolutions (Ratio among submissions) Patches (Ratio among implemented resolutions) Resolutions passed evaluation (Ratio among implemented resolutions) Resolutions under evaluation (Ratio among implemented resolutions) 5.1. Submission 3418 (46.68%) 3406 (46.52%) 585 (7.99%) 479 (81.88%) 498 (85.13%) 469 (25.35%) 481 (26.00%) 1114 (60.22%) 708 (63.55%) 832 (74.69%) In Mozilla, 35.6 Firefox bug reports on average were submitted per day during 206 days. Among 5419 distinct reporters, 74 were core members (1.37%; n=5419). These core members submitted 680 reports (9.29%; n=7322). Submission activities in Python occurred less intensively than in Mozilla, with a more significant portion contributed by core members. Approximately 9 issues were submitted daily during 206 days. 218 of the reporters were core members (26.30%; n=829), who filed 609 reports (32.92%; n=1850). The statistics reflect the fact 61 (10.43%) 199 (17.86%) 121

6 that Python community is much smaller than Mozilla, particularly with a smaller set of peripheral members. Many of the reporters are end-users who do not have technical background. This is in accordance with Firefox s oriented user base. I'm not a techie. Where would I find [the crash report id /stacktrace]? (Bug ; Mozilla). In contrast to Firefox reporters, Python users are mainly programmers with more homogeneity. Their submissions might be less accessible to general audience. [C]gi module cannot handle POST with multipart/formdata in 3.0. (Issue 4953; Python). Given the large number of inexperienced reporters, Bugzilla specifies a report s components in great details, including affected product, component, hardware platform, operating system, reproducibility, security, severity, build identifier, summary, details, steps to reproduce, actual results, and expected results. It provides explanations for all the items and examples for some of them. It also reminds reporters to check if their installment is updated to products current build before submitting a report to the system. Additionally, it lists the top 100 frequentlyreported bugs and recommends reporters searching within those bugs, examining whether their problems already exist in the repository. Roundup defines a report with fewer items and in a less structured form than Mozilla. Type is an optional field, and priority, stage, assignee and keywords are only editable for privileged users. It neither outlines the content of issue description, such as steps to reproduce, nor alerts users to perform pre-examination before submission Identification A much larger percentage of submissions were filtered out during identification in Mozilla than in Python (Table 1). Duplicate bugs contributed significantly to this difference (Mozilla: 18.38%, n=7322; Python: 5.31%, n=1850). Preprocessing in Mozilla entailed much more efforts than in Python. Regarding Firefox, 2680 bug reports had not completed preprocessing (36.60%; n=7322), namely unconfirmed as eligible for fixes. About 1223 bug reports (16.70%; n=7322) did not engage any identification activities with only one comment from the reporter. Only 84 Python issue reports (4.54%; n=1850) did not receive any identification by the time data were collected. Participants in Mozilla perform more refined roles in identification activities than those in Python. For Firefox bugs, QAs and Friends of the Tree (FotT) 2 often moderate the identification process, while module owners and peers contribute by locating the defect cause and discussing the request suitability. Moderation usually does not require as much expertise as diagnosis. In Bug , a QA reacted to the reporter first, I have no idea about what the reporter is talking. [Core developer 1] or [core developer 2] can you help out? Bugzilla assists preprocessing and prioritizing bug reports with sophisticated classification indices and search features. It categorizes open bugs into unconfirmed, new, assigned, and reopened. Members who want to perform triage can easily find the unconfirmed bugs. Other than searching by index value, Bugzilla supports searching by bug changes. For instance, users can find bugs that have their keywords changed between two dates. Roundup issue tracker does not divide its open issues, but it may have supported prioritizing issues better than Bugzilla. It uses priority to inform the importance of an issue, a ticket similar to severity/importance in Bugzilla. A χ 2 test on the independence between priority and whether an issue is fixed suggests that they are dependent: a significant portion of issues with release blocker, critical and high priority were fixed. However, χ 2 tests on the severity show that higher severity did not lead to more fixes. On the contrary, normal and trivial bugs were over fixed, while critical ones were under confirmed. This might be attributed to the fact that anyone can edit the value of severity in Bugzilla but not priority in Roundup Resolution As shown in Table 1, a much smaller portion of submissions to Mozilla had implemented resolutions than those to Python, including both fixed bugs without a patch and all the bugs with a patch. Among those implementations, patches appeared to be more dominant in Mozilla than in Python. We might have underestimated the number of patches in Python because some issues did not have keywords patch configured but were attached patches. The fact that many Python issues (18.70%; n=1850) were document revisions, which did not need patches for fixes, might also result in fewer patches. Contributions of peripheral members of Mozilla seemed more significant than those of Python. 66 out of 113 Firefox bug assignees were not core members. By contrast, only 4 assignees were peripheral members among 63 Python issue assignees. This may be attributed to several 2 FotT are the participants who are acknowledged by the project team for their significant contributions, such as testing, submitting patches, helping users, and writing documentation. 122

7 reasons. First, normal users are not permitted to set assignee in Roundup. Thus, peripheral members selfassignment did not happen as frequently as in Mozilla. Second, assignment is not always perceived as an approach to coordinate resolution implementation. At times core developers set themselves assignees when they are reviewing or will commit a resolution. Such assignment conveys that the assignee is responsible for the review conclusion or the commit action. A peripheral member found functions vulnerable to denial of service and attached patches for fixing this deficiency. A core developer assigned himself to the issue. Three days later, he finished evaluation, Thanks for the patches, and committed the patch (Issue 4859; Python). Bugzilla supports resolution activities mainly in two ways. It automatically sends bug assignees notification every time the bug report is modified. It also visualizes the status change of patches and depended bugs: If a patch is obsolete or a depended bug is solved, their labels will be crossed. Roundup encourages participation in resolution activities by two shortcut searches, Show Easy and Show Unassigned. These two options are posited besides the regular search feature. Easy issues are retrieved by the keyword easy, as we described in the prior section. Roundup also provides a field stage that has a value named needs patch, which is searchable when developers look for an issue to work on Evaluation Mozilla exhibited better performance of evaluation than Python with a higher rate of evaluation completion and pass. This may be due to two reasons. One is that Python has a comparatively smaller number of core members (i.e., evaluations) but a larger set of implemented resolutions to assess. The workload for each core member may exceed their capacity. The other is that Mozilla has a more effective mechanism of involving appropriate resolution reviewers than Python. Firefox clearly defines its modules as well as their owners and peers. Patch authors just have to set the flag in Bugzilla as review? with a reviewer s address to get the reviewer alerted. Even if the requested reviewer is not the appropriate candidate, s/he will forward the patch to the proper reviewer. Differently, Python source code is less structured. Therefore, responsibilities of each core developer are not as evident as in Mozilla; Patch authors, especially peripheral members, do not often request a specific reviewer. Mozilla developers apply unified symbols to declare evaluation results. Python, instead, does not set up such protocols, which may cause difficulty of tracing liability. In Bugzilla discussion spaces, rs= followed by a reviewer s name implies that the resolution is satisfactory. In its patch space, review+ following a reviewer s name states acceptance and review- shows rejection. Bugzilla is embedded with a relatively advanced patch viewer, which Roundup does not implement but recommends an external code viewer Rietveld. Bugzilla s patch viewer allows reviewers to examine the difference between patches, link to a particular section of a patch for discussion, connect to a web-based source code version control system and a cross-reference tool, and transform a patch in the unified format. In addition, Roundup highlights issues needing evaluation by one shortcut search, Show Needs Review. This retrieval is accomplished by the keyword needs review. The field stage contains values Patch Review and Commit Review, stating the evaluation progress. We found this field was not used frequently and was set incorrectly at times. 6. DISCUSSION AND CONCLUSION The four common peer review activities identified from Mozilla and Python submission, identification, resolution and evaluation share similarities with four steps of traditional software review process, individual reviews, review meetings, rework and follow-up. However, they also show distinctive features of OSS peer review practices. (1) Products in review are constantly changing because open source code repository is updated on a daily basis. Compared to a static context faced by a traditional inspection team, OSS peer review participants need to maintain awareness of product status, eliminating submissions of bugs that no longer exist in current build. (2) Review is conducted by thousands of different groups, each focusing on a single issue. This scales down the product being reviewed, decreasing work complexity and chances of errors. (3) Individual reviews are initiated by utilitarian incentives. Inspectors in traditional review examine products by reading line by line, using techniques like checklist. By contrast, OSS reviewers are bug reporters, who perform reviews whenever experiencing unexpected behavior of software applications. This motivates wide participation, but may also increase other participants difficulty in sense making of defects. (4) Rework is shared responsibility. Traditional software review confines discussions upon defect resolutions during review meetings, leaving the rework responsibility to the product author. The affected OSS module owners, equivalent to product authors, are not obliged to implement resolutions. This may facilitate knowledge development within communities, but may also increase 123

8 the risk of no rework done. (5) Review consensus is based on meritocracy. Moderators of traditional reviews usually make decisions involved in review meetings and followup. OSS peer review values the opinions of product authors and the reviewers who can offer resolutions. Activity characteristics urge reflection on Linus s Law. Compared to traditional software development in which review teams often consist of fewer than ten developers, OSS development does benefit from hundreds or thousands of reporters. A large number of defects or improvement requests are reported. Each single defect is analyzed by different composition of participants, integrating diverse expertise and facilitating knowledge sharing. However, many eyeballs may also increase the cost of peer review, particularly when they are from inexperienced users. For example, Mozilla had to spend great efforts on screening. Even with thousands of eyeballs, a number of reported bugs still remained in identification activities with limited attention in both Mozilla and Python. This indicates eyeballs cannot be enough without awareness support for appropriate experts. This study is our first step of investigating OSS peer review practices. We will include more data sources to refine our current findings, for instance, mailing lists, commit logs and PEPs. Those sources will also provide the opportunities to look into other types of peer review, such as design review. Our findings can also be improved by interviews, which may refine our identification of core contributors and their responsibilities. They will reveal the implicit protocols of peer review practices that are not discovered from our data, such as their strategy of task assignment. In the longer term, we will focus on a special kind of collaboration, creative collaboration, considering peer review can facilitate knowledge sharing and critical thinking. We hope our future work can generate more indications that inform the design of collaboration tools. ACKNOWLEDGEMENTS This work is supported by the US NSF ( ). We thank our partners, the Mozilla and Python organizations for sharing their practices. REFERENCES [1] E. S. Raymond, THE CATHEDRAL AND THE BAZAAR: MUSINGS ON LINUX AND OPEN SOURCE BY AN ACCIDENTAL REVOLUTIONARY, O'Reilly, [2] K. Crowston and B. Scozzi, "Coordination practices within FLOSS development teams: The bug fixing process," Computer Supported Acitivity Coordination, vol. 4, pp , [3] A. Ko and P. Chilana, "How power users help and hinder open bug reporting," CHI2010, Atlanta, GA, [4] J. Asundi and R. Jayant, "Patch Review Processes in Open Source Software Development Communities: A Comparative Case Study," HICSS' 07, Hawaii, USA, [5] P. Rigby, D. German, and M. Storey, "Open source software peer review practices: a case study of the apache server," ICSE'08, Leipzig, Germany, [6] M. Nurolahzade, S. Nasehi, S. Khandkar, and S. Rawal, "The role of patch review in software evolution: an analysis of the mozilla firefox," IWPSE-Evol' 09, Amsterdam, Netherlands, [7] J. P. Johnson, "Collaboration, peer review and open source software," Information Economics and Policy, vol. 18, pp , [8] F. Barcellini, F. Détienne, J. M. Burkhardt, and W. Sack, "A study of online discussions in an Open-Source Software Community," C&T 2005, Dortmund, Netherlands, [9] A. Mockus, R. T Fielding, and J. D Herbsleb, "Two Case Studies of Open Source Software Development: Apache and Mozilla," ACM Transactions on Software Engineering and Methodology, vol. 11, pp , [10] B. Nardi, CONTEXT AND CONSCIOUSNESS: ACTIVITY THEORY AND HUMAN-COMPUTER INTERACTION. The MIT Press, [11] J. Aranda and G. Venolia, "The secret life of bugs: Going past the errors and omissions in software repositories," ICSE 09, Vancouver, CA, [12] K. Wiegers, PEER REVIEWS IN SOFTWARE: A PRACTICAL GUIDE. Addison-Wesley, [13] IEEE Guide Standard for Software Reviews, IEEE Std , [14] M. Fagan, "Design and code inspections to reduce errors in program development," IBM Journal of Research and Development, vol. 15, pp , [15] T. Gilb, PRINCIPLES OF SOFTWARE ENGINEERING MANAGEMENT. Addison-Wesley, [16] A. Porter, H. Siy, C. Toman, and L. Votta, "An experiment to assess the cost-benefits of code inspections in large scale software development," ACM SIGSOFT Software Engineering Notes, vol. 20, pp , [17] N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann, "What makes a good bug report?," FSE 2008, Atlanta, GA, [18] S. Breu, R. Premraj, J. Sillito, and T. Zimmermann, "Information Needs in Bug Reports: Improving Cooperation Between Developers and Users," CSCW2010, Savannah, GA, [19] C. Francalanci and F. Merlo, "Empirical Analysis of the Bug Fixing Process in Open Source Projects," Open Source Development, Communities and Quality, pp , [20] Python Software Foundation, Issue Work Flow, July 25, Available: [21] Mozilla Foundation, Code Review FAQ, Aug 1, Available: 124