NAIST-IS-MT1451049 Master s Thesis git-sprite: Supporting Tool for Pull-Based Software Development Model Yusuke Saito February 4, 2016 Department of Information Science Graduate School of Information Science Nara Institute of Science and Technology
AMaster sthesis submitted to Graduate School of Information Science, Nara Institute of Science and Technology in partial fulfillment of the requirements for the degree of MASTER of ENGINEERING Yusuke Saito Thesis Committee: Professor Hajimu Iida Professor Kenichi Matsumoto Associate Professor Kohei Ichikawa Associate Professor Hiroshi Igaki Associate Professor Norihiro Yoshida Assistant Professor Yasuhiro Watashiba (Supervisor) (Co-supervisor) (Co-supervisor) (Osaka Institute of Technology) (Nagoya University) (Co-supervisor)
git-sprite: Supporting Tool for Pull-Based Software Development Model Yusuke Saito Abstract Modern OSS projects have adopted distributed version control system (DVCS), especially Git, to manage the versions of their source code and GitHub for hosting their Git repositories. GitHub provides a characteristic feature notably pull request, and many projects adopt pull-based development model by using it. This development model offers an opportunity to review the source code before merging it into the mainstream. In order to, leading a project to success, every developer should follow the flow strictly of pull-based development. To follow the flow, developers need to operate a DVCS correctly. However, operating DVCS is complicated and requires further knowledge of it. This thesis presents a tool, git-sprite, whichsupportsbeginnerdevelopersin pull-based development using Git and GitHub. First to determine what affects the developer to feel difficult in pull-based development, large-scale survey of 1,552 developers on GitHub was conducted. Our tool was developed based on the findings. The proposed tool supports the developer to follow the flow easily by simplifying the operation of Git. To evaluated git-sprite, thecomparative experiment was performed between Command Line Interface, one of the famous GUI tool SourceTree, and git-sprite. Theresultshowsgit-sprite helps beginner developers by efficiency and accuracy in pull-based development. Master s Thesis, Department of Information Science, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-MT1451049, February 4, 2016.
Keywords: Software Engineering, Development Support, Distributed Version Control System, Pull Request, Pull-Based Development ii
Pull-Based OSS (Git) (GitHub) GitHub pull-based pull-based pull-based git-sprite GitHub pull-based 1,552 git-sprite git-sprite GUI SourceTree git-sprite pull-based,,,, Pull-based, NAIST-IS-MT1451049, 2016 2 4. iii
Acknowledgements I wish to thank the following people for their guidance and support. Without these people, this work would not have been successful. First and foremost, I would like to show my greatest appreciation to Professor Hajimu Iida for supporting my research and giving me constructive comments. Without his guidance and support, I would not have accomplished the master course. Second, I would also like to thank Associate Professor Kohei Ichikawa and for giving me insightful comments and suggestions. His suggestions encourage me to work better for my research. I received generous support from Associate Professor Hiroshi Igaki. All the discussion and advice given improved the quality of my research. Also, preparation and support for the evaluation helped me a lot. I would like to offer my special thanks to Associate Professor Norihiro Yoshida. He has given me much suggestion to lead the research to be better. To Assistant Professor Yasuhiro Watashiba, I would like to show my greatest appreciation. Comments and suggestion from him encourage my work. Iwouldliketothankallrespondentsforansweringthesurvey. Theiranswer gave me further knowledge of what real developers feel and think. Also, I want to thank the member for Software Development and Analysis Lab in Osaka Institute of Technology who were the participants of the evaluation. They have participated the experiment for about four hours and solved problems which they have never experienced before. The evaluation would not have succeeded without their participation. iv
I would like to thank all members of the Software Design and Analysis Laboratory. Every day in NAIST was delightful and encourage me to do better. Especially I would like to express my gratitude to Kenji Fujiwara for supporting and advising on my work. Without him, my research would not have been possible. Finally and the most important, I want to thank you to my family. Without them, I would not be who I am now. They have supported me for everything and gave me the opportunity to study and research in NAIST. I would like to thank both, my parents and sister. v
Contents Abstract Acknowledgements i iv 1 Introduction 1 2 Background and Related Work 4 2.1. Version Control System........................ 4 2.2. Pull-based Development....................... 5 3 Investigation on Developers in Pull-based Development 9 3.1. Research Questions.......................... 9 3.1.1 RQ1: How do developers feel about pull-based development? 10 3.1.2 RQ2: How unique practices are used in pull-based development?... 10 3.1.3 RQ3: How is the command usage different for Git users who use unique practice?... 11 3.2. Survey Target............................. 11 3.2.1 Pull-based Development Project and Its Developers.... 11 3.2.2 Usage of Unique Practices.................. 12 3.2.3 Respondents.......................... 15 3.3. Results................................. 15 3.3.1 RQ1: How do developers feel about pull-based development? 16 vi
3.3.2 RQ2: How unique practices are used in pull-based development?... 20 3.3.3 RQ3: How is the command usage different for Git users who use unique practice?... 22 3.4. Chapter Summary.......................... 25 4 Supporting Tool for Pull-based Development 26 4.1. Requirements of Supporting Tool for Pull-based Development.. 26 4.2. git-sprite... 28 4.2.1 Overview........................... 28 4.2.2 Provided Features....................... 29 5 Evaluation 32 5.1. Overview................................ 32 5.2. Participants.............................. 34 5.3. Results................................. 36 6 Discussion 40 7 Conclusion 42 List of Publications 44 References 45 Appendix 48 A. Survey Used for the Investigation.................. 48 B. Detail of Evaluation......................... 52 B.1 Contents of the Application................. 52 B.2 Problems and its Instruction................. 53 vii
List of Figures 2.1 An example of GitHub pull request................. 7 2.2 A process of the pull-based development.............. 8 3.1 Respondents profiles......................... 15 3.2 Developers feeling with Git...................... 18 3.3 Developers feeling with GitHub and pull request.......... 19 3.4 Number of developers who use unique practices.......... 21 3.5 Specific Git commands used by developers according to the use of unique practice............................ 23 3.6 Specific Git commands which feel difficult by developers according to the use of unique practice..................... 24 4.1 Views of git-sprite........................... 28 7.1 View of Doubutsu Shogi Board................... 52 viii
List of Tables 3.1 Phrases to identify the pull request using unique practices.... 13 3.2 Percentage of unique practices usage................ 14 4.1 Comparison of unique practice operation between git-sprite and general Git commands........................ 31 5.1 Participants profiles.......................... 35 5.2 Questions and Tools......................... 36 5.3 Average results for each tool..................... 38 5.4 Results of CLI............................. 38 5.5 Results of SourceTree......................... 38 5.6 Results of git-sprite.......................... 39 ix
x
Chapter 1 Introduction As distributed version control systems (DVCS) have been widespread [1], more projects, both proprietary and especially open source, adopt DVCS for managing the source code history. Emerging of the source code hosting services such as GitHub 1 and Bitbucket 2 has impacted the DVCS to be widely used. Up to January 2016, more than 31 million projects have been created in GitHub 3, and more than 3 million developers have been registered with Bitbucket. These hosting services provide features that support software development in various ways such as pull request and issues. In particular, the pull request, one of the features to create contributions [2, 3], supports pull-based software development model [4, 5]. Pull-based development has been adopted for many projects. In pull-based development, developers always create the branch and pull request to create contributions instead of pushing changes directly to a main stream [4]. By using pull-based development, the opportunity for source code review are ensured, and discussion between contributors and reviewers became easy. Gousios et al. [4] examined all GitHub projects and found that 15% of the projects adopt pull-based development in 2013. Also, the usage of pull-based development is increasing enormously during this few years. However in pull-based development, 1 https://github.com 2 https://bitbucket.org 3 https://github.com/features 1
developers need to understand the flow of development and correctly use the DVCS commands, which is stated as complicated [6]. This complexity obstructs the beginner developers to join the pull-based development project. Therefore, this study proposes a supporting tool for beginner developers to relieve the difficulty in pull-based development. To support developers in pull-based development, investigation on the actual developer for understanding the habit and the difficulties is needed, which is not performed yet. Thus as a preliminary study, investigation on how developers feel and face difficulties in pull-based development was performed. A large-scale survey of 1,552 developers on GitHub was conducted for the investigation. The survey question was created based on prior research and our experience of pull-based development. For the understanding of the habit in pull-based development, the questions in this study are trying to answer as follow. RQ1: How do developers feel about pull-based development? RQ2: How unique practices are used in pull-based development? RQ3: How is the command usage different for Git users who use unique practice? This study proposes git-sprite, supportingtoolforpull-baseddevelopment. Requirements of the supporting tool were defined from the findings of the investigation. Our tool provides a feature which helps developers to develop easily in pull-based development. Also, proposed tool supports uncommon DVCS operation which is used for unique practices to create an organized history. The evaluation of our tool was conducted by comparative experiment between command line interface (CLI), SourceTree, and git-sprite. Theevaluationwasconductedby 6participantswithlittleexperienceindevelopmentusingDVCS.Toconfirmthe efficiency of git-sprite, theevaluationfocusesontheproductivityandaccuracyto develop and the usability of the tool. The evaluation result showed that git-sprite helped developers in both efficiency and accuracy in pull-based development. 2
The remaining of the thesis is structured as follows. Chapter 2 explains version control system and pull-based development. Chapter 3 describes the investigation on how GitHub developers feel in the pull-based development and its result. Chapter 4 explains the proposed tool and its provided features to support beginners. Chapter 5 shows the evaluation and its result. Chapter 6 discuss the effectiveness of the proposed tool. Finally, Chapter 7 concludes this thesis. 3
Chapter 2 Background and Related Work 2.1. Version Control System Version Control System (VCS) is a system to help developers to maintain the source code history and collaborate with the other developers on the same project. There are two varieties of VCS, centralized VCS (CVCS) and distributed VCS (DVCS). CVS and Subversion for CVCS and Git and Mercurial for DVCS are the typical options. The major difference between these two is the way of maintaining the development history. The CVCS maintain the history on the server whereas the DVCS maintain on each local machine. Since the DVCS concept was determined later than the CVCS, DVCS removed the limitation of CVCS by enabling lightweight branching and local VCS operations [7]. Due to this historical context, software development has transitioned from CVCS to DVCS in this ten years [1]. DVCS are adopted to many projects and used by many developers. However, previous research indicates the difficulties of using DVCS. Muslu et al. [1] investigated the reasons, barriers, and outcomes of transitioning from CVCS to DVCS by performing survey and interview to developers in the large company. From their findings, steep learning curve of DVCS was the one of the barriers for transitioning. Rosso et al. [6] also claim the difficulties of DVCS. They analyzed Git, one of the DVCSs, by its conceptual design. Analysis results show that Git 4
commands and options contain inconsistent, and some features such as stash, working directory, and index make Git as difficult for developers. Source code hosting service notably GitHub and Bitbucket has impacted the DVCS to use widely. This service provides the server for developers who made the developer easily to use DVCS for maintaining their projects. Also, these services can make the software project to be open in public. Public projects are downloaded and developed by developers in the world. This feature influenced many developers to be in the community of open source software development. For the efficient team development on the source code hosting service, several software development model 1 was proposed. One of the software development models is pull-based development. Section 2.2 explains in more detail about the pull-based development. 2.2. Pull-based Development Pull-based software development is one of the modern methodologies for software development using DVCS and source code hosting service [4, 5]. To support pull-based development, pull requests, a features provided by source code hosting service, are used. An example of pull request is shown in Figure 2.1. Up to 2013, about 15% of the repositories in GitHub are using pull-based development and usage is increasing in absolute numbers [4]. A process of the pull-based development using pull request is described in Figure 2.2. Firstly, a contributor to the project creates a branch in the local repository for the certain purpose of implementing a new feature or fixing a bug. This created branch are called as a topic branch. Secondly, after the contributor makes changes and commits it to the topic branch, they create a pull request to be merged with a branch in the remote repository. In pull-based development, destination branch for merging is called as the release branch. Contributors can create pull requests not only when the implementation is finished but also during the implementation. Pull requests 1 https://www.atlassian.com/git/tutorials/comparing-workflows/#!pull-request 5
created during the implementation is called work in progress (WIP) pull request 2 and is used to encourage a conversation with core team member and other contributors about the implementation. These WIP pull requests contains WIP tag ( [WIP] in general) in front of the pull request title. Then, a member of the core team will review the source code and comment if there are any problems or defects in the commits contained in the pull request [8]. After finishing reviewing the pull request, contributors revise the source code along with the review comment and push again to the same topic branch. This code review and the revising process will be repeated until the pull request has accepted or rejected. Finally, when the pull request has accepted, the topic branch will be merged into the release branch by core team member. By using pull-based development, an opportunity for code reviewing by core team member and other contributors are guaranteed before the commits are merged. In recent work, several researchers investigated the characteristic of the pull request on GitHub and its acceptance factor by the quantitative study [4, 9 11]. Also, Gousios et al. [12] conducted a more in-depth qualitative study by survey and interview to integrators in GitHub. They indicate accepted pull requests contain changes for the hot area in the project, consist of high-quality code and follow the project policy and coding style. Their studies suggest the necessity of supporting developers on integrating with a pull request. However, supporting tools for developing in pull-based development are yet to be developed. 2 https://github.com/blog/1943-how-to-write-the-perfect-pull-request 6
Figure 2.1: An example of GitHub pull request 7
Figure 2.2: A process of the pull-based development 8
Chapter 3 Investigation on Developers in Pull-based Development As pull-based development popularity has been increased, supporting the developers for pull-based development is in demand. To support those developers, it is important to know the characteristic and difficulties of using DVCS and its source code hosting services. However, investigation on developers in pull-based development have not been performed yet. In this chapter, a large-scale survey of 1,552 developers was conducted. This survey has set the target to the pull-based developed project using Git (DVCS) and GitHub (source code hosting service), which are the often used combination for pull-based development. 3.1. Research Questions The main focus of this investigation is to clear the habit of developers and difficulties of development in pull-based development. The questions in this study are trying to answer as follow. RQ1: How do developers feel about pull-based development? RQ2: How unique practices are used in pull-based development? 9
RQ3: How is the command usage different for Git users who use unique practice? This section explains the detail of each question. 3.1.1 RQ1: How do developers feel about pull-based development? During the pull-based development, developers use the Git and GitHub. To clear the difficulties in pull-based development, it is necessary to know which tool, Git or GitHub, makes it difficult as a practical matter. To make the analysis easier, following sub-questions were refined from the RQ1. RQ1.1: How do developers feel about using Git? RQ1.2: How do developers feel about using GitHub and pull request? 3.1.2 RQ2: How unique practices are used in pull-based development? In the pull-based development model, some unique practices are performed. This section briefly describes the unique practices and in more detail will be defined in Section 3.2.2. One of the practice is commit refactoring by using git rebase -i command [13, 14]. Commit refactoring, such as reordering, compressing and dividing commits, are performed to make neat commits contained in the pull request. Refactored commits increase the readability and acceptability of the pull request [12] when the integrator of the project review the source code before merging the pull request. Another practice is Work-in-Progress (WIP) pull request, a pull request created in the middle of implementation. WIP pull request is used to get a review on current source code and have a conversation with core team member on how to implement or solve the assigned issue. These practices are not indispensable, but it affects the pull request acceptance. This question clears whether developers use these unique practices or not. 10
3.1.3 RQ3: How is the command usage different for Git users who use unique practice? While performing unique practices, combining multiple uncommon Git commands is required. Because of the inconsistency of Git commands and options [6], using uncommon commands is troublesome for developers with little experience in pull-based development. For the further understanding of those uncommon commands, it is beneficial to clear which commands are used and struggles for developers. The last question finds the usage of Git commands for performing unique practices. 3.2. Survey Target This study conducted a large-scale survey to investigate a large number of developers from a wide range of beginner to an expert of pull-based development. This section describes the data collection process and the characteristic of pull requests created in each project and survey respondents. 3.2.1 Pull-based Development Project and Its Developers To select the pull-based development projects and its developers, GHTorrent dataset was used. Gousios et al. provide all event data of public projects as a GHTorrent dataset. Since most of the repositories on GitHub are inactive, and not all repositories use the pull request [4], it is necessary to select effective sample [8]. In order to ensure our selected repositories are active and adopt pullbased development, we have selected the 333 repositories that have received at least 5 pull request for each month in the year 2014. From the 333 repositories, 13,152 developers were extracted under the following conditions. She/he created, at least, one pull request at the selected repositories in 2014. 11
Her/his email address is registered with GitHub 1. 3.2.2 Usage of Unique Practices For more understandings of unique practices occurred in pull-based development, investigating and classifying the usage of unique practices are performed. To investigate, several random projects pull requests in Section 3.2.1 were manually checked. After manually checked, several phrased, which to identify if the pull request performed unique practices, were extracted. Extracted phrases are shown in Table 3.1. Then counted the pull requests by searching the phrase for top 10 projects which created most pull request in 2014. Results are shown in Table 3.2. This study classified the unique practices to four practices. These results referred to create the survey questions. Classified practices are as follow. WIP: Work-in-Progress pull request. Create a pull request with incomplete implementation to get an early review of source code. Update Branch: Updating the topic branch to take in the latest commit of release branch. This practice gets necessary when the latest commit influence the source code on a topic branch. After updating the branch, further implementation and testing are needed. Compress Commit: Compressing multiple commits into one. After fixing the source code as the review comments, commits with few or no information get created (e.g. only modified one line or totally revised from the past implementation). These commits should be compressed into one commit so the reviewers could understand easier. Split Branch: Split the branch and create a different pull request. If pull request contains commits with multiple implementations which are 1 Invalid address such as Twitter account or with the wrong format was excluded 12
not relevant to each other, these commits should be divided to a different branch and create another pull request. By splitting the pull request, not only the reviewing get simplified but also eliminating the commits, when the commits have defects or no longer needed, get easier. Table 3.1: Phrases to identify the pull request using unique practices Unique Practices Phrase WIP Update Branch Compress Commit Split Branch WIP rebase squash split 13
Table 3.2: Percentage of unique practices usage Repository WIP % Update Branch % Compress Commit % Split Branch % mozilla-b2g/gaia 0.80 7.99 0.97 1.11 mozilla/rust 0.58 11.25 3.92 2.70 phinze/homebrew-cask 0.31 0.88 4.99 0.20 Homebrew/homebrew 0.20 4.01 4.47 0.97 dotcloud/docker 0.78 15.38 4.55 2.79 CleverRaven/Cataclysm-DDA 3.34 1.02 0.19 1.71 saltstack/salt 0.15 2.32 0.21 0.91 CocoaPods/Specs 0.02 0.19 0.07 0.10 cms-sw/cmssw 0.03 7.40 0.25 1.20 elasticsearch/elasticsearch 0.62 5.16 2.72 1.97 14
3.2.3 Respondents To all developers extracted in the Section 3.2.1, email for the survey request was sent. Figure 3.1 shows an overview of the 1,552 respondents (11.8% answer rate). Most of the respondents work for industries (74%) and have more than six years of development experience (82%). For VCS experience, the percentage was mostly same among 1-5, 6-10 and more than ten years. On the other hand, the majority of experience in Git and GitHub was 3-5 years (58% and 61%). Figure 3.1: Respondents profiles 3.3. Results In this section, survey results per respondents question are presented. Specific survey questions are shown in the appendix. For open-ended questions, answers are presented in slanted. 15
3.3.1 RQ1: How do developers feel about pull-based development? To understand how developers feel about pull-based development, developers were asked how they think of using Git, GitHub, and pull request. RQ1 was refined into two subquestions, how developers feels with Git and GitHub. The overall results for RQ1.1 and RQ1.2 are shown in Figure 3.2 and Figure 3.3. RQ1.1: How do developers feel about using Git? Developers were asked with Likert scale question if they feel Git as difficult. The results show that about 20% of developers feel Git as difficult regardless of their development experience. For those who felt Git as difficult, investigating what factor influenced the difficult of Git was performed. To find the influence factor of difficulties, a multiple choice question based on the findings by Rosso et al. [6] was asked. The respondents also had the opportunity to report the difficulties with the open-ended question. From the Figure 3.2, most of the developer feels inconsistency of Git commands and options as troublesome. Also, a developer with little experience feel difficulties undo/redo Git commands more than the developers with high experience. Other than the provided choice, Merging and resolving the conflict and Steep learning curve wasalsoreported. From these findings, the amount of the developer who feels Git as difficult does not relate to their development experience, but the influence factor of difficulties changes with their experience. RQ1.2: How do developers feel about using GitHub and pull request? The same question as RQ1.1 was asked to developers but this time about GitHub and pull request. Unlike from the result from RQ1.1, almost none of the developers felt GitHub and pull request as troublesome. From the open-ended question, Not all Git function are available in GitHub and Want to create a pull request from command was reported for the GitHub and Rebasing and resolving the conflict for the pull request. These results show GitHub and pull request are 16
user-friendly enough to developers. For more improvement to its usability, supporting GitHub on command line interface or unifying Git and GitHub could be conceivable. 17
Figure 3.2: Developers feeling with Git 18
Figure 3.3: Developers feeling with GitHub and pull request 19
3.3.2 RQ2: How unique practices are used in pull-based development? The second research question is to examine how unique practices are used during pull-based development. Figure 3.4 shows that the results of differences in how much developers use WIP pull request and perform commit refactoring between who feel Git as difficult and who does not. The results show that both practices are used by 50% of the developers. Both have a similar result that who feel Git difficult and development experience with 1-5 years tend not to use these unique practices. For another year of experience, a developer who feels Git difficult performs commit refactoring as who feel not, but the number of using WIP pull request is slightly lower than the other. These results suggest that encouraging the beginner to use WIP pull request and commit refactoring by supporting these unique practices. To support them, it is necessary to understand the usage and difficulty using Git on unique practices. For further investigation on what Git commands developers use and what affects the difficulties on unique practices are carried out in the next research question. 20
Figure 3.4: Number of developers who use unique practices 21
3.3.3 RQ3: How is the command usage different for Git users who use unique practice? To clear what Git commands are used and struggles developers for unique practices in pull-based development, multiple choice question and open-ended question were asked to respondents. Choices for the multiple choice question was selected from commands which should be used in those unique practices. Frequently used Git commands and commands which struggle developers are shown in Figure 3.5 and Figure 3.6. As expected, developers who used unique practices tend to use selected commands more than who does not. Especially git rebase -i, the command to rewrite code history, had differences between them. Figure 3.6 shows that the amount of developers who feel each command as troublesome. Except the git add -p command, developers who do not use unique practices feel difficulties and decrease as development experience increases. Supporting these commands to help perform unique practices is essential to lead developer work efficiently in pull-based development. 22
Figure 3.5: Specific Git commands used by developers according to the use of unique practice 23
Figure 3.6: Specific Git commands which feel difficult by developers according to the use of unique practice 24
3.4. Chapter Summary In this chapter, the study examined how developers feel about pull-based development by conducting a large-scale survey of 1,552 developers on GitHub. Our main findings along with research questions are as follows: RQ1: How do developers feel about pull-based development? Developers feels GitHub and pull request more user-friendly than Git. From further examination, the inconsistency of Git commands and options and undo/redo Git commands are the influence factors of difficulties to developers. RQ2: How unique practices are used in pull-based development? Unique practices, such as WIP pull request and commit refactoring, were used by the half of respondents. A beginner who feels Git as difficult tends not to use both WIP pull request and commit refactoring. Support those beginners to use these practices is needed. RQ3: How is the command usage different for Git users who use unique practice? Commands selected for multiple choice question especially git rebase -i, were used by the developers who use unique practices. Most of the commands, except git add -p, feel troublesome for developers who does not use unique practices. By supporting these commands, it may lead more developers to perform unique practices. From the findings, the requirements of the supporting tool for pull-based development can be defined. 25
Chapter 4 Supporting Tool for Pull-based Development Since operating Git in pull-based development is difficult for beginner developers, this thesis propose a tool for pull-based development which mainly focus on supporting beginner developers. This chapter explains the requirements of supporting tool and show the proposed tool, git-sprite. 4.1. Requirements of Supporting Tool for Pullbased Development From the findings in Chapter 3, requirements of a supporting tool for pull-based development was defined. Defined requirements are as follow. [Requirement 1.] Unifying the Git and GitHub: During pull-based development, developers check pull request comments at the hosting service to see if any review comments have been added. Also, when switching branch to develop a different task, developers checks pull request comments to see if there is anything to modify the source code. This behavior of checking pull requests comments interrupts the development 26
itself. By integrating both Git and GitHub, creating and checking pull request comments can be done in a single tool. [Requirement 2.] Simplify Git operations to follow easily the flow of pull-based development: Since Git has many commands and options along with its inconsistency, beginner developers to operate Git correctly is troublesome. Simplify each Git commands helps developers to follow easily the flow of pull-based development. For example, general Git commands allow creating the branch from any commit, even from another branch. However, the topic branch must be made from the release branch in pull-based development. Simplifying the branch creation to be always created from release branch, developers can follow the flow without worrying about operating Git correctly. [Requirement 3.] Support unique practices: Using unique practices are essential in pull-based development. However, performing unique practices consists of developers to use uncommon Git commands. Not only that these uncommon Git commands are difficult but combining these Git commands are demanded to developers. Supporting these unique practices helps beginners to perform unique practices without any trouble. [Requirement 4.] Easy undo/redo Git operations: One of the difficulties faced by developers using Git is undo/redo the Git operation. The developer checks the history of commit which Git keeps record and rewind back to its point. This history which Git records is difficult for a beginner to understand. Moreover, combining multiple commands like unique practices make much more complicated of history to be understandable. It is necessary to support this undo/redo Git operations for beginner developers. 27
Figure 4.1: Views of git-sprite 4.2. git-sprite git-sprite is a supporting tool for developing in pull-based development using Git and GitHub. git-sprite was developed to satisfy the requirements listed in Section 4.1. In this section, we explain the overview and the provided feature of git-sprite. 4.2.1 Overview Figure 4.1 shows a screen capture of git-sprite. Itconsistsoffivecomponents: (a) branch view, (b)commit view, (c)index view, (d)diff view, (e)pull request view. The branch view and the pull request view are in the same position and can be switched by clicking the button on top of the two view. Branch view (a) is a view for managing a branch in both local and remote repository. In this view, all merged branches are hidden, and shown branches are unmerged, and it indicates as work in progress branch. A developer can easily change the branch by clicking the buttons with the corresponding branch name. Also, creating and deleting a branch can be done by clicking the buttons at the upper side of the branch view. This tool creates the branch from the release 28
branch while Git always needs to specify which branch to be created from. When a branch has changed, commit view (b), index view (c), and pull request view (e) will show the current commits, changed files, and pull request comments. In commit view, each commit of the topic and release branches is shown as a circle. By clicking a circle, information of differences corresponding to commits will show up in diff view (d). In the index view, changed files are shown as labels in both indexes and working directory. Index and working directory are the features of Git to manage the changed files. Same as the commit, these labels can be clicked to show the information of differences and dragged to move the index and working directory. Changed files listed in the index will be included to the new commit by clicking the commit button shown at the bottom of the index view. When changes are committed, this tool inquiries developer to push this commit instantly. The last component is the pull request view (e) which shows the comments in pull request corresponding to the current branch. The developer can see the comments without searching pull request and comments from GitHub. If a pull request corresponding to the current branch does not exist on GitHub, pull request view suggests the developer to create a pull request. Pull request can be established from pull request view by editing the title and comment. 4.2.2 Provided Features git-sprite provides features for supporting beginner developers which satisfy the requirements listed in Section 4.1. One of the features is supporting unique practices. In git-sprite, threeuniquepracticesstatedinsection3.2.2aresupported. Comparison between the git-sprite and general Git commands of operating in unique practices are shown in Table 4.1. First is the WIP pull request. WIP pull request can be created by selecting the checkbox of WIP when creating a pull request. By selecting the checkbox, WIP tag will be automatically added to prefix of the pull request title. After the branch is updated and pushed again, a developer can choose to remove the WIP tag. Second is the Update Branch. If 29
release branch has the latest commit which is not in a topic branch, right click the latest commit and select Rebase master will update the topic branch. Third is the Compress Commit. Compressing of commits is done by dragging the circle to the top of the other circle. Also, reordering commits can be performed by dragging the circle to the between of the other two commits. Last is the Split Branch. Splitting the branch can be performed by right-clicking the commit which you want to move the commit to other branch and select Move commit to another branch. After clicking the menu, a list of branches will appear to select the branch will move the commit. Another feature is simplified undo/redo Git commands. git-sprite provides undo/redo for every operation done in this tool. Compare to general Git commands, a developer can rewind back to the original state at any time. Since unique practices are implemented as a single operation in git-sprite, adevelopercaneasilyrewindbacktotheoriginalstatewherethe unique practices are performed. This easy undo/redo feature will encourage the developer to use unique practices. 30
Table 4.1: Comparison of unique practice operation between git-sprite and general Git commands Unique Practices git-sprite General Git 1 WIP 1. Select the WIP checkbox when creating the pull request from pull request view. Update Branch 1. Right click the latest commit in the release branch. 1. Create a pull request with [WIP] tag contains in the title. 1. git checkout <release branch> 2. Select Rebase master 2. git pull 3. git checkout <topic branch> 4. git rebase Compress Commit 1. Drag the commit to the other commit 1. git rebase -i <release branch> which you want to compress. 2. Change status of commit to compress commits on interactive mode. Split Branch 1. Create branch. 1. git branch <new topic branch> <release branch> 2. Right click the commit which you want to 2. git checkout <topic branch B> split. 3. Select Move commit to another branch 3. git cherry-pick <commita hash> 4. Select the branch created on 1st step. 4. git checkout <topic branch A> 5. git rebase -i <release branch> 6. Remove commita on interactive mode. 1 <> should be replaced with branch name or commit hash. 31
Chapter 5 Evaluation To evaluate the efficiency of git-sprite,comparativeexperiment between command line interface (CLI), SourceTree [15], and git-sprite was performed. This chapter explains the overview of experiment and its result. 5.1. Overview The experiment was performed by making participants develop a small application using pull-based development in CLI, SourceTree, and git-sprite. SourceTree was selected because it is one of the famous GUI tool clients for Git. Since the main focus of this experiment is to evaluate the tool efficiency for supporting pullbased development, participants were given three problems with brief instruction and source code of how to implement is written. By giving this instruction and the source code, participants can concentrate only on the development using Git. Participants follow the instruction and solve the problem with the designated tool. Each problem contains two different unique practices. Brief process and required unique practices for each problem are as follow. Also, specific instruction for each problem and the contents of the application are shown in the appendix. Aprocesswhichwehavetodoispresentedinslanted. Step a. Create topic branch. 32
Step b. Modify specified source code and make the commit. Step c. Create pull request of the topic branch and wait for the review to be finished. Step d. Add a comment to pull request for an additional instruction which uses unique practices. (Designated unique practices for each problem are as follow) Problem 1. Update Branch & Compress Commit Problem 2. Update Branch & Split Branch Problem 3. Compress Commit & Split Branch Step e. Do the instruction written in the comment. Step f. If the pull request satisfies the additional instruction, merge the pull request and inform the participants to move on to the next problems. If not, add a comment for participants to recorrection. To be impartial, participants received a lecture on what pull-based development is and how to operate each tool. Also, more detailed instruction of each tool was to hand out to the participants which can be seen during the experiment. If participants have a question which is not written in the instruction and has aproblem,participantswereallowedtoaskaquestionduringtheexperiment. Before and after the experiment, the survey was performed to learn the participants characteristic and impression of each tool. The post survey was designed based on Software Usability Scale (SUS) [16] to evaluate the usability of each tool. SUS is a way to evaluate the system usability which consists of 10 Likert scale questions. The overall process of the experiment is as follow. Step 1. Receive lecture Step 2. Pre survey Step 3. Solve problem 1 33
Step 4. Solve problem 2 Step 5. Solve problem 3 Step 6. Post survey 5.2. Participants Participants profiles shows in Table 5.1. All participants knew the basic of Git and experienced operating Git with CLI or GitHub Desktop. For participants B and C, they have an experience in operating Git with CLI and have a further knowledge of Git from other participants. Table 5.2 shows which participants use the designated tool in each problem. 34
Participant Programming Experience (YR) Table 5.1: Participants profiles Git Experience (YR) GitHub Experience (YR) Using Tool A 3-5 0-1 0-1 GitHub Desktop B 6-10 0-1 0-1 CLI C 1-2 0-1 0-1 CLI, GitHub Desktop, and SourceTree D 1-2 0-1 0-1 GitHub Desktop E 1-2 0-1 0-1 GitHub Desktop F 0-1 0-1 0-1 GitHub Desktop 35
5.3. Results Table 5.2: Questions and Tools Participant Problem 1 Problem 2 Problem 3 A&B CLI SourceTree git-sprite C&D git-sprite CLI SourceTree D&E SourceTree git-sprite CLI To evaluate the tool, this study focuses on three points. The first one is the time to solve the problem. The second one is the accuracy to solve the problem. The last one is the usability of the tool. The result of the experiment is shown in Table 5.4, 5.5, and 5.6. Average of the each tool is also shown in Table 5.3. First to discuss is the time to solve the problem. From the result, using git-sprite solved the problem twice as fast as the other two tools. Moreover, all participants solved as fast as the time using CLI and SouceTree for participants B and C, who have more experience in operating Git. This indicates that git-sprite can help beginner developers to develop fast enough even this was the first time to use. Next is the accuracy to solve the problem. This experiment checks the solution of the problem submitted by each participant is correct or not before they move to next problem. If the solution was not correct, comments which inform participants to recorrect their solution was added. Number of Recorrection in each table column is the number of this recorrection happened in step f. This number of recorrection shows that most participants using git-sprite solved the problem without any recorrection. In fact, few participants had recorrected using git-sprite, butthetimewasstillfastenoughcomparedtotheothertool. Fromthis results, git-sprite helped participants to develop in the pull-based development and use unique practices. The last one is the usability of the tool. During the experiment, all the question asked the participants were counted and took a memo of its question. Comparing the number of questions, CLI had the most and git-sprite had the 36
least. Some questions show that participants had fewer difficulties in using gitsprite with only looking at the instruction. Also, from the result of post-survey, designed based on SUS score, git-sprite scored 63.3 which is highest of all tools. However, this 63.3 is not a beneficial value since the average of SUS score is stated to be 68 [17]. These results indicate that git-sprite is easy to operate Git in pull-based development for beginner developers but needs more improvements in usability. From this result, using git-sprite sufficiently supports beginner developers in pull-based development. However, git-sprite need more improvements in usability which were indicated by the SUS score. 37
Table 5.3: Average results for each tool Tool Time (h) # of Recorrection # of Questions SUS CLI 1:17 1.5 6.7 35.8 SourceTree 0:56 1 5.2 53.8 git-sprite 0:31 0.5 3.2 63.3 Participant Table 5.4: Results of CLI Problem Time #of # (h) Recorrection #of Questions SUS A 1 0:45 0 1 35 B 1 1:04 1 6 47.5 C 2 0:47 2 4 42.5 D 2 1:52 2 11 35 E 3 1:50 2 10 35 F 3 1:25 2 8 20 Average 1:17 1.5 6.7 35.8 Participant Table 5.5: Results of SourceTree Time #of (h) Recorrection Problem # #of Questions SUS A 2 1:23 4 4 50 B 2 0:36 0 1 50 C 3 0:31 0 4 67.5 D 3 0:53 0 6 55 E 1 1:08 1 8 57.5 F 1 1:05 1 8 42.5 Average 0:56 1 5.2 53.8 38
Participant Table 5.6: Results of git-sprite Problem Time #of # (h) Recorrection #of Questions SUS A 3 0:21 0 1 70 B 3 0:25 0 1 70 C 1 0:20 0 4 67.5 D 1 0:40 1 6 60 E 2 0:34 0 4 57.5 F 2 0:48 2 3 55 Average 0:31 0.5 3.2 63.3 39
Chapter 6 Discussion From the evaluation in Section 5, git-sprite proved to be the one of the efficient supporting tools for beginners to develop in pull-based development. The result shows time, accuracy, and usability were higher than the CLI and SourceTree. Each problem in the evaluation was designed to solve unique practices which developers feel difficulty. Specific unique practices were investigated and defined in Section 3. Simplified features provided in git-sprite helped beginners to operate Git easily and perform unique practices. These provided features lead to the result. The reason of why git-sprite developed fast is because unique practices can be performed in single operation compared to other tools. In fact, participants struggled to combine multiple Git operation to solve the problem. This result and the findings in Section 3 shows proposed tool encourage beginner developers to use unique practices easily. In this experiment, evaluation was performed by comparing the result of each participant. All participants were all beginners and had no experience of development using both CLI and SourceTree. Also, participants were grouped into three to change the designated tool for solving each problem. Since conditions and experience of pull-based development are fair for all participants, evaluating the efficiency of git-sprite were impartiality performed. However, the evaluation on developer with experience of pull-based develop- 40
ment and with experience of development using CLI and SourceTree has not been performed. The experience of using the tool and pull-based development may have changed the result of this evaluation. For further evaluation of git-sprite, preparing evaluation on developers with a wide range of experience, beginner to expert, and with experience in other tools is needed. 41
Chapter 7 Conclusion This thesis proposed a tool, git-sprite, whichsupportsbeginnerdevelopersinpullbased development. To determine the requirements of a supporting tool for pullbased development, large-scale survey of GitHub developers was conducted. From the result of the investigation, developers feel Git difficult of the inconsistency of Git commands and undo/redo Git commands. Also, unique practices need further support for beginner developers. Requirements ofgit-sprite were considered based on these findings. Our tool, git-sprite, helpsdevelopersdevelopinpull-based development by supporting unique practices of the pull-based development, and easy undo/redo. For evaluating git-sprite, comparativeexperimentbetweencommandlinein- terface (CLI), SourceTree, and git-sprite was performed. Evaluation result shows participants using our tool solved fast and accurate compared to other tools. Also, the usability of the tool was evaluated with SUS and scored 63.3 which were the highest of the all. From these results, git-sprite efficiently helps developers in pull-based development. However, the participants were all beginners with few experience in other tools, such as CLI and SourceTree. For further evaluation, it is necessary to prepare participants with a wide range of experience in software development and with experience in other tools. During the evaluation, some improvements in usability of git-sprite were pointed out from the participants. Modifying git-sprite for improvement in usability is 42
needed. Also, from the investigation of developers in GitHub, most developers used CLI for operating Git. From this point, supporting beginner developers in CLI or help to transition from GUI to CLI is necessary. 43
List of Publications Yusuke Saito, Kenji Fujiwara, Hiroshi Igaki, Norihiro Yoshida, Hajimu Iida, Discussion of a Tool for Supporting Pull Request Driven Software Development, IEICE Technical Report, volume 114, number 416, pages 103-108 2015 (in Japanese). Yusuke Saito, Kenji Fujiwara, Hiroshi Igaki, Norihiro Yoshida, Hajimu Iida, How do GitHub Users Feel with Pull-Based Development?, In the 7th International Workshop on Empirical Software Engineering in Practice (IWESEP). 44
References [1] Kivanc Muslu, Christian Bird, Nachiappan Nagappan, and Jacek Czerwonka. Transition from Centralized to Decentralized Version Control Systems: A Case Study on Reasons, Barriers, and Outcomes. In Proceedings of the 36th International Conference on Software Engineering, ICSE2014,pages334 344, 2014. [2] Yue Yu, Huaimin Wang, Gang Yin, and Charles X. Ling. Reviewer Recommender of Pull-Requests in GitHub. In Proceedings of the Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference, pages 609 612, 2014. [3] Yang Zhang, Gang Yin, Yue Yu, and Huaimin Wang. An Exploratory Study of @-Mention in GitHub s Pull-Requests. In Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference, APSEC 14, pages 343 350, 2014. [4] Georgios Gousios, Martin Pinzger, and Arie van Deursen. An Exploratory Study of the Pull-based Software Development Model. In Proceedings of the 36th International Conference on Software Engineering, ICSE2014,pages 345 355, 2014. [5] Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M. German, and Premkumar Devanbu. Cohesive and Isolated Development with Branches. In Proceedings of the 15th International Conference on Fundamental Approaches to Software Engineering, FASE 12, pages 316 331, 2012. 45
[6] Santiago Perez De Rosso and Daniel Jackson. What s Wrong with Git?: A Conceptual Design Analysis. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward!2013,pages37 52,2013. [7] Brian de Alwis and Jonathan Sillito. Why Are Software Projects Moving from Centralized to Decentralized Version Control Systems? In Proceedings of the 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering, CHASE 09, pages 36 39, 2009. [8] Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. The Promises and Perils of Mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR2014,pages92 101,2014. [9] Jason Tsay, Laura Dabbish, and James Herbsleb. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proceedings of the 36th International Conference on Software Engineering, ICSE2014, pages 356 366, 2014. [10] Daricélio Moreira Soares, Manoel Limeira de Lima Júnior, Leonardo Murta, and Alexandre Plastino. Acceptance Factors of Pull Requests in Open-source Projects. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC 15,pages1541 1546,2015. [11] Mohammad Masudur Rahman and Chanchal K. Roy. An Insight into the Pull Requests of GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR2014,pages364 367,2014. [12] Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie van Deursen. Work Practices and Challenges in Pull-Based Development: The Integrator s Perspective. In Proceedings of the 37th International Conference on Software Engineering, ICSE,pages358 368,2015. 46
[13] Martin Dias, Alberto Bacchelli, Georgios Gousios, Damien Cassou, and Stéphane Ducasse. Untangling Fine-Grained Code Changes. CoRR, abs/1502.06757, 2015. [14] Hiroyuki Kirinuki, Yoshiki Higo, Keisuke Hotta, and Shinji Kusumoto. Hey! Are You Committing Tangled Changes? In Proceedings of the 22Nd International Conference on Program Comprehension, ICPC2014,pages262 265, 2014. [15] Atlassian. Sourcetree. https://www.atlassian.com/ software/sourcetree. [16] John Brooke. SUS: A quick and dirty usability scale. In Usability Evaluation in Industry. CRCPress,1996. [17] John Brooke. SUS: A Retrospective. J. Usability Studies, 8:29 40, 2013. 47
Appendix A. Survey Used for the Investigation 1. How many years have you been programming? < 1 1-2 3-5 6-10 > 10 2. You work for? The industry The academics The government Open Source Software 3. How many years have you been using VCS? < 1 1-2 3-5 6-10 > 10 4. How many years have you been using Git? < 1 1-2 48
3-5 6-8 > 8 5. How many years have you been using GitHub? < 1 1-2 3-5 > 5 6. How do you use Git? Command line interface GUI Client Other 7. Do you think Git is difficult to use? Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree 8. What makes you difficult when using Git? Too many commands and options Hard to understand about Index and Working directory Difficult to undo changes when mistaken Git command Nothing in particular 49
Other 9. What command do you often use? (multiple choice question) git add -p git merge git rebase git rebase -i git reset git revert git stash Other 10. If there are other Git commands which you frequently use and not listed above, write down here. (open-ended question) 11. What Git command do you often struggle with? (multiple choice question) git add -p git merge git rebase git rebase -i git reset git revert git stash Other 12. If there are other Git commands which you struggle and not listed above, write down here. (open-ended question) 13. Do you think GitHub is difficult to use? 50
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree 14. What makes you difficult when using GitHub? (open-ended question) 15. Do you think pull request is difficult to use? Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree 16. What makes you difficult when using pull request? (open-ended question) 17. Do you use WIP(work in progress) pull request? Yes No 18. Do you refactor the commits after you created a pull request? Yes No 51
B. Detail of Evaluation B.1 Contents of the Application Participants developed Dobutsu Shogi, a small shogi variant for young children, during the evaluation. In the experiment, Doubutsu Shogi were developed using Processing 1. Since the evaluation were conducted on short time, goal of this development is to implement only showing the area of shogi board. View of which participants are developing is shown in Figure 7.1. Figure 7.1: View of Doubutsu Shogi Board 1 https://processing.org/ 52
B.2 Problems and its Instruction Problem 1. Create BaseArea Class Step 1. Open the desginated tool Step 2. Checkout to the master branch Step 3. Create the add-base-area branch from master branch Step 4. Checkout to the add-base-area branch Step 5. Create the BaseArea class Step 6. Run the program to check if it works Step 7. Create a commit Step 8. Push to the GitHub remote repository Step 9. Open a web browser and go to the GitHub page (If you are using git-sprite skip this process) Step 10. Create a pull request Step 11. Wait for review comments to be added Step 12. Do the additional process written in the pull request comment Additional Step 1. Update the add-base-area branch to take in the lates commits in master branch Additional Step 2. Update the BaseArea class Additional Step 3. Compress two commits in the add-base-area branch to make it to one commit Additional Step 4. Run the program to check if it works Step 13. After finishing the additional process, push again and wait for next instruction Problem 2. Create MochigomaArea Class and InfoArea Class Step 1. Open the desginated tool 53
Step 2. Checkout to the master branch Step 3. Pull the latest master branch from the remote repository Step 4. Create the add-mochigoma-area branch from master branch Step 5. Checkout to the add-mochigoma-area branch Step 6. Create the MochigomaArea class Step 7. Run the program to check if it works Step 8. Create a commit Step 9. Create the InfoArea class Step 10. Run the program to check if it works Step 11. Create a commit Step 12. Push to the GitHub remote repository Step 13. Open a web browser and go to the GitHub page (If you are using git-sprite skip this process) Step 14. Create a pull request Step 15. Wait for review comments to be added Step 16. Do the additional process written in the pull request comment Additional Step 1. Update the add-mochigoma-area branch to take in the lates commits in master branch Additional Step 2. Since there are two commits about MochigomaArea class and InfoArea class, create add-info-area branch and move the commit which created InfoArea class to split the branch. Additional Step 3. Run the program to check if it works in the addinfo-area branch Additional Step 4. Push the add-info-area branch and create a new pull request Additional Step 5. Checkout to the add-mochigoma-area branch and check if the program works 54
Step 17. After finishing the additional process, push again and wait for next instruction Problem 3. Create Board Class and Update Doubutsu.pde Step 1. Open the desginated tool Step 2. Checkout to the master branch Step 3. Pull the latest master branch from the remote repository Step 4. Create the add-board branch from master branch Step 5. Checkout to the add-board branch Step 6. Create the Board class Step 7. Create a commit Step 8. Update the Board class Step 9. Run the program to check if it works Step 10. Create a commit Step 11. Update the Doubutsu.pde Step 12. Run the program to check if it works Step 13. Create a commit Step 14. Push to the GitHub remote repository Step 15. Open a web browser and go to the GitHub page (If you are using git-sprite skip this process) Step 16. Create a pull request Step 17. Wait for review comments to be added Step 18. Do the additional process written in the pull request comment Additional Step 1. Since there are two commits about creating Board class and updating Doubutsu.pde, create update-doubutsu branch and move the commit which updated Doubutsu.pde to split the branch. 55
Additional Step 2. Push the update-doubutsu branch and create a new pull request Additional Step 3. Checkout to add-board branch Additional Step 4. Compress two commits in the add-board branch to make it to one commit Additional Step 5. Run the program to check if it works Step 19. After finishing the additional process, push again and wait for next instruction 56