Version Uncontrolled! : How to Manage Your Version Control

Version Uncontrolled! : How to Manage Your Version Control Harold Dost III, Raastech ABSTRACT Are you constantly wondering what is in your production environment? Do you have any doubts about what code is running? Chances are that your version control software isn't the problem. No matter what system you're using there is the ideal and there is the practice. This white paper helps you get on the right track by covering topics that include the types of version control systems, merging and branching techniques, and methods to get a solid source control workflow into place. The latter half discusses the organization of version control and how that can help with enhancing confidence in your code base. TARGET AUDIENCE The intended readers of this paper are those involved with development or all experience levels. Especially those who are looking to drive up the quality of their production environments. EXECUTIVE SUMMARY Reader will learn: Differences between kinds of source control Basic usage of SVN and Git Version control release procedures BACKGROUND Whether working in IT, supporting corporate systems or developing a new product for a customer; the most common tool of any department is the version control system. It is sometimes referred to as source control or even the source repositories, but whatever name that a department uses for it there are many who simply misuse and under use the available features. This failure to use source control systems to their fullest extent has a number of implications. Some have never heard of version control until years into development and when it was introduced to them they view it as a necessary evil. Some simply do not know how to use it, because they either never had the time or interest to learn beyond what few commands they were taught. Others may just feel that the use of some of the features is futile. Regardless of the reasons departments are underusing their version control systems, the benefits of utilizing them more completely can have drastic improvements on the visibility and efficiency of the development process. Some of the key features can be used to save both anguish and time of the employees. Combining these features with a thought out release process can provide confidence that: a team knows where changes are, when changes were made, when they got to production, and who made them. This kind of visibility goes very far when considering global multi-team efforts on down to two person teams or even one-person pet projects. TECHNICAL DISCUSSIONS AND EXAMPLES RELEASE Before diving into the technical details, I would like to place the disclaimer that I tend to be biased towards Git SCM (Source Code Management) as I find it to be an incredibly powerful tool. This has been based off my personal experiences majorly consisting of SVN and Git. At many companies the typical release process will looks something like this: When the developer creates code this is either due to a new feature, maybe a bug fix, some performance enhancement, etc. When creating this code the developer, hopefully is writing tests not just in a word document, but in the form of code verifying the inputs and outputs of the code are what is expected. Once the developer is satisfied with the code, it will should get reviewed and tested by a 3 rd party. This helps with reducing inefficiencies, protect from backdoors to an extent, and maintaining code styling. After that, the code may be tested in a secondary environment. For the purposes of this paper we will later look at what I will call "binary-based" and "environmentbased". Binary-based would be something like the Linux Kernel project where there's no one specific place that it will run in the end. Environment-based would be an IT infrastructure where code often times needs to be migrated up through environments. For example, moving code from development into test and finally production. Even though in many cases there are binaries involved the procedures may be a bit different. Now even though the end products may end up being released as a binary or pushed into a production environment the biggest question we are asking here is around the source code. Where is it? Unfortunately, in many cases the answer is still always trunk. Having code in trunk is not bad, but only having it there is bad. The advantage of having the code only in trunk is that it's easy for developers. Make a change commit, make another change and commit. They will do 1 P age Version Uncontrolled!

this repeatedly until there is something ready to test and eventually move it onto production. The problem is that it's not simple to determine the last version in production. Maybe there is an extra tool that keeps track, but then how does that information get there? Even if you have such a tool it places tracking outside of developer responsibility. Using a version control system can solve this. CENTRALIZED VS. DISTRIBUTED Most if not all people in IT by now know what a version control system is, but they may not realize how many different one's there are out there and some of the inherent differences between them. The major divide in version control is its organization as a centralized system or a distributed one. A centralized version control system relies on a single repository. Developers will check files out and check them in by pushing changes to the repository. Certain files and folders may be locked, to help keep files from being changed while others are working on it. However, those same locks can be overwritten in many cases giving certain developers a false sense of security. On local systems different branches simply appear as folders on the file system and require having multiple copies of the same file if the root level is checked out from the central repository. Also, any commit history, or changes require contacting the central repository. Some of the systems that are classified as centralized include: Subversion (SVN), Team Foundation Server (TFS), ClearCase, and Perforce (P4). For the purposes of this paper Subversion will be used in examples for centralized repositories. In a distributed version control, system repositories are local, meaning that each developer has everything they need. Instead of needing to contact a server to commit changes or check the commit history it's all performed on the local repository. To propagate changes out to anyone else's repository including a "blessed" repository, where everyone would get their latest copy, patches are exchanged. Also since a developer has a repository on their machine, branching locally is cheap on disk space. While the initial cloning of a repository may take a little time all subsequent pushes/pulls, some of the distributed version control systems include: Fossil, Mercurial, Git, and Bazaar. For the purposes of this paper, Git will be used to demonstrate distributed version control system. USEFUL VERSION CONTROL COMMANDS Now that we understand the differences between centralized and distributed version control we can begin to look at how they are similar. Over the next few paragraphs we will be going through the commands of SVN and Git to show the analogs between the two. Some commands may not translate well between them, and as a result might be only mentioned for a single tool. CREATE A REPOSITORY To begin we need to have a repository; for SVN this is no simple task, as from the beginning it requires some central place to start committing. For information on how to do this look for the Subversion How To reference at the end of this paper. For Git simply, navigate to the directory where the project should be initialized, and issue the following command: git init Now to share it with people it will require some initial setup. For basic projects a shared network directory or even a web-based shared directory could be used to host a common source. If such a directory is used then it must have a bare repository, which can be created using: git init --bare test-repo.git There are also more advanced instructions to create a scalable, shared repository in How To Create a Remote Shared Git Repository. Once a repository is available to share amongst people it must then be either checked out or cloned. For SVN, a checkout is performed with the following command: svn co http://host.com/path/repo For Git, a clone is performed using: git clone https://host.com/path/repo.git Or: git clone user@server:path/repo.git The SVN and Git commands may take some time initially, but for Git all further commits and log request will be near instantaneous since the repository is on the local machine. However, to push/pull changes to and from the remote repository there are additional commands. COMMIT CHANGES To make changes in a repository they must be committed. Since in SVN, there is only a single repository one must only perform the following commands to add a file: svn add test-file.txt Push changes to the repository: svn commit -m "Some Message" This will push whatever changes have been made out to the central repository. For Git, however, you have the repository on your machine so to make changes locally first add the files that should be included in the commit: git add file-name.txt Commit the changes: git commit -m "Some Message" 2 P age Version Uncontrolled!

Push them to the repository: git push While on either of these processes when pushing changes out to the remote repository there may be merging conflicts, but that is a bit beyond the scope of this section. While the Git method may require a few more steps it doesn't always. Once a file is being tracked, modified files can be committed easily my just adding -a to the commit command. The advantage of this is that a developer can make changes to their repository multiple times and then wait to push the changes out until they are final. BRANCHING AND MERGING The last feature we will cover is branching and merging, as this will be important in the next section. To branch in SVN: svn copy trunk branches/my_branch This command will copy all files locally essentially doubling the disk space used by what was originally in trunk. However to branch in Git: git branch my-branch OR git checkout -b my-branch The first Git command will make a branch based on the current whereas the second will also take the user to that branch. Both commands create symbolic links until a user makes a change. It is important to note that from a user's perspective, local branches in SVN are seen as different directories in the file system. However, with Git, the tool specifies which branch is active only needing to navigate to a single directory. To know which branch a user is on: git status This will output: On branch master Your branch is up-to-date with 'origin/master'. nothing to commit, working directory clean To change branches use: git checkout branch-name Once a developer has made sufficient changes they should now merge their code. To merge code in SVN, for this example into trunk: cd /path/to/repo/trunk svn merge../branches/my_branch/ svn commit -m "Merge my_branch" For Git (assuming master is the current branch): cd /path/to/repo git merge my-branch git push WORKFLOWS Now that we know how to use our different version control systems we need to discuss good methods in using version control. I have observed trends that generally tend to be common in shops using centralized repositories. Merges are something to be feared. Commits may be only made when absolutely necessary. Everything goes into trunk. In order to combat potentially hazardous behaviors, it is recommended to setup a workflow and follow it. There are many out there, but can be distilled down to a few. Figure 1: Centralized Workflow The first workflow is not recommended; it is unfortunately very common amongst IT organizations despite better workflows being known for many years now. It's the centralized workflow. There is a single branch that everyone commits their changes to and as a result it is a mishmash of the various states of different sections of code. This can slow down productivity since consecutive commits may have nothing to do with each other. Tagging can't really help here since everything in the singular branch isn't guaranteed to be "production-ready". Figure 2: Feature Branch Workflow The next workflow improves things quite a bit; the Feature Branch Workflow. The biggest feature is branches. By adding these, a developer can create a branch from the primary branch and work in their personal branch for however long it takes to get the feature finished. A 3 P age Version Uncontrolled!

developer can have more than one personal branch and it can be used for features, bugs, etc. Once they are complete a merge can take place. When the merge happens depends largely on the project and technology set. To use the Linux Kernel again, the merge would need to take place first and then a build can occur since whatever build tool is going to be pulling from this "primary" branch. Whereas with something where the project is self contained as long as code can be checked out to be deployed or otherwise used then the merge could happen after, but by doing it before production is reflected by the version control. Tagging can be very useful in this case, since only "production-ready" code is placed into the primary branch. The last approach presented here builds on the last and I first discovered when reading an article by Vincent Driessen. It involves low and behold, more branches! Just as in the other there will be a "primary" branch, which for the example will be called master. This should hold your production code, and nothing else. The other constant branch is "develop". The develop branch is used as the intermediary where developers merge their features as they are done, but before ready to be released. Other than the personal branches there are other transient branches, namely hotfix and release branches. Finally, tagging plays a pivotal role in how this model works. So just as with the previous workflow, developers will checkout personal branches, and instead will branch from develop. Once their feature is complete they will merge back into develop. The only exception to this is if there is need for a hotfix that can't wait for the next major release. Assume the team is working on 1.5 and currently 1.4.1 has been released. The developer responsible will create a branch based on master, make the change and once it has been successfully tested and is ready to go to production, and the change will be merged into two places. The first is master where the new version tag will be 1.4.2. The second merge will be to develop so that code still in the develop branch accounts for the change. Finally, "release" branches are created from develop, and are a staging area between develop and master. When the release branch is created no new features are added to it, but features can still be merged into develop. Bug fixes are made directly to the release branch and can be merged back into develop as often as desired. Once a release is ready, it is merged into master and tagged. It is also merged back into develop to reflect any bug fixes that had not yet been merged. Figure 3: Vincent Driessen Workflow There may be variations of this depending on if a company is supporting multiple versions of something at the same time, but that would simply require checking out from a specific tag and making necessary changes. Other things that are not limited in any of these models are: 1. How often things are pushed to production? 2. Who pushes to production? 3. What gets pushed to production? 4. Who controls what should go into a release? The company implementing any of these workflows will determine many of these things. BRANCHING So far this paper has discussed tools, commands, and workflows. Of the three workflows, two of them implement branching. The last one especially requires a number of different branches. With all of these branches flying around being created and destroyed one of the things that will become quickly necessary is a naming scheme. Some examples of naming schemes are: Feature feature/had/00001-some-newfeature 4 P age Version Uncontrolled!

Bug bug/had/010000-blue-screen-of-deathis-red Spike / Experimental o spike/had/radical-new-things o exp/had/something-really-awesome Release release/1.5 or release/20150507.1 Having the purpose behind branch is very useful so that we can see outstanding bugs and features. It may be desired to have developers working on a change to include their initial as part of the branch name. This helps for quick visual inspections of existing branches. Hopefully companies are using some sort of system to track changes and the number from that system should be used to have some sort of correlation. Additionally, it's a good idea to have a description of the branch so that people on don't necessarily need to memorize issue numbers. Releases should only require a version number, maybe including rc, alpha, beta, etc. GETTING THERE Assuming that your team doesn't use a well-formed workflow you probably want to move towards one of these workflows. The steps towards using this on a team level are relatively easy. First you'll need to set aside a little time to think about how you can improve your personal workflow and then move out into the team. Start by practicing on a dummy project. Create a branch and go through your typical process, but adding in the branching concept and merging. Next, if your team is using the centralized workflow, there is nothing stopping you individually from branching for all of your assigned features. Once you feel comfortable with how the process works make a case for it at your next team meeting. If there aren't regular meeting maybe ask to request one, and if that doesn't work, then appeal to some of your coworkers and show them how it helps you. It can grow organically from you out to your co-workers. However to take it to the next level will require team and sometimes unit level cooperation. With your team on-board and the benefits these workflows making the business case should be simple. https://about.gitlab.com/pricing/ REFERENCES Atlassian. (2015). Comparing Workflows. Retrieved 2015, from Atlassian: https://www.atlassian.com/git/tutorials/comparingworkflows/ Bansal, N. (2011). HOWTO: Hosting a Subversion Repository. Retrieved 2015, from University Of Toronto: http://queens.db.toronto.edu/~nilesh/linux/subversionhowto/ Driessen, V. (2010). A successful Git branching model. Retrieved 2015, from Nvie: http://nvie.com/posts/asuccessful-git-branching-model/ Git SCM. (2014). Git Book. Retrieved 2015, from Git SCM: http://git-scm.com/book/en/v2/ Kovshenin, K. (2011). How To Create a Remote Shared Git Repository. Retrieved 2015, from Kovshenin: https://kovshenin.com/2011/howto-remote-shared-gitrepository/ Tutorials Point. (2014). SVN Tutorial. Retrieved 2015, from Tutorials Point: http://www.tutorialspoint.com/svn/ APPENDICES APPENDIX A: MIGRATION FROM GIT TO SVN http://www.subgit.com/remote-book.html APPENDIX B: ENTERPRISE GIT MANAGEMENT TOOLS https://enterprise.github.com/ https://www.atlassian.com/software/stash http://www.gitenterprise.com/pricing.html 5 P age Version Uncontrolled!