Version Control with Mercurial and SSH Lasse Kliemann lki@informatik.uni-kiel.de Vorkurs Informatik 2010
Wishlist While working on a project, it is nice to... be able to switch back to older versions of a file, or to compare different versions against each other. have an easy procedure for doing backups. access your files from anywhere over the network (reading and writing). work with several persons on the same set of files, without data loss and with some kind of conflict management. All this and more can be realized with... revision control systems, or version control systems if you prefer, e.g., CVS, Subversion, GNU arch, Darcs, Git, Bazaar, Mercurial; and tools for remote access, e.g., SSH. 1
The Naive Way The user opens a new file and starts writing. The more work he spends on the file, the more precious it becomes. This precious content is available in only one place, namely in the file on the disk (and during editing also in the RAM of the machine). Think of... the user deleting the file accidentally the user making an erroneous change to the file another user deleting the file (accidentally or maliciously) a malicious program (e.g., a virus) deleting the file the hard disk malfunctioning (can happen at any time!) 2
But I Have a Backup! Backups are fine. However... how often do you back up your files? how comfortable or flexible is your backup procedure? how well-organized are your backup collections? On our Suns, we have /backup/$user, but you might want to have something more fine-grained in addition. Not to mention the mess when working on the same set of files... in different locations with several persons at the same time. 3
Collaboration Media Infrastructures that support a group of contributors in working on the same files. e-mail ubiquitous but limited possibilities can lead to excessive extra work on all parts wikis share some properties with ( classical ) version control systems mostly for simple-structured text, provide their own rendering engine user interface is web-based and brings some nice features but also some severe limitations ( classical ) version control systems integrate smoothly with other tools (e.g., text editors)... and infrastructures (e.g., file system) useful for all kinds of text files... and with limitations also binary files can be distributed some wiki-like features can be added 4
Initializing the Project $ mkdir proj-1 $ cd proj-1 $ hg init $ ls -l total 0 $ ls -la [...] drwxr-x--- 3 lki lki 100 Oct 3 14:47.hg Everything below proj-1 is called a repository. Everything below proj-1 but outside of.hg is called the working copy, or working copy area. Everything below.hg is called the project metadata; sometimes it is also referred to as repository. $ hg init initializes the repository; this is only required once. 5
The.hg Directory $ find.hg.hg.hg/requires.hg/00changelog.i.hg/store Those files should not be changed by hand..hg/hgrc may be edited and contains configuration for this repository. Global configuration is done in /.hgrc. 6
Adding Some Files $ pwd /home/lki/proj-1 $ vim file-1 $ vim file-2 $ cat file-1 alpha beta $ hg status gives status information. Files that shall be tracked must be registered using $ hg add file. $ hg status? file-1? file-2 $ hg add file-1 $ hg status A file-1? file-2 7
Commit To record the current state, do a commit $ hg commit, shortly $ hg ci First, give Mercurial a name (your name) to associate with the commit. This is done in /.hgrc or locally in.hg/hgrc : [ui] username = Lasse Kliemann <lki@informatik.uni-kiel.de> Upon commit, an editor opens: v [...] HG: user: Lasse Kliemann <lki@informatik.uni-kiel.de> HG: branch default HG: added file-1 This message will be associated with the commit and appear in the logs. If there is nothing particularly interesting to say, use a dummy like na. 8
Examining Changes $ hg status? file-2 $ vim file-1 $ cat file-1 gamma beta $ hg status points out modified files. $ hg diff shows changes in detail. $ hg status M file-1? file-2 $ hg diff [...] -alpha +gamma beta 9
Log We do another commit using $ hg commit Then we examine the log: $ hg log changeset: 1:a9deef06258b tag: tip user: Lasse Kliemann <lki@informatik.uni-kiel.de> date: Sun Oct 03 18:32:34 2010 +0200 summary: na changeset: 0:19092ca73642 user: Lasse Kliemann <lki@informatik.uni-kiel.de> date: Sun Oct 03 18:23:26 2010 +0200 summary: na We use changeset and revision synonymously. More information is displayed with $ hg log -v or $ hg log -p. 10
$ hg log -p changeset: 1:a9deef06258b tag: tip user: Lasse Kliemann <lki@informatik.uni-kiel.de> date: Sun Oct 03 18:32:34 2010 +0200 summary: na [...] -alpha +gamma beta changeset: 0:19092ca73642 user: Lasse Kliemann <lki@informatik.uni-kiel.de> date: Sun Oct 03 18:23:26 2010 +0200 summary: na [...] +alpha +beta 11
Switching Back $ cat file-1 gamma beta $ hg checkout 0 1 files updated, [...] $ cat file-1 alpha beta You can bring back the working copy to any previously recorded state! Just use the checkout command. checkout is the same as update. Do not confuse this with the meaning of update in other systems, like Subversion. $ hg checkout 1 1 files updated, [...] $ cat file-1 gamma beta 12
Non-Linear Development You can make changes to an older revision and then commit. $ hg checkout 0 1 files updated, [...] $ vim file-1 $ hg diff [...] alpha beta +delta $ hg commit created new head $ hg heads shows all heads: $ hg heads changeset: changeset: 2:c3f86ff744d9 1:a9deef06258b 13
Non-Linear Development (2) We switch back to the other head, make changes, and commit. $ hg checkout 1 1 files updated, [...] $ vim file-1 $ cat file-1 epsilon gamma beta $ hg diff [...] +epsilon gamma beta $ hg commit 14
The Graph of Revisions (1) The situation can be displayed as a directed graph: alpha beta gamma beta epsilon gamma beta alpha beta delta 15
The Graph of Revisions (2) Or shortly, using revision numbers: 0 1 3 2 Heads are those nodes which have no outgoing edge, here 2 and 3. $ hg glog can in fact display such graphs on the terminal! They grow into the sky then, not from left to right. 16
Merging You can merge changes made in one line of development into another: $ hg checkout 3 $ hg merge 2 0 files updated, 1 files merged, 0 files removed, [...] $ hg diff [...] epsilon gamma beta +delta $ hg commit 0 1 3 4 2 17
The New Graph of Revisions alpha beta gamma beta epsilon gamma beta epsilon gamma beta delta alpha beta delta 18
Unknown Files: Ignoring and Purging It is common and okay to have files in the working copy area that are not known to Mercurial, we also say: not tracked. Problem is, they clutter up output of hg status. Use -q switch to supress. Long-term approach is to make appropriate entries in.hgignore, see man hg for detailed information. For example to ignore all files ending in.pdf, put this in.hgignore : glob:*.pdf The Purge extension provides a command to remove all files from the working copy area that are not tracked and not ignored: $ hg purge Use $ hg purge --all to even remove ignored files. 19
Rename and Delete Files under control of Mercurial should not be treated with mv or rm. Instead use Mercurial commands. $ hg mv path-1 path-2 $ hg rm path $ hg cp path-1 path-2 is also useful. 20
Extensions Mercurial comes with a bundle of so-called extensions. More extensions are available separately, see: http://mercurial.selenic.com/wiki/usingextensions Extensions have to be activated first, in order to be used. To use $ hg glog, the Graphlog or Purge extensions must be activated first. Put this into your /.hgrc : [extensions] hgext.graphlog = hgext.purge = 21
Checking the Wishlist We have a history with many features. We made a tiny step towards a backup infrastructure: data is kept in two locations, in the repository and in the working copy. But this leaves much to be done. No network, yet. A single-person setup only, so far. 22
Cloning and Pushing Cloning means replicating a repository, including all history. We create a clone of our proj-1 into proj-1a : $ cd $ hg clone proj-1 proj-1a updating to branch default 1 files updated, 0 files merged, 0 files removed, [...] We can work with proj-1a just as before; it provides a fully-fledged repository and working copy area. If we say $ hg push, then everything newly committed in proj-1a integrated into proj-1. will be Mercurial knows where to push to by looking at an entry in.hg/hgrc section called [paths]. in the command can also be given a destination explicitly on the command line. The push 23
Wow, It s Distributed! It is important to realize that the cloned repository contains no more or less information than the original one. New changesets can flow in both directions, using push or pull. Default push and pull locations can be set in.hg/hgrc. Be careful how to refer to a particular changeset: numbers are only meaningful within one repository, with more than one use the hash, aka changeset ID. We are one step closer to a good backup infrastructure. We can have as many repositories as we like, in different locations of the file system. E.g., one could reside on a USB memory stick. And what is best: cloning, pushing, and pulling works just as great over the network. 24
Network: SSH The network allows us to work in different places with multiple contributors on the same project. Mercurial features a built-in server, but we won t use it. We use SSH to wrap up communication. SSH features public key authentification: You create a public/private keypair on an account A. The public key is registered on an account B. Now you can access account B from account A. Even more, it can be so configured on B s side that only certain actions are allowed from A. Such restrictions could be that only cloning, pushing, and pulling regarding specific repositories is allowed. 25
On Account A: Creating Your Keypair Create a public/private keypair on account A: $ ssh-keygen -t rsa -b 4096 Go with the default for all questions (but configure a passphrase if you prefer). Send the public key to the administrator of account B. The public key is in your home directory in /.ssh/id_rsa.pub. You can send it by e-mail: $ mail admin@example.com < /.ssh/id_rsa.pub Or use a USB memory stick, put it on a website, etc. Do not give your secret key to others! The secret key is in your home directory in /.ssh/id_rsa. 26
On Account B: Give Permission to this Key On account B, the public key created on account A is inserted into /.ssh/authorized_keys. In the simplest case, the key is just copied there. Multiple keys are separated by newlines: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAABAEApiB[...] ssh-rsa AAAAB3NzaC1yc2EAAAABIwAABAEA2gm[...] This means full access. Often this is not what you want. Use the command directive to restrict access: command=command,no-port-forwarding,no-agent-forwarding, no-x11-forwarding,no-pty ssh-rsa AAAAB3NzaC1yc[...] Any client using this key (e.g., from account A) is now restricted to execute the specified command. (Provided this is the only entry for this key.) 27
Which Command to Use to Allow Mercurial Access? hg-ssh is provided by Mercurial, used like this: command="hg-ssh path-1... path-n",no-port-forw[...] To give access to repositories located in the listed paths. I wrote something more flexible, called vcs-route : command="vcs-route short-id long-id mode",no[...] The client (from account A) is confined to /hg/user/short-id on account B. The real repositories can be symlinked from there. Changesets are required to have long-id as username. mode can be rw for read-write access, or ro to only allow read access. Example: command="vcs-route lki Lasse Kliemann <lki@informatik.uni-kiel.de> rw",no[...] 28
URLs With SSH, the URLs for cloning, pushing, and pulling look like this: ssh://user@host/repos user corresponds to account B. host corresponds to the host (the machine) on which account B lives. repos is the name of the symbolic link in the directory /hg/user/short-id on account B. 29
Example On account B, in /.ssh/authorized_keys : command="vcs-route lki Lasse Kliemann <lki@informatik.uni-kiel.de> rw",no[...] On account A, the appropriate username should be set in /.hgrc : [ui] username = Lasse Kliemann <lki@informatik.uni-kiel.de> On account B, a repository is in, say, /hg/repos/proj-1. On account B, there is a symbolic link: /hg/user/lki/my-proj../../repos/proj-1 Account B is called, say, discopt gauss.informatik.uni-kiel.de. and lives on host On account A, the clone operation works like this then: $ hg clone ssh://discopt@gauss.informatik.uni-kiel.de/my-proj 30
Checking the Wishlist History Backup: you can have repositories in multiple locations, protected by the SSH command mechanism. This already makes sense on the same machine (with multiple accounts). Accessing files over the network Several persons Conflict management: merge partly covered so far, more to come in the exercises. 31
Activate Extensions and Hooks In order for vcs-route to work as described, two extensions and hooks have to be activated in /.hgrc on account B: [extensions] requser = path/requser.py readonly = path/readonly.py [hooks] pretxnchangegroup.requser = python:requser.hook pretxnchangegroup.readonly = python:readonly.hook On our Suns, use /home/discopt/software/sp/package/host/plastictree.net/vcs/hgext for path. I provide an installation of my software there. Instead, you can also put this one line in your the /.hgrc : %include /home/discopt/.hgrc-vcs-route 32
More Path Adjustments In fact, you also have to extend PATH in order for the vcs-route program (and probably also parts of Mercurial) to be found: command="path=/home/discopt/command:\"$path\" vcs-route [...] More documentation is in /home/discopt/software/sp/package/host/plastictree.net/vcs/localdoc/index.html. You can download the package at http://unix.plastictree.net/software/latest/vcs.tar.xz for your own installations. It requires http://code.dogmap.org/prjlibs/. 33
Traps and Pitfalls If you (intend to) work closely together: NO: infrequently pulling (and merging) or pushing Do pulls and merges often. Always do it before you begin your work. Commit and push whenever some part of the work is done. A pull and merge means that you get a chance to adapt your work to the changes made by others. A commit and push means that you give others a chance to do the same regarding their work and your changes. However, it may be desirable: not to push faulty code that a revision constitutes a consistent state 34
Traps and Pitfalls NO: long lines Start a new line often. Use semantic line breaks. It s a good idea to start a new line roughly after every clause. It makes diffs much more usefull. It gives better chances for a merge to succeed. It also helps editing a lot if you use good line-oriented editors, e.g., Vim. Do not confuse a line break with the wrap function of your editor! 35
Traps and Pitfalls Careful with confidential data! Once a changeset is out in the wild, it is very difficult to have it obliterated. Depending on circumstances, it may be practically impossible. So: think before you push. Some local operations can be taken back using rollback. 36
Traps and Pitfalls NO: forget to hg add a new file It is OK to have files in the working copy area which are unknown to Mercurial. However, if a new file shall be committed (and pushed) it must be added: hg add file Beginners often forget this step. Later they wonder why their co-workers cannot see the file. A missing file might temporarily break things. 37
Traps and Pitfalls NO: file open in editor during checkout or merge Copy of the file is in editor. A checkout or merge is performed. This alters the file in the working copy on the disk. However, the copy in the editor stays the same. The file is saved from the editor to disk. The file on disk now is the former one the changes due to the checkout or merge are knocked over by this. For Mercurial, it looks like normal changes to the file were made. Later, co-workers wonder where their changes have gone. 38
Where to Get Help? $ hg help topic $ man hg Navigate from http://mercurial.selenic.com Experts on mailing lists are extremely friendly, cooperative, and eager to help. When I brought up the question of committer authentification once, an OpenPGP plugin was written within 48 hours. With the help of the experts, I then designed and wrote vcs-route and the associated extensions. Comprehensive treatment in http://hgbook.red-bean.com/read/ by Bryan O Sullivan Presentation by him: http://www.youtube.com/watch?v=1sv8z_lmpt4 Linus Torvalds explains distributed version control at the example of Git: http://www.youtube.com/watch?v=4xpnkhjaok8 39