Two Best Practices for Scientific Computing



Similar documents
Version Control with. Ben Morgan

Git Basics. Christopher Simpkins Chris Simpkins (Georgia Tech) CS 2340 Objects and Design CS / 22

Using Git for Project Management with µvision

Source Control Systems

Version Control with Git. Linux Users Group UT Arlington. Rohit Rawat

MATLAB & Git Versioning: The Very Basics

Version Control with Git. Kate Hedstrom ARSC, UAF

Version Control with Svn, Git and git-svn. Kate Hedstrom ARSC, UAF

Lab Exercise Part II: Git: A distributed version control system

Introduction to the Git Version Control System

Version Control with Git. Dylan Nugent

CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)

Introduction to Version Control

Version Control! Scenarios, Working with Git!

Version control. HEAD is the name of the latest revision in the repository. It can be used in subversion rather than the latest revision number.

Version Control for Computational Economists: An Introduction

FEEG Applied Programming 3 - Version Control and Git II

Version Control using Git and Github. Joseph Rivera

Introduction to Git. Markus Kötter Notes. Leinelab Workshop July 28, 2015

CISC 275: Introduction to Software Engineering. Lab 5: Introduction to Revision Control with. Charlie Greenbacker University of Delaware Fall 2011

Version control with GIT

Advanced Computing Tools for Applied Research Chapter 4. Version control

Pro Git. Scott Chacon *

Git Basics. Christian Hanser. Institute for Applied Information Processing and Communications Graz University of Technology. 6.

Introduc)on to Version Control with Git. Pradeep Sivakumar, PhD Sr. Computa5onal Specialist Research Compu5ng, NUIT

Version control. with git and GitHub. Karl Broman. Biostatistics & Medical Informatics, UW Madison

Version Control with Git

Introducing Xcode Source Control

MOOSE-Based Application Development on GitLab

Using Subversion in Computer Science

Version Control Systems: SVN and GIT. How do VCS support SW development teams?

Work. MATLAB Source Control Using Git

Version Control with Git

Dalhousie University CSCI 2132 Software Development Winter 2015 Lab 7, March 11

Introduction to Version Control with Git

Version Control Systems

An Introduction to Mercurial Version Control Software

Version Control. Version Control

PKI, Git and SVN. Adam Young. Presented by. Senior Software Engineer, Red Hat. License Licensed under

Version Control with Subversion

Data management on HPC platforms

Continuous Integration. CSC 440: Software Engineering Slide #1

Version Control Tutorial using TortoiseSVN and. TortoiseGit

Distributed Version Control

Software Configuration Management and Continuous Integration

Content. Development Tools 2(63)

An Introduction to Git Version Control for SAS Programmers

Git. A Distributed Version Control System. Carlos García Campos carlosgc@gsyc.es

Working Copy 1.4 users manual

Gitflow process. Adapt Learning: Gitflow process. Document control

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer GIT

Zero-Touch Drupal Deployment

5 barriers to database source control and how you can get around them

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Unity Version Control

Version Control with Subversion and Xcode

Version Uncontrolled! : How to Manage Your Version Control

Version Control with Subversion

Version Control Script

Annoyances with our current source control Can it get more comfortable? Git Appendix. Git vs Subversion. Andrey Kotlarski 13.XII.

The Hitchhiker s Guide to Github: SAS Programming Goes Social Jiangtang Hu d-wise Technologies, Inc., Morrisville, NC

Version control with Subversion

Using GitHub for Rally Apps (Mac Version)

BlueJ Teamwork Tutorial

CS 2112 Lab: Version Control

Software configuration management

1. History 2. Structure 3. Git Comparison 4. File Storage 5. File Tracking 6. Staging 7. Queues (MQ) 8. Merge Tools 9. Interfaces

CSE 374 Programming Concepts & Tools. Laura Campbell (Thanks to Hal Perkins) Winter 2014 Lecture 16 Version control and svn

Derived from Chris Cannam's original at, an.

Introduction to Software Engineering (2+1 SWS) Winter Term 2009 / 2010 Dr. Michael Eichberg Vertretungsprofessur Software Engineering Department of

Software development. Outline. Outline. Version control. Version control. Several users work on a same project. Collaborative software development

NLP Programming Tutorial 0 - Programming Basics

Xcode Source Management Guide. (Legacy)

Improving your Drupal Development workflow with Continuous Integration

CSCB07 Software Design Version Control

Integrated version control with Fossil SCM

Mercurial. Why version control (Single users)

Surround SCM Best Practices

How to set up SQL Source Control. The short guide for evaluators

Source Code Management for Continuous Integration and Deployment. Version 1.0 DO NOT DISTRIBUTE

Continuous Integration

Revision control systems (RCS) and

An Introduction to Mercurial Version Control Software

SOFTWARE DEVELOPMENT BASICS SED

Automatic promotion and versioning with Oracle Data Integrator 12c

Introduction to Source Control ---

Git Branching for Continuous Delivery

Puppet Firewall Module and Landb Integration

Version Control Using Subversion. Version Control Using Subversion 1 / 27

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

Distributed Version Control with Mercurial and git

Git in control of your Change Management

Transcription:

Two Best Practices for Scientific Computing Version Control Systems & Automated Code Testing David Love Software Interest Group University of Arizona February 18, 2013

How This Talk Happened Applied alumnus, Carlos Chiquete, posted this paper on Facebook a All were great, but I d never encountered many a Justifying every second ever wasted on Facebook Best Practices for Scientific Computing (arxiv) David Love VCS & Testing February 18, 2013 1 / 40

Software Carpentry Lead author Greg Wilson founded a group called Software Carpentry They have many videos documenting best practices for scientific computing A 2-day boot camp ($20) will be held April 4-5, 2013 teaching many of these techniques. David Love VCS & Testing February 18, 2013 2 / 40

1 Version Control System: Git Basics Branching Remote Repositories 2 Unit (and other) Testing Assertions Unit Testing with xunit

1 Version Control System: Git Basics Branching Remote Repositories 2 Unit (and other) Testing

What is a Version Control System? Version Control Systems are pieces of software designed to: Maintain a complete history of the state of a project Works especially well with program code, L A TEX files anything you can read in a text editor Other file types aren t stored as efficiently Allow for different versions (branches) to exist concurrently and independently Provides tools to integrate changes from different branches together Allow for much simpler collaboration with others David Love VCS & Testing February 18, 2013 3 / 40

How It Works Version Control Systems maintain a database of document versions, called a repository Users check out files from the repository, change them, then commit those changes to the repository The VCS checks whether two editors (or branches) have edited the same lines, notes the conflict, and makes you resolve it Greatly reduces the chance that editors will overwrite each other accidentally Changes will not get lost Repository determines the latest version David Love VCS & Testing February 18, 2013 4 / 40

Best Use of Version Control Best Practices for Scientific Computing In practice, everything that has been created manually should be put version control, including programs, original field observations, and the source files for papers. Automated output and intermediate files can be regenerated at need. Binary files (e.g., images and audio clips) may be stored in version control, but it is often more sensible to use an archiving system for them, and store the metadata describing their contents in version control instead. David Love VCS & Testing February 18, 2013 5 / 40

Types of VCSs There are two basic types of VCSs: Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files. CVS (Concurrent Versions System) SVN (Subversion) Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server. Git Mercurial David Love VCS & Testing February 18, 2013 6 / 40

Types of VCSs There are two basic types of VCSs: Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files. CVS (Concurrent Versions System) SVN (Subversion) (Software Carpentry teaches SVN) Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server. Git Mercurial David Love VCS & Testing February 18, 2013 6 / 40

Types of VCSs There are two basic types of VCSs: Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files. CVS (Concurrent Versions System) SVN (Subversion) (Software Carpentry teaches SVN) Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server. Git (I will demonstrate Git) Mercurial David Love VCS & Testing February 18, 2013 6 / 40

Why Git? A very popular distributed VCS Does not require setting up a separate location to store the database This makes being a single user easier Supported on most popular code hosting services Google Code, SorceForge github 12, Bitbucket git svn uses Git locally but works with a Subversion server Free & Open Source Why Git is Better than X 1 Free student account 2 github:windows github:mac github:mobile David Love VCS & Testing February 18, 2013 7 / 40

Git Resources 1 Pro Git (used for this talk) 2 Version Control By Example 3 Top 10 Git Tutorials for Beginners 4 O Reilly Webcast: Git in One Hour 5 Git+L A TEX Workflow The highest rated answer to this stack overflow question is very good. David Love VCS & Testing February 18, 2013 8 / 40

Basics Basic Configuration Git stores your name and email and attached them to your contributions 1 git config --global user.name "David Love" 2 git config --global user.email dlove@math.arizona.edu Name your favorite editor 3 git config --global core.editor vim Select a diff & merge tool 4 git config --global merge.tool meld The --global tag stores the information in your home directory, and apply to all git repositories. The configuration will be stored in the local git repository otherwise. David Love VCS & Testing February 18, 2013 9 / 40

Basics Merge Tools Open Source 1 Diffuse 2 Emerge (emacs) 3 gvimdiff (gvim) 4 KDiff3 5 Meld 6 tkdiff 7 TortoiseMerge 8 xxdiff Free Commercial Software 1 opendiff (OS X developer tool) 2 P4Merge Pay Software 1 Araxis Merge 2 Beyond Compare 3 ECMerge GitHub for Windows & Mac provide their own merge tool David Love VCS & Testing February 18, 2013 10 / 40

Basics Creating a Git Repository To create a new repository: 1 Move to the directory with your files 2 git init To clone an existing repository: Use git command clone Format: git clone <url> [<directory>] Urls can use protocols git, http(s), ssh: git clone git://github.com/schacon/grit.git git clone http://github.com/schacon/grit.git git clone ssh://dlove@gila.math.arizona.edu:31415/$home/test.git David Love VCS & Testing February 18, 2013 11 / 40

Basics The File Status Lifecycle in Git Pro Git Image 2-1 Command git status lists Untracked files Modified but unstaged files Staged but uncommitted Moving within the lifecycle: Stage files with git add <file> Commit with git commit David Love VCS & Testing February 18, 2013 12 / 40

Basics Committing Changes to the Repository When you commit changes to the repository, Git asks for a commit message Git opens your favorite editor, and gives a (commented out) default message Now, type a short message describing what you changed during this commit Structuring Commits Best practice: structure your editing so each commit is a logically separate idea David Love VCS & Testing February 18, 2013 13 / 40

Basics Once committed, Git gives a message like Commit Information [master b05ca11] Commit message 1 file changed, 3 insertions(+), 2 deletions(-) master Branch name b05ca11 SHA-1 hash key (abbrev) Commit message Your commit message 1 file changed Number of files changed 3 insertions(+) Number of lines inserted 2 deletions(-) Number of lines deleted Git stores commits by a 40 digit SHA-1 hash key Git tracks lines of code. Editing a line = 1 insertion & 1 deletion David Love VCS & Testing February 18, 2013 14 / 40

Basics Committing all changes git commit without add git commit -a allows for skipping git add by committing all modified files. David Love VCS & Testing February 18, 2013 15 / 40

Basics Viewing Changes git diff Prints the differences between modified file and the most recent committed version git difftool Uses the merge tool to highlight the differences --cached Modifies either command to show differences between staged file and most recent committed version David Love VCS & Testing February 18, 2013 16 / 40

Basics Viewing the Commit History Commit Log git log shows the commit history in reverse chronological order. Default information Commit hash Author Options: Date & time committed Commit message -<number> Latest <number> entries, e.g., git log -4 --pretty=oneline Abbreviates to one line of output --since= Look at commits since some time, e.g., yesterday, 1.week, "2 months", 2013/02/01, 02/01/2013 --until= Look at commits until some time David Love VCS & Testing February 18, 2013 17 / 40

Basics Undoing Changes Changing Your Last Commit You can modify your previous commit to a new commit with git commit --amend. Unstaging a Staged File A file can be unstaged with git reset HEAD <file> Unmodifying a Modified File You can delete modifications to a file with git checkout -- <file> git status lists the latter two commands when appropriate. David Love VCS & Testing February 18, 2013 18 / 40

Branching What is a Branch? In Git and other VCSs, a branch is an independent copy of the working directory Changes in one branch will not affect any other branch Different branches can be checked out of the repository Branches can be merged to combine their contents Branches are simpler in Git than in most other VCSs David Love VCS & Testing February 18, 2013 19 / 40

Branching Basic Branch Commands in Git The basic branch operations: List branches git branch Create branch git branch <branch name> Check out branch git checkout <branch name> Merge into current branch git merge <branch name> To see how branches work, we ll look at how Git stores data. David Love VCS & Testing February 18, 2013 20 / 40

Branching Data Storage in Git Pro Git Image 1-5 Git stores data as a series of snapshots. Only when files A, B, or C change does Git store a new snapshot. David Love VCS & Testing February 18, 2013 21 / 40

Branching Data Storage in Git Pro Git Image 3-2 Data Git stores about a commit, including the hash, the author, commit message etc. Horizontal arrows are pointers pointing to the previous commit. David Love VCS & Testing February 18, 2013 21 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-3 An abbreviated commit history marked by SHA-1 hashes David Love VCS & Testing February 18, 2013 22 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-4 After git branch testing David Love VCS & Testing February 18, 2013 22 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-5 The HEAD pointer keeps track of the current branch David Love VCS & Testing February 18, 2013 22 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-6 After git checkout testing David Love VCS & Testing February 18, 2013 22 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-7 Made some changes, then committed to the current branch (testing) David Love VCS & Testing February 18, 2013 22 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-8 After git checkout master David Love VCS & Testing February 18, 2013 22 / 40

Branching Branching in Git, Conceptually Pro Git Image 3-9 Made further changes, then committed them to master David Love VCS & Testing February 18, 2013 22 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-10 You want to fix issue #53. Next: Create a branch for that purpose A small commit history David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-11 git branch iss53 Next: Make a change and commit it git branch iss53 David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-12 You stumble upon a bug that needs to be fixed immediately. Go back to master so your partial work on iss53 doesn t get integrated too early. Commands to execute: git checkout master git checkout -b hotfix to create and immediately check out branch hotfix Make a commit to fix the bug. Committed change on iss53 David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-13 After testing your work, you want to add the bug fix to master Next: merge hotfix into master git branch hotfix to fix a bug David Love VCS & Testing February 18, 2013 23 / 40

Branching Pro Git Image 3-14 Branch Merging, Conceptually Merging changes into master To merge hotfix into master: 1 git checkout master 2 git merge hotfix Git responds with message that includes Fast forward Meaning: Git simply moved the master label up history of commits Next: delete branch hotfix it is no longer needed Next: Go back to working on iss53 David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-15 Delete branch hotfix with: git branch -d hotfix Make another commit to iss53 Delete branch hotfix David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-15 Want to merge iss53 into master But master can t just move up the commit history Will do a three-way merge Want to merge again David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-16 git merge iss53 Git analyzes the changes applied to the common ancestor by master and iss53 If master and iss53 made changes to the same lines, Git notes a conflict that must be resolved manually The three-way merge David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-16 Git surrounds conflicts with standard conflict resolution markers: Code between <<<<<<< and ======= is the code from HEAD (master) Code between ======= and >>>>>>> is the code from the merging branch (iss53) The three-way merge David Love VCS & Testing February 18, 2013 23 / 40

Branching Pro Git Image 3-16 Branch Merging, Conceptually The three-way merge Run git mergetool to use your merge tool to resolve the conflict Git creates some files to help you merge the conflicts successfully: file.local from the current branch (master) file.base from the common ancestor file.remote from the merging branch (iss53) David Love VCS & Testing February 18, 2013 23 / 40

Branching Branch Merging, Conceptually Pro Git Image 3-17 Git creates a merge commit once the conflicts are resolved (or if no conflicts) Note: after resolving a conflict, you must then git merge to generate the merge commit. Merge commit at end of three-way merge David Love VCS & Testing February 18, 2013 23 / 40

Branching Comparing Branches git difftool branch shows differences between the current branch and branch using the merge tool David Love VCS & Testing February 18, 2013 24 / 40

Branching Comparing Branches git difftool branch shows differences between the current branch and branch using the merge tool Double Dot Notation For branches A and B, A..B selects all commits in the history of B since splitting from A git log A..B gives all commit messages in B since splitting from A David Love VCS & Testing February 18, 2013 24 / 40

Branching Comparing Branches git difftool branch shows differences between the current branch and branch using the merge tool Double Dot Notation For branches A and B, A..B selects all commits in the history of B since splitting from A git log A..B gives all commit messages in B since splitting from A Triple Dot Notation A...B selects commits on both branches since splitting git log A...B gives all commit messages in either A or B since the common ancestor David Love VCS & Testing February 18, 2013 24 / 40

Remote Repositories Remote Repositories Git can connect to remote repositories over networks to collaborate with others origin repository When you clone from a remote source, the remote repository is automatically added to your local repository and named origin git remote List remote repositories git remote -v List remote repositories with more information git remote add Add a new remote repository git remote rename Rename a remote repository git remote remove Remove a remote repository David Love VCS & Testing February 18, 2013 25 / 40

Remote Repositories Remote Branches Remote repositories have their own branches that you can examine and merge with Remote Branch Names Remote branches have names <repository>/<branch>, e.g., origin/master git branch -r Show remote branches git branch -a Show all branches (local and remote) David Love VCS & Testing February 18, 2013 26 / 40

Remote Repositories Getting Updates from a Remote Repository Two options to get data from a remote repository: git fetch origin Updates remote branch from origin. Does not change any local branches. git pull origin Updates remote branch from origin. Tries to merge these changes into your local branch. You will have to resolve any conflicts After resolving the conflict, git commit to generate the merge commit David Love VCS & Testing February 18, 2013 27 / 40

Remote Repositories Adding Updates to a Remote Repository One command to update a remote branch with your local copy git push origin master Update master branch on origin with your local copy of master If no one has made changes to origin since your last pull, the push will go through. If someone else has pushed to origin, Git will prevent you from pushing your changes. You must first merge the changes in the local repository before pushing the new code. 1 Use git pull to merge the changes into your copy 1 get mergetool to resolve any conflicts 2 get commit to generate the commit merge 2 git push to update the remote repository David Love VCS & Testing February 18, 2013 28 / 40

Remote Repositories Workflow with Remote Git 1 Pull changes to start your work time 1 Read the logs of changes made 2 Create local branches to make your changes 3 Once they are correct, merge your local changes back together 4 Push the changes back to the server 1 If rejected, pull to merge changes 2 Resolve conflicts and commit, if necessary 3 Push changes back to the server David Love VCS & Testing February 18, 2013 29 / 40

Remote Repositories Sync in GitHub:Widows and Mac Github Sync Github s GUI for Windows and Mac has a sync button that automatically deals with push and pull commands David Love VCS & Testing February 18, 2013 30 / 40

1 Version Control System: Git 2 Unit (and other) Testing Assertions Unit Testing with xunit

Testing Assertions Assertions Assertion An assertion is a statement that something is true at a particular point in a program. If the statement is false, the program will halt immediately. Assertions can be used to ensure that: 1 Inputs are valid 2 Program or function outputs are consistent 3 Theoretical properties of the algorithm are satisfied David Love VCS & Testing February 18, 2013 31 / 40

Testing Assertions Example: Assertions in Matlab My code has a lower bound zlower that should be uniformly nondecreasing as the algorithm progresses It is updated with zlower = c*x I use an assertion to ensure the nondecreasing bound updating zlower Code example: assert( c*x >= zlower, Decrease in zlower ) zlower = c*x; David Love VCS & Testing February 18, 2013 32 / 40

Testing Assertions Runtime Testing Best Practices in Scientific Computing Assertions can make up a sizable fraction of the code in well-written applications, just as tools for calibrating scientific instruments can make up a sizable fraction of the equipment in a lab. If something goes wrong, the code halts immediately, greatly simplifying debugging Best Practices in Scientific Computing Assertions are executable documentation, i.e., they explain the program as well as checking its behavior. This makes them more useful in many cases than comments since the reader can be sure that they are accurate and up to date. David Love VCS & Testing February 18, 2013 33 / 40

Testing Unit Testing with xunit Automated Testing Best Practices for Scientific Computing [R]egression testing is the practice of running pre-existing tests after changes to the code in order to make sure that it hasn t regressed, i.e., that things which were working haven t been broken. The next line of defense is Automated Testing: Unit Test Tests a single unit of a program, e.g., a function or method Integration Test Tests that units work correctly when put together David Love VCS & Testing February 18, 2013 34 / 40

Testing Unit Testing with xunit Kinds of Test Cases Oracles Anything that tells you how a program should be working 1 Closed form solutions to special cases 2 Simple/small cases of the problem 3 Older versions of the code 1 Slow, simple algorithm to test complicated, fast algorithm 2 High level implementation to test lower level code (e.g., MATLAB to C++) Bugs Write a test to trigger a fixed bug to prevent it from reappearing David Love VCS & Testing February 18, 2013 35 / 40

Testing Unit Testing with xunit MATLAB xunit Test Framework xunit is a framework for writing unit tests It has been implemented for almost any language you can think of MATLAB xunit Test Framework Wikipedia s List of Unit Testing Frameworks David Love VCS & Testing February 18, 2013 36 / 40

Testing Unit Testing with xunit Building tests with xunit xunit tests have the same basic structure: input =... expectedoutput =... realoutput = YourCode( input ); assertequal( expectedoutput, realoutput ); Define the input and expected output (perhaps for multiple cases) Run your code for each input value Compare your expectation with what happened David Love VCS & Testing February 18, 2013 37 / 40

Testing Unit Testing with xunit xunit Assertions assertequal(a,b) A and B are equal. assertelementsalmostequal Elements of floating point matrices A and B are within some (absolute or relative) tolerance assertvectorsalmostequal norm(a-b) is within some (absolute or relative) tolerance of zero asserttrue,assertfalse Check Boolean values assertfilesequal Checks that files are the same assertexceptionthrown Checks that a specific exception was thrown David Love VCS & Testing February 18, 2013 38 / 40

Testing Unit Testing with xunit Running tests with xunit With MATLAB xunit Test Framework: Write your tests in their own directory Write each test case as an M-file function that returns no output arguments The function should start or end with test or Test Go to the test directory Run all tests with runtests Run a specific test with runtests TestName David Love VCS & Testing February 18, 2013 39 / 40

Testing Unit Testing with xunit Test Driven Development Test Driven Development Broadly speaking, TDD is the practice of writing the test cases for new software before the software is written. Benefits: Helps to clarify the purpose of the program before coding begins Tends to create more modular and extensible code Helps ensure tests are actually written! Possible drawbacks: May include poorly written tests May create false confidence No clear evidence that TDD improves productivity David Love VCS & Testing February 18, 2013 40 / 40

Thanks for listening! Questions?