Data management on HPC platforms

Similar documents
Version Control with Svn, Git and git-svn. Kate Hedstrom ARSC, UAF

MATLAB & Git Versioning: The Very Basics

Version Control with Git. Linux Users Group UT Arlington. Rohit Rawat

Version Control with Git. Kate Hedstrom ARSC, UAF

Version Control Tutorial using TortoiseSVN and. TortoiseGit

Version Control using Git and Github. Joseph Rivera

Version Control! Scenarios, Working with Git!

Advanced Computing Tools for Applied Research Chapter 4. Version control

Version Control with Git. Dylan Nugent

Using Git for Project Management with µvision

Version control. with git and GitHub. Karl Broman. Biostatistics & Medical Informatics, UW Madison

Revision control systems (RCS) and

Version Control with Subversion

Git - Working with Remote Repositories

Introduction to Version Control

Version Control with Git

Git Basics. Christopher Simpkins Chris Simpkins (Georgia Tech) CS 2340 Objects and Design CS / 22

Distributed Version Control with Mercurial and git

Version Control Systems

Version Control. Version Control

Source Control Systems

Version control. HEAD is the name of the latest revision in the repository. It can be used in subversion rather than the latest revision number.

Version Control for Computational Economists: An Introduction

Version Control with Git

Using GitHub for Rally Apps (Mac Version)

Introduction to Git. Markus Kötter Notes. Leinelab Workshop July 28, 2015

Version Control with. Ben Morgan

FEEG Applied Programming 3 - Version Control and Git II

Lab Exercise Part II: Git: A distributed version control system

Annoyances with our current source control Can it get more comfortable? Git Appendix. Git vs Subversion. Andrey Kotlarski 13.XII.

Version control with Subversion

CSE 374 Programming Concepts & Tools. Laura Campbell (Thanks to Hal Perkins) Winter 2014 Lecture 16 Version control and svn

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Version control with GIT

Introduc)on to Version Control with Git. Pradeep Sivakumar, PhD Sr. Computa5onal Specialist Research Compu5ng, NUIT

CS 2112 Lab: Version Control

An Introduction to Mercurial Version Control Software

Subversion. Nadir SOUALEM. Linux Users subversion client svn or higher. Windows users subversion client Tortoise 1.6.

Software Configuration Management. Slides derived from Dr. Sara Stoecklin s notes and various web sources.

CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)

Derived from Chris Cannam's original at, an.

Introducing Xcode Source Control

Setting up a local working copy with SVN, MAMP and rsync. Agentic

Mercurial. Why version control (Single users)

Introduction to the Git Version Control System

Connecting to the School of Computing Servers and Transferring Files

Work. MATLAB Source Control Using Git

BlueJ Teamwork Tutorial

Git, GitHub & Web Hosting Workshop

Introduction to Version Control with Git

Continuous Integration. CSC 440: Software Engineering Slide #1

Version Control Using Subversion. Version Control Using Subversion 1 / 27

Version Control Systems: SVN and GIT. How do VCS support SW development teams?

Version Control Script

The Hitchhiker s Guide to Github: SAS Programming Goes Social Jiangtang Hu d-wise Technologies, Inc., Morrisville, NC

CISC 275: Introduction to Software Engineering. Lab 5: Introduction to Revision Control with. Charlie Greenbacker University of Delaware Fall 2011

Continuous Integration

Version Control with Mercurial and SSH

Introduction to Version Control with Git

Source Control Guide: Git

BioHPC Cloud Storage. portal.biohpc.swmed.edu Updated for

HPC system startup manual (version 1.30)

Developer Workshop Marc Dumontier McMaster/OSCAR-EMR

Two Best Practices for Scientific Computing

Version Uncontrolled! : How to Manage Your Version Control

Software configuration management

SFTP Server User Login Instructions. Open Internet explorer and enter the following url:

MOOSE-Based Application Development on GitLab

Memopol Documentation

Manual. CollabNet Subversion Connector to HP Quality Center. Version 1.2

Accessing the FTP Server - User Manual

Software Configuration Management and Continuous Integration

Content. Development Tools 2(63)

BACKUP YOUR SENSITIVE DATA WITH BACKUP- MANAGER

GENERAL FILE TRANSFER GUIDELINES

Source code management systems

Redmine: A project management software tool. January, 2013

Miguel A. Figueroa Villanueva Xabriel J. Collazo Mojica. ICOM 5047 Capstone Miguel A. Figueroa Villanueva University of Puerto Rico Mayagüez Campus

1. History 2. Structure 3. Git Comparison 4. File Storage 5. File Tracking 6. Staging 7. Queues (MQ) 8. Merge Tools 9. Interfaces

User s Manual

Using sftp in Informatica PowerCenter

How to Backup XenServer VM with VirtualIQ

What Does Tequila Have to Do with Managing Macs? Using Open Source Tools to Manage Mac OS in the Enterprise!

Export & Backup Guide

Beginning with SubclipseSVN

WinSCP Tutorial 01/28/09: Y. Liow

FTP Accounts Contents

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

HADOOP CLUSTER SETUP GUIDE:

IT at D-PHYS - Tutorial

Source Code Management for Continuous Integration and Deployment. Version 1.0 DO NOT DISTRIBUTE

Using Subversion in Computer Science

How To Backup In Cisco Uk Central And Cisco Cusd (Cisco) Cusm (Custodian) (Cusd) (Uk) (Usd).Com) (Ucs) (Cyse

Ad Hoc (Temporary) Accounts Instructions

Transcription:

Data management on HPC platforms Transferring data and handling code with Git scitas.epfl.ch September 10, 2015 http://bit.ly/1jkghz4

What kind of data Categorizing data to define a strategy Based on size? Based on format? Based on purpose? 1 / 22

What kind of data Categorizing data to define a strategy Based on size? some kilo, multiple mega, a tera,... Based on format? Based on purpose? 1 / 22

What kind of data Categorizing data to define a strategy Based on size? some kilo, multiple mega, a tera,... Based on format? binary/ascii, etc... Based on purpose? 1 / 22

What kind of data Categorizing data to define a strategy Based on size? some kilo, multiple mega, a tera,... Based on format? binary/ascii, etc... Based on purpose? code/input data/output data, etc... 1 / 22

Show me your data, I tell you what you do Different types of data considered Simulation input Simulation output Processed results Simulations code, pre/post processing scripts,... 2 / 22

Show me your data, I tell you what you do Different types of data considered Simulation input Simulation output Processed results Simulations code, pre/post processing scripts,... Type of tools Big data type of tools Versioning type of tools 2 / 22

Clusters folder structure /home User configurations, codes, input files, submission scripts /scratch Output files from running jobs /work Long term output files storage /tmp Node local space if needed 3 / 22

Clusters folder structure /home User configurations, codes, input files, submission scripts /scratch Output files from running jobs /work Long term output files storage /tmp Node local space if needed home visibly all clusters / all nodes size global 100TB, quota 100GB per user location /home/<username> backup backup on tapes, snapshots easy access $HOME 3 / 22

Clusters folder structure /home User configurations, codes, input files, submission scripts /scratch Output files from running jobs /work Long term output files storage /tmp Node local space if needed scratch visibly per cluster / all nodes size aries 87TB, bellatrix 200TB, castor 22TB, deneb 350TB location /scratch/<group>/<username> backup no backup, no snapshots, data removed if needed easy access $SCRATCH in jobs 3 / 22

Clusters folder structure /home User configurations, codes, input files, submission scripts /scratch Output files from running jobs /work Long term output files storage /tmp Node local space if needed work visibly all cluster / all nodes size global 100TB, quota 50GB per group (+1TB/300.- for 3years) location /work/<group> backup backup on demand (to pay), snapshots easy access $WORK in jobs 3 / 22

Clusters folder structure /home User configurations, codes, input files, submission scripts /scratch Output files from running jobs tmp /work Long term output files storage /tmp Node local space if needed visibly per node size node dependent location /tmp/${slurm JOB ID} only during job backup no backup, no snapshots, removed at job end easy access $TMPDIR in jobs 3 / 22

Exercise 1: Simple connection Questions: Connect to your favorite front node Check the different folders /home /scratch /work Ok this exercise is just to be sure you can connect to the cluster 4 / 22

Big data SSH based file transfer SSH: Secure SHell Different ways: scp secure copy sftp secure file transfer rsync remote synchronization sshfs ssh file system This data should be on your /scratch or /work 5 / 22

Move your Big data with scp scp works fine for Linux/MacOS, preinstalled Could be complicated on Windows: pscp.exe from a cmd.exe http://www.chiark.greenend.org.uk/~sgtatham/putty/ download.html 6 / 22

Exercise 2: Using scp Questions: Copy a file from your machine to the cluster. Retrieve a file from the cluster to your machine 7 / 22

Move your Big data with sftp sftp tones of GUIs, for example: Filezilla https://filezilla-project.org ( all OSes) Cyberduck https://cyberduck.io/ (Windows and Mac) Tunnelier https://bitvise.com/tunnelier (Windows) TUIs also exists like lftp 8 / 22

Exercise 3: Using sftp Questions: Try downloading one of these tools and connect to a cluster 9 / 22

Move your Big data with rsync sshfs On Linux type file location in your file browser sshfs://<remote>/<path> Note: seams to have ways to be used on MacOS or Windows rsync Synchronizes two folders, folders could be remote Could/should be used instead of mv GUIs exist for all OSes https://en.wikipedia.org/wiki/rsync 10 / 22

Exercise 4: Using rsync Questions: Create a temporary tmp/ folder in your home folder Copy the folder /ssoft/sources/cmake-3.2.3 to this tmp/ folder Create a backup/ folder in the tmp/ folder Use rsync to copy the cmake folder in your tmp/backup folder Modify the README file in the code of cmake in tmp/cmake-3.2.3 Re-synchronize the files 11 / 22

Versionning What version control means Keep track of the changes Merge changes from multiple sources Strategies Local version control (RCS) Remote on a central server (CVS, Subversion) Distributed version control (Git, Mercurial, Bazaar) 12 / 22

Versionning : with Git Git: the stupid content tracker Distributed revision control Originally developed by Linus Torvald Named after the egotistical bastard Linus Remote server File versions DB Version 3 Version 2 Version 1 Computer 1 File File versions DB Version 3 Version 2 Version 1 Computer 2 File File versions DB Version 3 Version 2 Version 1 13 / 22

Git basics working copy staging space.git directory remote server 14 / 22

Git basics: clone repository working copy staging space.git directory remote server clone Command git clone <repo url> 14 / 22

Git basics: add file working copy staging space.git directory remote server add Command git add <files>... 14 / 22

Git basics: commit modifications working copy staging space.git directory remote server commit Command git commit -m "comment" 14 / 22

Git basics: pull modifications working copy staging space.git directory remote server pull Command git pull 14 / 22

Git basics: push to server working copy staging space.git directory remote server push Command git push 14 / 22

Git basics working copy staging space.git directory remote server clone add commit pull push 14 / 22

Exercise 5: First step with Git Questions: If you do not have git installed, get it from https://git-scm.com/downloads Clone the repository https://<username>@git.epfl.ch/repo/scitas-test.git Create a file, use your username in the name of the file Check the state of your working copy Add the file to the repository Commit your modifications Pull the potential modifications from the server Push your changes to the server 15 / 22

Resolve conflicts working copy staging space.git directory remote server modify file text 16 / 22

Resolve conflicts working copy staging space.git directory remote server add commit 16 / 22

Resolve conflicts working copy staging space.git directory remote server pull remote modifications on text 16 / 22

Resolve conflicts working copy staging space.git directory remote server Correct conflit 16 / 22

Resolve conflicts working copy staging space.git directory remote server commit -a 16 / 22

Resolve conflicts working copy staging space.git directory remote server push 16 / 22

Resolve conflicts working copy staging space.git directory remote server modify file text add commit pull remote modifications on text Correct conflit commit -a push 16 / 22

Exercise 6: Generate and solve conflicts Questions: Clone the scitas-test repository in a different folder Modify the file created in the previous exercise in both clones Commit this both modifications Pull and push in one of the clone Pull in the second clone, You should get a conflict <<<<<<<<<< One version ========== Other version >>>>>>>>>> Check the local status Correct the conflict and commit using git commit -a Push the modifications 17 / 22

Multiple servers for one project Remote server 1 File versions DB Version 3 Version 2 Version 1 18 / 22

Multiple servers for one project Remote server 1 File versions DB Version 3 Version 2 Version 1 Computer Command On Computer: git clone <remote url 1> File File versions DB Version 3 Version 2 Version 1 18 / 22

Multiple servers for one project Remote server 1 File versions DB Version 3 Remote server 2 File versions DB Version 2 Version 1 origin Computer File File versions DB Version 3 Version 2 Version 1 Command On Remote server 2: git init --bare 18 / 22

Multiple servers for one project Remote server 1 File versions DB Version 3 Remote server 2 File versions DB Version 2 Version 1 origin Computer File server2 Command On Computer: git remote server2 \ <remote url 2> File versions DB Version 3 Version 2 Version 1 18 / 22

Multiple servers for one project Remote server 1 File versions DB Version 3 Version 2 Version 1 Remote server 2 File versions DB Version 3 Version 2 Version 1 origin server2 Command Computer File File versions DB Version 3 Version 2 Version 1 On Computer: git push server2 18 / 22

Exercise 7: Handle remotes Questions: Connect on the front node of your favorite cluster Create a new folder that will contain your server In this folder initialize a new git server In one of the former clone of scitas-test add the new remote URL <cluster name>:<path to repo> List the remotes to see if everything looks correct Push the local content to the new server On the cluster clone this new server URL <path to repo> Note: The access permission on this new server are based on the file system permissions 19 / 22

Branching - Merging version 1 20 / 22

Branching - Merging Command git commit... version 2 version 1 20 / 22

Branching - Merging version 3 Command git branch branch name git checkout branch name or git checkout -b branch name version 2 version 1 20 / 22

Branching - Merging version 3 version 3 Command git checkout master git commit... version 2 version 1 20 / 22

Branching - Merging version 4 Command git commit... version 3 version 3 version 2 version 1 20 / 22

Branching - Merging version 4 version 3 version 4 version 3 Command git checkout branch name git commit... git checkout branch name git commit... version 2 version 1 20 / 22

Branching - Merging version 4 merge 1 version 4 Command git merge master version 3 version 3 version 2 version 1 20 / 22

Branching - Merging version 5 version 4 merge 1 version 4 Command git commit... version 3 version 3 version 2 version 1 20 / 22

Branching - Merging version 5 version 5 version 4 version 3 merge 1 version 4 version 3 Command git checkout master git commit... version 2 version 1 20 / 22

Branching - Merging merge 2 version 5 version 5 version 4 merge 1 version 4 Command git merge branch name version 3 version 3 version 2 version 1 20 / 22

Branching - Merging version 6 merge 2 version 5 version 5 version 4 merge 1 version 4 Command git commit... version 3 version 3 version 2 version 1 20 / 22

Exercise 8: Handle remotes Questions: Create a branch Modify a file and commit the changes Checkout the master branch Modify a file and commit the changes Merge the branch previously created in the master branch List all branches Print the logs of the different modifications Delete the merged branch 21 / 22

Sources and extra infos Sources Wikipedia http://git-scm.com Manpages: rsync, git Learn more Optimizing your research data management http://sfp.epfl.ch/page-66455-en.html Git with a game: http://pcottle.github.io/learngitbranching/ Handling big files with git: https://git-annex.branchable.com 22 / 22