Cluster Computing With R



Similar documents
Introduction to Sun Grid Engine (SGE)

Beyond Windows: Using the Linux Servers and the Grid

High Performance Computing

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

The SUN ONE Grid Engine BATCH SYSTEM

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Grid 101. Grid 101. Josh Hegie.

Grid Engine Users Guide p1 Edition

Introduction to the SGE/OGS batch-queuing system

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Using Parallel Computing to Run Multiple Jobs

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

An Introduction to High Performance Computing in the Department

Lab 1 Beginning C Program

Installing and running COMSOL on a Linux cluster

SSH Connections MACs the MAC XTerm application can be used to create an ssh connection, no utility is needed.

SGE Roll: Users Guide. Version Edition

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

PuTTY/Cygwin Tutorial. By Ben Meister Written for CS 23, Winter 2007

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

User s Manual

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Getting Started with HPC

Introduction to Operating Systems

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Grid Engine. Application Integration

Command-Line Operations : The Shell. Don't fear the command line...

Grid Engine 6. Troubleshooting. BioTeam Inc.

Configuration of High Performance Computing for Medical Imaging and Processing. SunGridEngine 6.2u5

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Quick Tutorial for Portable Batch System (PBS)

Easy Setup Guide 1&1 CLOUD SERVER. Creating Backups. for Linux

CycleServer Grid Engine Support Install Guide. version 1.25

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Scheduling in SAS 9.3

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Martinos Center Compute Clusters

Manual for using Super Computing Resources

Tutorial Guide to the IS Unix Service

NEC HPC-Linux-Cluster

Grid Engine Training Introduction

Introduction to Sun Grid Engine 5.3

Miami University RedHawk Cluster Connecting to the Cluster Using Windows

Using the Yale HPC Clusters

Scheduling in SAS 9.4 Second Edition

Parallel Debugging with DDT

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Secure Perfect RAID Recovery Instructions

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield.

PrimeRail Installation Notes Version A June 9,

Efficient cluster computing

GRID Computing: CAS Style

GAUSS 9.0. Quick-Start Guide

13.1 Backup virtual machines running on VMware ESXi / ESX Server

Configure Single Sign on Between Domino and WPS

Sun Grid Engine, a new scheduler for EGEE

WinSCP PuTTY as an alternative to F-Secure July 11, 2006

Using Red Hat Enterprise Linux with Georgia Tech's RHN Satellite Server Installing Red Hat Enterprise Linux

Command Line - Part 1

Windows Clients and GoPrint Print Queues

How to Backup XenServer VM with VirtualIQ

EVALUATION ONLY. WA2088 WebSphere Application Server 8.5 Administration on Windows. Student Labs. Web Age Solutions Inc.

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

PaperStream Connect. Setup Guide. Version Copyright Fujitsu

Linux Overview. Local facilities. Linux commands. The vi (gvim) editor

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Cadence Verilog Tutorial Windows Vista with Cygwin X Emulation

Installing FEAR on Windows, Linux, and Mac Systems

Getting Started with Mplus Version 7.31 Demo for Mac OS X and Linux

Supplement I.C. Creating, Compiling and Running Java Programs from the Command Window

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide

Running Knn Spark on EC2 Documentation

Birmingham Environment for Academic Research. Introduction to Linux Quick Reference Guide. Research Computing Team V1.0

Installing HSPICE on UNIX, Linux or Windows Platforms

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Urgency Classification Scope Description In case of Information only Other Several Issues Fixed Features / Enhancements:

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

SAM XFile. Trial Installation Guide Linux. Snell OD is in the process of being rebranded SAM XFile

Chapter 2: Getting Started

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Installing SQL-Ledger on Windows

Marvell SATA3 RAID Installation Guide

Moving the TRITON Reporting Databases

STATISTICA VERSION 9 STATISTICA ENTERPRISE INSTALLATION INSTRUCTIONS FOR USE WITH TERMINAL SERVER

Hodor and Bran - Job Scheduling and PBS Scripts

Deploying a Virtual Machine (Instance) using a Template via CloudStack UI in v4.5.x (procedure valid until Oct 2015)

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities

SteelEye Protection Suite for Linux v Network Attached Storage Recovery Kit Administration Guide

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Oracle Grid Engine. User Guide Release 6.2 Update 7 E

Running applications on the Cray XC30 4/12/2015

GeBro-BACKUP. Die Online-Datensicherung. Manual Pro Backup Client on a NAS

Setting Up ALERE with Client/Server Data

HPC system startup manual (version 1.30)

Transcription:

Cluster Computing With R Stowers Institute for Medical Research R/Bioconductor Discussion Group Earl F. Glynn Scientific Programmer 18 December 2007 1

Cluster Computing With R Accessing Linux Boxes from Windows Linux Servers Cluster Head Nodes Linux Command Prompt xterms Linux Home Directories Setup For Examples R Batch Jobs Using Linux Cluster Basics Sun Grid Engine R Batch Jobs on Cluster: Toy Examples and Time Perturbation Study 2

Accessing Linux Boxes from Windows Wiki: http://research/puttyxming PuTTY / Xming will be used to access Linux boxes from Windows 3

Accessing Linux Boxes from Windows PuTTY / xming Accessing Linux Hosts genekc02 64-bit genekc03 32-bit "Right" Click Xming icon Can access via PuTTY icon 4

Accessing Linux Boxes from Windows PuTTY / xming Accessing Cluster Head Nodes Confusing Names 32-bit cluster nodes cluster01 = Betelgeuse cluster02 = Sirius 64-bit cluster nodes cluster03 = Orion "Right" Click Xming icon 5

http://documents.wolfram.com/applications/astronomer/atlas/novdecjan.html 6

Accessing Linux Boxes from Windows Linux Console and Command Prompt cat.bashrc # or in.profile case $TERM in xterm*) PROMPT_COMMAND='echo -ne "\033]0;${USER} ${HOSTNAME}\007"' ;; *) ;; esac BLUETEXT="\[\e[34;1m\]" RESETTEXT="\[\e[0m\]" PS1="$BLUETEXT\n[\! \$(date '+%d%b%y %T') \$( pwd )]\n$resettext" Making Your Linux Bash Command Prompt Useful http://www.expertsrt.com/tutorials/matt/cmdprompt.html 7

Accessing Linux Boxes from Windows Linux Command Prompt and xterms 8

Accessing Linux Boxes from Windows LINUX Home Directories Files for cluster jobs normally should be in your home directory, e.g., /home/efg Suggest single directory per cluster job, e.g., /home/efg/cluster/r/simplegraph LINUX home directories can be accessed via Windows using an UNC name: \\lnaskc01\unixhomes\efg\cluster\r\simplegraph Not all filesystems available on genekc02,..., are available on clusters, e.g., $BLASTDB 9

Accessing Linux Boxes from Windows Summary Use Linux boxes for R "batch" jobs (discussed next) Normally use genekc02 or genekc03 boxes Can monitor how busy a box is with "top" (cluster is possible alternative when busy) only 1 CPU busy of 4 10

Accessing Linux Boxes from Windows Setup for Examples Login on genekc02 or genekc03 In your home directory: mkdir simplegraph cd simplegraph cp /home/efg/cluster/r/simplegraph/*. cd.. mkdir array cd array cp /home/efg/cluster/r/array/*. Or, run: /home/efg/cluster/r/rclass.bash 11

R Batch Jobs Using Linux Develop R script using Windows or Linux Develop parameterized R script [some limitations] R CMD BATCH Create Bash driver for R script Test Bash driver using "normal" Linux box 12

R Batch Jobs Using Linux Simple R Script and Batch Driver 13

R Batch Jobs Using Linux R CMD BATCH At Linux command prompt or in a script R CMD BATCH --vanilla --slave scriptname.r output.txt --vanilla Combination of --no-save: Don't save it --no-restore: Don't restore anything --no-site-file: Don't read the site-wide Rprofile --no-init-file: Don't read the.rprofile or ~/.Rprofile files --no-environ: Don't read the site and user environment file --slave: Make R run as quietly as possible See man R on Linux 14

R Batch Jobs Using Linux Simple R Script and Batch Driver q() not needed Run the script 15

R Batch Jobs Using Linux Simple R Script and Batch Driver Look at txt output file using Linux Windows Explorer: Usually need to "refresh" to see files Look at pdf output file using Windows /home/efg/cluster/r/simplegraph = \\lnaskc01\unixhomes\efg\cluster\r\simplegraph Use Windows file "extensions" so files can be processed in either Windows or Linux. 16

R Batch Jobs Using Linux Parameterized R Script and Driver http://quantitative-ecology.blogspot.com/2007/08/including-arguments-in-r-cmd-batch-mode.html 17

R Batch Jobs Using Linux Parameterized R Script and Driver 18

R Batch Jobs Using Linux Parameterized R Script and Driver 19

R Batch Jobs Using Linux Parameterized R Script and Driver Mouse Zebrafish Should have overridden ChartTitle. Suggestions? 20

R Batch Jobs Using Linux Parameterized R Script and Driver Windows: Use WordPad (but NOT NotePad) to view.txt file 21

Cluster Basics R is configured nearly identically on 32-bit and 64-bit platforms. Limitations: Can be subtle differences in environmental variables, executable paths, between head node and cluster nodes, or head node and other Linux boxes. A "good" job for cluster is mostly CPU intensive many computations. Short jobs, or jobs with a lot of I/O, may not be good cluster jobs. Must search standard out/error files for errors 22

Cluster Basics IT Wiki http://wiki/it/clusters Many changes are coming over next few months. Wiki pages are a bit dated right now. 23

Cluster Basics Betelgeuse / cluster01 IT Wiki: http://wiki/it/clusters/userguide Confusing Names 32-bit cluster nodes cluster01 = Betelgeuse cluster02 = Sirius 64-bit cluster nodes cluster03 = Orion 24

Cluster Basics Betelgeuse / cluster01 Load Monitor X 25

Cluster Basics Betelgeuse / cluster01 26

Cluster Basics Cluster Head Node Directly login to cluster head node using Xming (right click, SIMR cluster, <cluster name>) Don't do any unnecessary work on the head node. Use head node ONLY to submit jobs via the Sun Grid Engine (discussed next) If you forget, kill any jobs accidentally run on cluster head node as soon as possible. 27

Sun Grid Engine qrsh: Interactively run job when genekc02, genekc03,..., are too busy qsub: Submit cluster jobs qstat: Status of cluster jobs submitted qdel: Delete cluster job qmon: GUI Job Monitor (submit, manage, monitor) 28

Sun Grid Engine qrsh What if genekc02, genekc03,... are fully loaded and you want to use another CPU? What if you have a very long-running job and would like to not "hog" the main Linux boxes? Some people have no choice where to run jobs due to license limitations. When licensed CPUs are in use, they have no options. Be kind to your neighbor by using cluster node when possible. If cluster nodes are available, qrsh can be used as an alternative. 29

Sun Grid Engine qrsh 1. Check that cluster nodes are available. 2. Login to cluster head node: cluster01 (Betelgeuse) for now [right click Xming SIMR Cluster cluster01] 30

Sun Grid Engine qrsh 3. Open xterm if desired. 4. Issue qrsh command: qrsh q all.q In this case we were assigned node0050 5. Change to work directory and process work 31

Sun Grid Engine qrsh 5. Process any work on cluster node 32

Sun Grid Engine qrsh 6. Enter "exit" to close qrsh session on node. Enter "exit" to close cluster01 session. 33

Sun Grid Engine qsub Submit job to cluster Write "wrapper" script Process a "chunk" of work, perhaps a file, when script is invoked Script can read/write any number of files Avoid unnecessary network I/O Submit script to cluster for execution on node selected by Sun Grid Engine Consider submission script for documentation qsub cwd o output.txt j yes N "name" wrapper.bash [parms] Standard Out and Standard Error will be written to output.txt. 34 Use e error.txt (with default j no) to capture Standard Error.

Sun Grid Engine qsub Submit array job to cluster Wrapper script to perform task Sun Grid Engine sets up environment variable $SGE_TASK_ID with numeric value Script decides what work to do based on $SGE_TASK_ID environment variable qsub cwd t 1-4 N "ArrayJob" arrayjob.bash -t start-stop:increment, start > 0 35

Sun Grid Engine qsub ** CAUTION ** Can submit job to various queues, but not all queues are equal right now. Unfortunaely, "q all.q" is not consistently shown in examples here. For now, always use q all.q with qsub. qsub q all.q cwd t 1-4 N "ArrayJob" arrayjob.bash 36

Sun Grid Engine qstat and qdel List your jobs in SGE queue (login ID xxx) qstat u xxx Delete all your jobs in SGE queue qdel u xxx qmon is an alternative 37

R Batch Jobs on Cluster "Toy" Examples to Show Concepts SimpleGraph Example ArrayJob Example Overview of Research Project Time Perturbation Somitogenesis Studies 38

R Batch Job on Cluster SimpleGraph Example Modify batch script: batch.bash Submission Script: submit.bash Fix Editing in Windows 39

R Batch Job on Cluster SimpleGraph Example Job Script: simplegraph1.bash 40

R Batch Job on Cluster SimpleGraph Example R Script: simplegraph1.r 41

R Batch Job on Cluster SimpleGraph Example Run submit.bash to submit jobs to cluster: 1. Login to Cluster Head Node 2. Change to working directory 3. Execute the submit.bash script Delete all jobs: qdel u efg 42

R Batch Job on Cluster SimpleGraph Example Output files: Standard Out and Standard Error Files have no useful information and can be deleted. 43

R Batch Job on Cluster R Array Job Example Submission Script: submit.bash Job Script: arrayjob.bash R Script: arrayjob.r 44

R Batch Job on Cluster R Array Job Example Submit Array Job Output files Standard Out and Standard Error Files are zero-length and can be deleted. 45

R Batch Job on Cluster R Array Job Example Output files Cannot create JPEGs or PNGs on cluster nodes for now, but PDFs will work for graphical output. 46

R Batch Job on Cluster Time Perturbation Somitogenesis Studies

R Batch Job on Cluster Time Perturbation Somitogenesis Studies submit.bash submit-batch.bash & submit-batch.bash & Compare250.bash See files in /home/efg/cluster/timeperturb/mouse17

Questions? 49