An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012
1 Some Background 2 How is Buster used? 3 Software 4 Support
Outline 1 Some Background 2 How is Buster used? 3 Software 4 Support
What is Buster for? Raw computing power: Large datasets lots of memory Complex algorithms fast processing Batch processing Ability to set your algorithm running, and get on with other work. Interactive sessions Manipulating data in real-time. Cost effectiveness High-powered centralised computing facility shared among users
System architecture Frontend Execution Internet Fileserver
Background Rack mounted Usage Software Support
Specifications Frontend Node 2 x 2.33GHz Intel E5410 Quad Core processors 8GB fully buffered RAM Storage Cluster Lustre TM high performance filesystem 17TB RAID storage Execution Nodes (11 machines, 108 CPUs) 5 nodes: 3.16GHz Intel X5460 Quad Core x2, 16GB FBRAM 2 nodes: 2.93GHz Intel X5570 Quad Core x2, 16GB FBRAM 5 nodes: 2.80GHz Intel X5660 Six Core x2, 48GB FBRAM
Outline 1 Some Background 2 How is Buster used? 3 Software 4 Support
What you get Standard user accounts provide: Username (same as ITS username) and password 1GB fault-tolerant home storage Expandable if required Backed up nightly to ITS central backup service Default 150GB storage space. Fault tolerant NOT backed up! Up to 800GB scratch space, shared with other users Files deleted automatically after 14 days Standard software packages installed under the module system
Frontend node Frontend Execution Internet Fileserver
Logging in Login provided over SSH secure shell connection Hostname: buster.stats.warwick.ac.uk Provides password-protected access from anywhere on the internet. Graphical forwarding enabled (eg. R graphs, text editors) Clients: Linux / MacOS X - native ssh client MS Windows - recommended Xming client
Accessing files SSH Encrypted file transfer from anywhere on the internet Use scp (Linux/MacOS X) or WinSCP (Windows) Windows fileshare Unencrypted from within campus (still requires password, though) Home: \\buster\<username> Storage: \\buster\storage\<username>
The module system Software packages available via module command Allows versioning of software packages Checks for conflicts between packages Available packages: module avail Adding a module (default version): module add R Adding a module (specific version): module add R/2.8.1 Displaying information: module display R Help: module help
Submitting jobs Frontend Execution Internet Fileserver
Submitting jobs Grid Engine Jobs managed by Grid Engine (Sun/Oracle/Open Grid Scheduler) Interactive jobs Batch jobs Submit a job from the Frontend node, and Grid Engine sends it to a free slot on an execution node
Interactive jobs Requested via the qlogin command on the frontend node Short jobs only eg. running individual commands in R or interactive Python Pros: interactivity, graphics, quick and simple to use. Cons: you lose your job if your connection to Buster is interrupted, lower processor scheduling priority. Please remember to exit at the end of your session!
Batch Jobs Submit via a job submission script from the frontend node with qsub: Ideal for long jobs run in batch mode Pros: Allows requests for multiple processors Provides the task array facility Saving standard output and error buffers to disk Allows you to log out and get on with something else while your job runs High-priority processor scheduling Cons: You have to write a job submission script No interactivity/graphics
Batch Jobs Example job script - /usr/sge/examples/jobs/r-example.sh #!/bin/bash #$ -S /bin/bash #$ -o /storage/$user/r-example.stdout #$ -e /storage/$user/r-example.stderr #$ -l h vmem=500m,h rt=0. /etc/profile module add R cd /storage/$user time R --vanilla << EOF x<-runif(100) pdf("r-output.pdf") plot(x) dev.off() EOF
Submitting Jobs Simply: $ qsub <path to script> Queues veryshort 1 hour short 12 hours medium 24 hours long 48 hours unlimited
Batch Jobs Advanced options Task arrays - instruct Grid Engine to run N instances of your algorithm -t 1-N:5 (ie. 1 to N jobs, skip every 5) Parallel environments For running parallel algorithms only! Shared memory (smp) or distributed memory (mpi) -pe <smp mpi> n See intranet for further details
Monitoring jobs Monitoring jobs: qstat monitors job status To see how busy the queue is: qstat -u \* Killing jobs: qdel deletes jobs Requires the job number (use qstat) To kill all your jobs at once: qdel -u <username> % utilisation 0 40 80 Jul Aug Sep Oct Nov
Outline 1 Some Background 2 How is Buster used? 3 Software 4 Support
Available Software Applications R Maple Ox Ggobi Scilab Octave Scripting languages Python (+numpy) Perl R J Libraries GSL ATLAS LAPACK Boost GNU Multiprecision SPRNG JAGS ACML Compilers GNU Compiler Collection Sun Java 6 SE SDK
Outline 1 Some Background 2 How is Buster used? 3 Software 4 Support
Buster Support Using Buster: 1 Command man pages 2 Web documentation: Dept. Homepage Intranet Local IT Info Cluster 3 Forum: New Forum 4 Sysops: Phil Harvey-Smith & Simon Parkes Help on specific software: 1 Software package documentation 2 Web documentation (FAQ etc etc) 3 Mailing lists 4 Google!!!!!!!!!!!!!
Buster Forum http://forums.warwick.ac.uk/wf Departments Stats Buster A new forum has been set up to : 1 provide hints and tips 2 if you spend a long time finding a solution to a problem, others might benefit from your answer. 3 make requests for new or upgraded software 4 request help, these can range from trivial to research problems