Informationsaustausch für Nutzer des Aachener HPC Clusters

Transcription

1 Informationsaustausch für Nutzer des Aachener HPC Clusters Paul Kapinos, Marcus Wagner

2 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based management of the cluster resources Interactive usage Using the batch system Integratives Hosting Discussion 3 von 5

3 The RWTH Compute cluster No. 32 in TOP500 (June 2011), no. 4 in Germany No. 272 in TOP500 (Nov 2014) MPI Partition 1358 Westmere EP nodes 2x Xeon X5675 (6-core 3.06 GHz => 16k cores 24 or 96 GB RAM (4:1) SMP Partition 88 Nehalem EX nodes 16x Xeon X7550 (8-core 2.00 GHz => 11k cores GB RAM Interactive front ends and back ends ( 1% of cluster) 300 cores, max. 256 GB RAM, users p.n. (GPU cluster, MIC cluster, old hardware, IH systems)

4 Why? Because we must: requirements of the Science Council (Wissenschaftsrat) to request funding for future computer systems:. a scientific process for the allocation of the expensive compute resources has to be established which will guarantee fair handling of all users.

5 Why? (ii) fair distribution of resources main goals: collation between used resources and scientific value defined (short...) job staring times defined, predictable throughput for researchers effective and resource saving usage pattern would you drive fuel-saving, if you do not have to pay for the fuel? if you indeed have no clue how many fuel did you burned? and, last but not least, to keep some buddies within bounds

6 The status Implemented with projects and queues in LSF batch system JARA-HPC partition (30% ): since 2012 general introduction: Q3/2014 up n runnin now Use a project: add a line to your batch file #BSUB -P abcd4321 Check your quota: $ r_batch_usage

7 How? (II) Free quota scientific employees: 2000 core-h per month (about a week of 12-core node p.m.) students: 500 core-h per month Need more? file a project! RWTH Small (S): up to 10,000 core-h p.m. (0.01 mio core-h p.m.) technical review only RWTH Standard (M): up to 0.05 mio core-h p.m. a project description is required (=> internal scientific review) JARA-HPC/RWTH Big (XL/L): up to 2.5 mio core-h p.m. submission twice a year following the JARA-HPC procedures a detailed project description is required (=> external scientific reviews) student need more for lecture / course / thesis? RWTH lecture, RWTH thesis (XS) (up to 20,000 core-h p.m. (*) ) technical review only

8 How-to file an Application for computing time go to decide for what type of project you should apply determine your needs; don t be shy! don t try to be too exact It s better to ask for 30% too much than 1% too less it s easier to ask for round sum (compute time, duration..) both for you and us think about special requirements: overlong compute time? (more that 120h not possible) disk storage? one huge project, or maybe multiple subprojects?.

9 How-to file an Application for computing time (II) go to fill in the right form use Acrobat X to edit the PDF file we need the data to be extracted electronically do not use meaningless values like normal, much for e.g. memory consumption do not cut a corner we do not know who Mr See Above is! send the electronically-readable PDF file to [email protected] do not send us screen shots, JPG, PNG, DOCX, TXT files do not send us signed+scanned PDFs via print the same file, sign it, and fax or mail it to us do not send send us signed+scanned PDFs via At the end we need the same document in two versions: signed+legal (thus fax or mail), and electronically readable.

10 How-to file an Application for computing time (III) go to filing an application for RWTH Standard (M) project? A project description is required (for internal scientific reviews) Bring up if your project is a follow-up project, is funded by some organisation, filing an application for JARA-HPC/RWTH-Big (XL/L) project? Submission twice a year following the JARA-HPC procedures

11 How-to file an Application for computing time (IV) go to Application form filled, ed, printed out, signed, faxed/mailed? then wait In typically a week: a message that both versions of application form has been arrived. Some days later: 1) a message that the project is ready-to-use, (for small projects), or 2) a message that the project has been introduced with a test quota of 0.01 Mio corehours per month, and the scientific review process started (for larger projects) For (2), some weeks (or either months ) later: a message that the project is approved and full remaining quota is granted (often the runtime of project is adjusted, too, according to the delay) Yes we know: this process is a really tedious and lengthy one Working on improving it. But at least the scientific review will stay delaying factor.

12 What happen if over quota? running jobs continue to the end (and still consume core-h!) new-submitted and pending jobs moved to low-priority queue they still can start! but if and only if there are free resources not used by normal-priority jobs if started from low queue, still consume core-h quota may go well in the red! today, no hard limit in low-priority queue this will be subject of change in future, very likely at 1 st of any month, you get next month s quota added. if you are in the black with your quota then, new jobs will be submitted and pending jobs will be moved to normal-priority queue technically, no difference from which queue job is started only start time differs!

13 How is the quota computed? main goal is to motivate the users for continuous using of resources but still allow some peaks Three-month sliding window up to 300% of month quota available in a month unused quota from previous month is transferred to the current month but not further The quota for the previous, the current, and the next month are added up The consumed core-h for the previous and for the current month are added up The difference between both values is the amount of core-h available in the current month Huh?

14 Check your quota now! Check your quota: $ r_batch_usage h (manual of r_batch_usage) $ r_batch_usage (overwiew; big terminal window advisable) $ r_batch_usage -q User: pk Status of user: RWTH-Mitarbeiter Quota monthly (core-h): 2000 Remaining core-h of prev. month: Consumed core-h act. month: 3938 Consumable core-h (%): -101 Consumable core-h: 0

15 Check your project quota now! Usage of batch system with projects: $ r_batch_usage h (manual of r_batch_usage) $ r_batch_usage p <projectname> (overwiew) $ r_batch_usage -p jara0001 -q Group: jara0001 Start of Accounting Period: End of Accounting Period: State of project: active Quota monthly (core-h): Remaining core-h of prev. month: 0 Consumed core-h act. month: Consumable core-h (%): 70 Consumable core-h:

16 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based management of the cluster resources Interactive usage Using the batch system Integratives Hosting Discussion 3 von 5

17 Interactive usage Batch system: MPI Partition 1358 Westmere EP nodes => 16k cores SMP Partition 88 Nehalem EX nodes=> 11k cores Interactive front ends and back ends ( 1% of cluster) 300 cores, max. 256 GB RAM, users p.n. challenged resource! an issue with an interactive node directly affect 100s of users we want to enable as much of test options as possible notorious trade-off between feature and stability

18 Interactive usage Go to: Interactive front ends cluster.rz.rwth-aachen.de, cluster-linux.rz.rwth-aachen.de (main front ends) cluster-copy.rz.rwth-aachen.de, cluster-copy2.rz.rwth-aachen.de (for file transfer only) cluster-x.rz.rwth-aachen.de, cluster-x2.rz.rwth-aachen.de (GUI / remote desktop). and others. supported protocols: SSH (with X11 forwarding), SCP, remote desktop (FastX/XWin32) $ ssh -X -l ab cluster.rz.rwth-aachen.de accessible from RWTH network only (VPN helps!) Interactive back ends used to off-load MPI processes started on front ends off-loading managed by Interactive MPIEXEC wrapper hardware subject of change currently: 8x 12-core Westmere with 96 GB RAM 3 von 5

19 Interactive usage Go to: Interactive front ends are frequented by 100s of users! any issue directly interrupt work of these users! Purposes: data transfer, job submission, application porting, testing, tuning, debugging NOT FOR PRODUCTIVE RUNS USE BATCH SYSTEM Rule of thumb: not more than 20 minutes of CPU time that does not mean I can start 80x of 19.5-minute-runs one after another! Really need compute power and interactive session? Batch jobs with GUI: In terms of advanced testing, we set flexible quotas using cgroup system CPU: processes of a user are configured to get the same amount of CPU cycles as all processes of other user Memory: real memory is limited to a part of available RAM - this prevent the situation when one user consume all RAM and crash the whole node - use memquota command to find out current situation 3 von 5

20 Interactive usage Go to: Interactive front ends are frequented by 100s of users! any issue directly interrupt work of these users! Main Performance Issue on front ends: DATA TRANSFER Use dedicated front ends for any data transfer, TAR, ZIP, cluster-copy.rz.rwth-aachen.de cluster-copy2.rz.rwth-aachen.de 3 von 5

21 Interactive usage Go to: Interactive back ends unprovided for log in hardware subject of change (currently: 8x 12-core Westmere with 96 GB RAM) used to off-load MPI processes started on front ends (reduce load!) off-loading managed by Interactive MPIEXEC wrapper example: $MPIEXEC np 2 hostname processes started on less-loaded nodes, but with massive overloading allowed further, you re not alone on these systems load of 100+ is not unusual any productive runs and time measurements absurd to the highest degree the only sense of (overloading) test runs: test of will my binary start with XYZ ranks? - if yes, Ctrl-C and proceed to the Batch System - if not, you ve got the reply immediately (instead of waiting a day for the batch job) NOT FOR PRODUCTIVE RUNS USE BATCH SYSTEM 3 von 5

22 Interactive usage : Changed Terms of Use passing along your HPC account to third parties is explicitly forbidden secondary logins will be gradually deactivated during the next months secondary accounts (after 05/2014) already configured without login permissions the only use of secondary accounts now: data sharing Jobs and processes (in Batch, on interactive front ends and back ends) which disturb other jobs/processes, may be killed without further notice. If your job has been killed, you probably do some bad thing read the documentation! 14%3A+Changed+Terms+of+Use

23 "Would You Like to Know More?" Links The Primer, It s worth reading: slightly outdated now update planned The Dokuweb, it s most actual and worth reading, too: Linux Cluster: Parallel Programming: Events: Today s themes in Dokuweb: Project-based management of the cluster resources Manuals: Interactive usage: Batch system: