General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)
|
|
|
- Kristopher Richards
- 9 years ago
- Views:
Transcription
1 Slurm Training15
2 Agenda About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration
3 About Slurm
4 About Slurm
5 About Slurm About Slurm Slurm was an acronym for Simple Linux Utility for Resource Management. Development started in 2002 at LLNL. It evolved into a capable job scheduler through use of optional plugins. On 2010 the Slurm developers founded SchedMD. It has about 500,000 lines of C code. Not Simple anymore. Now is called : Slurm Workload Manager. Supports AIX, Linux, Solaris, other Unix variants. Used on many of the world s largest computers. Official website :
6 About Slurm Support Community Support through list We strongly recommend 3rd-Level Support provided by SchedMD
7 Key Features Available in Slurm Key Features Available in Slurm Full control over CPU usage Topology awareness Full control Memory usage Job Preemption support Scheduling techniques & Job requeue mechanism performance Job Array support Integration with MPI Interactive sessions support High Availability Debugger friendly Suspension and Resume capabilities Kernel Level Checkpointing & Restart Job Migration Job profiling
8 Extending Slurm Extending Slurm features Write a plug-in C API slurm.h Perl API Python API Contributing back to Slurm 1 Fork the Github Slurm branch. 2 Clone the Github Slurm branch. 3 Upload your changes to your github repo. 4 Create a pull request to SchedMD Github.
9 Resource Management Manage resources within a single cluster Nodes state (up/down/idle/allocated/mix/drained) access Jobs queued and running start/stop suspended resizing completing
10 Job States Figure: Job States and job flow. Source SchedMD
11 Definitions of Socket, Core & Thread Figure: Definitions of Socket, Core & Thread. Source SchedMD
12 Slurm daemons Slurm daemons slurmctld Central management daemon Master/Slave slurmdbd (optional) Accounting database system slurmd On every compute node slurmstepd Started by slurmd for every job step
13 Job/step allocation sbatch submits a script job. salloc creates an allocation. srun runs a command across nodes. srun_cr runs a command across nodes with C&R. sattach Connect stdin/out/err for an existing job or job step. scancel cancels a running or pending job. sbcast Transfer file to a compute nodes allocated to a job.
14 srun : Simple way to manage MPI, OpenMP, pthreads & serial jobs Slurm provides a single command line to manage all the MPI flavors, OpenMP, Pthreads and serial applications Users don t need to worry about MPI flags and options for each MPI implementation mpirun/mpiexec/mpiexec.hydra The tool is called srun and it should be mandatory for submitting jobs in the cluster. It provides key features like: memory affinity CPU binding cgroups integration
15 sbatch ~ $ vim testjob.sl ~ $ cat testjob.sl #!/bin/bash #SBATCH --nodes=10 srun echo "running on : $(hostname)" srun echo "allocation : $SLURM_NODELIST" ~ $ sbatch testjob.sl Submitted batch job ~ $ cat slurm out running on : hsw001 allocation : hsw[ ]
16 sbatch ~ $ vim testjob.sl ~ $ cat testjob.sl #!/bin/bash #SBATCH --nodes=10 srun echo "running on : $(hostname)" srun echo "allocation : $SLURM_NODELIST" ~ $ sbatch testjob.sl Submitted batch job ~ $ cat slurm out running on : hsw001 allocation : hsw[ ]
17 sbatch ~ $ vim testjob.sl ~ $ cat testjob.sl #!/bin/bash #SBATCH --nodes=10 srun echo "running on : $(hostname)" srun echo "allocation : $SLURM_NODELIST" ~ $ sbatch testjob.sl Submitted batch job ~ $ cat slurm out running on : hsw001 allocation : hsw[ ]
18 Submitting a Job Job Description Example : SMP #!/bin/bash #SBATCH -o blastn-%j.%n.out #SBATCH -D /share/src/blast+/example #SBATCH -J BLAST #SBATCH --cpus-per-task=4 #SBATCH --mail-type=end #SBATCH --mail-user=your @yourdomain.org #SBATCH --time=08:00:00 ml blast+ export OMP_NUM_THREADS=4 srun blastn
19 Submitting a Job Job Description Example : MPI #!/bin/bash #SBATCH -o migrate-%j.%n.out #SBATCH -D /share/src/migrate/example #SBATCH -J MIGRATE-MPI #SBATCH --ntasks=96 #SBATCH --mail-type=end #SBATCH --mail-user=your @yourdomain.org #SBATCH --time=08:00:00 ml migrate/3.4.4-ompi srun migrate-n-mpi parmfile.testml -nomenu
20 Submitting a Job Parametric Job To submit 10,000 element job array will take less than 1 second! sbatch array= jobarray.sh In order to identify the input files or the parameters you can use environment variable as array index: SLURM_ARRAY_TASK_ID
21 Submitting a Job Parametric Job Example : jobarray.sh #!/ bin / bash # SBATCH -J JobArray # SBATCH -- time =01:00:00 # SBATCH -- cpus - per - task =4 # SBATCH -- mem - per - cpu =1024 PARAMS = $ ( sed -n $ { SLURM_ARRAY_TASK_ID } p params. dat ) srun your_binary $PARAMS params.dat example -file your_input_file -x -file your_input_file -x -file your_input_file -x 1 -y 1 -y 1 -y 1 -z 1 -z 1 -z 1 2 3
22 Submitting a Job Parametric Job The management is really easy: squ JOBID PARTITION PRIOR NAME USER _[ ] high JobArray jordi _1 high JobArray jordi R _2 high JobArray jordi R $ scancel _[ ] $ scontrol hold STATE TIME TIMEL PD 0:00 1:00:00 0:02 1:00:00 0:02 1:00:00
23 Realistic example #!/bin/bash #SBATCH -J GPU_JOB #SBATCH --time=01:00:00 # Walltime #SBATCH -A uoa99999 # Project Account #SBATCH --ntasks=4 # number of tasks #SBATCH --ntasks-per-node=2 # number of tasks per node #SBATCH --mem-per-cpu=8132 # memory/core (in MB) #SBATCH --cpus-per-task=4 # 4 OpenMP Threads #SBATCH --gres=gpu:2 # GPUs per node #SBATCH -C kepler ml GROMACS/4.6.5-goolfc hybrid srun mdrun_mpi
24 squeue shows the status of jobs. sinfo provides information on partitions and nodes. sview GUI to view job, node and partition information. smap CLI to view job, node and partition information.
25 squeue Show jobid and allocated nodes for running jobs of the user jordi: [4925] ~$ squeue -u jordi JOBID PARTITION NAME high Migrate-N high Migrate-N high Migrate-N USER ST jordi PD jordi PD jordi R TIME 0:00 0:00 0:27 NODES NODELIST(REASON) 4 (Resources) 4 (Priority) 512 hsw[1-512]
26 sview
27 Slurm Commands Accounting sacct Report accounting information by individual job and job step sstat Report accounting information about currently running jobs and job steps (more detailed than sacct) sreport Report resources usage by cluster, partition, user, account, etc.
28 Slurm Commands Scheduling sacctmgr Database management tool Add/delete clusters, accounts, users, etc. Get/set resource limits, fair-share allocations, etc. sprio View factors comprising a job s priority sshare View current hierarchical fair-share information sdiag View statistics about scheduling module operations (execution time, queue length, etc.)
29 Slurm Commands Slurm Administration Tool scontrol - Administrator tool to view and/or update system, job, step, partition or reservation status. The most popular are: checkpoint create delete hold reconfigure release requeue resume suspend More information :
30
31
32
Until now: tl;dr: - submit a job to the scheduler
Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
Slurm Workload Manager Architecture, Configuration and Use
Slurm Workload Manager Architecture, Configuration and Use Morris Jette [email protected] Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
An introduction to compute resources in Biostatistics. Chris Scheller [email protected]
An introduction to compute resources in Biostatistics Chris Scheller [email protected] 1. Resources 1. Hardware 2. Account Allocation 3. Storage 4. Software 2. Usage 1. Environment Modules 2. Tools 3.
Job Scheduling Using SLURM
Job Scheduling Using SLURM Morris Jette [email protected] Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn how to build, configure and
Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano
Managing GPUs by Slurm Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano Agenda General Slurm introduction Slurm@CSCS Generic Resource Scheduling for GPUs Resource
Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas [email protected]
Batch Usage on JURECA Introduction to Slurm Nov 2015 Chrysovalantis Paschoulas [email protected] Batch System Concepts (1) A cluster consists of a set of tightly connected "identical" computers
Biowulf2 Training Session
Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov System hardware overview What s new/different The batch system & subminng jobs
Submitting batch jobs Slurm on ecgate. Xavi Abellan [email protected] User Support Section
Submitting batch jobs Slurm on ecgate Xavi Abellan [email protected] User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic
Introduction to parallel computing and UPPMAX
Introduction to parallel computing and UPPMAX Intro part of course in Parallel Image Analysis Elias Rudberg [email protected] March 22, 2011 Parallel computing Parallel computing is becoming increasingly
SLURM Resources isolation through cgroups. Yiannis Georgiou email: [email protected] Matthieu Hautreux email: matthieu.hautreux@cea.
SLURM Resources isolation through cgroups Yiannis Georgiou email: [email protected] Matthieu Hautreux email: [email protected] Outline Introduction to cgroups Cgroups implementation upon
The Asterope compute cluster
The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU
Matlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
Bright Cluster Manager 5.2. User Manual. Revision: 3324. Date: Fri, 30 Nov 2012
Bright Cluster Manager 5.2 User Manual Revision: 3324 Date: Fri, 30 Nov 2012 Table of Contents Table of Contents........................... i 1 Introduction 1 1.1 What Is A Beowulf Cluster?..................
Introduction to HPC Workshop. Center for e-research ([email protected])
Center for e-research ([email protected]) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions
CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions (Version: 07.10.2013) Foto: Thomas Josek/JosekDesign Viktor Achter Dr. Stefan Borowski Lech Nieroda Dr. Lars Packschies Volker
Tutorial-4a: Parallel (multi-cpu) Computing
HTTP://WWW.HEP.LU.SE/COURSES/MNXB01 Introduction to Programming and Computing for Scientists (2015 HT) Tutorial-4a: Parallel (multi-cpu) Computing Balazs Konya (Lund University) Programming for Scientists
Grid Engine Training Introduction
Grid Engine Training Jordi Blasco ([email protected]) 26-03-2012 Agenda 1 How it works? 2 History Current status future About the Grid Engine version of this training Documentation 3 Grid Engine internals
Slurm License Management
Slurm License Management Slurm 2013 User Group Bill Brophy, Bull 1 Background Software and the associated Licenses are EXPENSIVE Nearly $160 billion will be spent by American companies on software purchases
Simulation of batch scheduling using real production-ready software tools
Simulation of batch scheduling using real production-ready software tools Alejandro Lucero 1 Barcelona SuperComputing Center Abstract. Batch scheduling for high performance cluster installations has two
Introduction to Supercomputing with Janus
Introduction to Supercomputing with Janus Shelley Knuth [email protected] Peter Ruprecht [email protected] www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?
Hodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen
Niels Bohr Institute University of Copenhagen Who am I Theoretical physics (KU, NORDITA, TF, NSC) Computing non-standard superconductivity and superfluidity condensed matter / statistical physics several
Using NeSI HPC Resources. NeSI Computational Science Team ([email protected])
NeSI Computational Science Team ([email protected]) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource
PBS INTERNALS PBS & TORQUE PBS (Portable Batch System)-software system for managing system resources on workstations, SMP systems, MPPs and vector computers. It was based on Network Queuing System (NQS)
R and High-Performance Computing
R and High-Performance Computing A (Somewhat Brief and Personal) Overview Dirk Eddelbuettel ISM HPCCON 2015 & ISM HPC on R Workshop The Institute of Statistical Mathematics, Tokyo, Japan October 9-12,
How to control Resource allocation on pseries multi MCM system
How to control Resource allocation on pseries multi system Pascal Vezolle Deep Computing EMEA ATS-P.S.S.C/ Montpellier FRANCE Agenda AIX Resource Management Tools WorkLoad Manager (WLM) Affinity Services
Parallel Debugging with DDT
Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece
locuz.com HPC App Portal V2.0 DATASHEET
locuz.com HPC App Portal V2.0 DATASHEET Ganana HPC App Portal makes it easier for users to run HPC applications without programming and for administrators to better manage their clusters. The web-based
HPC-Nutzer Informationsaustausch. The Workload Management System LSF
HPC-Nutzer Informationsaustausch The Workload Management System LSF Content Cluster facts Job submission esub messages Scheduling strategies Tools and security Future plans 2 von 10 Some facts about the
Slurm at CEA. Site Report CEA 10 AVRIL 2012 PAGE 1. 16 septembre 2014. SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M.
Site Report SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M. Hautreux 16 septembre 2014 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing projects Developments evaluation methodology Xeon
Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
The RWTH Compute Cluster Environment
The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de
Job Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. [email protected] 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
HRSK-II Nutzerschulung
HRSK-II Nutzerschulung 2013-05-30 Jan Wender Senior IT Consultant 1 2 science + computing ag Founded 1989 Offices Tübingen München Düsseldorf Berlin Employees ~300 Turnover 2012 ~30 Mio. Euro Shareholder
CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2
CNAG User s Guide Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015 Contents 1 Introduction 2 2 System Overview 2 3 Connecting to CNAG cluster 2 3.1 Transferring files...................................
Cloud Bursting with SLURM and Bright Cluster Manager. Martijn de Vries CTO
Cloud Bursting with SLURM and Bright Cluster Manager Martijn de Vries CTO Architecture CMDaemon 2 Management Interfaces Graphical User Interface (GUI) Offers administrator full cluster control Standalone
To connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai
IBM HPC Developer Education @ TIFR, Mumbai IBM Storage & Technology Group LoadLeveler Overview January 30-31, 2012 Pidad D'Souza ([email protected]) IBM, System & Technology Group 2009 IBM Corporation
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann [email protected] Universität Paderborn PC² Agenda Multiprocessor and
Miami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
Distributed Operating Systems. Cluster Systems
Distributed Operating Systems Cluster Systems Ewa Niewiadomska-Szynkiewicz [email protected] Institute of Control and Computation Engineering Warsaw University of Technology E&IT Department, WUT 1 1. Cluster
High Performance Computing
High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia
CPU Scheduling Outline
CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different
PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
Compute Cluster Documentation
Drucksachenkategorie Compute Cluster Documentation Compiled by Martin Geier (DLR FA-STM) Authors Michael Schäfer (INIT GmbH) Martin Geier (DLR FA-STM) Date 22.05.2015 Table of contents Table of contents...
Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
MPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
Requesting Nodes, Processors, and Tasks in Moab
LLNL-MI-401783 LAWRENCE LIVERMORE NATIONAL LABORATORY Requesting Nodes, Processors, and Tasks in Moab D.A Lipari March 29, 2012 This document was prepared as an account of work sponsored by an agency of
Load Balancing in Beowulf Clusters
Load Balancing in Beowulf Clusters Chandramohan Rangaswamy Department of Electrical and Computer Engineering University of Illinois at Chicago July 07, 2001 1 Abstract Beowulf[1] Clusters are growing in
Real-Time KVM for the Masses Unrestricted Siemens AG 2015. All rights reserved
Siemens Corporate Technology August 2015 Real-Time KVM for the Masses Unrestricted Siemens AG 2015. All rights reserved Real-Time KVM for the Masses Agenda Motivation & requirements Reference architecture
Subversion Integration
Subversion Integration With the popular Subversion Source Control Management tool, users will find a flexible interface to integrate with their ExtraView bug-tracking system. Copyright 2008 ExtraView Corporation
How to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC
Paper BI222012 SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC ABSTRACT This paper will discuss at a high level some of the options
Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...
Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems... Martin Siegert, SFU Cluster Myths There are so many jobs in the queue - it will take ages until
Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU [email protected]
Advanced PBS Workflow Example Bill Brouwer 050112 Research Computing and Cyberinfrastructure Unit, PSU [email protected] 0.0 An elementary workflow All jobs consuming significant cycles need to be submitted
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
Parallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney
Frysk The Systems Monitoring and Debugging Tool Andrew Cagney Agenda Two Use Cases Motivation Comparison with Existing Free Technologies The Frysk Architecture and GUI Command Line Utilities Current Status
Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014
Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014 Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue
HPC at IU Overview. Abhinav Thota Research Technologies Indiana University
HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:
Chapter 7 OBJECTIVES Operating Systems Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the concept of virtual memory. Understand the
NEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture
Review from last time CS 537 Lecture 3 OS Structure What HW structures are used by the OS? What is a system call? Michael Swift Remzi Arpaci-Dussea, Michael Swift 1 Remzi Arpaci-Dussea, Michael Swift 2
System Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
CSE 120 Principles of Operating Systems. Modules, Interfaces, Structure
CSE 120 Principles of Operating Systems Fall 2000 Lecture 3: Operating System Modules, Interfaces, and Structure Geoffrey M. Voelker Modules, Interfaces, Structure We roughly defined an OS as the layer
USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE
USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE Gonzalo Garcia VP of Operations, USA Property of GMV All rights reserved INTRODUCTION Property of GMV All rights reserved INTRODUCTION
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ([email protected]) ARM R&D December 2012
Visualizing gem5 via ARM DS-5 Streamline Dam Sunwoo ([email protected]) ARM R&D December 2012 1 The Challenge! System-level research and performance analysis becoming ever so complicated! More cores and
The SUN ONE Grid Engine BATCH SYSTEM
The SUN ONE Grid Engine BATCH SYSTEM Juan Luis Chaves Sanabria Centro Nacional de Cálculo Científico (CeCalCULA) Latin American School in HPC on Linux Cluster October 27 November 07 2003 What is SGE? Is
Grid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
Chapter 2: Getting Started
Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand
TOOLS AND TIPS FOR MANAGING A GPU CLUSTER. Adam DeConinck HPC Systems Engineer, NVIDIA
TOOLS AND TIPS FOR MANAGING A GPU CLUSTER Adam DeConinck HPC Systems Engineer, NVIDIA Steps for configuring a GPU cluster Select compute node hardware Configure your compute nodes Set up your cluster for
IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures
IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Introduction
Using Google Compute Engine
Using Google Compute Engine Chris Paciorek January 30, 2014 WARNING: This document is now out-of-date (January 2014) as Google has updated various aspects of Google Compute Engine. But it may still be
