Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015



Similar documents
Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

NEC HPC-Linux-Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Job Scheduling with Moab Cluster Suite

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Using the Yale HPC Clusters

Quick Tutorial for Portable Batch System (PBS)

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Getting Started with HPC

Martinos Center Compute Clusters

Introduction to the SGE/OGS batch-queuing system

LSKA 2010 Survey Report Job Scheduler

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Hodor and Bran - Job Scheduling and PBS Scripts

An Introduction to High Performance Computing in the Department

Job scheduler details

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Running applications on the Cray XC30 4/12/2015

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Guillimin HPC Users Meeting. Bryan Caron

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Linux Server Support by Applied Technology Research Center. Proxy Server Configuration

Grid Engine Users Guide p1 Edition

NYUAD HPC Center Running Jobs

Batch Scripts for RA & Mio

Installing and running COMSOL on a Linux cluster

Operating Systems Virtualization mechanisms

Resource Management and Job Scheduling

Running COMSOL in parallel

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

High-Availability Using Open Source Software

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

virtualization.info Review Center SWsoft Virtuozzo (for Windows) //

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

A High Performance Computing Scheduling and Resource Management Primer

Beyond Windows: Using the Linux Servers and the Grid

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

Matlab on a Supercomputer

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

Using NeSI HPC Resources. NeSI Computational Science Team

SLURM Workload Manager

Parallel Debugging with DDT

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Introduction to SDSC systems and data analytics software packages "

Parallel Processing using the LOTUS cluster

Computing in High- Energy-Physics: How Virtualization meets the Grid

GRID Computing: CAS Style

Introduction to Sun Grid Engine (SGE)

Clusters in the Cloud

Using the Yale HPC Clusters

Cloud Computing through Virtualization and HPC technologies

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

The CNMS Computer Cluster

Linux - CentOS 6 Install Guide

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

User s Manual

- An Essential Building Block for Stable and Reliable Compute Clusters

Scyld Cloud Manager User Guide

High Performance Computing

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Sun Grid Engine, a new scheduler for EGEE

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Directions for VMware Ready Testing for Application Software

Using Parallel Computing to Run Multiple Jobs

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

Introduction to Supercomputing with Janus

Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

HPC system startup manual (version 1.30)

Virtualization in Linux

ABAQUS High Performance Computing Environment at Nokia

Acronis Backup & Recovery 10 Server for Linux. Installation Guide

RED HAT ENTERPRISE VIRTUALIZATION

Pre-scheduled Control of Online Device-Based Backup and Incremental Data Backup in VMware Virtual System Environment

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Intro to Virtualization

Bright Cluster Manager

System Structures. Services Interface Structure


The RWTH Compute Cluster Environment

Transcription:

Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015

1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler

1. Accessing Resources DIPC technicians did the job for us :-) http://dipc.ehu.es/cc/computing_resources/index.php/atlas

1. Accessing Resources Log in to the access node ac.sw. ehu.es: ssh username@ac.sw.ehu.es

1. Accessing Resources You can also establish connection with login nodes manually: ssh username@ac-01.sw.ehu.es ssh username@ac-02.sw.ehu.es

1. Accessing Resources Establish connection with atlas.sw. ehu.es: ssh username@atlas.sw.ehu.es

1. Accessing Resources You would need to bring your files and data over, compile your code or use the compiled one, and create a batch submission script. Submit the script so that your application runs on the compute nodes. Pay attention to the various file systems available and the choices in programming environments.

2. Software Overview

2. Software Overview

2. Software Overview snow! is a suite based on Open Source software designed to manage HPC infrastructures. snow! is an easy to use and it installs software that provides all the necessary tools to deploy and operate a computing cluster such as OS, monitoring tools, cluster filesystem, batch queue system or parallel and mathematical libraries.

2. Software Overview

2. Software Overview Critical Services All critical services are isolated in VMs (Paravirtualization) under HA + LB layer. Development & Scientific Software snow! relies on EasyBuild for building development and scientific software.

2. Software Overview Deploy & Cloning System snow! uses native tools to deploy Linux distributions (Debian,SLES, RHEL and CentOS) and System Imager for cloning via torrent. Configuration Manager snow! relies on CFEngine for configuration management which includes pre-build tuned OS setup for each role and usage HPC, HTC and Big Data (HPDA). CFEngine has a dramatically small memory footprint, it runs really fast and it has few dependencies.

2. Software Overview Focused on Performance & Efficiency All the software is tuned for the HPC solution (architecture, hardware setup, etc.) Designed for Mission Critical Environments snow! is bullet-proof by design and it also provides strategies to minimize downtime. Designed for Large Scale Systems snow! can easily scale by adding more physical nodes to allocate more critical services.

2. Software Overview Enhances Security & Privacy in HPC Environments snow! includes network segmentation capabilities and proactive security on DMZ services. Minimizes time for production snow! minimizes the efforts on building, designing, managing and administrating HPC facilities. EasyBuild integration EasyBuild is a software build and installation framework that allows you to manage (scientific) software on HPC systems in an efficient way.

2. Software Overview

2. Software Overview BeeGFS (formerly FhGFS) is a parallel cluster file system, developed with a strong focus on performance and designed for very easy installation and management. BeeGFS transparently spreads user data across multiple servers. By increasing the number of servers and disks in the system, you can simply scale performance and capacity of the file system to the level that you need.

3. Job Scheduler The queueing is Torque with the Maui scheduler. TORQUE is a distributed resource manager providing control over batch jobs and distributed compute nodes. You need three main commands to use the system: showq, qsub, and qstat.

3. Job Scheduler

3. Job Scheduler The queueing is Torque with the Maui scheduler. TORQUE is a distributed resource manager providing control over batch jobs and distributed compute nodes. You need three main commands to use the system: showq, qsub, and qstat.

3. Job Scheduler showq This displays the main job queue. It shows three types of job: those running, those waiting to run, and those blocked for various reasons. It orders the waiting jobs by priority. This is the command to use when you want to know where your job is in the queue, or to check if it has been blocked.

3. Job Scheduler qstat 'qstat -q' shows you the available job classes or queues. Note that 'qstat -q' doesn't give you all of the interesting information about any given class. 'qstat -Q' gives a different display for each class. If you want to know everything then 'qstat -Qf classname' will give you all of the user-readable information about the class called classname. 'qstat -f jobid' gives a full listing of the details for the job whose number is jobid. This is useful for debugging; when a job won't run or is running in the wrong place. 'qstat -n jobid' tells you which node(s) the job with number jobid is running on. This is useful to know if you are writing to nonshared filesystems.

3. Job Scheduler qsub submits a job. In its simplest form you would run 'qsub myscript', where myscript is a shell script that runs your job. Your shell script will be interpreted by your login shell- any #! line at the top is ignored. qsub has lots of commandline options, but you can make things easier by setting most of the available options within your job script. Here's a very short example job script to look at: # Set some Torque options: class name and max time for the job. #PBS -q serial #PBS -l walltime=2:00:00 # Change to directory you submitted the job from cd $PBS_O_WORKDIR # Run program /home/dtur/prog.exe input.dat

Project Overview Thanks for your attention David Tur, PhD HPC Expert david.tur@hpcnow.com www.hpcnow.com