Guillimin HPC Users Meeting. Bryan Caron



Similar documents
Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge December 1st, 2015

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Batch Scripts for RA & Mio

Getting Started with HPC

Job Scheduling with Moab Cluster Suite

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Microsoft Research Windows Azure for Research Training

Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...

Microsoft Research Microsoft Azure for Research Training

The Moab Scheduler. Dan Mazur, McGill HPC Aug 23, 2013

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Miami University RedHawk Cluster Working with batch jobs on the Cluster

A High Performance Computing Scheduling and Resource Management Primer

Job scheduler details

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Martinos Center Compute Clusters

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Resource Management and Job Scheduling

Hybrid Cluster Management: Reducing Stress, increasing productivity and preparing for the future

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

What s New in MATLAB and Simulink

Beyond Windows: Using the Linux Servers and the Grid

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Hodor and Bran - Job Scheduling and PBS Scripts

Grid Engine Training Introduction

Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3)

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

Hack the Gibson. John Fitzpatrick Luke Jennings. Exploiting Supercomputers. 44Con Edition September Public EXTERNAL

Cornell University Center for Advanced Computing

Overview of HPC Resources at Vanderbilt

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Introduction to SDSC systems and data analytics software packages "

LANL Computing Environment for PSAAP Partners

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Dell Reference Configuration for Hortonworks Data Platform

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Using the Yale HPC Clusters

RED HAT ENTERPRISE LINUX 7

Broadening Moab/TORQUE for Expanding User Needs

Remote & Collaborative Visualization. Texas Advanced Compu1ng Center

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

MATLAB Distributed Computing Server with HPC Cluster in Microsoft Azure

NYUAD HPC Center Running Jobs

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

An Introduction to High Performance Computing in the Department

Hybrid Software Architectures for Big

High Performance Computing in CST STUDIO SUITE

Manual for using Super Computing Resources

Moab and TORQUE Highlights CUG 2015

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Cluster Implementation and Management; Scheduling

Fundamentals Curriculum HAWQ

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Parallel Computing. Benson Muite. benson.

BIG DATA USING HADOOP

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Open Cirrus: Towards an Open Source Cloud Stack

Overview of HPC systems and software available within

Cornell University Center for Advanced Computing A Sustainable Business Model for Advanced Research Computing

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

locuz.com HPC App Portal V2.0 DATASHEET

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

PBS Professional Job Scheduler at TCS: Six Sigma- Level Delivery Process and Its Features

Hadoop Job Oriented Training Agenda

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

HPC Wales Skills Academy Course Catalogue 2015

WHAT S NEW IN SAS 9.4

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

SGE Roll: Users Guide. Version Edition

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

The CNMS Computer Cluster

Hadoop in the Enterprise

Grid Engine Users Guide p1 Edition

Amazon EC2 Product Details Page 1 of 5

Best Practices for Hadoop Data Analysis with Tableau

<Insert Picture Here> Adventures in Middleware Database Abuse

Transcription:

November 13, 2014 Bryan Caron bryan.caron@mcgill.ca bryan.caron@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

Outline Compute Canada News October Service Interruption De-Brief November GPFS Online Maintenance Scheduler Updates Software and User Environment Updates Training News 2

Compute Canada News Resource Allocation Competition 2015 RAC and Research Platforms & Portals (RPP) Application submission deadline: October 23 Results Announcement: December 2 Call for Researcher Visualizations 2D or 3D visualizations from any research area that has leveraged Compute Canada resources interactive or pre-recorded movies Contact us: guillimin@calculquebec.ca Stay Tuned for exciting Compute Canada announcements on December 2 3

Service Interruption De-Brief Guillimin Service Interruption: October 17-18 Scheduled outage due to a full ETS campus-wide power interruption for electrical maintenance Start: Friday October 17, Target End: Saturday October 18 storage system and network maintenance power outage (23h - 03h) All Guillimin services restored October 19 (afternoon) October 20: Write time-out errors observed on GSS nodes of GPFS cluster (/gs/scratch and /gs/project) Root cause: enclosure firmware & GPFS version mismatch October 21: halt of all user access to stop GPFS and update firmware on all enclosures Access re-opened evening of October 21 4

GPFS Maintenance GPFS Maintenance - November 6 online replacement of one faulty disk drive enclosure drawer validated live drawer replacement with testing under load on our most recent GPFS Storage System not yet in active production Impacted filesystem: /gs/project/ and /gs/scratch/ GPFS slowdown during the brief drawer replacement file integrity maintained during drawer replacement maximum performance and stability restored Note: /scratch auto-clean-up - re-scheduled to November 20 5

Scheduler Update In general improved overall stability and performance Recall: April 10 - qsub for job submission enabled November 15 - msub submissions will be disabled qsub provides better support for queue options and speed of submissions Please ensure any scripts are using the site Torque qsub instead of msub /opt/torque/x86_64/bin/qsub Torque equivalents of Moab commands Old: canceljob New: qdel Old: checkjob New: qstat -f Old: showq New: qstat 6

Software Update New Installations HDF5/1.8.13-intel OpenFOAM/2.2.2-GCC-OpenMPI SAS 9.3 (Licensed for McGill users only) Matlab MDCS version R2014b New Matlab MDCS integration scripts Version 2.0 Easier configuration of job parameters (ppn, pmem, gpus, etc.) Please see Parallel Matlab on Guillimin on our Documentation page for details (http://www.hpc.mcgill.ca/index.php/starthere) 7

Software Update Reminder: Guillimin Hadoop Cluster 10 nodes available for MapReduce / Hadoop workloads In progress updates: Major increase to storage pool size per node using GPFS Hadoop ecosystem component installation based upon Hortonworks Data Platform (HDP) please contact guillimin@calculquebec.ca for access Hadoop Talk @ Big Data Montréal - November 4 by Dan Mazur of McGill HPC / Calcul Québec http://www.bigdatamontreal.org/ 8

See Training at www.hpc.mcgill.ca for our full calendar of training and workshops for 2014 and to register all materials from previous workshops are available online suggestions for training in 2015? Please let us know! Upcoming: November 27 - Introduction to Xeon Phi December 4 - Introduction to GPU / CUDA December 11 - Introduction to Matlab Distributed Computing Server Recently Completed: November 6 - Advanced MPI October 23 - Introduction to OpenMP October 9 - Introduction to MPI Training News 9

User Feedback and Discussion Questions? Comments? We value your feedback. Guillimin Operational News for Users Status Pages http://www.hpc.mcgill.ca/index.php/guillimin-status http://serveurscq.computecanada.ca (all CQ systems) Follow us on Twitter http://twitter.com/mcgillhpc 10