Broadening Moab/TORQUE for Expanding User Needs

Similar documents
Moab and TORQUE Highlights CUG 2015

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform

Using the Windows Cluster

Simulation Platform Overview

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

A High Performance Computing Scheduling and Resource Management Primer

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

Cluster, Grid, Cloud Concepts

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

HPC on AWS. Hiroshi Kobayashi, Dev./Lab. IT System HGST Japan, Ltd. Jun 3, 2015

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

LSKA 2010 Survey Report Job Scheduler

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Parallel Processing using the LOTUS cluster

Overview of HPC Resources at Vanderbilt

Ten Reasons to Switch from Maui Cluster Scheduler to Moab HPC Suite Comparison Brief

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Hack the Gibson. John Fitzpatrick Luke Jennings. Exploiting Supercomputers. 44Con Edition September Public EXTERNAL

MPI / ClusterTools Update and Plans

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3)

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

EnterpriseLink Benefits

System Monitoring and Reporting

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

Cloud Computing through Virtualization and HPC technologies

Building a BI Solution in the Cloud

Recommended hardware system configurations for ANSYS users

How System Settings Impact PCIe SSD Performance

HP Proliant BL460c G7

Administration Guide. . All right reserved. For more information about Specops Deploy and other Specops products, visit

owncloud Enterprise Edition on IBM Infrastructure

Welcome The webinar will begin shortly

Numerix CrossAsset XL and Windows HPC Server 2008 R2

Technical Computing Suite Job Management Software

Assignment # 1 (Cloud Computing Security)

User Manual: Using Hadoop with WS-PGRADE. workflow.

PRIMERGY server-based High Performance Computing solutions

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Universal Management Service 2015

Parallel Computing with MATLAB

NETWRIX EVENT LOG MANAGER

Achieving Performance Isolation with Lightweight Co-Kernels

locuz.com HPC App Portal V2.0 DATASHEET

Adobe Summit 2015 Lab 718: Managing Mobile Apps: A PhoneGap Enterprise Introduction for Marketers

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Tuning Tableau Server for High Performance

Configuring and Launching ANSYS FLUENT Distributed using IBM Platform MPI or Intel MPI

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Automating Big Data Benchmarking for Different Architectures with ALOJA

Configuring HP Elite, EliteBook, and Z220 systems for Intel Smart Response Technology

Recent Advances in HPC for Structural Mechanics Simulations

præsentation oktober 2011

WINDOWS AZURE AND WINDOWS HPC SERVER

SENSE/NET 6.0. Open Source ECMS for the.net platform. 1

SLURM Workload Manager

HIPAA Compliance Use Case

NETWRIX PASSWORD MANAGER

Build Transactions Into Your Apps and Mobilize Your Enterprise with MicroStrategy 10

Solution White Paper Connect Hadoop to the Enterprise

PARALLELS SERVER 4 BARE METAL README

Foglight. Managing Hyper-V Systems User and Reference Guide

Cisco UCS CPA Workflows

Cloud Attached Storage 3.1 EA

5-Bay Raid Sub-System Smart Removable 3.5" SATA Multiple Bay Data Storage Device User's Manual

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

KASEYA CLOUD SOLUTION CATALOG 2016 Q1. UPDATED & EFFECTIVE AS OF: February 1, Kaseya Catalog Kaseya Copyright All rights reserved.

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Job Scheduling with Moab Cluster Suite

Deploying Windows Streaming Media Servers NLB Cluster and metasan

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

FileMaker Server 15. Getting Started Guide

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

vrealize Operations Manager Customization and Administration Guide

COMMANDS 1 Overview... 1 Default Commands... 2 Creating a Script from a Command Document Revision History... 10

REDEFINE SIMPLICITY TOP REASONS: EMC VSPEX BLUE FOR VIRTUALIZED ENVIRONMENTS

Cost-Effective Business Intelligence with Red Hat and Open Source

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

HPC Wales Skills Academy Course Catalogue 2015

Google

UMass High Performance Computing Center

Instant Chime for IBM Sametime High Availability Server Guide

Transcription:

Broadening Moab/TORQUE for Expanding User Needs Gary D. Brown HPC Product Manager CUG 2016 1 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL Viewpoint Web Portal User Portal Administrator Portal Remote Visualization Nitro HTC 2 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp (fast I/O) Intel MIC KNL Viewpoint Web Portal User Portal Administrator Portal Remote Visualization Nitro HTC 3 2016 Adaptive Computing Enterprises, Inc.

Use Cases Stage job data with fast SSD-based storage Input data before job starts Output data after job ends Fast persistent storage for job workflows Fast job checkpoint/restore 4 2016 Adaptive Computing Enterprises, Inc.

Cray DataWarp SSD-based storage system independent of parallel file system (PFS) Hosted on DataWarp (DW) service nodes Contain actual SSDs Nodes connect to Aries high-speed network DataWarp Service (DWS) Organizes DW storage for jobs and persistent storage Manages SSD storage across DW service nodes 5 2016 Adaptive Computing Enterprises, Inc.

Cray DataWarp Physical Architecture DW storage 2-hop roundtrip Aries HSN Electronic access times PFS storage 4-hop roundtrip 2-hop Aries HSN 2-hop InfiniBand Electro-mechanical access times DWS not shown 6 2016 Adaptive Computing Enterprises, Inc.

Cray DataWarp Support DataWarp Storage Types Job-based Persistent Script-based integration with Moab/TORQUE DW-enabled jobs processed by Moab job submit filter Moab automatically creates DW job workflow 7 2016 Adaptive Computing Enterprises, Inc.

DataWarp Integration Project All DW storage scheduled by Moab Job DW storage by job scheduling Persistent DW storage by reservation scheduling Out-of-the-box functionality Some site-specific information configuration required 8 2016 Adaptive Computing Enterprises, Inc.

Moab 3-Job DataWarp Model Follows Moab Data-staging Model Uses Moab system jobs to create/destroy DW storage and stage data in/out Automatically creates 3-job workflow DW creation/input data-staging system job User job Output data-staging/dw destruction system job 9 2016 Adaptive Computing Enterprises, Inc.

DataWarp-enabled Job Workflow User submits user job Job Submit Filter detects #DW directives Job Submit Filter creates 3-job workflow 1. DW Creation / Input Datastaging System job 2. User Job 3. Output Data-staging / DW Destruction System Job 10 2016 Adaptive Computing Enterprises, Inc.

Automatic Job/DW Workflow Generation What if user submits multiple-job workflow And some user jobs in job workflow use DataWarp storage? Job Submit Filter creates proper DW job dependencies, but not for entire user job workflow Example Data validation Compute (uses DataWarp) Visualization User Job A User Job B (DW) User Job C 11 2016 Adaptive Computing Enterprises, Inc.

Automatic Job/DW Workflow Generation User Job Dependencies Data validation Compute Visualization Job A (ok) Job B (ok) Job C User Job A User Job B (DataWarp-enabled job) User Job C Job workflow generation script must work with Job Submit Filter to create proper job dependencies (solid lines below) 12 2016 Adaptive Computing Enterprises, Inc.

Multi-Job Workflow Generation Script # Submit data validation job A jobaid=`msub -l nodes=1,walltime=10:00 mycheck.sh` # Submit DataWarp-enabled compute job B # (3 jobids returned (dwin, user, dwout); put into array) jobbids=(`msub -l nodes=16,walltime=01:30:00,\ depend=afterok:${jobaid}...\ mysim.sh`) # Submit visualization job C jobcid=`msub -l nodes=1,walltime=30:00,\ depend=afterok:${jobbids[1]}:${jobbids[2]} \ myviz.sh` 13 2016 Adaptive Computing Enterprises, Inc.

Persistent DataWarp Storage Moab schedules and tracks all DW storage capacity Moab persistent DW storage uses DW storage reservation Persistent DataWarp storage management Created by reservation start trigger Destroyed by reservation end trigger Jobs get mount point 14 2016 Adaptive Computing Enterprises, Inc.

Benefits Moab schedules and manages both job and persistent DataWarp storage Users have good user experience No race conditions cause cancelled jobs! Jobs have fast I/O Jobs execute in less time 15 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL (re-provisioning) Viewpoint Web Portal User Portal Administrator Portal Remote Visualization Nitro HTC 16 2016 Adaptive Computing Enterprises, Inc.

Use Case Intel introduces 2 nd -gen Xeon Phi processor KNL supports 3 different NUMA configurations with 5 modes 3 different MCDRAM configurations with 4 modes 20 possible NUMA/MCDRAM configurations Some applications execute fastest with specific NUMA/MCDRAM configuration 17 2016 Adaptive Computing Enterprises, Inc.

Intel MIC Knights Landing (KNL) 5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-numa cluster 2 and 4 a2a, hemi, quad, snc2, snc4 4 MCDRAM mode configurations Cache, flat, 50/50 hybrid, 25/75 hybrid cache, flat, equal, split 18 2016 Adaptive Computing Enterprises, Inc.

Moab Cray KNL Support 20 pre-defined combinations Select combination as OS at job submission msub nodes=20,os=cle_snc4_flat myjobscript Moab re-provisions Cray compute nodes when necessary Uses Moab s built-in re-provisioning capability Quickly re-provisions Cray KNL compute nodes for jobs requesting specific NUMA/MCDRAM modes 19 2016 Adaptive Computing Enterprises, Inc.

Cray KNL Re-provisioning Process User submits job requiring specific KNL NUMA / MCDRAM configuration msub l nodes=20,os=cle_snc4_flat myjob Moab schedules KNL job to nodes Moab runs NodeModifyURL script to modify compute node configurations and reboot them Script executes Cray capmc to set KNL NUMA node and MCDRAM configurations and to reboot nodes Cray s capmc modifies nodes BIOS settings and reboots nodes Cray reports new NUMA node / MCDRAM configurations pbs_reporter maps NUMA node/mcdram configuration to OS name and reports OS name to pbs_server TORQUE pbs_server passes new OS name to Moab, which starts the KNL job Moab Scheduler New OS Modify Name Node Node URL Update Report Node Modify Script capmc Modify NUMA / MCDRAM Modify BIOS config and reboot TORQUE pbs_server TORQUE pbs_reporter System Query Report ALPS NUMA / MCDRAM config 20 2016 Adaptive Computing Enterprises, Inc.

Benefits Users can run applications with KNL NUMA and MCDRAM configuration for optimal job execution. Easy to request specific KNL configuration with job submission 21 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL Viewpoint Web Portal (non-technical users) User Portal Administrator Portal Remote Visualization Nitro HTC 22 2016 Adaptive Computing Enterprises, Inc.

Use Case Many HPC sites asked to expand user base Non-technical users What is a command line? Non-traditional users Non-research Engineers and Scientists Humanities Local or regional industries with no HPC experience 23 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL Viewpoint Web Portal (non-technical users) User Portal (job submission) Administrator Portal Remote Visualization Nitro HTC 24 2016 Adaptive Computing Enterprises, Inc.

Use Case Someone said I need to use xyz program I have data Someone gave me my data Someone mentored/trained me and I created my own data I taught myself and created my own data How do I use xyz with my data? 25 2016 Adaptive Computing Enterprises, Inc.

Log in as user to Viewpoint Web Portal 26 2016 Adaptive Computing Enterprises, Inc.

Click Templates on your User Dashboard 27 2016 Adaptive Computing Enterprises, Inc.

Use xyz application template 28 2016 Adaptive Computing Enterprises, Inc.

Fill in application template 29 2016 Adaptive Computing Enterprises, Inc.

Choose a time limit 30 2016 Adaptive Computing Enterprises, Inc.

Tell it where your data file is 31 2016 Adaptive Computing Enterprises, Inc.

Submit your xyz job 32 2016 Adaptive Computing Enterprises, Inc.

Click Workload to check on your xyz job 33 2016 Adaptive Computing Enterprises, Inc.

Check out your xyz job s details 34 2016 Adaptive Computing Enterprises, Inc.

Look at your job s statistics 35 2016 Adaptive Computing Enterprises, Inc.

Download your xyz job s results 36 2016 Adaptive Computing Enterprises, Inc.

Benefits Allows non-technical users to submit jobs using pre-defined application templates Application templates contain Job script for running application Web form design with pre-defined and/or custom widgets and their variables Job submission replaces variables within job script with form s widget values 37 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL Viewpoint Web Portal (non-technical users) User Portal Administrator Portal (application templates) Remote Visualization Nitro HTC 38 2016 Adaptive Computing Enterprises, Inc.

Use Case Administrator needs to support many nontechnical users and their applications Define application templates using editor Define and customize application template form using pre-defined widget types Define widgets variable name, default value, and other characteristics to build form Create job script 39 2016 Adaptive Computing Enterprises, Inc.

Log in as admin to Viewpoint Web Portal 40 2016 Adaptive Computing Enterprises, Inc.

Click on Templates in Admin Dashboard 41 2016 Adaptive Computing Enterprises, Inc.

Choose an application template 42 2016 Adaptive Computing Enterprises, Inc.

View application template history 43 2016 Adaptive Computing Enterprises, Inc.

Edit and design application template 44 2016 Adaptive Computing Enterprises, Inc.

Edit application template (lower part) 45 2016 Adaptive Computing Enterprises, Inc.

Choose Basic Job Settings widgets 46 2016 Adaptive Computing Enterprises, Inc.

Choose Resources widgets 47 2016 Adaptive Computing Enterprises, Inc.

Choose Node Policies Settings widgets 48 2016 Adaptive Computing Enterprises, Inc.

Design custom xyz Settings widgets 49 2016 Adaptive Computing Enterprises, Inc.

Choose custom widget from widget type list 50 2016 Adaptive Computing Enterprises, Inc.

Create/import job script with Script Builder 51 2016 Adaptive Computing Enterprises, Inc.

Add widget variables to job script 52 2016 Adaptive Computing Enterprises, Inc.

Benefits Application templates let an administrator create application templates for nontechnical users. Non-technical users have a simple way to submit jobs for specific applications. 53 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL Viewpoint Web Portal (non-technical users) User Portal Administrator Portal Remote Visualization (graphical interfaces) Nitro HTC 54 2016 Adaptive Computing Enterprises, Inc.

Use Cases Users need to visualize their data to gain insights from it. Desktops or workstations with decent visualization hardware can be expensive. Data can be too large to move to another location. Non-technical users need a way to easily set up a remote visualization session. 55 2016 Adaptive Computing Enterprises, Inc.

Choose Remote Visualization Templates 56 2016 Adaptive Computing Enterprises, Inc.

Choose Remote Desktop App Template 57 2016 Adaptive Computing Enterprises, Inc.

Choose Graphical Desktop Type 58 2016 Adaptive Computing Enterprises, Inc.

Submit Remote Desktop Job 59 2016 Adaptive Computing Enterprises, Inc.

Remote Desktop Job s Compute Node 60 2016 Adaptive Computing Enterprises, Inc.

Remote Desktop Session Screen 61 2016 Adaptive Computing Enterprises, Inc.

Remote Visualization Session Controls 62 2016 Adaptive Computing Enterprises, Inc.

Resized Remote Viz Session Desktop 63 2016 Adaptive Computing Enterprises, Inc.

Job Detail with Remote Viz Session 64 2016 Adaptive Computing Enterprises, Inc.

Canceling the Remote Desktop Session 65 2016 Adaptive Computing Enterprises, Inc.

Choose ParaView Application Template 66 2016 Adaptive Computing Enterprises, Inc.

ParaView Input Data file selection 67 2016 Adaptive Computing Enterprises, Inc.

Submitting ParaView Remote Viz Job 68 2016 Adaptive Computing Enterprises, Inc.

ParaView Remote Viz Session 69 2016 Adaptive Computing Enterprises, Inc.

Adjusting Session Image Quality 70 2016 Adaptive Computing Enterprises, Inc.

Remote Desktop / ParaView Session 71 2016 Adaptive Computing Enterprises, Inc.

Benefits Technical and non-technical users can easily create remote visualization sessions with their applications and data. 72 2016 Adaptive Computing Enterprises, Inc.

Agenda DataWarp Intel MIC KNL Viewpoint Web Portal (non-technical users) User Portal Administrator Portal Remote Visualization Nitro (HTC job scheduling) 73 2016 Adaptive Computing Enterprises, Inc.

Use Cases A user can have thousands or millions of single-core to single-node jobs. Parameter sweeps Monte Carlo simulations Regression tests Other Large job queues Long scheduling cycles 74 2016 Adaptive Computing Enterprises, Inc.

Nitro HTC Scheduler Application Scheduler-agnostic HTC scheduler application TORQUE Slurm Platform LSF Cray ALPS Submitted as HPC job to scheduler 75 2016 Adaptive Computing Enterprises, Inc.

Nitro Job User converts HTC jobs to Nitro tasks Tasks defined in Nitro Task file Submit 1 Nitro job with Task file Schedule 1 parallel HPC job Execute millions of HTC tasks What about HTC for non-technical users? 76 2016 Adaptive Computing Enterprises, Inc.

Choose Nitro Application Template 77 2016 Adaptive Computing Enterprises, Inc.

Nitro Task File selection 78 2016 Adaptive Computing Enterprises, Inc.

Nitro Task file upload and job submission 79 2016 Adaptive Computing Enterprises, Inc.

Nitro job progress and statistics 80 2016 Adaptive Computing Enterprises, Inc.

Nitro job progress 81 2016 Adaptive Computing Enterprises, Inc.

Nitro Task Log file display during job 82 2016 Adaptive Computing Enterprises, Inc.

Benefits Remove thousands-to-millions of HTC jobs from HPC scheduler queue(s) Execute HTC jobs faster Non-technical users can Submit Nitro jobs Monitor Nitro job progress Examine Nitro Task results 83 2016 Adaptive Computing Enterprises, Inc.

Questions? 84 2016 Adaptive Computing Enterprises, Inc. CONFIDENTIAL AND PROPRIETARY

PMIx Support Process Management Interface for exascale Open-source Project on github Intel, Mellanox, IBM, Adaptive Computing, SchedMD Benefits Instant On very fast job start (10 minutes 2 seconds) Asynchronous, non-blocking operations Blob-based data transfers Fast broadcasts/collectives Interfaces with RMs (e.g., TORQUE) Will work with malleable/evolving jobs in future 85 2016 Adaptive Computing Enterprises, Inc. CONFIDENTIAL AND PROPRIETARY

Malleable and Evolving Jobs Next-generation application/rte support Malleable jobs (scheduler application/rte) Evolvable jobs (application/rte scheduler) Need standard API to negotiate resources Like MPI for parallel communications Gathering use cases for PMIx API definition Benefits Standard application API for scheduler/app dialog Scheduler and Malleable/Evolving Application Dialog (SMEAD) API definition Allows applications to ask for and return resources Allows scheduler to give and take resources to and from a job 86 2016 Adaptive Computing Enterprises, Inc. CONFIDENTIAL AND PROPRIETARY

Backup Slides 87 2016 Adaptive Computing Enterprises, Inc. CONFIDENTIAL AND PROPRIETARY