Berkeley Research Computing. Town Hall Meeting Savio Overview



Similar documents
Visualization Cluster Getting Started

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

An introduction to compute resources in Biostatistics. Chris Scheller

Overview of HPC Resources at Vanderbilt

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

PetaLibrary Storage Service MOU

icer Bioinformatics Support Fall 2011

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Remote & Collaborative Visualization. Texas Advanced Compu1ng Center

Introduction to Supercomputing with Janus

Estonian Scientific Computing Infrastructure (ETAIS)

Getting Started with HPC

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Data Movement and Storage. Drew Dolgert and previous contributors

The Asterope compute cluster

How To Run A Cloud At Cornell Cac

Part I Courses Syllabus

LANL Computing Environment for PSAAP Partners

SLURM Workload Manager

HPC Wales Skills Academy Course Catalogue 2015

Cluster Implementation and Management; Scheduling

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

HPC Software Requirements to Support an HPC Cluster Supercomputer

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

Cornell University Center for Advanced Computing

Manual for using Super Computing Resources

ICMS IT HANDBOOK. IT Handbook for Students. Logins. Student Systems

Deploying and managing a Visualization Onera

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Research Technologies Data Storage for HPC

How To Install Linux Titan

An Introduction to High Performance Computing in the Department

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

Introduction to parallel computing and UPPMAX

Red Hat Enterprprise Linux - Renewals DETAILS SUPPORTED ARCHITECTURE


Introduction to SDSC systems and data analytics software packages "

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

Using the Windows Cluster

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

IMPLEMENTING GREEN IT

Cornell University Center for Advanced Computing A Sustainable Business Model for Advanced Research Computing

Globus and the Centralized Research Data Infrastructure at CU Boulder

CONDOR CLUSTERS ON EC2

Using NeSI HPC Resources. NeSI Computational Science Team

Matlab on a Supercomputer

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

1 Bull, 2011 Bull Extreme Computing

IT at D-PHYS - Tutorial

SURFsara HPC Cloud Workshop

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Novell to Microsoft Conversion: Identity Management Design & Plan

Parallel Large-Scale Visualization

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

One-Time Password Contingency Access Process

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

How to Use NoMachine 4.4

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Sun Constellation System: The Open Petascale Computing Architecture

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Overview of HPC systems and software available within

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

CNR-INFM DEMOCRITOS and SISSA elab Trieste

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

The CNMS Computer Cluster

High Performance Computing Infrastructure at DESY

Using the Yale HPC Clusters

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

APPENDIX 1 SUBSCRIPTION SERVICES

Using Google Compute Engine

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Science Gateway Services for NERSC Users

FREE computing using Amazon EC2

locuz.com HPC App Portal V2.0 DATASHEET

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Getting help - guide to the ticketing system. Thomas Röblitz, UiO/USIT/UAV/ITF/FI ;)

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

Transcription:

Berkeley Research Computing Town Hall Meeting Savio Overview

SAVIO - The Need Has Been Stated Inception and design was based on a specific need articulated by Eliot Quataert and nine other faculty: Dear Graham, We are writing to propose that UC Berkeley adopt a condominium computing model, i.e., a more centralized model for supporting research computing on campus...

SAVIO - Condo Service Offering Purchase into Savio by contributing standardized compute hardware An alternative for running a cluster in a closet with grad students and postdocs The condo trade-off: Idle resources are made available to others There are no (ZERO) operational costs for administration, colocation, base storage, optimized networking and access methods, and user services Scheduler gives priority access to resources equivalent to the hardware contribution

SAVIO - Faculty Computing Allowance Provides allocations to run on Savio as well as support to researchers who have not purchased Condo nodes 200k Service Units (core hours) annually More than just compute: File systems Training/support User services PIs request their allocation via survey Early user access (based on readiness) now General availability planned for fall semester

SAVIO - System Overview Similar in design to a typical research cluster Master Node role has been broken out (management, scheduling, logins, file system, etc..) Home storage: Enterprise level, backups, quotaed Scratch space: Large and fast (Lustre) Multiple login/interactive nodes DTN: Data Transfer Node Compute nodes are delineated based on role

SAVIO - System Architecture

SAVIO - Specification Hardware Compute Nodes: 20-core, 64GB, InfiniBand BigMem Nodes: 20-core, 512GB, InfiniBand Software Stack Scientific Linux 6 (equivalent to Red Hat Enterprise Linux 6) Parallelization: OpenMPI, OpenMP, POSIX threads Intel Compiler SLURM job scheduler Software Environment Modules

SAVIO - OTP The biggest security threat that we encounter... STOLEN CREDENTIALS Credentials are stolen via keyboard sniffers installed on researchers laptops or workstations, incorrectly assumed to be secure OTP (One Time Passwords) offers mitigation Easy to learn, simple to use, and works on both computers and smartphones!

SAVIO - Future Services Serial/HTC Jobs Expanding the initial architecture beyond just HPC Specialized node hardware (12-core, 128GB, PCI flash storage) Designed for jobs that use <= 1 node Nodes are shared between jobs GPU nodes GPUs are optimal for massively parallel algorithms Specialized node hardware (8-core, 64GB, 2x Nvidia K80)

Questions

Berkeley Research Computing Town Hall Meeting Savio User Environment

SAVIO - Faculty Computing Allowance Eligibility requirements ladder-rank faculty or PI on UCB campus. In need of compute power to solve a research problem. Allowance Request Procedure First fill out the Online Requirements Survey Allowance can be used either by the faculty or by immediate group members. For additional cluster accounts fill out - Additional User Account Request Form Allowances New allowances start on June 1st of every year. Mid-year requests are granted a prorated allocation A cluster specific project (fc_projectname) with all user accounts is setup Scheduler account (fc_projectname) with 200K core hours is setup Annual allocation exipres on May 31st of the following year

SAVIO - Access Cluster access Connect using SSH (server name - hpc.brc.berkeley.edu) Uses OTP - One Time Passwords (Multifactor authentication) Multiple login nodes (randomly distribute users) Coming in future NERSC s NEWT REST API for web portal development ipython notebooks & Jupyter hub integration

SAVIO - Data Storage Options Storage No local storage on compute nodes All storage accessed over network Either NFS or Lustre protocol Multiple file systems HOME - NFS, 10GB quota, Backed up, No purge. SCRATCH - Lustre, No quota, No Backups, can be purged Project (GROUP) space - NFS, 200GB quota, No Backups, No Purge. No long term archive.

SAVIO - Data Transfers Use only the dedicated Data Transfer Node (DTN) Server name - dtn.brc.berkeley.edu Highly recommend using Globus (Web interface) for management Many other traditional tools are also supported on the DTN SCP/SFTP Rsync BBCP

SAVIO - Software Support Software module farm Many of the most commonly used packages are already available. In most cases packages compiled from source Easy command line tools to browse and access packages ($ module cmd) Supported package list Open Source Tools - octave, gnuplot, imagemagick, visit, qt, ncl, paraview, lz4, git, valgrind, etc.. Languages - GNU C/C++/Fortran compilers, Java (JRE), Python, R, etc.. Commercial Intel C/C++/Fortran compiler suite, Matlab with 80 core license for MDCS User applications Individual user/group specific packages can be built from source by users Recommend using GROUP storage space for sharing with others in group. SAVIO consultants available to answer your questions.

SAVIO - Job Scheduler SLURM Quality of Service Max allowed running time/job Max number of nodes/job savio_debug 30 minutes 4 savio_normal 72 hours (i.e 3 days) 24 Multiple Node Options (partitions) Partition # of nodes # of cores/node Memory/node Local Storage savio 160 20 64 GB No local storage savio_bigmem 4 20 512 GB No local storage savio_htc 12 12 128 GB Local PCI Flash Interaction with Scheduler Only with command line tools and utilities. Online web interfaces for job management can be supported in future via NERSC s NEWT REST API or ipython/jupyter or both.

SAVIO - Job Accounting Jobs gain exclusive access to assigned compute nodes. Jobs are expected to be highly parallel and capable of using all the resources on assigned nodes. For example: Running on one standard node for 5 hours uses 1 (nodes) * 20 (cores) * 5 (hours) = 100 core-hours (or Service Units).

SAVIO - How to Get Help Online User Documentation User Guide - http://research-it.berkeley.edu/services/high-performancecomputing/user-guide New User Information - http://research-it.berkeley.edu/services/highperformance-computing/new-user-information Helpdesk Email : brc-hpc-help@lists.berkeley.edu Monday - Friday, 9:00 am to 5:00 pm Best effort in non working hours

Thank you Questions