Berkeley Research Computing Town Hall Meeting Savio Overview
SAVIO - The Need Has Been Stated Inception and design was based on a specific need articulated by Eliot Quataert and nine other faculty: Dear Graham, We are writing to propose that UC Berkeley adopt a condominium computing model, i.e., a more centralized model for supporting research computing on campus...
SAVIO - Condo Service Offering Purchase into Savio by contributing standardized compute hardware An alternative for running a cluster in a closet with grad students and postdocs The condo trade-off: Idle resources are made available to others There are no (ZERO) operational costs for administration, colocation, base storage, optimized networking and access methods, and user services Scheduler gives priority access to resources equivalent to the hardware contribution
SAVIO - Faculty Computing Allowance Provides allocations to run on Savio as well as support to researchers who have not purchased Condo nodes 200k Service Units (core hours) annually More than just compute: File systems Training/support User services PIs request their allocation via survey Early user access (based on readiness) now General availability planned for fall semester
SAVIO - System Overview Similar in design to a typical research cluster Master Node role has been broken out (management, scheduling, logins, file system, etc..) Home storage: Enterprise level, backups, quotaed Scratch space: Large and fast (Lustre) Multiple login/interactive nodes DTN: Data Transfer Node Compute nodes are delineated based on role
SAVIO - System Architecture
SAVIO - Specification Hardware Compute Nodes: 20-core, 64GB, InfiniBand BigMem Nodes: 20-core, 512GB, InfiniBand Software Stack Scientific Linux 6 (equivalent to Red Hat Enterprise Linux 6) Parallelization: OpenMPI, OpenMP, POSIX threads Intel Compiler SLURM job scheduler Software Environment Modules
SAVIO - OTP The biggest security threat that we encounter... STOLEN CREDENTIALS Credentials are stolen via keyboard sniffers installed on researchers laptops or workstations, incorrectly assumed to be secure OTP (One Time Passwords) offers mitigation Easy to learn, simple to use, and works on both computers and smartphones!
SAVIO - Future Services Serial/HTC Jobs Expanding the initial architecture beyond just HPC Specialized node hardware (12-core, 128GB, PCI flash storage) Designed for jobs that use <= 1 node Nodes are shared between jobs GPU nodes GPUs are optimal for massively parallel algorithms Specialized node hardware (8-core, 64GB, 2x Nvidia K80)
Questions
Berkeley Research Computing Town Hall Meeting Savio User Environment
SAVIO - Faculty Computing Allowance Eligibility requirements ladder-rank faculty or PI on UCB campus. In need of compute power to solve a research problem. Allowance Request Procedure First fill out the Online Requirements Survey Allowance can be used either by the faculty or by immediate group members. For additional cluster accounts fill out - Additional User Account Request Form Allowances New allowances start on June 1st of every year. Mid-year requests are granted a prorated allocation A cluster specific project (fc_projectname) with all user accounts is setup Scheduler account (fc_projectname) with 200K core hours is setup Annual allocation exipres on May 31st of the following year
SAVIO - Access Cluster access Connect using SSH (server name - hpc.brc.berkeley.edu) Uses OTP - One Time Passwords (Multifactor authentication) Multiple login nodes (randomly distribute users) Coming in future NERSC s NEWT REST API for web portal development ipython notebooks & Jupyter hub integration
SAVIO - Data Storage Options Storage No local storage on compute nodes All storage accessed over network Either NFS or Lustre protocol Multiple file systems HOME - NFS, 10GB quota, Backed up, No purge. SCRATCH - Lustre, No quota, No Backups, can be purged Project (GROUP) space - NFS, 200GB quota, No Backups, No Purge. No long term archive.
SAVIO - Data Transfers Use only the dedicated Data Transfer Node (DTN) Server name - dtn.brc.berkeley.edu Highly recommend using Globus (Web interface) for management Many other traditional tools are also supported on the DTN SCP/SFTP Rsync BBCP
SAVIO - Software Support Software module farm Many of the most commonly used packages are already available. In most cases packages compiled from source Easy command line tools to browse and access packages ($ module cmd) Supported package list Open Source Tools - octave, gnuplot, imagemagick, visit, qt, ncl, paraview, lz4, git, valgrind, etc.. Languages - GNU C/C++/Fortran compilers, Java (JRE), Python, R, etc.. Commercial Intel C/C++/Fortran compiler suite, Matlab with 80 core license for MDCS User applications Individual user/group specific packages can be built from source by users Recommend using GROUP storage space for sharing with others in group. SAVIO consultants available to answer your questions.
SAVIO - Job Scheduler SLURM Quality of Service Max allowed running time/job Max number of nodes/job savio_debug 30 minutes 4 savio_normal 72 hours (i.e 3 days) 24 Multiple Node Options (partitions) Partition # of nodes # of cores/node Memory/node Local Storage savio 160 20 64 GB No local storage savio_bigmem 4 20 512 GB No local storage savio_htc 12 12 128 GB Local PCI Flash Interaction with Scheduler Only with command line tools and utilities. Online web interfaces for job management can be supported in future via NERSC s NEWT REST API or ipython/jupyter or both.
SAVIO - Job Accounting Jobs gain exclusive access to assigned compute nodes. Jobs are expected to be highly parallel and capable of using all the resources on assigned nodes. For example: Running on one standard node for 5 hours uses 1 (nodes) * 20 (cores) * 5 (hours) = 100 core-hours (or Service Units).
SAVIO - How to Get Help Online User Documentation User Guide - http://research-it.berkeley.edu/services/high-performancecomputing/user-guide New User Information - http://research-it.berkeley.edu/services/highperformance-computing/new-user-information Helpdesk Email : brc-hpc-help@lists.berkeley.edu Monday - Friday, 9:00 am to 5:00 pm Best effort in non working hours
Thank you Questions