LANL Computing Environment for PSAAP Partners



Similar documents
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

The Asterope compute cluster

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Designed for Maximum Accelerator Performance

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Getting Started with HPC

The CNMS Computer Cluster

An introduction to Fyrkat

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

High Performance Computing in Aachen

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

MPI / ClusterTools Update and Plans

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Overview of HPC systems and software available within

Building Clusters for Gromacs and other HPC applications

Smarter Cluster Supercomputing from the Supercomputer Experts

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

FLOW-3D Performance Benchmark and Profiling. September 2012

- An Essential Building Block for Stable and Reliable Compute Clusters

CAS2K5. Jim Tuccillo

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

SUN HPC SOFTWARE CLUSTERING MADE EASY

Cluster Implementation and Management; Scheduling

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Sun Constellation System: The Open Petascale Computing Architecture

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

IMPLEMENTING GREEN IT

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

HPC Software Requirements to Support an HPC Cluster Supercomputer

Berkeley Research Computing. Town Hall Meeting Savio Overview

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

Fujitsu HPC Cluster Suite

Parallel Processing using the LOTUS cluster

Introduction to Supercomputing with Janus

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Streamline Computing Linux Cluster User Training. ( Nottingham University)

SR-IOV: Performance Benefits for Virtualized Interconnects!

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Designed for Maximum Accelerator Performance

Hack the Gibson. John Fitzpatrick Luke Jennings. Exploiting Supercomputers. 44Con Edition September Public EXTERNAL

An Introduction to High Performance Computing in the Department

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

Overview of HPC Resources at Vanderbilt

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Parallel Programming Survey

icer Bioinformatics Support Fall 2011

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Hadoop on the Gordon Data Intensive Cluster

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Cloud Computing through Virtualization and HPC technologies

High Performance Computing Infrastructure at DESY

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

supercomputing. simplified.

Using NeSI HPC Resources. NeSI Computational Science Team

The Information Technology Solution. Denis Foueillassar TELEDOS project coordinator

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

1 Bull, 2011 Bull Extreme Computing

The RWTH Compute Cluster Environment

MareNostrum 3 Javier Bartolomé BSC System Head Barcelona, April 2015

Data Management. Network transfers

A High Performance Computing Scheduling and Resource Management Primer

OpenMP Programming on ScaleMP

Using the Windows Cluster

CNR-INFM DEMOCRITOS and SISSA elab Trieste

Debugging with TotalView

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Archival Storage At LANL Past, Present and Future

Windows HPC 2008 Cluster Launch

Can High-Performance Interconnects Benefit Memcached and Hadoop?

HPC Update: Engagement Model

High Performance Computing in CST STUDIO SUITE

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

BG/Q Performance Tools. Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 Argonne Leadership CompuGng Facility

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

The GRID and the Linux Farm at the RCF

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Linux Cluster Computing An Administrator s Perspective

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Analisi di un servizio SRM: StoRM

ECDF Infrastructure Refresh - Requirements Consultation Document

Lecture 2 Parallel Programming Platforms

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Smarter Cluster Supercomputing from the Supercomputer Experts

Data Management Best Practices

High Performance Computing at the Oak Ridge Leadership Computing Facility

ELEC 377. Operating Systems. Week 1 Class 3

Parallele Dateisysteme für Linux und Solaris. Roland Rambau Principal Engineer GSE Sun Microsystems GmbH

Transcription:

LANL Computing Environment for PSAAP Partners Robert Cunningham rtc@lanl.gov HPC Systems Group (HPC-3) July 2011

LANL Resources Available To Alliance Users Mapache is new, has a Lobo-like allocation Linux (TOSS) cluster, Moab scheduler, shared /scratchn 4,736 Xeons with IB, 50.4 TF,? procs/job Conejo: companion to Mapache small ASC allocation Lobo is current workhorse TLCC platform Linux (TOSS), Moab scheduler, shared /scratchn 4,352 compute CPUs; max 2,144 procs/job Small Roadrunner hybrid platform available: Cerrillos Big future for Turquoise (Mustang, etc.) but not big ASC

LANL Mapache Cluster SGI XE1300 Series with quad-core Intel Xeon 5550 75% allocation for ASC, primarily for PSAAP Architecture: 4,352 x Intel Xeon@ 2.66 GHz cores on 592 compute nodes = 8 cores/node Mellanox InfiniBand interconnect 14.2 TeraBytes RAM = 24GB/node Theoretical peak of ~50.4 teraflops

LANL Lobo Cluster Standard DoE Lab cookie-cutter cluster from Appro, Intl. 75% allocation for ASC, primarily for PSAAP 2 Connected Units (CUs) combined include: 4,352 x AMD Opteron @ 2.2 GHz cores on compute nodes Voltaire InfiniBand interconnect 8.7 TeraBytes RAM Theoretical peak of ~17.4 teraflops

LANL Cerrillos Cluster Hybrid architecture: Opteron+Cell, from IBM. 25% allocation for ASC, primarily for PSAAP 2 Connected Units (CUs) combined include: 1,440 x AMD Opteron @ 1.8 GHz cores on compute nodes 1,440 Cell Broadband Engines Voltaire InfiniBand interconnect 11.8 TeraBytes RAM Theoretical peak of ~152 teraflops

LANL TLCC2 Cluster Plans No order placed, so this is a prediction only, for one ASC cluster in Turquoise network: moonlight.lanl.gov Hybrid architecture: Opteron+GPGPU, from Appro Primarily for PSAAP, but other ASC will use it Expected in ~December 308 compute nodes, ie. 2 SUs Intel Sandy Bridge, dual proc node Nvidia M2090 Qlogic InfiniBand interconnect Theoretical peak of ~1.6 TF/node

LANL HPC Environment Obtain an account, acquire cryptocard (foreign nationals start early!) Access HPC platforms in Open Collaborative (Turquoise) Network via firewall/gateway: ssh wtrw.lanl.gov After connecting, ssh into front-end: lo-fe[1 2], mp-fe1, ce-fe[1 2] Moab + [Slurm Torque] schedules nodes, batch or interactive msub scriptname -or - msub I checkjob, showstate, showq, mjobctl mpirun to run parallel jobs across nodes Fairshare scheduling to deliver pre-determined allocation Unique security model no connecting (ssh) between platforms No kerberos tickets; keep cryptocard handy scp or sftp using File Transfer Agents (FTAs), turq-fta1.lanl.gov and turqfta2.lanl.gov

LANL HPC Environment Usage model Front-end: text editing, job script set-up, pathname arrangements, compilations, linking, and job launching Compute nodes: run applications I/O nodes: out to local disk, no direct user access Possible intra-network File Transfer Agent (FTA) arrangement in future Modulefiles to establish compilers, libraries, tools in $PATH, $LD_LIBRARY_PATH Compilers: Intel, PGI, Pathscale. MPI: OpenMPI, Mvapich Math libraries provided by compiler vendors + ATLAS

LANL Turquoise HPC Storage Tiny home directories, not shared between clusters (security) Larger NFS-based workspace in /usr/projects/proj_name Big parallel, globally-accessible filesystem: /scratchn Cross-mounted to all HPC nodes in Turquoise ~800 TB total space Fast for parallel I/O, slower than NFS for serial transfers No automated back-ups Purged weekly, 30-days or older! Archival storage, offline, available via GPFS File transfers an everpresent problem on the brink of a solution

Parallel Scalable Back Bone (PaScalBB) Disk Poor Clients cluster cluster cluster cluster Job Queue Scalable OBFS Enterprise Global FS File System Gateways BackBone Object Archive Diskless Clients cluster cluster cluster cluster Relieve the compute nodes Multiple clusters sharing large, global namespace parallel I/O subcluster Includes Cerrillos/Lobo/Coyote Network is combination of HPC Interconnect + commodity networking bridge Panasas is storage vendor I/O through a set of fileserver nodes over IB; nodes serve as interconnect<->gige routers.

LANL Turquoise File Transfers File Transfers between Turquoise and the wild are S L O W, we are addressing this now; may need your help to test. Throttled by gateway/firewall and security: ~1 MB/s! Packet reordering Sniffing Only scp allowed today: encryption Panasas filesystem (/scratchn) slow serial All data routed through tiny wtrw, twice Solution currently in testing: double-hop through a security enclave GPFS-based way station Parallel transfers using bbcp Orders of magnitude faster: 10s MB/s up to low 100s Turquoise holds the future for unclassified work, big changes ahead

Turquoise High Performance File Transfer Service Turquoise HPC Clusters wtrw GPFS Transfer New File Transfer New Perf Sonar SSH Existing New bbcp gridftp LANL Authentication Firewall LANL Yellow Collaborators via the Internet

LANL Turquoise Tools TAU (Tuning and Analysis Utils) -- profiling and tracing toolkit http://www.cs.uoregon.edu/research/tau/home.php STAT (Stack Trace Analysis Tool) from LLNL https://computing.llnl.gov/code/stat/ Boost C++ Utility Libraries http://www.boost.org/doc/libs/1_46_1/libs/utility/index.html Open SpeedShop http://www.openspeedshop.org/wp/ sampling experiments callstack analysis hardware performance counters MPI profiling and tracing I/O profiling and tracing floating-point exception analysis

LANL Turquoise Tools Javelina -- code coverage tool that uses dynamic instrumentation http://javelina-cc.sourceforge.net/ Valgrind -- instrumentation framework for dynamic analysis, http://valgrind.org/ memory errors cache and branch prediction profiler thread error detection heap profiling memp parallel heap profiling library http://memp.sourceforge.net/ mpip Lightweight, Scalable MPI profiling http://mpip.sourceforge.net/

LANL Debugging gdb (Gnu debugger) comes with distro http://www.gnu.org/software/gdb/ Totalview interactive medium-scale parallel debugger http://www.roguewave.com/company/overview/totalview-techacquisition.aspx parallel independent process views ThreadSpotter MemoryScape ReplayEngine GUI or command-line (TCL)

3 Useful LANL Web Sites for Users User Docs http://bear.lanl.gov/drupal/?q=tlcc_home (or) https://bear.lanl.gov/drupal (or call 505-665-4444 option 3, consult@lanl.gov) Calendar http://ccp.lanl.gov HPC Training http://int.lanl.gov/projects/asci/training

HPC Accounts Don t forget Photo Op!

Questions?