Introductory Tutorial for Discover - NCCS s newest Linux Networx Cluster. Software Integration and Visualization Office (SIVO)
|
|
|
- Isabella Carter
- 10 years ago
- Views:
Transcription
1 Introductory Tutorial for Discover - NCCS s newest Linux Networx Cluster by Software Integration and Visualization Office (SIVO)
2 Introduction to SIVO Who we are: Leadership in the development of advanced scientific software industry best practices that minimize risk and cost Competency in state-of-the-art visualization tools and techniques computational science modern software engineering practices Have an established and cordial relationship with NASA scientific research community What services we provide: HPC services, Parallelization, Tuning & Debugging Scientific code re-factoring and redesign Object Oriented Design (OOD) of science applications State-of-the-art visualization support Contact information:
3 Training Outline Overview of System Architecture Accessing the System File Systems Programming Environment Compiling/Optimization How to Run/ Debugging Tools for Performance and Data Analysis Bruce Van Aartsen Udaya Ranawake Hamid Oloso
4 Overview of System Architecture - Nodes 4 Front-end nodes 128 Compute Nodes: SuperMicro mother board, 0.8U Evolocity II chassis Dual Socket, dual core (4 cores per node) 120 GB hard drive 4 GB RAM
5 Overview of System Architecture - Processor Base Unit: Intel Dempsey Dual Core 3.2GHz (Xeon 5060) 1066 MHz front side bus 2 MB L2 cache per core (4 MB total per socket) 12.8 GFLOPS/socket peak performance (25.6 GF/node) Early indications with High Performance Linpack (HPL) are showing 60% - 65% of peak
6 Overview of System Architecture - Memory 4 GB RAM/node (4 x 1 GB DDR2 533 MHz) Only 3GB user-accessible Memory bandwidth 4 GB/s to the core Each core has its own Levels 1 and 2 cache Level 1 cache (Unified Instruction/Data, 32KB) Level 2 cache (Separate Instruction and data, 2 MB each)
7 Overview of System Architecture - Interconnect Dual-ported 4x InfiniBand Host Channel Adapters (HCA) PCI-Express 8x (10 Gb/s) Aggregate Bandwidth - 20 Gb/s bi-directional Latency: ~4 microseconds
8 Overview of System Architecture - Comparison SGI Altix HP Alpha Platform Linux Networx Palm/ SC45 Discover Explore Halem Total number of nodes / Total memory (GB) / Peak Performance (TFLOPS) 3.3 ~7 3.2 Operating System SUSE Linux SGI Advanced Linux Environ. (RHEL based) Tru64 UNIX
9 Overview of System Architecture - Future plan Scalable Expansion Nodes Dell mother board 1U chassis Dual socket, dual core Intel Woodcrest 2.66 GHz Both cores share 4MB L2 cache 4 GB DDR2 667 MHz FB DIMM 160 GB hard drive Visualization Nodes 16 nodes integrated into the base unit AMD Opteron dual core 280 processors (2.6 GHz) 8 GB of PC3200/DDR400 S/R DIMM 250 GB SATA NVidia Quadro FX 4500 PCI-Express
10 Accessing the System - Requirements Need an active account [email protected] Need RSA securid fob
11 Accessing the System - Local (GSFC/NASA) From GSFC hosts, log onto discover using: ssh [-XY] [email protected] X or Y are optional, enable X11 forwarding. Passcode:10 digit ID number: 4-digit PIN + 6-digit fob Host: discover Password: requires user password. Note: discover is in LDAP (Lightweight Directory Access Protocol), enables password sharing with other system in LDAP
12 Accessing the System - Remote (offsite) Login from other NASA hosts: Login into GSFC hosts to access login.nccs.nasa.gov Login from non-nasa hosts: Provide host name(s) or IP address to [email protected] Entry through login.gsfc.nasa.gov. Alternatively, apply for GSFC VPN account
13 File Systems - /discover/home /discover/nobackup IBM Global Parallel File System (GPFS) used on all nodes Serviced by 4 I/O nodes but with read/write access from all nodes /discover/home 2TB, /discover/nobackup 4 TB Individual quota : /discover/home: 500 MB /discover/nobackup: 100 GB Very fast (peak: 350 MB/sec, normal: MB/sec)
14 File Systems - Overview 128 Compute Nodes IB 4 Login Nodes 2 Gateway Nodes 1 GigE 10 GigE SAN 4 I/O Nodes DDN Control. Disk
15 File Systems - /discover/nobackup - Cont d Recommended for running jobs (more space) Not backed up Stage data in before a run Stage data out after a run For large data transfer, higher transfer rates (shared 10GB/sec) possible via data transfer commands in a script submitted to datamove queue
16 File Systems - Mass Storage (dirac) Mass storage: dirac.gsfc.nasa.gov Access using ssh, scp, sftp, ftp (anonymous)
17 File Systems - Moving Data In or Out Moving/copying files into and out of the system scp : file transfer to and from dirac.gsfc.nasa.gov sftp, ftp : file transfer from dirac.gsfc.nasa.gov dmf: ssh into required system and execute dmf related commands locally
18 Programming Environment Programming environment is controlled using modules No modules are loaded by default. The user has to add the needed modules to the shell startup file.
19 Programming Environment cont d Checking available modules module avail Checking loaded modules module list Unloading a module module unload <module name> Loading a module module load <module name> Swapping modules module swap <old module> <new module> Purging all modules module purge
20 Programming Environment cont d Typical module commands for shell startup file Initialization for tcsh source /usr/share/modules/init/tcsh Initialization for csh source /usr/share/modules/init/csh Initialization for bash./usr/share/modules/init/bash module purge module load comp/intel module load mpi/scali module load lib/mkl
21 Programming Environment cont d Things to remember: No modules are loaded at login. Explicitly load the required modules. Your MKL module must match the compiler version.
22 Compiling How to compile: Different versions of Intel compilers are available To see available compilers: module avail To see loaded compiler: module list Use an appropriate module command to select another compiler
23 Compiling - Sequential Build Intel compilers ifort <flags> code.f (or.f90) icc <flags> code.c icpc <flags> code.cpp (or.c)
24 Compiling - OpenMP Build Intel compilers ifort -openmp <flags> code.f (or.f90) icc -openmp <flags> code.c icpc -openmp <flags> code.cpp (or.c)
25 Compiling - MPI Build Default MPI Library: ScaMPI by Scali Intel compilers mpif90 <flags> code.f (or.f90) mpicc <flags> code.c mpic++ <flags> code.cpp (or.c)
26 Optimization and Debugging Flags -O0: No optimization. -O2: Default optimization level. -O3: Includes more software pipelining, prefetching, and loop reorganization. -g : Generates symbolic debugging information and line numbers in the object code for use by source-level debuggers. -openmp: Enables the parallelizer to generate multithreaded code based on the OpenMP directives.
27 Optimization and Debugging Flags contd. -traceback: Available in Version 8 and higher compilers only. Tells the compiler to generate extra information in the object file to allow the display of source file traceback information at runtime when a severe error occurs. The default is -notraceback.
28 Some things to Remember If the build fails: Check the compiler version; you may need to swap modules Check the optimization level; a different level might work If the link fails: Check that environment variables include the correct paths If you see the following errors during link: /usr/local/intel/fce/ /lib/libcxa.so.5: undefined reference to uw_parse_lsda_info. Try adding -lunwind to your link flags If you see the following warning during link: /usr/local/intel/fce/ /lib/libimf.so: warning; warning: feupdateenv is not implemented and will always fail. Try adding -i_dynamic option to your link flags
29 Third Party Libraries Will be installed under /usr/local/other. Baselibs will be in this directory. Some of the important software packages available under baselibs are: esmf netcdf hdf
30 How to Run Uses PBS Queuing System Queues described below, with * notations defined on next slide Queue name high_priority (*) general_hi (*) general general_small debug pproc datamove (**) background router # nodes (***) Resource limit Wall (hours) , (***)
31 How to Run cont d (*) Restricted; requires permission (**) Queue specifically set up for fast data transfer. Mapped to gateway nodes. Allows up to four data movement jobs per gateway node (***) router is the default when no queue has been selected. Appropriately maps jobs to one of debug, pproc, general_small and general.
32 How to Run cont d Frequently used commands qsub: To submit a job qsub myscript qstat: To check job status qstat -a for all jobs qstat -au myuserid for my jobs alone qstat -Q[f] to see what queues are available. Adding the f option provides details. qdel: To delete a job from the queue qdel jobid (Obtain jobid from qstat)
33 How to Run cont d Interactive batch: qsub -V -I -l select=4,walltime=04:00:00 -V exports user s environment to the PBS job -I implies an interactive batch session -l requests resources e.g. 4 nodes and 4 hrs of wall time You will get a terminal session that you can use for regular interactive activities Batch: qsub <batch_script>
34 How to Run cont d: Common PBS options #PBS -W group_list=<value> - To specify group name(s) under which this job should run #PBS -m mail_options - To control condition under which mail about the job is sent. Mail_options can be any combination of a (send mail when job aborts), b (send mail when job begins execution) and c ( send mail when job terminates) #PBS -l resource=<value> - To request resource e.g.: #PBS -l select=4 - To requests four execution units or nodes #PBS -l walltime=4:00:00 - To request 4 hours of run time
35 How to Run cont d: Common PBS options #PBS -o myjob.out - To redirect standard output to myjob.out #PBS -e myjob.err - To redirect standard error to myjob.err #PBS -j oe - To merge std output and std error into standard output #PBS -j eo - To merge std output and std error into standard error #PBS -N myjob - To set the job name to myjob, else job name is script name Useful PBS environmental variable is $PBS_O_WORKDIR. It is the absolute path to the directory where a PBS job is submitted from
36 How to Run cont d: Example MPI batch script #!/bin/csh -f #PBS -W group_list=g0000 #PBS -m e #PBS -l select=4 #PBS -l walltime=4:00:00 #PBS -o myjob.out #PBS -j oe #PBS -N myjob cd $PBS_O_WORKDIR # For MPI job. # Note: Run maximum of 4 processes per node. # Depending on application, fewer processes per node # may be better mpirun -np 16 -inherit_limits./myjob
37 How to Run cont d: Example serial batch script #!/bin/csh -f #PBS -W group_list=g0000 #PBS -m e #PBS -l select=1 #PBS -l walltime=4:00:00 #PBS -o myjob.out #PBS -j oe #PBS -N myjob cd $PBS_O_WORKDIR # For serial job./myjob
38 How to Run cont d: Example OpenMP batch script #!/bin/csh -f #PBS -W group_list=g0000 #PBS -m e #PBS -l select=1 #PBS -l walltime=4:00:00 #PBS -o myjob.out #PBS -j oe #PBS -N myjob cd $PBS_O_WORKDIR # For OpenMP job setenv OMP_NUM_THREADS 4 # or fewer than 4 if you so desire./myjob
39 How to Run cont d: Example hybrid MPI/OpenMP batch script #!/bin/csh -f #PBS -W group_list=g0000 #PBS -m e #PBS -l select=16 #PBS -l walltime=4:00:00 #PBS -o myjob.out #PBS -j oe #PBS -N myjob cd $PBS_O_WORKDIR #For OpenMP job setenv OMP_NUM_THREADS 4 mpirun -np 16 -inherit_limits./myjob #Or fewer than 16 if you so desire #Note: np*omp_num_threads select*4. #As another example: #setenv OMP_NUM_THREADS 2 #mpirun -np 32 -inherit_limits./myjob
40 How to Run - Environment Variables Important environmental variables OMP_NUM_THREADS To set number of OpenMP threads Default value is 1 regardless of type of job User limits stacksize: Reset to a large value; unlimited works just fine memoryuse: Set this to 3GB/#processes per node vmemoryuse: set this to memoryuse value Set these with limit command in tcsh, or ulimit command in bas In tcsh: limit stacksize (for 500 MB) In bash: ulimit -s (for 500 MB) MPI mpirun -inherit_limits passes user s limit to the jobs
41 List of available debuggers totalview gdb Tools for Debugging (remember to add -g -traceback flags when compiling to enable source level debugging) How to launch totalview (still working X-Display issues) Sequential & OpenMP: totalview myjob MPI: totalview mpirun -a -np 4 -inherit_limits myjob
42 Performance Analysis Tools The following tools for HEC software analysis are planned for installation on the system. Intel Trace Analyzer and Collector: Allows optimizing, high-performance applications Intel Vtune : allows developers to locate performance bottlenecks Open SpeedShop: is an open source tool to support performance analysis PAPI: The Performance API for accessing hardware counters TAU: is a profiling and tracing toolkit for performance of parallel programs Future availability and potential training sessions for these software tools will be advertised pending product compatibility testing on discover. More information about these tools are on the following slides.
43 Data Analysis and Visualization Tools The following tools for data analysis and visualization are planned for installation on the system as well: GrADs (Grid Analysis and Display System) NCAR Graphics NCView (netcdf visual browser) NCO (NetCDF Operators) Cdat and VCDat (Visual Climate Analysis Tools ) GEL (Graphic Encapsulation Library) FFTW ("Fastest Fourier Transform in the West") Future availability and potential training sessions for these software tools will be advertised pending product compatibility testing on discover. More information about these tools can be found on the following slides.
44 Profiler Intel Vtune: Performance Analysis Tools -Currently untested The VTune Performance Analyzer allows developers to identify and locate performance bottlenecks in their code. Vtune provides visibility into the inner workings of the application as it executes. Unique capabilities: Low Overhead (less than 5% ). VTune Performance Analyzer profiles everything that executes on the CPU. The sampling profiler not only provides valuable information about the application but also about the OS, third party libraries, and drivers. Parts of it (the Sampling and Call Graph profilers) do not require any source code. Sampling Does Not Require Instrumentation. Analyzer only needs debug information to provide valuable performance information. The developer can even profile modules without debug information. That allows to focus the development effort when optimizing application performance.
45 Performance Analysis Tools -Intel Trace Analyzer and Collector -Currently untested Helps to analyze, optimize, and deploy high-performance applications on Intel processor-based clusters. It provides information critical to understanding and optimizing MPI cluster performance by quickly finding performance bottlenecks with MPI communication. Includes trace file comparison, counter data displays, and an optional MPI correctness checking library is available. Timeline Views and Parallelism Display Advanced GUI, Display Scalability, Detailed and Aggregate Views Metrics Tracking Execution Statistics, Profiling Library, Statistics Readability, Logs, information for function calls, sent messages, etc. Scalability Low Overhead, Thread Safety, Fail-Safe Mode, Filtering and Memory Handling Low Intrusion Instrumentation, Binary Instrumentation (for IA-32 and Intel 64) Intel MPI Library
46 Performance Analysis Tools PAPI -Coming Soon The Performance API (PAPI) specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count Events, occurrences of specific signals related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis including : hand tuning, compiler optimization, benchmarking, monitoring performance modeling And analysis of bottlenecks in high performance computing.
47 Performance Analysis Tools TAU -Coming Soon - TAU (Tuning and Analysis Utilities) is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements. TAUs profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools. Open SpeedShop -Coming Soon Open SpeedShop is an open source multi platform Linux performance tool to support performance analysis of applications running on both single node and large scale IA64, IA32, EM64T, and AMD64 platforms. Open SpeedShop's base functionality includes metrics like exclusive and inclusive user time, MPI call tracing, and CPU hardware performance counter experiments. In addition, It supports several levels of plugins which allow users to add their own performance experiments.
48 Data Analysis and Visualization Tools GrADs (Grid Analysis and Display System) -Coming Soon GrADs is an interactive desktop tool that is used for easy access, manipulation, and visualization of earth science data. NCAR Graphics -Coming Soon NCAR Graphics is a library containing Fortran/C utilities for drawing contours, maps, vectors, streamlines, weather maps, surfaces, histograms, X/Y plots, annotations, etc. NCView (netcdf visual browser) -Coming Soon NCView is a visual browser for netcdf format files. Typically you would use ncview to get a quick and easy, push-button look at your netcdf files. You can view simple movies of the data, view along various dimensions, take a look at the actual data values, change color maps, invert the data, etc.
49 Data Analysis and Visualization Tools NCO (NetCDF Operators) -Coming Soon NCO is a suite of programs or operators that perform a set of operations on a netcdf or HDF4 file to produce a new netcdf file as output. Cdat and VCDat (Visual Climate Analysis Tools ) -Coming Soon VCDat the offical Graphical User Interface (GUI) for the Climate Data Analysis Tools (CDAT). Pronounced as v-c-dat, it greatly simplifies the use of CDAT and can be used with no knowledge of Python.
50 Data Analysis and Visualization Tools GEL (Graphic Encapsulation Library) -Coming Soon Gel is a sample vertical application for visualizing CFD data built upon the horizontal products of the Component Framework, including FEL, VisTech, GelLib, and Env3D. Visualizes surfaces, contours, isosurfaces, cutting planes, vector fields, field lines, vortex cores, flow separation lines etc. FTW("Fastest Fourier Transform in the West") -Coming Soon FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data.
The CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
Hodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
NEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
Getting Started with HPC
Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959
Using NeSI HPC Resources. NeSI Computational Science Team ([email protected])
NeSI Computational Science Team ([email protected]) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
End-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
The Asterope compute cluster
The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
Introduction to HPC Workshop. Center for e-research ([email protected])
Center for e-research ([email protected]) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
-------- Overview --------
------------------------------------------------------------------- Intel(R) Trace Analyzer and Collector 9.1 Update 1 for Windows* OS Release Notes -------------------------------------------------------------------
High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina
High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers
A Crash course to (The) Bighouse
A Crash course to (The) Bighouse Brock Palen [email protected] SVTI Users meeting Sep 20th Outline 1 Resources Configuration Hardware 2 Architecture ccnuma Altix 4700 Brick 3 Software Packaged Software
Miami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
LANL Computing Environment for PSAAP Partners
LANL Computing Environment for PSAAP Partners Robert Cunningham [email protected] HPC Systems Group (HPC-3) July 2011 LANL Resources Available To Alliance Users Mapache is new, has a Lobo-like allocation Linux
HPC system startup manual (version 1.30)
HPC system startup manual (version 1.30) Document change log Issue Date Change 1 12/1/2012 New document 2 10/22/2013 Added the information of supported OS 3 10/22/2013 Changed the example 1 for data download
Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected]
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected] Setup Login to Ranger: - ssh -X [email protected] Make sure you can export graphics
Manual for using Super Computing Resources
Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad
Cluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
Parallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
An Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria
Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew
1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology
Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science
CAS2K5. Jim Tuccillo [email protected] 912.576.5215
CAS2K5 Jim Tuccillo [email protected] 912.576.5215 Agenda icorporate Overview isystem Architecture inode Design iprocessor Options iinterconnect Options ihigh Performance File Systems Lustre isystem Management
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
HPC at IU Overview. Abhinav Thota Research Technologies Indiana University
HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
RA MPI Compilers Debuggers Profiling. March 25, 2009
RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar
Grid 101. Grid 101. Josh Hegie. [email protected] http://hpc.unr.edu
Grid 101 Josh Hegie [email protected] http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging
Parallel Debugging with DDT
Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece
Introduction to ACENET Accelerating Discovery with Computational Research May, 2015
Introduction to ACENET Accelerating Discovery with Computational Research May, 2015 What is ACENET? What is ACENET? Shared regional resource for... high-performance computing (HPC) remote collaboration
Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
High Performance Computing in Aachen
High Performance Computing in Aachen Samuel Sarholz [email protected] aachen.de Center for Computing and Communication RWTH Aachen University HPC unter Linux Sep 15, RWTH Aachen Agenda o Hardware o Development
MPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
Altix Usage and Application Programming. Welcome and Introduction
Zentrum für Informationsdienste und Hochleistungsrechnen Altix Usage and Application Programming Welcome and Introduction Zellescher Weg 12 Tel. +49 351-463 - 35450 Dresden, November 30th 2005 Wolfgang
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
RWTH GPU Cluster. Sandra Wienke [email protected] November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke [email protected] November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Parallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
IMPLEMENTING GREEN IT
Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
24/08/2004. Introductory User Guide
24/08/2004 Introductory User Guide CSAR Introductory User Guide Introduction This material is designed to provide new users with all the information they need to access and use the SGI systems provided
ABAQUS High Performance Computing Environment at Nokia
ABAQUS High Performance Computing Environment at Nokia Juha M. Korpela Nokia Corporation Abstract: The new commodity high performance computing (HPC) hardware together with the recent ABAQUS performance
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Introduction to Supercomputing with Janus
Introduction to Supercomputing with Janus Shelley Knuth [email protected] Peter Ruprecht [email protected] www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?
Introduction to SDSC systems and data analytics software packages "
Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni ([email protected]) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
A Pattern-Based Approach to. Automated Application Performance Analysis
A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,
WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION
WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution
Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda
Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
GRID Computing: CAS Style
CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch
Linux tools for debugging and profiling MPI codes
Competence in High Performance Computing Linux tools for debugging and profiling MPI codes Werner Krotz-Vogel, Pallas GmbH MRCCS September 02000 Pallas GmbH Hermülheimer Straße 10 D-50321
System Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
PrimeRail Installation Notes Version A-2008.06 June 9, 2008 1
PrimeRail Installation Notes Version A-2008.06 June 9, 2008 1 These installation notes present information about installing PrimeRail version A-2008.06 in the following sections: Media Availability and
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville [email protected]
A Brief Survery of Linux Performance Engineering Philip J. Mucci University of Tennessee, Knoxville [email protected] Overview On chip Hardware Performance Counters Linux Performance Counter Infrastructure
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
Windows HPC 2008 Cluster Launch
Windows HPC 2008 Cluster Launch Regionales Rechenzentrum Erlangen (RRZE) Johannes Habich [email protected] Launch overview Small presentation and basic introduction Questions and answers Hands-On
High Performance Computing at the Oak Ridge Leadership Computing Facility
Page 1 High Performance Computing at the Oak Ridge Leadership Computing Facility Page 2 Outline Our Mission Computer Systems: Present, Past, Future Challenges Along the Way Resources for Users Page 3 Our
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center
SRNWP Workshop HP Solutions and Activities in Climate & Weather Research Michael Riedmann European Performance Center Agenda A bit of marketing: HP Solutions for HPC A few words about recent Met deals
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Grid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
Why are we teaching you VisIt?
VisIt Tutorial Why are we teaching you VisIt? Interactive (GUI) Visualization and Analysis tool Multiplatform, Free and Open Source The interface looks the same whether you run locally or remotely, serial
Building Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl [email protected] CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
Introduction to Sun Grid Engine (SGE)
Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems
Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance
How to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
CHAPTER FIVE RESULT ANALYSIS
CHAPTER FIVE RESULT ANALYSIS 5.1 Chapter Introduction 5.2 Discussion of Results 5.3 Performance Comparisons 5.4 Chapter Summary 61 5.1 Chapter Introduction This chapter outlines the results obtained from
