Introducing High Performance Computing at Marquette

Similar documents

Getting Started with HPC

Manual for using Super Computing Resources

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Introduction to Sun Grid Engine (SGE)

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Installing and running COMSOL on a Linux cluster

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Hodor and Bran - Job Scheduling and PBS Scripts

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Using Parallel Computing to Run Multiple Jobs

NEC HPC-Linux-Cluster

The CNMS Computer Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

MPI / ClusterTools Update and Plans

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

An Introduction to High Performance Computing in the Department

Using the Yale HPC Clusters

Grid 101. Grid 101. Josh Hegie.

GRID Computing: CAS Style

Using NeSI HPC Resources. NeSI Computational Science Team

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Asterope compute cluster

Clusters: Mainstream Technology for CAE

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Quick Tutorial for Portable Batch System (PBS)

Miami University RedHawk Cluster Working with batch jobs on the Cluster

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

ABAQUS High Performance Computing Environment at Nokia

HP reference configuration for entry-level SAS Grid Manager solutions

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Grid Engine Users Guide p1 Edition

FLOW-3D Performance Benchmark and Profiling. September 2012

The RWTH Compute Cluster Environment

Running applications on the Cray XC30 4/12/2015

Parallel Debugging with DDT

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

Using the Windows Cluster

Parallel Processing using the LOTUS cluster

Building Clusters for Gromacs and other HPC applications

Introduction to SDSC systems and data analytics software packages "

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

HPC system startup manual (version 1.30)

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Microsoft Windows Compute Cluster Server 2003 Getting Started Guide

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Grid Scheduling Dictionary of Terms and Keywords

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

PRIMERGY server-based High Performance Computing solutions

Cloud Computing through Virtualization and HPC technologies

LSKA 2010 Survey Report Job Scheduler

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

NYUAD HPC Center Running Jobs

locuz.com HPC App Portal V2.0 DATASHEET

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

UMass High Performance Computing Center

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

High Performance Computing

Overview of HPC Resources at Vanderbilt

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Data management on HPC platforms

Batch Scripts for RA & Mio

wu.cloud: Insights Gained from Operating a Private Cloud System

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

SGE Roll: Users Guide. Version Edition

Assignment # 1 (Cloud Computing Security)

Introduction to HPC Workshop. Center for e-research

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Linux Cluster Computing An Administrator s Perspective

Automating Big Data Benchmarking for Different Architectures with ALOJA

Introduction to MSI* for PubH 8403

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

24/08/2004. Introductory User Guide

Visualization Cluster Getting Started

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

Transcription:

Introducing High Performance Computing at Marquette Xizhou Feng, Ph.D. Research Engineer, IT Services Research Assistant Professor, MSCS Marquette University xizhou.feng@marquette.edu September 5, 2012

Experiment Theory Computing The Need of High Performance Computing Computing is the third pillar for scientific discovery Research computing provide the infrastructure that Enables science at scale Advances research program Responds to new opportunity Discovery & Innovation Computing Infrastructure Physical Sciences, Economics, Social Sciences, Engineering, Humanities 2

Research Computing: Support HPC in Campus Research computing is the application of computing resources and tools in conducting research, scholarship and creative activity. Its scope includes but not limited to: Computing, storage, and networking resources Large-scale data/database management Software for modeling, simulation, and analysis Ubiquitous, fully-supported cyberinfrastructure Support for incorporating advanced computing technology to the research programs Research computing bigger & faster computer 3

Research Computing Support@ Marquette HPCGC Campus Champions Advise Policy Direction Plan Monitor Report ITS System RCS Collaborate Request Suggestion Collaborate Manage Service Support Collaborate System Researchers Computational Scientists HPC Users Research Computing Community 4

Available HPC Resources to Marquette Users Local resources Pere Cluster PARIO Cluster HPCL Cluster MUGrid (Condor pool) Regional resources SeWHIP National resources XSEDE Open Science Grid, NCSA, ORNL, DOE resources Commercial resources 5

Infiniband Interconnection Gigabit Ethernet INtetconnection The Pere Cluster Marquette Data Center 10GE Active Directory Center GE hn1 hn2 DDR 4x5 Gbps E8 : cn113-cn128 msa1 msa1 E1 : cn1-cn16 6

Pere Hardware Configuration 2 ProLiant DL380 G6 Server as head node Two Intel Xeon X5550@2.67GHz Quad-core CPU Two 72GB hard drivers (RAID 1) One Mellanox MT26418 IB DDR NIC Two NetXen NX3031 Ethernet Controller 128 Compute nodes: HP roliant BL280c G6 blade Two Intel Xeon X5550@2.67GHz Quadcore CPU Two local hard driver: 120GB + 500GB One Mellanox MT25418 IB DDR NIC One Intel 82576 Gigabit Ethernet controller 2 HP MSA2012sa storage racks Each rack has 3 enclosures Each enclosure has 11 750GB 7200 RPM SATA disks configured with RAID10 (~20TB available storage) 7

Pere Software Configuration O.S.: Red Hat Enterprise Linux 5 5.6 Authentication: AD + winbind Integrated with Marquette authentication infrastructure Workload scheduler: TORQUE/PBS: cn1-64 Condor: cn65-128 Programming models Task parallel OpenMP MPI MPI+OpenMP 8

Sample Applications Running on Pere Biomedical Simvascular (Blood flow) Neuron (computational neuroscience) Medical imaging processing Nerual simualtion Chemistry Gaussian Amber cyana Autodock Molpro Mechanical Converge (CFD) Electrical MATLAB MSCS MATLAB Bioinformatics apps Parallel computing course Business Stata 9

Access the Pere Cluster Get an account on Pere Fill the account request form Email it its-rcs@mu.edu Login the Cluster ssh <your-mu-id>@pere.marquette.edu ssh -X <your-mu-id>@pere.marquette.edu Account management User authentication is based on Active Directory Same user id and password emarq/checkmarq

Transfer File between Pere and Desktop Method 1: sftp (text or GUI) sftp <muid>@pere.mu.edu put simple.c bye Method 2: scp scp simple.c muid@pere.mu.edu:example/ Method 3: rsync rsync -rsh=ssh -av example \ muid@pere.mu.edu: Method 4: svn or cvs svn co svn+ssh://<svn-host-repo>/example

Transfer File between Pere and Desktop Method 5: Mount your home on Pere as a network drive User needs request to enable this feature 12

Developing & Running Parallel Code 13

Workload Management/Job Scheduler A kind of software that provide Job submission and automatic execution Job monitoring and control Resource management Priority management Checkpoint Usually implemented as master/slave architecture Pere current uses both PBS/TORQUE and Condor

Using PBS/TORQUE Common used Command qsub myjob.qsub submit job scripts qstat view job status qdel job-id delete job pbsnodes show nodes status pbstop show queue status 15

Sample Job Scripts on Pere #!/bin/sh #PBS -N hpl #PBS -l nodes=64:ppn=8,walltime=01:00:00 #PBS -q batch #PBS -j oe #PBS -o hpl-$pbs_jobid.log module load mpich2/intel/1.4.1 cd $PBS_O_WORKDIR cat $PBS_NODEFILE Assign a name to the job Request resources: 64 nodes, each with 8 processors, 1 hour Submit to batch queue Merge stdout and stderr output Redirect output to a file Load environment variablesdir Change work dir to current dir Print allocated nodes (not required) mpirun -np 512 --hostfile `echo $PBS_NODEFILE` Run the xhpl mpi program

Using Condor Resources: http://www.cs.wisc.edu/condor/tutorials/

Using Condor 1. Write a submit script simple.job Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.out Error = simple.error Queue 2. Submit the script to condor pool condor_submit simple.job 3. Watch the job run condor_q condor_q sub <youusername>

Doing a Parameter Sweep Can put a collections of jobs in the same submit scripts to do a parameter sweep. Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.$(process).out Error = simple.$(process).error Queue Arguments = 4 11 Queue Tell condor to use different output for each job Use queue to tell the individual jobs Can be run independently Arguments = 4 12 Queue

Condor DAGMAN DAGMAN lets you submit complex sequences of jobs as long as they can be expressed as a directed acyclic graph Commands: condor_submit_dag simple.dag./watch_condor_q

Using XSEDE Resources If you need more computing power, consider XSEDE. What is XSEDE? Extreme Science and Engineering Discovery Environment A single virtual system that scientists can use to interactively share computing resources, data and expertise XSEDE resources are free to academic users Allocation requests are need, but we can help Campus Champions: Lars Olson and me 21

HPC Systems Available on XSEDE 22

Best Practice of using Shared HPC Systems Setup a comfortable local environment on your desktop SSH client: SSH secure client, Putty) Linux VM: VMWare + CentOS + Shared folder Use public key for authentication Be familiar with Unix environment Editing files with vi or emacs Working with files & directories Working with shell environment and scripting tools Working with basic Unix programing tools Security concerns: backup, password, and file access permission 23

Best Practice of using Shared HPC Systems Understand the basics of HPC Typical HPC system architecture SMP/Cluster/Grid/Heterogeneous systems Parallel computing models/paradigms Job Parallel/Data Parallel/OpenMP/MPI/PGAS/MapReduce Common tools available on HPC environment Environment modules Job schedulers: PBS, SGE, LSF, Condor, etc. Parallel compilers: gcc, intel, pgi, etc Consult system documentations Queue systems Data storage System policy 24

Best Practice of using Shared HPC Systems Automate your workflow Develop scripts to wrap/simplify the commands for preparing/transferring/cleaning data Use scripts/tools to glue related tasks Use the appropriate queues cvtec: for simvascular, limited to 5 jobs batch: for other PBS jobs, no limit condor: for Condor jobs Request the right number of node for each job The bell-curve of typical parallel speedup Profile with short runs to determine the optimal number of node before launching many long runs Try to use all the cores on a single node to prevent interferences from other jobs 25

Best Practice of using Shared HPC Systems Pay attention to data management Consider using a database to manage input data, simulation configuration, and results Store your data in a well-organized directory structure Routinely back up data from cluster to your desktop Regularly check the available storage space on the cluster and remove unused temporal data Optimize job for better performance Use an optimized version of your code Reduce unnecessary data movement Choose a proper intervals for check-pointing Use different file systems for different purpose 26

Best Practice of using Shared HPC Systems Get help from the community Research Computing Support at Marquette Solve Technical issues Help scripts/solutions Advise Job/application optimize Provide Special training sessions Attend training/tutorial sessions Local user Community XSEDE resources 27

System and User Support User Interface and Collaboration Applications Data Store Visualization Runtimes and Middleware (MPI, OpenMP, UPC, PBS, Condor) Operating System Computing Resources (clusters, networks, storage, power, cooling, etc) On-demand Priority-based Guaranteed 28

Motivating Examples 29

Example 1: High Performance Bayesian Phylogenetics The problem: accurately and efficiently construct large evolutionary tree using genomic data The challenges Extremely computational intensive Large memory footprint Large number of datasets Italy 1998 Romania 1996 Kenya 1998 New York 1999 Israel 1998 Israel 1998 New York 1999 Kenya 1998 Romania 1996 Italy 1998 Lemur Gorilla Chimpanzee Human 30

The solution The solution 1. Develop highly scalable parallel algorithms (PBPI) 1400X speedup for 256 processors (or reducing time from ~40 hours to 1.7 minutes) Support very large data set with distributed memory Scaling up to 4000 processors enabling large science 2. Customize scripts to automate data generation, analysis, and summary 3. Use HPC and Teragrid to speedup analysis by running hundreds of analysis in parallel Research previous done in years can be completed in weeks 31

Example 2: Individual-based computational epidemiology The problems: preparing pandemic influenza with policy informatics 1918 pandemics killed >25 million people worldwide (548,452 in US) It is only a matter of time that before the a human flu pandemic grips the world. A novel flu strain that can easily transmit between human could trigger a disease pandemic that overburdens existing public health infrastructure 32

The solution: HPC-supported Individual-based computational epidemiology Investigate how infectious disease spread through large populations Provide tools for experts to test different public health interventions Population Mobility Disease Models b Ib L1 Cb a Ia L2 Ca c Ic L3 Cb 8:00 12:00 8:00 12:00 8:00 12:00 Social Contact Network Simulation Engines

The Results: High Fidelity, High Resolution, and High Flexibility Models 34

Example Case Study using EpiSimdemics

Example 3: Cyber-Infrastructure for Complex System Research The problem: Translating HPC software to a user-centric problem solving environment, making HPC analytical capability available to domain expert who does not need to be an HPC experts. The Solution: Abstract the scientific workflow to a web-based problemsolving platform Hide the complexity of data preparation, job submission, resource scheduling, and simulation/analysis execution in HPC and data grid Let researchers and experts to focus on what problem to be solve instead of how to compute the problem

The DIDATIC/ISIS System Formulate Problem Select Models/Data Job Coordinator Simulation Engine Design Experiments Execute Experiments Graphical User Interface SimfraSpace Service Broker Analyze Results Draw Conclusions GUI Server Database Analytical Engine Recommend Policy Demo System URL: http://zia.vbi.vt.edu:8080/didactic/didactic.html

Lessons and Summary Computing, particular HPC, has been playing a central role in today s research. Parallel computing is becoming mainstream There are many challenges in applying HPC in a new research program User-centric Ubiquitous HPC and Cyberinfrastructure is a candidate solution Marquette ITS Research Computing Service commits to help you build the environment and explore new research opportunities 38