NVIDIA HPC Update. Carl Ponder, PhD; NVIDIA, Austin, TX, USA - Sr. Applications Engineer, Developer Technology Group
|
|
- Marylou Jennings
- 8 years ago
- Views:
Transcription
1 NVIDIA HPC Update Carl Ponder, PhD; NVIDIA, Austin, TX, USA - Sr. Applications Engineer, Developer Technology Group Stan Posey; sposey@nvidia.com; NVIDIA, Santa Clara, CA, USA - HPC Program Manager, ESM and CFD Solutions
2 IBM and NVIDIA CWO Review NVIDIA HPC Strategy for CWO GPU Model Developments Global: ECMWF ESCAPE Project, NOAA NGGPS Meso-scale: COSMO, WRF 2
3 GPU Motivation (I): Performance Trends Peak Double Precision FLOPS Peak Memory Bandwidth 8000 GFLOPS 1400 GB/s Volta ~5x 1200 ~5x ~15x 1000 Pascal Volta 4000 Pascal M1060 M2050 M2090 K20 K40 K M1060 M2050 M2090 K20 K40 K NVIDIA GPU x86 CPU NVIDIA GPU x86 CPU 3
4 GPU Motivation (II): Power Limiting Trends Challenges of Getting ECMWF s Weather Forecast Model (IFS) to the Exascale G. Mozdzynski, ECMWF 16 th HPC Workshop Conclusion: Current mathematics, algorithms and hardware will not be sufficient (by far) 4
5 GPU Motivation (III): ES Model Trends Higher grid resolution with manageable compute and energy costs Global atmosphere models from 10 km today to cloud-resolving scales of 1-3 km IFS? 128 km 16 km 10 km 3 km Source: Project Athena Increase in ensemble use and ensemble members to manage uncertainty Number of Jobs 10x - 50x Fewer model approximations (NH), more features (physics, chemistry, etc.) Accelerator technology identified as a cost-effective and practical approach to future NWP requirements 5
6 Tesla GPU Progression During Recent Years 2012 (Fermi) M (Kepler) K20X 2014 (Kepler) K (Kepler) K80 Kepler / Fermi Peak SP Peak SGEMM 1.03 TF 3.93 TF 2.95 TF 4.29 TF 3.22 TF 8.74 TF 4x Peak DP Peak DGEMM.515 TF 1.31 TF 1.22 TF 1.43 TF 1.33 TF 2.90 TF Memory size 6 GB 6 GB 12 GB 24 GB (12 each) 2x Mem BW (ECC off) 150 GB/s 250 GB/s 288 GB/s 480 GB/s (240 each) 2x Memory Clock 2.6 GHz 3.0 GHz 3.0 GHz PCIe Gen Gen 2 Gen 2 Gen 3 Gen 3 2x # of Cores (2496 each) 5x Board Power 235W 235W 235W 300W 0% 28% Note: Tesla K80 specifications are shown as aggregate of two GPUs on a single board 3x 6
7 Features of Pascal GPU Architecture 2016 NVLink Interconnect at 80 GB/s (Speed of CPU Memory) Stacked Memory 4x Higher Bandwidth ~1 TB/s 3x Capacity, 4x More Efficient Unified Memory Lower Development Effort (Available Today in CUDA6) TESLA GPU Stacked Memory NVLink GB/s HBM 1 TB/s UVM CPU (POWER,?) DDR GB/s DDR Memory 7
8 NVLink Interconnect Alternative to PCIe High speed interconnect for CPU-to-GPU and GPU-to-GPU Performance advantage over PCIe Gen-3 (5x to 12x faster) GPUs and CPUs share data structures at CPU memory speeds TESLA GPU Stacked Memory NVLink GB/s HBM 1 TB/s UVM CPU (POWER,?) DDR GB/s DDR Memory 8
9 NVLink Interconnect Alternative to PCIe GPU Density Increasing HPC Vendors & NVLink 2016 All will support GPU-to-GPU Several to support CPU-to-GPU Cray CS-Storm: 8 x K80 Dell C4130: 4 x K80 HP SL270: 8 x K80 9
10 GPU and CPU Platforms and Configurations NVLink PCIe PCIe X86, ARM64, POWER CPU X86, ARM64, POWER CPU 2014 Kepler GPU 2016 Pascal GPU More CPU Choices for GPU Acceleration 2014 NVLink Interconnect Alternative to PCIe 2016 NVLink POWER CPU 10
11 IBM Power + NVIDIA GPU Accelerated HPC Next-Gen IBM Supercomputers and Enterprise Servers OpenPOWER Foundation Open ecosystem built on Power Architecture Long term roadmap integration + POWER CPU Tesla GPU & 30+ more First GPU-Accelerated POWER-Based Systems Available Since
12 DOE CORAL Based on Power + GPU 2017 US DOE CORAL Systems Summit (ORNL) and Sierra (LLNL) Installation 2017 at ~150 PF each Nodes of POWER 9 + Tesla Volta GPUs NVLink Interconnect for CPUs + GPUs ORNL Summit System Approximately 3,400 total nodes Each node 40+ TF peak performance About 1/5 of total #2 Titan nodes (18K+) Same energy used as #2 Titan (27 PF) CORAL Summit System 5-10x Faster than Titan 1/5th the Nodes, Same Energy Use as Titan 12
13 Feature Comparison of Summit vs. Titan ~5-10x ~1/5x ~30x ~15x ~10-20x ~0x 13
14 CORAL Systems for US DOE Climate Model ACME ACME: Accelerated Climate Model for Energy DOE Labs: Argonne, LANL, LBL, LLNL, ORNL, PNNL, Sandia Co-design project using US DOE LCF systems Branch of CESM using CAM-SE (atm) and MPAS-O (ocn) ACME Project Roadmap (page 2 in report) Strategy Report 11 Jul CORAL: Summit, Sierra, Aurora Trinity: NERSC8 (Cori) TITAN OLCF [AMD + GPU] Mira ALCF [Blue GeneQ] Source: 14
15 GPU Programming Models for CPU Platforms Availability for x86 Today, POWER and ARM Under Development Libraries AmgX cublas Compiler Directives Programming Languages / x86 15
16 PGI Compilers for OpenPOWER + GPU Feature parity with PGI Compilers on Linux/x86+Tesla CUDA Fortran, OpenACC, OpenMP, CUDA C/C++ host compiler Integrated with IBM s optimized LLVM / Power code generator Limited access in 2015, Beta 1H 2016, Production in 2016 x86 Recompile 16
17 NVIDIA GPU Focus for Atmosphere Models Recent Developments Global: ECMWF ESCAPE Project, NOAA NGGPS Regional: COSMO, WRF 17
18 ECMWF Project for Exascale Weather Prediction Expectations towards Exascale: Weather and Climate Prediction P. Bauer, ECMWF, EESI-2015 EASCAPE HPC Goals: Standardized, highly optimized kernels on specialized hardware Overlap of communication and computation Compilers/standards supporting portability Project Approach: Co-design between domain scientists, computer scientists, and HPC vendors 18
19 ECMWF Initial IFS Investigation on Titan GPUs Challenges of Getting ECMWF s Weather Forecast Model (IFS) to the Exascale G. Mozdzynski, ECMWF 16 th HPC Workshop >3x >4x Legendre Transform Kernels Observe >4x for Titan K20X vs. XC core Ivy Bridge 19
20 NOAA NGGPS: Background on Program From: Next Generation HPC and Forecast Model Application Readiness at NCEP -by John Michalakes, NOAA NCEP; AMS, Phoenix, AZ, Jan
21 NOAA NGGPS: NH Model Dycore Candidates (5) Dycores FV3 & MPAS selected for Level-II evaluation NVIDIA, NOAA investigating potential for GPUs NOAA evaluations under HPC program SENA Software From: Next Generation HPC and Forecast Model Application Readiness at NCEP -by John Michalakes, NOAA NCEP; AMS, Phoenix, AZ, Jan 2015 Engineering for Novel Architectures (2015) 21
22 COSMO Regional Model on GPUs COSMO model fully implemented on GPUs by MeteoSwiss COSMO consortium approved GPU code for standard distribution (5.3) MeteoSwiss to deploy first-ever operational NWP on GPUs Successful daily test runs on GPUs since Dec 2014, operational in 2016 COSMO 2 (2.2 KM) COSMO 1 (1.1 KM) Reality Source: Increased Resolution = Quality Forecast 22
23 Operational NWP at MeteoSwiss Before GPUs Delivering performance in scientific simulations: present and future role of GPUs in supercomputers -by Dr. Thomas Schulthess, Director CSCS; NVIDIA GTC, March
24 New Operational NWP at MeteoSwiss With GPUs Climate Simulation and Numerical Weather Prediction Using GPUs -by Dr. Xavier Lapillonne, MeteoSwiss; EGU Assembly, Apr 2015 MeteoSwiss will deploy the first-ever operational NWP with GPUs (2016) Two identical systems purchased (Cray CS + K80 GPUs) to be installed during 2015 Successful daily run of pre-operational COSMO on GPUs since Dec 2014 at CSCS New Operational NWP Configurations: New configurations of higher resolution and ensemble prediction only possible with performance-per-energy gains from GPUs 2.2km increase to 1.1km 6.6km increase to 2.2km EPS Short range rapid update (grid=1158x774x80, 1.1km) forecast of 24 hrs generated every 3h (8 per day) Medium range ensemble (grid=582x390x60, 2.2km) forecast of 5+ days generated every 12h (2 per day) 24
25 MeteoSwiss GPU-Driven Weather Prediction MeteoSwiss COSMO NWP Configurations Since 2008 IFS from ECMWF 2 per day, 10 day forecast COSMO 7 (6.6 KM) 3 per day, 3 day forecast COSMO 2 (2.2 KM) 8 per day, 24 hr forecast Before GPUs MeteoSwiss COSMO NWP Configurations During 2016 IFS from ECMWF 2 per day, 10 day forecast COSMO E (2.2 KM) 2 per day, 5 day forecast COSMO 1 (1.1 KM) 8 per day, 24 hr forecast With GPUs New configurations of higher resolution and ensemble predictions possible owing to the performance-per-energy gains from GPUs X. Lapillonne, MeteoSwiss; EGU Assembly, Apr
26 COSMO Model Performance for K20X Oct 2014 Are OpenACC Directives the Easy Way to Port NWP Applications to GPUs? -by Dr. Xavier Lapillonne, MeteoSwiss; ECMWF 16 th HPC Workshop, Oct 2014 Details of Benchmark and CPU + GPU Configurations: COSMO-2 (2km) configuration, 24h run, grid = 520x350x60 System todi, XK7 at CSCS All results for 9 compute nodes with 9 x AMD CPU and 9 x K20X All results socket-to-socket CPU use of full 16 cores GPU use of 1 CPU core Physics that have OpenACC optimizations speedup 3.4X Assim, OpenACC on 1 core CPU 3.3X slower but < 15% of profile Total GPU speedup 2.7X 26
27 COSMO Model Performance for K20X Apr 2015 Climate Simulation and Numerical Weather Prediction Using GPUs -by Dr. Xavier Lapillonne, MeteoSwiss; EGU Assembly, Apr 2015 Details of Benchmark and CPU + GPU Configurations: COSMO-1 (1km) configuration, 24h run, grid = 1158x774x80 System Piz Daint, XC30 at CSCS All results for 70 compute nodes with 1 x SB CPU and 1 x K20X All results socket-to-socket CPU use of full 8 cores GPU use of 1 CPU core More substantial gains in energy per solution, but not reported Total GPU speedup 2.2X 27
28 COSMO Development on GPUs by MeteoSwiss Climate Simulation and Numerical Weather Prediction Using GPUs -by Dr. Xavier Lapillonne, MeteoSwiss; EGU Assembly, Apr 2015 Approach Implementation 28
29 References: COSMO Model on NVIDIA GPUs Towards GPU-accelerated Operational Weather Forecasting - Oliver Fuhrer (MeteoSwiss), NVIDIA GTC 2013, Mar 2013 Source: Delivering Performance in Scientific Simulations: Present and Future Role of GPUs in Supercomputers - Thomas Schulthess (CSCS), NVIDIA GTC 2014, Mar 2014 Source: Implementation of COSMO on Accelerators - Oliver Fuhrer (MeteoSwiss), ECMWF Scalability Workshop, Apr 2014 Source: Implementation of COSMO on Accelerators - Xavier Lapillonne (MeteoSwiss), ECMWF 16 th HPC Workshop, Oct 2014 Source: 29
30 WRF Progress Update Independent development projects of (I) CUDA and (II) OpenACC I. CUDA through funded collaboration with SSEC SSEC lead Dr. Bormin Huang, NVIDIA CUDA Fellow: research.nvidia.com/users/bormin-huang Objective: Hybrid WRF benchmark capability starting 2H 2015 II. OpenACC directives through collaboration with NCAR MMM Objective: NCAR GPU interest in Fortran version to host on distribution site \ 30
31 Project Status (I): NVIDIA-SSEC CUDA WRF Objective to provide Q3 benchmark capability of hybrid CPU+GPU WRF SSEC ported 20+ modules to CUDA across 5 physics types in WRF: Cloud MP = 11; Radiation = 4; Surface Layer = 3; Planetary BL = 1; Cumulus = 1 Project to integrate CUDA modules into latest full model WRF SSEC development began 2010, based on WRF 3.3/3.4 vs. latest release SSEC about publications so module development was isolated from WRF model Start with existing modules most common across target customer s input Objective: speedups measured on individual modules within full model run NVIDIA SSEC Project Success \ Target modules achieve published GPU performance within full WRF model run Final clean-up and performance studies underway, real benchmarks in 2H 2015 TQI/SSEC to fund/port all modules to CUDA towards a full WRF model on GPUs Focus on additional physics modules, and ongoing WRF-ARW dycore development 31
32 SSEC Speedups of WRF Physics Modules * Modules for initial NVIDIA-funded integration project WRF Module GPU Speedup (w-wo I/O) Technical Paper Publication Kessler MP 70x / 816x J. Comp. & GeoSci., 52, , 2012 Purdue-Lin MP 156x / 692x SPIE: doi: / WSM 3-class MP 150x / 331x WSM 5-class MP 202x / 350x JSTARS, 5, , 2012 Eta MP 37x / 272x SPIE: doi: / WSM 6-class MP 165x / 216x Submitted to J. Comp. & GeoSci. Goddard GCE MP 348x / 361x Accepted for publication in JSTARS Thompson MP * * * 76x / 153x SBU 5-class MP 213x / 896x JSTARS, 5, , 2012 WDM 5-class MP 147x / 206x WDM 6-class MP 150x / 206x J. Atmo. Ocean. Tech., 30, 2896, 2013 RRTMG LW 123x / 127x JSTARS, 7, , 2014 RRTMG SW 202x / 207x Submitted to J. Atmos. Ocean. Tech. Goddard SW 92x / 134x JSTARS, 5, , 2012 Dudhia SW MYNN SL TEMF SL 19x / 409x 6x / 113x 5x / 214x Thermal Diffusion LS 10x / 311x Submitted to JSATRS * * * * YSU PBL 34x / 193x Submitted to GMD Hybrid WRF Customer Benchmark Capability Starting in 2H 2015 Hardware and Benchmark Case CPU: Xeon Core-i7 3930K,1 core use; Benchmark: CONUS 12 km for 24 Oct 24; 433 x 308, 35 levels 32
33 NVIDIA-SSEC Phase-I Priority on 6 Modules WRF Module NCEP HRRR NCEP NWS-1 NCAR KISTI KAUST Cloud MP ~20% Rad ~6% PBL WSM 5-class MP WSM 6-class MP Thompson MP Dudhia Radiation RRTMG Radiation YSU Planetary BL NOTE: New NCAR global model MPAS uses WRF physics: WSM6, RRTMG, and YSU 33
34 Project Status (II): NVIDIA-NCAR OpenACC WRF Objective to provide benchmark capability of hybrid CPU+GPU WRF NVIDIA ported 30+ routines to OpenACC for both dynamics and physics: List provided on next page Project to integrate OpenACC modules with full model WRF release Open source project with plans for NCAR hosting on WRF trunk distribution site Objective to minimize code refactoring for scientists readability and acceptance \ NVIDIA - NCAR Project Success Several performance sensitive routines of dynamics and physics are completed This includes advection routines for dynamics; expensive microphysics schemes Hybrid WRF performance using single socket CPU + GPU faster than CPU-only Incremental improvements as more OpenACC code is developed for execution on GPU 34
35 WRF Modules Available in OpenACC Project to integrate CUDA modules into latest full model WRF Several dynamics routines including all of advection Several physics schemes: Planetary boundary layer YSU, GWDO Cumulus KFETA Microphysics Kessler, Morrison, Thompson Radiation RRTM Surface physics Noah \ 35
36 Summary Highlights NVIDIA observes strong NWP community interest in GPU acceleration Appeal of new technologies: Pascal, NVLink, more CPU platform choices NVIDIA applications engineering collaboration in several model projects Investments in OpenACC: PGI release of 15; Continued Cray collaborations GPU progress for several models we examined a few of these MeteoSwiss developments towards COSMO operational NWP on GPUs ECMWF developments of IFS and launch of H2020 ESCAPE project NVIDIA HPC technology behind the next-gen pre-exascale systems New GPU technologies for DOE CORAL program and ACME climate model 36
37 Thank You Carl Ponder, PhD; NVIDIA, Austin, TX, USA
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationSummit and Sierra Supercomputers:
Whitepaper Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy s New Pre-Exascale Systems November 2014 1 Contents New Flagship Supercomputers in U.S. to Pave Path to Exascale
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationThe GPU Accelerated Data Center. Marc Hamilton, August 27, 2015
The GPU Accelerated Data Center Marc Hamilton, August 27, 2015 THE GPU-ACCELERATED DATA CENTER HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 Product design FROM ADVANCED RENDERING TO VIRTUAL
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More information#OpenPOWERSummit. Join the conversation at #OpenPOWERSummit 1
XLC/C++ and GPU Programming on Power Systems Kelvin Li, Kit Barton, John Keenleyside IBM {kli, kbarton, keenley}@ca.ibm.com John Ashley NVIDIA jashley@nvidia.com #OpenPOWERSummit Join the conversation
More informationAppro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationNVIDIA GPUs in the Cloud
NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationSRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center
SRNWP Workshop HP Solutions and Activities in Climate & Weather Research Michael Riedmann European Performance Center Agenda A bit of marketing: HP Solutions for HPC A few words about recent Met deals
More informationJean-Pierre Panziera Teratec 2011
Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores
More informationRWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
More informationNVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief
NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA changed the high performance computing (HPC) landscape by introducing its Fermibased GPUs that delivered
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationParallel Computing. Introduction
Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00
More informationGPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
More informationCOMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More information1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations
Bilag 1 2013-03-12/RB DeIC (DCSC) Scientific Computing Installations DeIC, previously DCSC, currently has a number of scientific computing installations, distributed at five regional operating centres.
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationNVLink High-Speed Interconnect: Application Performance
Whitepaper NVIDIA TM NVLink High-Speed Interconnect: Application Performance November 2014 1 Contents Accelerated Computing as the Path Forward for HPC... 3 NVLink: High-Speed GPU Interconnect... 3 Server
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationOptimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
More informationWorkshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
More information5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model C99, C++, F2003 Compilers Optimizing Vectorizing Parallelizing Graphical parallel tools PGDBG debugger PGPROF profiler Intel, AMD, NVIDIA
More informationExperiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationBarry Bolding, Ph.D. VP, Cray Product Division
Barry Bolding, Ph.D. VP, Cray Product Division 1 Corporate Overview Trends in Supercomputing Types of Supercomputing and Cray s Approach The Cloud The Exascale Challenge Conclusion 2 Slide 3 Seymour Cray
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
More informationPerformance Engineering of the Community Atmosphere Model
Performance Engineering of the Community Atmosphere Model Patrick H. Worley Oak Ridge National Laboratory Arthur A. Mirin Lawrence Livermore National Laboratory 11th Annual CCSM Workshop June 20-22, 2006
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationLinux Cluster Computing An Administrator s Perspective
Linux Cluster Computing An Administrator s Perspective Robert Whitinger Traques LLC and High Performance Computing Center East Tennessee State University : http://lxer.com/pub/self2015_clusters.pdf 2015-Jun-14
More informationHPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development
HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development Cray Solutions for the Earth Sciences Why Cray? Cray solutions support: Range of modeling capabilities
More informationIS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration
IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationPerformance analysis of parallel applications on modern multithreaded processor architectures
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance analysis of parallel applications on modern multithreaded processor architectures Maciej Cytowski* a, Maciej
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationACANO SOLUTION VIRTUALIZED DEPLOYMENTS. White Paper. Simon Evans, Acano Chief Scientist
ACANO SOLUTION VIRTUALIZED DEPLOYMENTS White Paper Simon Evans, Acano Chief Scientist Updated April 2015 CONTENTS Introduction... 3 Host Requirements... 5 Sizing a VM... 6 Call Bridge VM... 7 Acano Edge
More informationUnleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationHPC Programming Framework Research Team
HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro
More informationKriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
More informationVisit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory
Visit to the National University for Defense Technology Changsha, China Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 3, 2013 On May 28-29, 2013, I had the opportunity to attend
More informationHank Childs, University of Oregon
Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for
More informationCluster Monitoring and Management Tools RAJAT PHULL, NVIDIA SOFTWARE ENGINEER ROB TODD, NVIDIA SOFTWARE ENGINEER
Cluster Monitoring and Management Tools RAJAT PHULL, NVIDIA SOFTWARE ENGINEER ROB TODD, NVIDIA SOFTWARE ENGINEER MANAGE GPUS IN THE CLUSTER Administrators, End users Middleware Engineers Monitoring/Management
More informationHPC-related R&D in 863 Program
HPC-related R&D in 863 Program Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Aug. 27, 2010 Outline The 863 key project on HPC and Grid Status and Next 5 years 863 efforts on
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationGPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
More informationTowards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com
Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to
More informationExperiences with Tools at NERSC
Experiences with Tools at NERSC Richard Gerber NERSC User Services Programming weather, climate, and earth- system models on heterogeneous mul>- core pla?orms September 7, 2011 at the Na>onal Center for
More informationJeff Wolf Deputy Director HPC Innovation Center
Public Presentation for Blue Gene Consortium Nov. 19, 2013 www.hpcinnovationcenter.com Jeff Wolf Deputy Director HPC Innovation Center This work was performed under the auspices of the U.S. Department
More informationSYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014
SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance
More informationAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels Jiannan Ouyang, Brian Kocoloski, John Lange The Prognostic Lab @ University of Pittsburgh Kevin Pedretti Sandia National Laboratories HPDC 2015
More informationPart I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationInterconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, June 2016
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, June 2016 Mellanox Leadership in High Performance Computing Most Deployed Interconnect in High Performance
More informationStovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.
Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster
More informationReview of SC13; Look Ahead to HPC in 2014. Addison Snell addison@intersect360.com
Review of SC13; Look Ahead to HPC in 2014 Addison Snell addison@intersect360.com New at Intersect360 Research HPC500 user organization, www.hpc500.com Goal: 500 users worldwide, demographically representative
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
More informationSoftware and Performance Engineering of the Community Atmosphere Model
Software and Performance Engineering of the Community Atmosphere Model Patrick H. Worley Oak Ridge National Laboratory Arthur A. Mirin Lawrence Livermore National Laboratory Science Team Meeting Climate
More informationEmerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationSR-IOV: Performance Benefits for Virtualized Interconnects!
SR-IOV: Performance Benefits for Virtualized Interconnects! Glenn K. Lockwood! Mahidhar Tatineni! Rick Wagner!! July 15, XSEDE14, Atlanta! Background! High Performance Computing (HPC) reaching beyond traditional
More informationASSESSMENT OF THE CAPABILITY OF WRF MODEL TO ESTIMATE CLOUDS AT DIFFERENT TEMPORAL AND SPATIAL SCALES
16TH WRF USER WORKSHOP, BOULDER, JUNE 2015 ASSESSMENT OF THE CAPABILITY OF WRF MODEL TO ESTIMATE CLOUDS AT DIFFERENT TEMPORAL AND SPATIAL SCALES Clara Arbizu-Barrena, David Pozo-Vázquez, José A. Ruiz-Arias,
More informationECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationKeeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory
Keeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory with contributions from the Keeneland project team and partners 2 NSF Office of Cyber
More informationSimulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationCORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended
More informationEnergy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez
Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built
More informationOverview of HPC systems and software available within
Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster
More informationIntroduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber
Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )
More informationRecent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
More informationCFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
More informationThe Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist
The Top Six Advantages of CUDA-Ready Clusters Ian Lumb Bright Evangelist GTC Express Webinar January 21, 2015 We scientists are time-constrained, said Dr. Yamanaka. Our priority is our research, not managing
More informationManaging GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano
Managing GPUs by Slurm Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano Agenda General Slurm introduction Slurm@CSCS Generic Resource Scheduling for GPUs Resource
More informationScalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics
More informationClimate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model
ANL/ALCF/ESP-13/1 Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model ALCF-2 Early Science Program Technical Report Argonne Leadership Computing Facility About Argonne
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationAltix Usage and Application Programming. Welcome and Introduction
Zentrum für Informationsdienste und Hochleistungsrechnen Altix Usage and Application Programming Welcome and Introduction Zellescher Weg 12 Tel. +49 351-463 - 35450 Dresden, November 30th 2005 Wolfgang
More informationPCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
More informationNew Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC
New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking
More information