GPU EN CALCUL SCIENTIFIQUE
|
|
- Emily Walton
- 7 years ago
- Views:
Transcription
1 GPU EN CALCUL SCIENTIFIQUE Formation du Club des Affiliés du LAAS-CNRS, Toulouse, 22 mars 2016 Frédéric Parienté, Tesla Accelerated Computing, NVIDIA
2 GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO THE WORLD LEADER IN VISUAL COMPUTING 2
3 Time of accelerators has come FIVE THINGS TO REMEMBER NVIDIA is focused on co-design from top-to-bottom Accelerators are surging in supercomputing Machine learning is the next killer application for HPC Tesla platform leads in every way 3
4 It s time to start planning for the end of Moore s Law, and it s worth pondering how it will end, not just when. Robert Colwell Director, Microsystems Technology Office, DARPA 4
5 TESLA ACCELERATED COMPUTING PLATFORM Focused on Co-Design from Top to Bottom TFLOPS Fast GPU Engineered for High Throughput NVIDIA GPU x86 CPU Productive Programming Model & Tools Expert Co-Design Accessibility 3,0 2,5 K80 APPLICATION 2,0 1,5 1,0 0,5 0,0 M1060 M2090 K20 K40 Fast GPU + Strong CPU MIDDLEWARE SYS SW LARGE SYSTEMS PROCESSOR 5
6 125 ACCELERATORS SURGE IN WORLD S TOP SUPERCOMPUTERS Top500: # of Accelerated Supercomputers 100+ accelerated systems now on Top500 list 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers 25 Tesla supercomputers growing at 50% CAGR over past five years
7 70% OF TOP HPC APPS ACCELERATED INTERSECT360 SURVEY OF TOP APPS TOP 25 APPS IN SURVEY GROMACS SIMULIA Abaqus NAMD AMBER ANSYS Mechanical MSC NASTRAN SPECFEM3D LAMMPS NWChem LS-DYNA Schrodinger Gaussian GAMESS Top 10 HPC Apps 90% Accelerated Intersect360, Nov 2015 HPC Application Support for GPU Computing Top 50 HPC Apps 70% Accelerated ANSYS Fluent WRF VASP OpenFOAM CHARMM Quantum Espresso ANSYS CFX Star-CD CCSM COMSOL Star-CCM+ BLAST = All popular functions accelerated = Some popular functions accelerated = In development = Not supported 7
8 370 GPU-Accelerated Applications 8
9 TESLA BOOSTS DATACENTER THROUGHPUT $500M Datacenter, 4x increase in ROI 30% CPU Nodes 100% CPU Nodes 70% of Applications 5x Faster with GPU 70% GPU-Accelerated Nodes 1000 Jobs Per Day 3800 Jobs Per Day 9
10 NEXT-GEN SUPERCOMPUTERS ARE GPU-ACCELERATED SUMMIT SIERRA U.S. Dept. of Energy Pre-Exascale Supercomputers for Science NOAA New Supercomputer for Next-Gen Weather Forecasting IBM Watson Breakthrough Natural Language Processing for Cognitive Computing 10
11 MACHINE LEARNING HPC 1 ST CONSUMER KILLER-APP MICROSOFT CORTANA GOOGLE OPEN-SOURCE TENSORFLOW FACEBOOK MESSENGER FACIAL RECOGNITION MICROSOFT OPEN-SOURCE DMTK YOUTUBE CLICK-TO-BUY ADS GOOGLE PHOTO 11
12 TESLA PLATFORM LEADS IN EVERY WAY PROCESSOR INTERCONNECT SOFTWARE ECOSYSTEM 12
13 TESLA PLATFORM FOR HPC 13
14 Approximately a third of HPC systems operating today are equipped with accelerators and nearly half of all newly deployed systems have them. Source: ACCELERATED COMPUTING: A TIPPING POINT FOR HPC Intersect360 Nov
15 TESLA FOR SIMLUATION LIBRARIES DIRECTIVES LANGUAGES ACCELERATED COMPUTING TOOLKIT TESLA ACCELERATED COMPUTING 15
16 Tesla Accelerates Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid the perfect target for fighting the infection. Without GPU, the supercomputer would need to be 5x larger for similar performance. 16
17 TESLA K80 World s Fastest Accelerator for HPC & Data Analytics Dual CPU Server Tesla K80 Server 5x Faster AMBER Performance Simulation Time from 1 Month to 1 Week # of Days CUDA Cores 4992 Peak DP Peak DP w/ Boost GDDR5 Memory Bandwidth Power GPU Boost 1.9 TFLOPS 2.9 TFLOPS 24 GB 480 GB/s 300 W Dynamic AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond CPU: 2.3GHz. 64GB System Memory, CentOS
18 TESLA K80: 10X FASTER ON REAL-WORLD APPS 15x K80 CPU 10x 5x 0x Benchmarks Molecular Dynamics Quantum Chemistry Physics CPU: 12 cores, 2.70GHz. 64GB System Memory, CentOS 6.2 GPU: Single Tesla K80, Boost enabled 18
19 TESLA K80 BOOSTS DATA CENTER THROUGHPUT ACCELERATING KEY APPS 1/3 OF NODES ACCELERATED, 2X SYSTEM THROUGHPUT 15x Speed-up vs Dual CPU K80 CPU CPU-only System Accelerated System 10x 5x 0x QMCPACK LAMMPS CHROMA NAMD AMBER 100 Jobs Per Day 220 Jobs Per Day CPU: Dual E GHz, 64GB System Memory, CentOS 6.2 GPU: Single Tesla K80, Boost enabled 19
20 TESLA FOR VISUALIZATION IRAY OPTIX INDEX VISUALIZATION TOOLS FOR HPC TESLA ACCELERATED COMPUTING 20
21 VISUALIZE DATA INSTANTLY FOR FASTER SCIENCE CPU Supercomputer Viz Cluster Data Transfer Traditional Slower Time to Discovery Simulation- 1 Week Days Viz- 1 Day Time to Discovery = Months Multiple Iterations GPU-Accelerated Supercomputer Interactive Tesla Platform Faster Time to Discovery Visualize while you simulate/without data transfers Restart Simulation Instantly Multiple Iterations Time to Discovery = Weeks Scalable Flexible 21
22 VISUALIZATION-ENABLED SUPERCOMPUTERS Simulation + Visualization CSCS Piz Daint NCSA Blue Waters ORNL Titan Galaxy Formation Molecular Dynamics Cosmology 22
23 GROWING ADOPTION IN CLIMATE & WEATHER MeteoSwiss Deploys World s First Accelerated Weather Supercomputer 2x higher resolution for daily forecasts 14x more simulation with ensemble approach for medium-range forecasts NOAA Chooses Tesla To Improve Weather Forecast Research Develop global model with 3km resolution, five-fold increase from today s resolution Improved resolution requires 100x computational complexity 23
24 U.S. TO BUILD TWO FLAGSHIP SUPERCOMPUTERS Powered by the Tesla Platform PFLOPS Peak 10x in Scientific App Performance IBM POWER9 CPU + NVIDIA Volta GPU NVLink High Speed Interconnect 40 TFLOPS per Node, >3,400 Nodes 2017 Major Step Forward on the Path to Exascale 24
25 ACCELERATED COMPUTING DELIVERS 5X HIGHER ENERGY EFFICIENCY GB/s IBM POWER CPU Most Powerful Serial Processor NVIDIA NVLink Fastest CPU-GPU Interconnect NVIDIA Volta GPU Most Powerful Parallel Processor 25
26 CORAL: BUILT FOR GRAND SCIENTIFIC CHALLENGES Fusion Energy Role of material disorder, statistics, and fluctuations in nanoscale materials and systems Climate Change Study climate change adaptation and mitigation scenarios; realistically represent detailed features Biofuels Search for renewable and more efficient energy sources Astrophysics Radiation transport critical to astrophysics, laser fusion, atmospheric dynamics, and medical imaging Combustion Combustion simulations to enable the next gen diesel/biofuels to burn more efficiently Nuclear Energy Unprecedented high-fidelity radiation transport calculations for nuclear energy applications 26
27 TESLA PLATFORM FOR MACHINE LEARNING 27
28 THE BIG BANG IN MACHINE LEARNING DNN BIG DATA GPU Google s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs And it depends on these chips more than the larger tech universe realizes. 28
29 Tesla Revolutionizes Machine Learning GOOGLE BRAIN APPLICATION DEEP LEARNING BEFORE TESLA AFTER TESLA Cost $5,000K $200K Servers 1,000 Servers 16 Tesla Servers Energy 600 KW 4 KW Performance 1x 6x 29
30 THE AI RACE IS ON 30
31 NVIDIA GPU THE ENGINE OF DEEP LEARNING WATSON CHAINER THEANO MATCONVNET TENSORFLOW CNTK TORCH CAFFE NVIDIA CUDA ACCELERATED COMPUTING PLATFORM 31
32 Caffe Performance 6 M40+cuDNN4 CUDA BOOSTS DEEP LEARNING 5X IN 2 YEARS Performance K40 K40+cuDNN1 M40+cuDNN3 0 11/2013 9/2014 7/ /2015 AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu
33 AMAZING RATE OF IMPROVEMENT 100% Image Recognition ImageNet Accuracy IMAGENET 96% 100% Pedestrian Detection CALTECH CV-based DNN-based 100% Object Detection KITTI 95% 90% 85% 80% NVIDIA GPU 84% 88% 93% Accuracy 95% 90% 85% 80% 90% 80% 70% 60% Top Score 72% 66% 62% 75% 79% 83% 86% 87,5% 75% 75% 50% 55% NVIDIA DRIVENet 70% 72% 74% 70% 40% 39% 45% 65% % 11/2013 6/ /2014 7/2015 1/ % 33
34 CUDA FOR DEEP LEARNING DEVELOPMENT DEEP LEARNING SDK DIGITS cudnn cusparse cublas NCCL TITAN X DEVBOX GPU CLOUD 34
35 FACEBOOK S DEEP LEARNING MACHINE Purpose-Built for Deep Learning Training 2x Faster Training for Faster Deployment 2x Larger Networks for Higher Accuracy Powered by Eight Tesla M40 GPUs Open Rack Compliant Most of the major advances in machine learning and AI in the past few years have been contingent on tapping into powerful GPUs and huge data sets to build and train advanced models Serkan Piantino Engineering Director of Facebook AI Research 35
36 DESIGNED FOR AI COMPUTING AT LARGE SCALE Built on the NVIDIA Tesla Platform 8 Tesla M40s deliver aggregate 96 GB GDDR5 memory and 56 teraflops of SP performance Leverages world s leading deep learning platform to tap into frameworks such as Torch and libraries such as cudnn Operational Efficiency and Serviceability Free-air Cooled Design Optimizes Thermal and Power Efficiency Components swappable without tools Configurable PCI-e for versatility 36
37 13x Faster Training Caffe TESLA M40 World s Fastest Accelerator for Deep Learning Training Dual CPU Server GPU Server with 4x TESLA M40 Reduce Training Time from 5 Days to less than 10 Hours Number of Days CUDA Cores 3072 Peak SP GDDR5 Memory Bandwidth Power 7 TFLOPS 12 GB 288 GB/s 250W Note: Caffe benchmark with AlexNet, training 1.3M images with 90 epochs CPU server uses 2x Xeon E5-2699v3 CPU, 128GB System Memory, Ubuntu
38 Video Processing Stabilization and Enhancements Image Processing Resize, Filter, Search, Auto-Enhance 4x 5x TESLA M4 Highest Throughput Hyperscale Workload Acceleration Video Transcode 2x H.264 & H.265, SD & HD Machine Learning Inference 2x CUDA Cores 1024 Peak SP 2.2 TFLOPS GDDR5 Memory Bandwidth Form Factor Power 4 GB 88 GB/s PCIe Low Profile W Preliminary specifications. Subject to change. 38
39 TESLA PLATFORM FOR DEVELOPERS 39
40 10X GROWTH IN ACCELERATED COMPUTING ,000 CUDA Downloads 3 Million CUDA Downloads 27 CUDA Apps 370 CUDA Apps 60 Universities Teaching 800 Universities Teaching 4,000 Academic Papers 60,000 Academic Papers 6,000 Tesla GPUs 450,000 Tesla GPUs 77 Supercomputing Teraflops 54,000 Supercomputing Teraflops 40
41 HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions GPU 5% of Code Rest of Sequential CPU Code CPU + 41
42 COMMON PROGRAMMING MODELS ACROSS MULTIPLE CPUS Libraries AmgX cublas Compiler Directives Programming Languages / x86 42
43 GPU ACCELERATED LIBRARIES Drop-in Acceleration for Your Applications Domain-specific Deep Learning, GIS, EDA, Bioinformatics, Fluids NVBIO Triton Ocean SDK Visual Processing Image & Video Linear Algebra Dense, Sparse, Matrix NVIDIA CODEC SDK NVIDIA NPP NVIDIA cublas, cusparse Math Algorithms AMG, Templates, Solvers AmgX developer.nvidia.com/gpu-accelerated-libraries NVIDIA curand cusolver 43
44 OpenACC Simple Powerful Portable Fueling the Next Wave of Scientific Discoveries in HPC main() { <serial code> #pragma acc kernels //automatically runs on GPU { <parallel code> } } RIKEN Japan NICAM- Climate Modeling 7-8x Speed-Up 5% of Code Modified University of Illinois PowerGrid- MRI Reconstruction 70x Speed-Up 2 Days of Effort Developers using OpenACC
45 LS-DALTON Large-scale Application for Calculating High-accuracy Molecular Energies Lines of Code Modified Minimal Effort # of Weeks Required # of Codes to Maintain <100 Lines 1 Week 1 Source Big Performance LS-DALTON CCSD(T) Module Benchmarked on Titan Supercomputer (AMD CPU vs Tesla K20X) 12,0x OpenACC makes GPU computing approachable for domain scientists. Initial OpenACC implementation required only minor effort, and more importantly, no modifications of our existing CPU implementation. Janus Juul Eriksen, PhD Fellow qleap Center for Theoretical Chemistry, Aarhus University Speedup vs CPU 8,0x 4,0x 0,0x Alanine-1 13 Atoms Alanine-2 23 Atoms Alanine-3 33 Atoms 45
46 OPENACC DELIVERS TRUE PERFORMANCE PORTABILITY Paving the Path Forward: Single Code for All HPC Processors Speedup vs Single CPU Core 35x 30x 25x 20x 15x 10x 5x 0x Application Performance Benchmark CPU: MPI + OpenMP CPU: MPI + OpenACC CPU + GPU: MPI + OpenACC 30,3x 11,9x 7,6x 7,1x 7,1x 4,1x 4,3x 5,2x 5,3x 359.MINIGHOST (MANTEVO) NEMO (CLIMATE & OCEAN) CLOVERLEAF (PHYSICS) 359.miniGhost: CPU: Intel Xeon E v3, 2 sockets, 32-cores total, GPU: Tesla K80- single GPU NEMO: Each socket CPU: Intel Xeon E v3, 16 cores; GPU: NVIDIA K80 both GPUs CLOVERLEAF: CPU: Dual socket Intel Xeon CPU E v2, 20 cores total, GPU: Tesla K80 both GPUs 46
47 CUDA Super Simplified Memory Management Code void sortfile(file *fp, int N) { char *data; data = (char *)malloc(n); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); CPU Code CUDA 6 Code with Unified Memory void sortfile(file *fp, int N) { char *data; cudamallocmanaged(&data, N); fread(data, 1, N, fp); qsort<<<...>>>(data,n,1,compare); cudadevicesynchronize(); use_data(data); } free(data); } cudafree(data); 47
48 Numerical Packages MATLAB Mathematica LabView GPU DEVELOPER ECO-SYSTEM Debuggers & Profilers CUDA-GDB NV Visual Profiler NVIDIA Nsight Visual Studio Allinea TotalView Languages & Directives C C++ Fortran Java Python OpenACC OpenMP Cluster Tools GPUDirect RDMA Datacenter GPU Manager Libraries FFT BLAS SPARSE LAPACK NPP Video Imaging Consultants & Training OEM Solution Providers ANEO GPU Tech 48
49 DEVELOP ON GEFORCE, DEPLOY ON TESLA Designed for Developers & Gamers Available Everywhere developer.nvidia.com/cuda-gpus developer.nvidia.com/devbox Designed for the Data Center ECC 24x7 Runtime GPU Monitoring Cluster Management GPUDirect-RDMA Hyper-Q for MPI 3 Year Warranty Integrated OEM Systems, Professional Support 49
50
51 Sep 28-29, 2016 Amsterdam #GTC16 EUROPE S BRIGHTEST MINDS & BEST IDEAS DEEP LEARNING & ARTIFICIAL INTELLIGENCE SELF-DRIVING CARS VIRTUAL REALITY & AUGMENTED REALITY SUPERCOMPUTING & HPC GTC Europe is a two-day conference designed to expose the innovative ways developers, businesses and academics are using parallel computing to transform our world. 2 Days 800 Attendees 50+ Exhibitors 50+ Speakers 15+ Tracks 15+ Workshops 1-to-1 Meetings 51
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationDeep Learning and GPUs. Julie Bernauer
Deep Learning and GPUs Julie Bernauer GPU Computing 2 GPU Computing x86 3 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int n, const float *a, const float
More informationFAST DATA = BIG DATA + GPU. Carlo Nardone, Senior Solution Architect EMEA Enterprise FastData @ UNITO, March 21 th, 2016
FAST DATA = BIG DATA + GPU Carlo Nardone, Senior Solution Architect EMEA Enterprise FastData @ UNITO, March 21 th, 2016 GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO / EMBEDDED THE WORLD LEADER
More informationGPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
More informationSummit and Sierra Supercomputers:
Whitepaper Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy s New Pre-Exascale Systems November 2014 1 Contents New Flagship Supercomputers in U.S. to Pave Path to Exascale
More informationThe GPU Accelerated Data Center. Marc Hamilton, August 27, 2015
The GPU Accelerated Data Center Marc Hamilton, August 27, 2015 THE GPU-ACCELERATED DATA CENTER HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 Product design FROM ADVANCED RENDERING TO VIRTUAL
More informationNVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief
NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA changed the high performance computing (HPC) landscape by introducing its Fermibased GPUs that delivered
More informationDEEP LEARNING WITH GPUS
DEEP LEARNING WITH GPUS GEOINT 2015 Larry Brown Ph.D. June 2015 AGENDA 1 Introducing NVIDIA 2 What is Deep Learning? 3 GPUs and Deep Learning 4 cudnn and DiGiTS 5 Machine Learning & Data Analytics and
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationNVIDIA GPUs in the Cloud
NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationSense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
More informationPLGrid Infrastructure Solutions For Computational Chemistry
PLGrid Infrastructure Solutions For Computational Chemistry Mariola Czuchry, Klemens Noga, Mariusz Sterzel ACC Cyfronet AGH 2 nd Polish- Taiwanese Conference From Molecular Modeling to Nano- and Biotechnology,
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationNVIDIA HPC Update. Carl Ponder, PhD; cponder@nvidia.com; NVIDIA, Austin, TX, USA - Sr. Applications Engineer, Developer Technology Group
NVIDIA HPC Update Carl Ponder, PhD; cponder@nvidia.com; NVIDIA, Austin, TX, USA - Sr. Applications Engineer, Developer Technology Group Stan Posey; sposey@nvidia.com; NVIDIA, Santa Clara, CA, USA - HPC
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationlocuz.com HPC App Portal V2.0 DATASHEET
locuz.com HPC App Portal V2.0 DATASHEET Ganana HPC App Portal makes it easier for users to run HPC applications without programming and for administrators to better manage their clusters. The web-based
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationCORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended
More informationHigh-Performance Computing and Big Data Challenge
High-Performance Computing and Big Data Challenge Dr Violeta Holmes Matthew Newall The University of Huddersfield Outline High-Performance Computing E-Infrastructure Top500 -Tianhe-II UoH experience: HPC
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationThe Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist
The Top Six Advantages of CUDA-Ready Clusters Ian Lumb Bright Evangelist GTC Express Webinar January 21, 2015 We scientists are time-constrained, said Dr. Yamanaka. Our priority is our research, not managing
More information#OpenPOWERSummit. Join the conversation at #OpenPOWERSummit 1
XLC/C++ and GPU Programming on Power Systems Kelvin Li, Kit Barton, John Keenleyside IBM {kli, kbarton, keenley}@ca.ibm.com John Ashley NVIDIA jashley@nvidia.com #OpenPOWERSummit Join the conversation
More informationHow To Compare Amazon Ec2 To A Supercomputer For Scientific Applications
Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance
More informationDeep Learning GPU-Based Hardware Platform
Deep Learning GPU-Based Hardware Platform Hardware and Software Criteria and Selection Mourad Bouache Yahoo! Performance Engineering Group Sunnyvale, CA +1.408.784.1446 bouache@yahoo-inc.com John Glover
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More informationParallel Software usage on UK National HPC Facilities 2009-2015: How well have applications kept up with increasingly parallel hardware?
Parallel Software usage on UK National HPC Facilities 2009-2015: How well have applications kept up with increasingly parallel hardware? Dr Andrew Turner EPCC University of Edinburgh Edinburgh, UK a.turner@epcc.ed.ac.uk
More informationEvoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
More informationThe Value of High-Performance Computing for Simulation
White Paper The Value of High-Performance Computing for Simulation High-performance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows best-in-class companies
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationIntroduction to ACENET Accelerating Discovery with Computational Research May, 2015
Introduction to ACENET Accelerating Discovery with Computational Research May, 2015 What is ACENET? What is ACENET? Shared regional resource for... high-performance computing (HPC) remote collaboration
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationKeeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory
Keeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory with contributions from the Keeneland project team and partners 2 NSF Office of Cyber
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationUnleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
More informationScaling from Workstation to Cluster for Compute-Intensive Applications
Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationPerformance of HPC Applications on the Amazon Web Services Cloud
Cloudcom 2010 November 1, 2010 Indianapolis, IN Performance of HPC Applications on the Amazon Web Services Cloud Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, Harvey
More informationParallel Computing. Introduction
Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00
More informationNVLink High-Speed Interconnect: Application Performance
Whitepaper NVIDIA TM NVLink High-Speed Interconnect: Application Performance November 2014 1 Contents Accelerated Computing as the Path Forward for HPC... 3 NVLink: High-Speed GPU Interconnect... 3 Server
More informationDeep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu wuren@baidu.com
Deep Learning Meets Heterogeneous Computing Dr. Ren Wu Distinguished Scientist, IDL, Baidu wuren@baidu.com Baidu Everyday 5b+ queries 500m+ users 100m+ mobile users 100m+ photos Big Data Storage Processing
More informationCloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
More informationOn-Demand Supercomputing Multiplies the Possibilities
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationHigh Productivity Computing With Windows
High Productivity Computing With Windows Windows HPC Server 2008 Justin Alderson 16-April-2009 Agenda The purpose of computing is... The purpose of computing is insight not numbers. Richard Hamming Why
More informationbenchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
More informationApplying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
More informationThree Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
More information10- High Performance Compu5ng
10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance
More informationIntroduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationPerformance analysis of parallel applications on modern multithreaded processor architectures
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance analysis of parallel applications on modern multithreaded processor architectures Maciej Cytowski* a, Maciej
More informationHigh Performance Applications over the Cloud: Gains and Losses
High Performance Applications over the Cloud: Gains and Losses Dr. Leila Ismail Faculty of Information Technology United Arab Emirates University leila@uaeu.ac.ae http://citweb.uaeu.ac.ae/citweb/profile/leila
More informationSGI HPC Systems Help Fuel Manufacturing Rebirth
SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationWorkshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
More informationPanasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory
Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationHigh Performance Computing
High Parallel Computing Hybrid Program Coding Heterogeneous Program Coding Heterogeneous Parallel Coding Hybrid Parallel Coding High Performance Computing Highly Proficient Coding Highly Parallelized Code
More informationTHE WORLD LEADER IN VISUAL COMPUTING
THE WORLD LEADER IN VISUAL COMPUTING NVIDIA is the world leader in visual computing. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationCOMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More information5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model C99, C++, F2003 Compilers Optimizing Vectorizing Parallelizing Graphical parallel tools PGDBG debugger PGPROF profiler Intel, AMD, NVIDIA
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationPart I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationSOSCIP Platforms. SOSCIP Platforms
SOSCIP Platforms SOSCIP Platforms 1 SOSCIP HPC Platforms Blue Gene/Q Cloud Analytics Agile Large Memory System 2 SOSCIP Platforms Blue Gene/Q Platform 3 top500.org Rank Site System Cores Rmax (TFlop/s)
More informationJean-Pierre Panziera Teratec 2011
Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores
More informationScientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationIntroduction to the CUDA Toolkit for Building Applications. Adam DeConinck HPC Systems Engineer, NVIDIA
Introduction to the CUDA Toolkit for Building Applications Adam DeConinck HPC Systems Engineer, NVIDIA ! What this talk will cover: The CUDA 5 Toolkit as a toolchain for HPC applications, focused on the
More informationALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationGraphic Processing Units: a possible answer to High Performance Computing?
4th ABINIT Developer Workshop RESIDENCE L ESCANDILLE AUTRANS HPC & Graphic Processing Units: a possible answer to High Performance Computing? Luigi Genovese ESRF - Grenoble 26 March 2009 http://inac.cea.fr/l_sim/
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationOverview of HPC systems and software available within
Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster
More informationDr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS
More informationClusters with GPUs under Linux and Windows HPC
Clusters with GPUs under Linux and Windows HPC Massimiliano Fatica (NVIDIA), Calvin Clark (Microsoft) Hillsborough Room Oct 2 2009 Agenda Overview Requirements for GPU Computing Linux clusters Windows
More informationPRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationBuild GPU Cluster Hardware for Efficiently Accelerating CNN Training. YIN Jianxiong Nanyang Technological University jxyin@ntu.edu.
Build Cluster Hardware for Efficiently Accelerating CNN Training YIN Jianxiong Nanyang Technological University jxyin@ntu.edu.sg Visual Object Search Private Large-scale Visual Object Database Domain Specifi
More informationUsing the Windows Cluster
Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationPanasas: High Performance Storage for the Engineering Workflow
9. LS-DYNA Forum, Bamberg 2010 IT / Performance Panasas: High Performance Storage for the Engineering Workflow E. Jassaud, W. Szoecs Panasas / transtec AG 2010 Copyright by DYNAmore GmbH N - I - 9 High-Performance
More informationEnergy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez
Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationMedical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.
Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming
More informationOverview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationINTEL PARALLEL STUDIO XE EVALUATION GUIDE
Introduction This guide will illustrate how you use Intel Parallel Studio XE to find the hotspots (areas that are taking a lot of time) in your application and then recompiling those parts to improve overall
More information