NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief
|
|
- Dennis Rose
- 7 years ago
- Views:
Transcription
1 NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief
2 NVIDIA changed the high performance computing (HPC) landscape by introducing its Fermibased GPUs that delivered an impressive leap in performance and energy-efficiency while offering a parallel programming model, CUDA, as an extension to industry standard languages like C, C++, and Fortran. Now NVIDIA has introduced the new Tesla K20-K20X GPU Accelerators that further advance the HPC industry. Built on the revolutionary Kepler architecture, these accelerators redefine the standard for energy-efficient computing and feature innovative technologies like SMX, Hyper-Q, and Dynamic Parallelism to boost application performance by up to 10x. SMX: 3x More Performance Per Watt The new SMX (Next Generation Streaming Multiprocessor) is an architectural innovation designed from the ground-up to deliver high efficiency performance. With SMX at its core, Tesla K20/K20X accelerators deliver the industry s highest single and double precision performance teraflops and 1.31 teraflops respectively for Tesla K20X- at an unprecedented 93% computational efficiency. Teraflop Double Precision Performance (DGEMM) Xeon E5-2687WTesla M2090 (Fermi) Tesla K20X Teraflop Single Precision Performance 3.0 (SGEMM) Xeon E5-2687W Tesla M2090 Tesla K20X (Fermi) Hyper-Q: Easy Speed-up for Legacy MPI Codes Many legacy MPI codes don t generate enough work to fully occupy the GPU because they were written for CPU cores. Rather than requiring developers to refactor their codes to put more workload per MPI process, the Hyper-Q feature reduces efforts considerably because developers can now throw up to 32 MPI processes with small and medium-sized workloads at a shared GPU. Speedup vs. Dual K20 20x 15x 10x 5x CP2K-Quantum Chemistry K20 with Hyper-Q K20 without Hyper-Q 2.5x To illustrate the power of Hyper-Q, we picked a traditionally difficult code for GPUs called CP2K, a popular MPI-based quantum chemistry code, showing more than 2x performance improvement when 16 MPI ranks are run on the shared GPU. 0x Number of GPUs
3 Dynamic Parallelism: Simplifying Parallel Programming Dynamic Parallelism allows the GPU to operate more autonomously from the CPU by generating new work for itself at runtime, from inside a kernel. The concept is simple, but the impact is powerful: it can make programming easier, particularly for algorithms traditionally considered difficult such as divide-and-conquer problems. Relative Sorting Performance Quicksort Tesla K20X Performance Advantage over Sandy Bridge CPUs 4x 3x 2x 1x 2x Without Dynamic Parallelism With Dynamic Parallelism 0x To showcase its potential, we used Dynamic Parallelism on Quicksort, a well- Problem Size (Million of Elements) known algorithm for sorting methods, to reduce the lines of CUDA code in half while improving performance by 2x. K20X Benchmarks by Application Accelerating Key Scientific Applications by up to 10x Today, hundreds of applications take advantage of GPU acceleration, spanning all scientific disciplines and engineering domains, and the number of applications continues to grow. In the past year alone, the number of CUDA-accelerated applications has grown by over 60%. When Tesla K20 GPU Accelerators are added to servers with Sandy Bridge CPUs, CUDAenabled applications are typically accelerated up to 10x. The K20X benchmark below shows single node performance of leading applications in various science domains. CPU results: Dual socket E5-2687w; GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs *MATLAB results comparing one i7-2600k CPU vs. Tesla K20 GPU
4 Here s a closer look at three of the applications listed above. Chroma: High Energy & Nuclear Physics Chroma is used by scientists to test alternatives to the Standard Model of physics in order to develop a deeper understanding of the fundamental properties of matter and energy. The benchmark below show significant performance boost when adding one or two Tesla K20X GPU Accelerators to dual socket CPUs x128 Lattice Volume, BiCSTAB solver 20x 18.0x Relat iv e to 2x CP U 16x 14.6x 12x 8x 4x 0x 1.0x 7.5x 9.4x 2xCPU 2xCPU+ 2xCPU+ 2xCPU+ 2xCPU+ 1xM2090 2xM2090 1xK20X 2xK20X CPU results: Dual socket E5-2687w; GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs, 64GB per node SPECFEM3D: Earth Sciences Researchers use SPECFEM3D for a deeper understanding of the phenomena that shape seismic activity, thereby allowing engineers to assess potential hazards and to build structural countermeasures. This code was a Gordon Bell winner in 2008 prior to the GPU work, so it was a highly tuned code even before the recent work to accelerate on GPUs.
5 U ive elat R 16.0x x CP12.0x 8.0x 256x128 3D Spatial Discretization 5.4x 9.3x to 28.8x 4.0x 0.0x 1.0x 12.8x 2xCPU 2xCPU+ 2xCPU+ 2xCPU+ 2xCPU+ 1xM2090 2xM2090 1xK20X 2xK20X CPU results: Dual socket E5-2687w; GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs, 64GB per node AMBER: Molecular Dynamics Molecular dynamics (MD) allows the study of biological and chemical systems at the atomistic level on timescales from femtoseconds to milliseconds. Numerous software packages exist for conducting MD simulations of which one of the most widely used is AMBER. With the Tesla K20X GPU Accelerators, acceleration by up to 80% is observed on AMBER, over compared to Tesla M2090 GPUs. 10.0x SPFP-JAC_production_NVE Relative to 2x CPU 8.0x 6.0x 4.0x 2.0x 1.0x 3.4x 4.6x 7.1x 8.2x 0.0x 2xCPU 2xCPU+ 2xCPU+ 2xCPU+ 2xCPU+ 1xM2090 2xM2090 1xK20X 2xK20X CPU results: Dual socket E5-2687w; GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs, 64GB per node
6 Large Cluster Scaling for GPU-Accelerated Applications Large clusters around the world are increasingly deploying GPUs to accelerate their workload. In the Top500 list, from June 2011 to June 2012, number of GPU-accelerated systems grew by over 400%. Whether an application is GPU-accelerated or not, designing code to scale well across many nodes has never been an easy task. However, many GPU-accelerated applications are scaling well as developers work on extracting more parallelism by allocating more parallel work within a node and use MPI to distribute GPU workloads. WL-LSMS: Material Science WL-LSMS simulates the magnetic behavior of materials at the atomic and nanoscale in order to design lightweight, powerful magnetic components for highly efficient electric motors, generators, and magnetic storage devices. WL-LSMS was a Gordon Bell winner in 2009, so it was already a highly-tuned CPU code prior to the recent work to accelerate on GPUs. 40x 32 Atom Emsemble of Iron (Fe) Relative to a Single CPU only Node Cray XK7 -Tesla K20X 30x Cray XK7-CPU 20x 6 10x x 0x # Sockets QMCPACK: Material Science QMCPACK is used by researchers to develop new insights into condensed matter physics, materials science, and chemistry by using more physically accurate methods. For large systems on the order of 200 electrons or more, simulations that traditionally would take upwards of 12 hours or more can now run on the order of 3 hours per simulation.
7 Compute Efficiency Cray XK7 Tesla K20X Cray XK7 CPU 3x3x1 Graphite x # of Compute Nodes NAMD: Molecular Dynamics NAMD (Not (just) Another Molecular Dynamics program) is a free-of-charge molecular dynamics simulation package written using the Charm++ parallel programming model, noted for its parallel efficiency and often used to simulate large systems (millions of atoms). NAMD also scales well across many nodes accelerated with GPUs. ns/day 100x STMV Cray XK7 K20X Cray XK7 CPU 1.0 4x # of Compute Nodes For more information on Tesla K20/K20X GPU Accelerators, visit
8 Notice ALL INFORMATION PROVIDED IN THIS TECHNICAL BRIEF, INCLUDING COMMENTARY, OPINION, NIVIDA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMEN, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or system without express written approval of NVIDIA Corporation. Copyright 2012 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, Tesla, Kepler, and CUDA are trademarks and / or registered trademarks of NVIDIA Corporation. All company and product names are trademarks or registered trademarks of the respective owners with which they are associated. Features, pricing, availability, and specifications are all subject to change without notice. NOV
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationThree Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationNVIDIA VIDEO ENCODER 5.0
NVIDIA VIDEO ENCODER 5.0 NVENC_DA-06209-001_v06 November 2014 Application Note NVENC - NVIDIA Hardware Video Encoder 5.0 NVENC_DA-06209-001_v06 i DOCUMENT CHANGE HISTORY NVENC_DA-06209-001_v06 Version
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationNVIDIA GeForce GTX 750 Ti
Whitepaper NVIDIA GeForce GTX 750 Ti Featuring First-Generation Maxwell GPU Technology, Designed for Extreme Performance per Watt V1.1 Table of Contents Table of Contents... 1 Introduction... 3 The Soul
More informationHigh Performance Computing and Big Data: The coming wave.
High Performance Computing and Big Data: The coming wave. 1 In science and engineering, in order to compete, you must compute Today, the toughest challenges, and greatest opportunities, require computation
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationSummit and Sierra Supercomputers:
Whitepaper Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy s New Pre-Exascale Systems November 2014 1 Contents New Flagship Supercomputers in U.S. to Pave Path to Exascale
More informationNVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.
More informationGPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
More informationL20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
More informationwww.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
More informationNVIDIA Jetson TK1 Development Kit
Technical Brief NVIDIA Jetson TK1 Development Kit Bringing GPU-accelerated computing to Embedded Systems P a g e 2 V1.0 P a g e 3 Table of Contents... 1 Introduction... 4 NVIDIA Tegra K1 A New Era in Mobile
More informationSimulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
More informationIntel Solid-State Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationBinomial option pricing model. Victor Podlozhnyuk vpodlozhnyuk@nvidia.com
Binomial option pricing model Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 0.9 2007/03/19 vpodlozhnyuk Initial release 1.0 2007/04/05 Mharris
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationNVIDIA Tesla Compute Cluster Driver for Windows
NVIDIA Tesla Compute Cluster Driver for Windows v197.03 March 2010 Release Notes 01 NVIDIA TESLA COMPUTE CLUSTER DRIVER FOR WINDOWS This edition of Release 197 Notes describes the Release 197 Tesla Compute
More informationThe Fastest, Most Efficient HPC Architecture Ever Built
Whitepaper NVIDIA s Next Generation TM CUDA Compute Architecture: TM Kepler GK110 The Fastest, Most Efficient HPC Architecture Ever Built V1.0 Table of Contents Kepler GK110 The Next Generation GPU Computing
More informationNVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationPLGrid Infrastructure Solutions For Computational Chemistry
PLGrid Infrastructure Solutions For Computational Chemistry Mariola Czuchry, Klemens Noga, Mariusz Sterzel ACC Cyfronet AGH 2 nd Polish- Taiwanese Conference From Molecular Modeling to Nano- and Biotechnology,
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationIntroduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationTESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE
TESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE TESLA C2050 INSTALLATION GUIDE NVIDIA Tesla C2050/2070 TABLE OF CONTENTS TABLE OF CONTENTS Introduction 1 About This Guide 1 Minimum System Requirements
More informationNVIDIA GeForce Experience
NVIDIA GeForce Experience DU-05620-001_v02 October 9, 2012 User Guide TABLE OF CONTENTS 1 NVIDIA GeForce Experience User Guide... 1 About GeForce Experience... 1 Installing and Setting Up GeForce Experience...
More informationTechnical Brief. DualNet with Teaming Advanced Networking. October 2006 TB-02499-001_v02
Technical Brief DualNet with Teaming Advanced Networking October 2006 TB-02499-001_v02 Table of Contents DualNet with Teaming...3 What Is DualNet?...3 Teaming...5 TCP/IP Acceleration...7 Home Gateway...9
More informationOptimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationNVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
More informationNVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationWhitepaper. NVIDIA Miracast Wireless Display Architecture
Whitepaper NVIDIA Miracast Wireless Display Architecture 1 Table of Content Miracast Wireless Display Background... 3 NVIDIA Miracast Architecture... 4 Benefits of NVIDIA Miracast Architecture... 5 Summary...
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationThe Value of High-Performance Computing for Simulation
White Paper The Value of High-Performance Computing for Simulation High-performance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows best-in-class companies
More informationTESLA K20X GPU ACCELERATOR
TESLA K20 GPU ACCELERATOR BD-06397-001_v05 November 2012 Board Specification DOCUMENT CHANGE HISTORY BD-06397-001_v05 Version Date Authors Description of Change 01 July 31, 2012 GG, SM Preliminary Information
More informationRobust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code
Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code F. Rossi, S. Sinigardi, P. Londrillo & G. Turchetti University of Bologna & INFN GPU2014, Rome, Sept 17th
More informationTESLA M2075 DUAL-SLOT COMPUTING PROCESSOR MODULE
TESLA M2075 DUAL-SLOT COMPUTING PROCESSOR MODULE BD-05837-001_v01 May 2011 Board Specification DOCUMENT CHANGE HISTORY BD-05837-001_v01 Version Date Authors Description of Change 01 May 13, 2011 DS, SM
More informationTESLA M2050 AND TESLA M2070 DUAL-SLOT COMPUTING PROCESSOR MODULES
TESLA M2050 AND TESLA M2070 DUAL-SLOT COMPUTING PROCESSOR MODULES BD-05238-001_v02 April 2010 Board Specification DOCUMENT CHANGE HISTORY BD-05238-001_v02 Version Date Authors Description of Change 01
More informationQUADRO POWER GUIDELINES
QUADRO POWER GUIDELINES DA-07261-001_v03 July 2015 Application Note DOCUMENT CHANGE HISTORY DA-07261-001_v03 Version Date Authors Description of Change 01 June 6, 2014 VL, SM Initial Release 02 June 2,
More informationTechnical Brief. MediaShield Storage Technology: Confidently Storing Your Digital Assets. March 2007 TB-01760-001_v03
Technical Brief MediaShield Storage Technology: Confidently Storing Your Digital Assets March 2007 TB-01760-001_v03 Table of Contents Overview...1 Industry-Defined Multidish Techniques...3 RAID 0: Disk
More informationOpenCL Programming for the CUDA Architecture. Version 2.3
OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationDUAL MONITOR DRIVER AND VBIOS UPDATE
DUAL MONITOR DRIVER AND VBIOS UPDATE RN-07046-001_v01 September 2013 Release Notes DOCUMENT CHANGE HISTORY RN-07046-001_v01 Version Date Authors Description of Change 01 September 30, 2013 MD, SM Initial
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationINTEL PARALLEL STUDIO XE EVALUATION GUIDE
Introduction This guide will illustrate how you use Intel Parallel Studio XE to find the hotspots (areas that are taking a lot of time) in your application and then recompiling those parts to improve overall
More informationPower Monitoring At NCSA ISL and Blue Waters
Power Monitoring At NCSA ISL and Blue Waters Salishan 2013 Conference Craig P Steffen Blue Waters Science and Engineering Applications Suppport Group csteffen@ncsa.illinois.edu NCSA: Measuring Power Use
More informationNVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE. NVIDIA Performance Engineering Labs PerfEngDoc-SG-DSC01v1 March 2016
NVIDIA GRID DASSAULT V5/V6 SCALABILITY GUIDE NVIDIA Performance Engineering Labs PerfEngDoc-SG-DSC01v1 March 2016 HOW MANY USERS CAN I GET ON A SERVER? The purpose of this guide is to give a detailed analysis
More informationNVIDIA Tesla. GPU Computing Technical Brief. Version 1.0.0 5/24/07
NVIDIA Tesla GPU Computing Technical Brief Version 1.0.0 5/24/07 ii NVIDIA Tesla: GPU Compute Tech Brief, Version 1.0.0 Table of Contents Chapter 1. High-Performance Computing on the GPU... 1 1.1 High-Performance
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationTowards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com
Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to
More informationMonte-Carlo Option Pricing. Victor Podlozhnyuk vpodlozhnyuk@nvidia.com
Monte-Carlo Option Pricing Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 1. 3//7 vpodlozhnyuk Initial release Abstract The pricing of options
More informationThe PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
More informationGPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
More informationKeys to node-level performance analysis and threading in HPC applications
Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION
More informationIntel Platform and Big Data: Making big data work for you.
Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable
More informationIntroduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set
More informationParallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com
Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com April 2007 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release April 2007
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationIntelligent Business Operations
White Paper Intel Xeon Processor E5 Family Data Center Efficiency Financial Services Intelligent Business Operations Best Practices in Cash Supply Chain Management Executive Summary The purpose of any
More informationThe Hartree Centre helps businesses unlock the potential of HPC
The Hartree Centre helps businesses unlock the potential of HPC Fostering collaboration and innovation across UK industry with help from IBM Overview The need The Hartree Centre needs leading-edge computing
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationApplication Note. nforce 220/420 Platform Installing nforce Core Drivers Under Windows 98
Application Note nforce 220/420 Platform Installing nforce Core Drivers Under Windows 98 Installing Core Drivers Under Windows 98 This application note provides instructions to install the NVIDIA nforce
More informationTechnical Brief. NVIDIA nview Display Management Software. May 2009 TB-03966-001_v02
Technical Brief NVIDIA nview Display Management Software May 2009 TB-03966-001_v02 nview Display Management Software Introduction NVIDIA nview Display Management Software delivers maximum flexibility
More informationTESLA K20 GPU ACCELERATOR
TESLA K20 GPU ACCELERATOR BD-06455-001_v09 October 2014 Board Specification DOCUMENT CHANGE HISTORY BD-06455-001_v09 Version Date Authors Description of Change 01 August 1, 2012 GG, SM Preliminary Information
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationGPU Accelerated Signal Processing in OpenStack. John Paul Walters. Computer Scien5st, USC Informa5on Sciences Ins5tute jwalters@isi.
GPU Accelerated Signal Processing in OpenStack John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute jwalters@isi.edu Outline Motivation OpenStack Background Heterogeneous OpenStack GPU
More informationTHE AMD MISSION 2 AN INTRODUCTION TO AMD NOVEMBER 2014
THE AMD MISSION To be the leading designer and integrator of innovative, tailored technology solutions that empower people to push the boundaries of what is possible 2 AN INTRODUCTION TO AMD NOVEMBER 2014
More informationSGI HPC Systems Help Fuel Manufacturing Rebirth
SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications
More informationNVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS
NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v7.5 September 2015 Installation and Verification on Windows TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements...
More informationThe Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
More informationOn-Demand Supercomputing Multiplies the Possibilities
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server
More informationA quick tutorial on Intel's Xeon Phi Coprocessor
A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed
More informationNVIDIA GRID K1 GRAPHICS BOARD
GRAPHICS BOARD BD-06633-001_v02 January 2013 Board Specification DOCUMENT CHANGE HISTORY BD-06633-001_v02 Version Date Authors Description of Change 01 November 27, 2012 AP, SM Preliminary Information
More informationIBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationTESLA K10 GPU ACCELERATOR
TESLA K10 GPU ACCELERATOR BD-06280-001_v06 September 2012 Board Specification DOCUMENT CHANGE HISTORY BD-06280-001_v06 Version Date Authors Description of Change 01 April 10, 2012 GG, SM Preliminary Information
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More informationCUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)
CUDA in the Cloud Enabling HPC Workloads in OpenStack John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute jwalters@isi.edu With special thanks to Andrew Younge (Indiana Univ.) and Massimo
More informationREMOTE VISUALIZATION ON SERVER-CLASS TESLA GPUS
REMOTE VISUALIZATION ON SERVER-CLASS TESLA GPUS WP-07313-001_v01 June 2014 White Paper TABLE OF CONTENTS Introduction... 4 Challenges in Remote and In-Situ Visualization... 5 GPU-Accelerated Remote Visualization
More informationDeep Learning GPU-Based Hardware Platform
Deep Learning GPU-Based Hardware Platform Hardware and Software Criteria and Selection Mourad Bouache Yahoo! Performance Engineering Group Sunnyvale, CA +1.408.784.1446 bouache@yahoo-inc.com John Glover
More informationRecent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationNVIDIA GRID K2 GRAPHICS BOARD
GRAPHICS BOARD BD-06580-001_v02 January 2013 Board Specification DOCUMENT CHANGE HISTORY BD-06580-001_v02 Version Date Authors Description of Change 01 October 16, 2012 AP, SM Initial Release 02 January
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationData Centric Interactive Visualization of Very Large Data
Data Centric Interactive Visualization of Very Large Data Bruce D Amora, Senior Technical Staff Gordon Fossum, Advisory Engineer IBM T.J. Watson Research/Data Centric Systems #OpenPOWERSummit Data Centric
More informationParallel Computing. Introduction
Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00
More informationrevolutionising high performance computing
revolutionising high performance computing WITH supermicro SOLUTIONS USING nvidia tesla nvidia tesla Gpus are revolutionizing computing The high performance computing (HPC) industry s need for computation
More information