GPU System Architecture. Alan Gray EPCC The University of Edinburgh
|
|
- Marcia Hardy
- 8 years ago
- Views:
Transcription
1 GPU System Architecture EPCC The University of Edinburgh
2 Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems From desktop to supercomputer Accelerator Technology NVIDIA AMD Intel Example GPU machines
3 Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems From desktop to supercomputer Accelerator Technology NVIDIA AMD Intel Example GPU machines
4 The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers got faster by increasing the frequency Voltage was decreased to keep power reasonable. Now, voltage cannot be decreased any further 1s and 0s in a system are represented by different voltages Reducing overall voltage further would reduce this difference to a point where 0s and 1s cannot be properly distinguished
5 Instead, performance increases can be achieved through exploiting parallelism Need a chip which can perform many parallel operations every clock cycle Many cores and/or many operations per core Want to keep power/core as low as possible Much of the power expended by CPU cores is on functionality not generally that useful for HPC Branch prediction, out-of-order execution etc
6 So, for HPC, we want chips with simple, low power, number-crunching cores But we need our machine to do other things as well as the number crunching Run an operating system, perform I/O, set up calculation etc Solution: Hybrid system containing both CPU and accelerator chips
7 It costs a huge amount of money to design and fabricate new chips Not feasible for relatively small HPC market Luckily, over the last few years, Graphics Processing Units (GPUs) have evolved for the highly lucrative gaming market And largely possess the right characteristics for HPC Many number-crunching cores GPU vendors have tailored existing GPU architectures to the HPC market
8 GPUs in HPC November 2011 Top500 list: GPUs (next generation will have GPUs) GPUs GPUs GPUs now firmly established in HPC industry
9 Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems From desktop to supercomputer Accelerator Technology NVIDIA AMD Intel Example GPU machines
10 AMD 12-core CPU = compute unit (= core) Not much space on CPU is dedicated to compute
11 NVIDIA Fermi GPU = compute unit (= SM = 32 CUDA cores) GPU dedicates much more space to compute At expense of caches, controllers, sophistication etc
12 Memory For many applications, performance is very sensitive to memory bandwidth CPUs use DRAM GPUs use Graphics DRAM Graphics memory: much higher bandwidth than standard CPU memory
13 GPU vs CPU: Theoretical Peak capabilities NVIDIA Fermi AMD Magny-Cours (6172) Cores 448 (1.15GHz) 12 (2.1GHz) Operations/cycle 1 4 DP Performance (peak) 515 GFlops 101 GFlops Memory Bandwidth (peak) 144 GB/s 27.5 GB/s For these particular products, GPU theoretical advantage is ~5x for both compute and main memory bandwidth Application performance very much depends on application Typically a small fraction of peak Depends how well application is suited to/tuned for architecture
14 Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems From desktop to supercomputer Accelerator Technology NVIDIA AMD Intel Example GPU machines
15 GPUs cannot be used instead of CPUs They must be used together GPUs act as accelerators Responsible for the computationally expensive parts of the code DRAM GDRAM CPU GPU I/O PCIe I/O
16 Scaling to larger systems I/O PCIe I/O Interconnect CPU DRAM GPU + GDRAM Interconnect allows multiple nodes to be connected CPU I/O PCIe GPU + GDRAM I/O Can have multiple CPUs and GPUs within each workstation or shared memory node E.g. 2 CPUs +2 GPUs (above) CPUs share memory, but GPUs do not
17 GPU Accelerated Supercomputer GPU/CPU Node GPU/CPU Node GPU/CPU Node GPU/CPU Node GPU/CPU Node GPU/CPU Node GPU/CPU Node GPU/CPU Node GPU/CPU Node
18 To utilise a GPU, programs must Contain parts targeted at host CPU (most lines of source code) Contain parts targeted at GPU (key computational kernels) Manage data transfers between distinct CPU and GPU memory spaces Traditional language (e.g C/Fortran) does not provide these facilities To run on multiple GPUs in parallel Normally use one host CPU core (thread) per GPU Program manages communication between host CPUs in the same fashion as traditional parallel programs e.g. MPI and/or OpenMP (latter shared memory node only)
19 Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems From desktop to supercomputer Accelerator Technology NVIDIA AMD Intel Example GPU machines
20 Latest Technology NVIDIA Tesla HPC specific GPUs have evolved from GeForce series Intel AMD Knights Corner many-core x86 chip is like hybrid between a GPU and many-core CPU FireStream HPC specific GPUs have evolved from (ATI) Radeon series
21 NVIDIA Tesla Series GPU Chip partitioned into Streaming Multiprocessors (SMs) Multiple cores per SM Sophisticated thread engine Shared L2 cache
22 NVIDIA GPU SM Less scheduling units than cores Threads are scheduled in groups of 32, called a warp Threads within a warp always execute the same instruction in lock-step (on different data elements) Configurable L1 Cache/ Shared Memory
23 NVIDIA Tesla Series Tesla 1060 Fermi 2050 Fermi 2070 Fermi 2090 Kepler K20 (not yet released) CUDA cores SP Performance 933 GFlops 1030 GFlops 1030 GFlops 1330 GFlops DP Performance 78 GFlops 515 GFlops 515 GFlops 665 GFlops?? Memory Bandwidth 102 GB/s 144 GB/s 144 GB/s 178 GB/s?? Memory 4 GB 3 GB 6 GB 6 GB?? Variety of packaging solutions??
24 NVIDIA Roadmap
25 Programming NVIDIA Tesla CUDA C: proprietary interface to the architecture. Extensions to the C language which allow interfacing to the hardware Now relatively mature, and supported by comprehensive documentation, user forums, etc Fortran support provided by third party PGI Cuda Fortran compiler OpenCL Cross platform API Similar in design to CUDA, but lower level and not so mature Directives based approach OpenMP style directives - abstract complexities away from programmer Several products with varying syntax etc. Still lack maturity OpenMP committee currently considering adopting accelerators as part of OpenMP standard With participation from all main vendors plus others such as EPCC
26 AMD FirePro AMD acquired ATI in 2006 AMD FirePro series: derivative of Radeon chips with HPC enhancements FirePro W9000 Released Aug 2012 Peak 1 TFLOP (double precision) Peak 4 TFLOPS (single precision) 6GB GDDR5 SDRAM Now with full ECC support Programming: OpenCL Packaging: just cards for workstations at the moment. Server solutions may follow.
27 Intel Xeon Phi Intel Larrabee: A Many-Core x86 Architecture for Visual Computing Release delayed such that the chip missed competitive window of opportunity. Larrabee was not released as a competitive product, but instead a platform for research and development (Knight s Ferry). Knights Corner derivative chip Many Integrated Cores (MIC) architecture. No longer aimed at graphics market Instead Accelerating Science and Discovery more than 50 cores Release expected in 2012 Now branded as Intel Xeon Phi Hybrid between GPU and many-core CPU
28 Intel Xeon Phi CPU-like x86 cores: same instruction set as standard CPUs Relatively small number cores/chip Fully cache coherent In principle could support an OS GPU-like Simple cores, lack sophistication e.g. no out-oforder execution Each core contains 512-bit vector processing unit (16 SP or 8 DP numbers) Supports multithreading Not expected to run OS, but used as accelerator (at least initially) Programming: Possible to use standard models such as OpenMP but still need additional directives to manage data
29 Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems From desktop to supercomputer Accelerator Technology NVIDIA AMD Intel Example GPU machines
30 DIY GPU Workstation Just need to slot GPU card into PCI-e Need to make sure there is enough space and power in workstation
31 GPU Servers Several vendors offer GPU Servers Example Configuration: 4 GPUs plus 2 (multi-core) CPUs Multiple servers can be connected via interconnect
32 EPCC HECToR GPU Machine EPCC own a GPU System which is partly available to UK researchers As part of the HECToR UK National Service Comprises 3 GPU servers, each with 4 NVIDIA Fermi GPUs Connected via infiniband interconnect Plus another node with single AMD Firestream GPU Plus a front end with a single NVIDIA Fermi More details at HECToRGPU.php
33 Cray XK6 The Cray XK6 supercomputer is a trifecta of scalar, network and many-core innovation. It combines Cray s proven Gemini interconnect, AMD s leading multi-core scalar processors and NVIDIA s powerful manycore GPU processors to create a true, productive hybrid supercomputer. Each compute node contains 1 CPU + 1 GPU Can scale up to 500,000 nodes
34 Cray XK6 Compute Node XK6 Compute Node Characteristics AMD Series 6200 (Interlagos) NVIDIA Tesla X2090 PCIe Gen2 Host Memory 16 or 32GB 1600 MHz DDR3 NVIDIA Tesla X2090 Memory 6GB GDDR5 capacity Gemini High Speed Interconnect Upgradeable to future GPUs H T 3 H T 3 Z Y X
35 Cray XK6 Compute Blade Compute Blade: 4 Compute Nodes 4 CPUs (middle) + 4 GPUs (right) + 2 interconnect chips (left) (2 compute nodes share a single interconnect chip)
36 Summary GPUs have higher compute and memory bandwidth capabilities than CPUs GPUs are not used alone, but work in tandem with CPUs Leading to complications with task decomposition and memory management GPU accelerated systems scale from simple workstations to largescale supercomputers Traditional languages are not enough to program such systems Current/imminent accelerator products from NVIDIA,AMD and Intel NVIDIA are market leaders at the moment, with the proprietary CUDA the most common programming method AMD largely rely on the cross-platform OpenCL Intel will enter the market later this year with a hybrid CPU/GPU product
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationA quick tutorial on Intel's Xeon Phi Coprocessor
A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationPCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
More informationIntroduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it
t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationScalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics
More informationGPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
More informationLecture 1. Course Introduction
Lecture 1 Course Introduction Welcome to CSE 262! Your instructor is Scott B. Baden Office hours (week 1) Tues/Thurs 3.30 to 4.30 Room 3244 EBU3B 2010 Scott B. Baden / CSE 262 /Spring 2011 2 Content Our
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationExperiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationOverview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationEnergy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez
Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built
More informationGPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationIntroduction to GPU Computing
Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture
More informationThe High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
More informationKalray MPPA Massively Parallel Processing Array
Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationwww.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
More informationEnabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationLecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU
More information~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
More informationGPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2
GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,
More informationRWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
More informationHP Workstations graphics card options
Family data sheet HP Workstations graphics card options Quick reference guide Leading-edge professional graphics February 2013 A full range of graphics cards to meet your performance needs compare features
More informationOptimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
More informationHow To Build An Ark Processor With An Nvidia Gpu And An African Processor
Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced
More informationUsing the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial
Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Bill Barth, Kent Milfeld, Dan Stanzione Tommy Minyard Texas Advanced Computing Center Jim Jeffers, Intel June 2013, Leipzig, Germany
More informationEnabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationExascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation
Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
More informationCORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended
More informationGPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationHow To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationOverview of HPC systems and software available within
Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster
More informationVisit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory
Visit to the National University for Defense Technology Changsha, China Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 3, 2013 On May 28-29, 2013, I had the opportunity to attend
More informationOpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
More informationAssessing the Performance of OpenMP Programs on the Intel Xeon Phi
Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum
More informationLecture 1: an introduction to CUDA
Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationEmerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest
More informationGPU Hardware CS 380P. Paul A. Navrá7l Manager Scalable Visualiza7on Technologies Texas Advanced Compu7ng Center
GPU Hardware CS 380P Paul A. Navrá7l Manager Scalable Visualiza7on Technologies Texas Advanced Compu7ng Center with thanks to Don Fussell for slides 15-28 and Bill Barth for slides 36-55 CPU vs. GPU characteris7cs
More informationNVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?
NVIDIA GRID OVERVIEW Imagine if responsive Windows and rich multimedia experiences were available via virtual desktop infrastructure, even those with intensive graphics needs. NVIDIA makes this possible
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationSUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE
SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks
More informationIntroduction to GPU Architecture
Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three
More informationSystem Software for Big Data Computing. Cho-Li Wang The University of Hong Kong
System Software for Big Data Computing Cho-Li Wang The University of Hong Kong HKU High-Performance Computing Lab. Total # of cores: 3004 CPU + 5376 GPU cores RAM Size: 8.34 TB Disk storage: 130 TB Peak
More informationHPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
More informationAccelerating Intensity Layer Based Pencil Filter Algorithm using CUDA
Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationGPU-accelerated Large Scale Analytics using MapReduce Model
, pp.375-380 http://dx.doi.org/10.14257/ijhit.2015.8.6.36 GPU-accelerated Large Scale Analytics using MapReduce Model RadhaKishan Yadav 1, Robin Singh Bhadoria 2 and Amit Suri 3 1 Research Assistant 2
More informationRetargeting PLAPACK to Clusters with Hardware Accelerators
Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.
More informationHPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014
HPC Cluster Decisions and ANSYS Configuration Best Practices Diana Collier Lead Systems Support Specialist Houston UGM May 2014 1 Agenda Introduction Lead Systems Support Specialist Cluster Decisions Job
More informationNVIDIA Solves the GPU Computing Puzzle
NVIDIA Solves the GPU Computing Puzzle Nathan Brookwood Research Fellow Nathan@Insight64.com An Insight 64 White Paper Sponsored by NVIDIA Anyone who has ever assembled a jigsaw puzzle knows the drill.
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationEnhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
More informationA Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
More informationHP Z Workstations graphics card options
Sales guide HP Z Workstations graphics card options Quick reference guide Table of contents Desktop Workstations compatibility... 3 Mobile and All-in-One Workstations compatibility... 4 Compare features:
More informationNVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
More informationHP Workstations graphics card options
Family data sheet HP Workstations graphics card options Quick reference guide Leading-edge professional graphics March 2014 A full range of graphics cards to meet your performance needs compare features
More informationAppro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
More informationGeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationCray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance
More informationGPU Computing - CUDA
GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective
More informationUnleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
More information