Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

Size: px
Start display at page:

Download "Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code"

Transcription

1 Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code F. Rossi, S. Sinigardi, P. Londrillo & G. Turchetti University of Bologna & INFN GPU2014, Rome, Sept 17th 2014

2 Laser-driven, plasma based acceleration is a breakthrough particle acceleration technology. In LWFA electron acceleration, a laser pulse propagating in a underdense plasma generates a wakefield that can be used as an accelerating structure. Wakefield = Micron-scale accelerating cavity. Wakefield accelerating fields can reach amplitudes of >100 GV/m. BELLA laser at Berkeley Lab accelerates ultra short electrons bunches up to a few GeVs in very short distances! In Bologna we perform modeling work for experimental groups at INFN Frascati (FLAME) and CNR INO.

3 Modeling the complex Physics of laser plasma accelerators requires advanced numerical models. Electron acceleration (bubble regime) Ion acceleration (TNSA) + post-acceleration Nanostructured targets Ion acceleration (RPA) ne z [λ] z [λ] 5 ni x [λ] [nc ] 0 5 x [λ] y [λ] 5 10 np ni z [λ] [nc ]

4 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Co Modeling the complex Physics of laser plasma accelerators requires advanced numerical models. Particle in Cell simulations 101 Physical model that describes laser plasma interaction: Maxwell equations for e.m. fields Relativistic Lorentz force for plasma Vlasov equation: t + ẋ@ x + ṗ@ p )f j (x, p, t) =0 Fluid model integrating Vlasov equation over first two momenta n j (x) = f i (x, p, t)dp n j u j (x) = vf i (x, p, t)dp One momentum per position: cannot describe wave-breaking Particle in cell (PIC) Sampling phase space as a sum of discrete, finite-sized, macro-particles X f i (x, p, t) =f 0 g(x x i ) (p p i ) More accurate Describes wave-breaking i Numerical noise

5 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations C Particle in cell method, algorithmically Particle-grid interactions: Force on particles is interpoladed (averaging) from fields grids Particle current/density is deposited to grid scatter operation: a particle adds its density value the cells that it overlaps

6 Modeling the complex Physics of laser plasma accelerators requires advanced numerical codes. AlaDyn code suite: CPU code ALADYN High order schemes For memory intensive ion acceleration simulations. Typical run: 3D, fully electromagnetic TNSA simulation (n/nc = 40 λsd=20nm) 100x25x25 points/μm ~1TB RAM (4000 CPU) Interaction time: 30μm GPU code JASMINE Particle tracking (radiation simulation) Tunneling ionization simulation Typical run: 3D, full-em bubble regime simulation (n = ~ e-/cm 3 ) 40x10x10 points/μm ~100GB of GPU RAM Interaction time: >4000μm

7 Current/Density Deposition on GPUs

8 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Co PIC Deposition algorithm on massively parallel architectures 1 Naive, (1 particle! 1 thread) parallelization! Race conditions on same memory cells: wrong results Atomic operations or other synchronization methods are required 2 N particles per cell, particle shape function of total order K (4~27) Density grid data is accessed K N ppc times It s worth caching in GPU multiprocessor s shared memory

9 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Co Caching using shared memory Many accesses to same memory location!worth caching Density grid data is accessed 2 K N ppc times Particles organized by grid block and then grid cell One block processes particles having center inside the grid block Grid block cells + halo (~ half the order of shape function) Cache electrical current grid block in multiprocessor s shared memory Saves ~N ppc K global memory traffic Use segmented reduction for sums in shared memory Rather than shared memory atomic operations Not natively available for double precision Can be slower than reduction

10 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Conc Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid

11 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

12 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

13 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

14 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Conc Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

15 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Conc Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

16 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Conc Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

17 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

18 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

19 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Deposition algorithm without atomics Algorithm - Sort by cell block and cell - Assign a CUDA block to a cell block - Perform a per-block, shared memory, segmented scan to compute density sum for each cell - Sum cached copy to global grid Scans:

20 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Performance LASER: a=7.7, waist=9.0 mum, fwhm=24.0 fs PLASMA: density=1.00e+19 1/cm^3. Simulation run for ct = 60µm, double precision. Note: 3D test with Esirkepov method runs stretched grid optimization while 2D Esirkepov runs without it. Francesco Rossi P. Londrillo, S. Sinigardi, A. Sgattoni and G. Turchetti Robust algorithms for current deposition and efficient memory usage in a GPU

21 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Performance gain Jasmine vs ALaDyn (our CPU code), same exact simulation. Performance of a single NVIDIA Fermi GPU equates ~200x BlueGene cores or ~45x IBM SP6 cores. Plus, since subdomains are much larger, load balancing and scalability are much easier In the simulation setups shown above (fair resolution), jasmine can simulate ~4mm of plasma per day on a ~24 GPUs cluster + Kepler k40 CNAF gave us an extra ~1.5x in performance per GPU

22 PIC parallelization on multiple GPUs

23 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Scaling to multiple GPUs: Hiding network transfer Exchange: 1 Fields halos 2 Particles leaving subdomain Cluster nodes communicate using standard MPI Transfer particles concurrently with current deposition. Communication can be hidden almost completely. Scalability test: warm plasma simulation on INFN APE ULa Sapienza and PLX CINECA

24 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Scaling

25 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Simple algorithm for load balancing The particle motion easily leads to inhomogeneous distribution of the load Shrink the volume of heavy-loaded nodes: Each few timesteps select a subdomain (and its row) and the direction where to shrink Subdomains topology remains intact (vertices conservation) Choice is done trying to minimize the cost function: k 1 Max(Load)/Average(Load)+k 2 Variance(Load)/Average(Load)

26 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Load balancer test case Setup (coarse test) LASER: a=5.8, waist=13.2 mum, fwhm=24.0 fs PLASMA: density=3.80e+18 1/cm^3 GRID: n=[729, 96, 96], dx=[ 6.25e-02, 5.00e-01, 5.00e-01 ] mum Francesco Rossi P. Londrillo, S. Sinigardi, A. Sgattoni androbust G. Turchetti algorithms for current deposition and efficient memory usage in a GPU

27 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Concl Test case: scaling with load balancing With 72 subdomains:

28 Introduction to GPUs GPU PIC current deposition algorithm Multi GPU parallelization Simulation of radiation emission Benchmarks and simulations Conc Memory intensive simulations Total memory availability represents a constraint for many simulations (for example ion acceleration ones). In a node, GPU memory is often much less than the total host memory available Using host memory to store simulation data make larger simulations possible on a cluster of fixed size Asynchronous stream of particle chunks stored in main CPU memory overlapped with computation using CUDA streams Slower but no longer memory bound to the GPU device memory Currently in testing stage

29 Meta-programming improves the code flexibility and GPU scan primitives are used in a wide range of auxiliary functions. Meta-programming via Python-based code template engine: Ionization levels from ADK model Flexibility/extensibility Port GPU kernels in other frameworks Optimization tweaks Auxiliary PIC functionalities require parallel scan primitives: Radiation emission spectrum using trajectories For tracking particles, stream compaction is used for particle selection. Scan is used for dynamically spawn a variable number of electrons from atoms in field tunneling ionization (ADK) model

30 Conclusions GPUs allow simulating few millimeter-long, fully 3D, laser wakefield electron acceleration stages on small clusters (~10 k40). For 10 GeV, >10 cm acceleration stages, reduced numerical methods (envelope, cylindrical symmetry) are usually much more efficient: our objective is to have them efficiently implemented on GPUs.

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12 (Toward) Radiative transfer on AMR with GPUs Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12 A few words about GPUs Cache and control replaced by calculation units Large number of Multiprocessors

More information

Parallel Simplification of Large Meshes on PC Clusters

Parallel Simplification of Large Meshes on PC Clusters Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Texture Cache Approximation on GPUs

Texture Cache Approximation on GPUs Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, joshua.sanmiguel@mail.utoronto.ca 1 Our Contribution GPU Core Cache Cache

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

11th International Computational Accelerator Physics Conference (ICAP) August 19 24, 2012, Rostock-Warnemünde (Germany)

11th International Computational Accelerator Physics Conference (ICAP) August 19 24, 2012, Rostock-Warnemünde (Germany) Numerical Modeling of RF Electron Sources for FEL-Accelerators Erion Gjonaj Computational Electromagetics Laboratory (TEMF), Technische Universität Darmstadt, Germany 11th International Computational Accelerator

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

Computation of Mutual Information Metric for Image Registration on Multiple GPUs

Computation of Mutual Information Metric for Image Registration on Multiple GPUs Computation of Mutual Information Metric for Image Registration on Multiple GPUs Andrew V. Adinetz 1, Markus Axer 2, Marcel Huysegoms 2, Stefan Köhnen 2, Jiri Kraus 3, Dirk Pleiter 1 1 JSC, Forschungszentrum

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka René Widera1, Erik Zenker1,2, Guido Juckeland1, Benjamin Worpitz1,2, Axel Huebl1,2, Andreas Knüpfer2, Wolfgang E. Nagel2,

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

MOSIX: High performance Linux farm

MOSIX: High performance Linux farm MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

walberla: A software framework for CFD applications on 300.000 Compute Cores

walberla: A software framework for CFD applications on 300.000 Compute Cores walberla: A software framework for CFD applications on 300.000 Compute Cores J. Götz (LSS Erlangen, jan.goetz@cs.fau.de), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation)

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop Emily Apsey Performance Engineer Presentation Overview What it takes to successfully virtualize ArcGIS Pro in Citrix XenApp and XenDesktop - Shareable

More information

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA changed the high performance computing (HPC) landscape by introducing its Fermibased GPUs that delivered

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

Pedraforca: ARM + GPU prototype

Pedraforca: ARM + GPU prototype www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics

More information

Accelerator Beam Dynamics on Multicore, GPU and MIC Systems. James Amundson, Qiming Lu, and Panagiotis Spentzouris Fermilab

Accelerator Beam Dynamics on Multicore, GPU and MIC Systems. James Amundson, Qiming Lu, and Panagiotis Spentzouris Fermilab Accelerator Beam Dynamics on Multicore, GPU and MIC Systems James Amundson, Qiming Lu, and Panagiotis Spentzouris Fermilab Synergia Synergia: A comprehensive accelerator beam dynamics package http://web.fnal.gov/sites/synergia/sitepages/synergia%20home.aspx

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Speed Network Marketing, Energy Selection and Optimal Timetable

Speed Network Marketing, Energy Selection and Optimal Timetable COULOMB11 Optical Acceleration of Ions and Perspective for Biomedicine G. Turchetti Conclusive remarks The regime currently most investigated is TNSA. The problem is not maximum energy. Limits come from

More information

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1 Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

More information

Les Accélérateurs Laser Plasma

Les Accélérateurs Laser Plasma Les Accélérateurs Laser Plasma Victor Malka Laboratoire d Optique Appliquée ENSTA ParisTech Ecole Polytechnique CNRS PALAISEAU, France victor.malka@ensta.fr Accelerators : One century of exploration of

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild. Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Network Traffic Monitoring & Analysis with GPUs

Network Traffic Monitoring & Analysis with GPUs Network Traffic Monitoring & Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

Does Quantum Mechanics Make Sense? Size

Does Quantum Mechanics Make Sense? Size Does Quantum Mechanics Make Sense? Some relatively simple concepts show why the answer is yes. Size Classical Mechanics Quantum Mechanics Relative Absolute What does relative vs. absolute size mean? Why

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Jean-Pierre Panziera Teratec 2011

Jean-Pierre Panziera Teratec 2011 Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

Numerical Model for the Study of the Velocity Dependence Of the Ionisation Growth in Gas Discharge Plasma

Numerical Model for the Study of the Velocity Dependence Of the Ionisation Growth in Gas Discharge Plasma Journal of Basrah Researches ((Sciences)) Volume 37.Number 5.A ((2011)) Available online at: www.basra-science -journal.org ISSN 1817 2695 Numerical Model for the Study of the Velocity Dependence Of the

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

GPU Point List Generation through Histogram Pyramids

GPU Point List Generation through Histogram Pyramids VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator

More information

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Packet-based Network Traffic Monitoring and Analysis with GPUs

Packet-based Network Traffic Monitoring and Analysis with GPUs Packet-based Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2014 March 24-27, 2014 SAN JOSE, CALIFORNIA Background Main

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

GPU Programming in Computer Vision

GPU Programming in Computer Vision Computer Vision Group Prof. Daniel Cremers GPU Programming in Computer Vision Preliminary Meeting Thomas Möllenhoff, Robert Maier, Caner Hazirbas What you will learn in the practical course Introduction

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

The HipGISAXS Software Suite: Recovering Nanostructures at Scale

The HipGISAXS Software Suite: Recovering Nanostructures at Scale The HipGISAXS Software Suite: Recovering Nanostructures at Scale Abhinav Sarje Computational Research Division 06.24.15 OLCF User Meeting Oak Ridge, TN Nanoparticle Systems Materials (natural or artificial)

More information

GPU-Based Network Traffic Monitoring & Analysis Tools

GPU-Based Network Traffic Monitoring & Analysis Tools GPU-Based Network Traffic Monitoring & Analysis Tools Wenji Wu; Phil DeMar wenji@fnal.gov, demar@fnal.gov CHEP 2013 October 17, 2013 Coarse Detailed Background Main uses for network traffic monitoring

More information

Network Traffic Monitoring and Analysis with GPUs

Network Traffic Monitoring and Analysis with GPUs Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time

More information

Performance and scalability of a large OLTP workload

Performance and scalability of a large OLTP workload Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............

More information

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

Poster Either Oral or Poster Will not attend. Predictive community computational tools for virtual plasma science experiments.

Poster Either Oral or Poster Will not attend. Predictive community computational tools for virtual plasma science experiments. White Paper for Frontiers of Plasma cience Panel Date of ubmission: 06/19/2015 Indicate the primary area this white paper addresses by placing P in right column. Indicate secondary area or areas by placing

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

OpenACC Parallelization and Optimization of NAS Parallel Benchmarks

OpenACC Parallelization and Optimization of NAS Parallel Benchmarks OpenACC Parallelization and Optimization of NAS Parallel Benchmarks Presented by Rengan Xu GTC 2014, S4340 03/26/2014 Rengan Xu, Xiaonan Tian, Sunita Chandrasekaran, Yonghong Yan, Barbara Chapman HPC Tools

More information

Basin simulation for complex geological settings

Basin simulation for complex geological settings Énergies renouvelables Production éco-responsable Transports innovants Procédés éco-efficients Ressources durables Basin simulation for complex geological settings Towards a realistic modeling P. Havé*,

More information

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University

More information