# 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

2 Administrative Details Please sign up for the course so we can get you computer accounts. You will need an account on the cluster as well as an account on the math sciences network. The accounts should be available by the second class meeting. The class web page is There are some pointers on computing and resources available on that page. Class notes will also be posted there. If you are not yet familiar with Linux or R you should become familiar with them soon. Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

3 Outline A rough outline of what we might cover: Some background on HPC. Overview of the Statistics cluster. The snow package. Some tools for monitoring parallel applications. Other R parallel computing frameworks. Overview of PVM and MPI. Overview of OpenMP. Parallel linear algebra libraries. Batch scheduling on the cluster. Overview of Grid computing.... Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

4 Why High Performance/Parallel Computing? Many computations are almost instantaneous on desktop computers. Some computations are beyond a single desktop computer: they take too long need too much memory need too much disk storage Using multiple computers in an organized way is one solution. Using multiple processors on a single computer can also help. Doing this can be hard and/or expensive. Good tools can help. Before getting in too deep be sure to ask: is the computation really needed? Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

5 An Important Number: 2 32 = 4, 294, 967, 296 For many years computers used a 32-bit architecture: Standard computer integer types are usually restricted to 32 bits, usually to the ranges [ 2 31, ] or [0, 2 32 ]. The maximal amount of memory a process can address (the address space) is 2 32 bytes, or 4 GB. Using files more than 4G in size can be tricky. Larger amounts of memory can essentially only be used by working with multiple computers. More recently 64-bit architectures have become available: C int and FORTRAN integer are usually still 32 bit for backward compatibility. C long is usually 64 bit (except Win64) and supported in hardware. Maximal address space is = 10 7 TB = 10 4 PB = 10EB. Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

6 Some Supercomputer and Compute Cluster History Early supercomputers were very fast single processors (1960s). Single (1970s) and multiple (4 16, 1980s) vector processors. Multiple standard processors with shared or distributed memory (1990s). Beowulf clusters (mid 1990s): multiple (more or less) commodity computers reasonably fast dedicated communications network distributed memory (unavoidable for 32-bit) Distributed shared memory systems can use 64-bit Beowulf-style hardware software, hardware, or combination Can provide illusion of single multi-processor system Sometimes called NUMA (Non-Uniform Memory Architecture) Still mostly proprietary and expensive Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

7 Some Recent Developments Moore s law: number of transistors on a chip doubles every 18 months. Until recently this has meant speed increase at about the same rate. Recently speeds have remained flat limiting factors: heat power consumption Additional transistors have been used for integrating graphics, networking chipsets multiple cores 2 or 4 logical processors on a single chip Realistic 3D graphics have driven multiple processors on a chip: Some nvidia cards have 128 (specialized) cores for \$500. Sony/Toshiba/IBM Cell processor for PS 3 has one standard PowerPC Element (PPE) and 8 Synergistic Processing Elements (SPE). Special libraries and methods are needed to program these. Experimental interfaces from high level languages are available for some (Python, R for nvidia). Parallel programming is likely to become essential even for desktop computers. Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

8 Parallel Programming Tools Writing correct, efficient sequential programs is hard. Writing correct, efficient parallel programs is harder. Good tools can help: Low level tools: sockets for distributed memory programming threads for shared memory computing Intermediate level tools: PVM, MPI message-passing libraries for distributed memory OpenMP for shared memory Higher level tools: simple systems like snow for R distributed memory parallelized libraries (distributed or shared memory) parallelizing compilers (mostly shared memory) Some problems are easy to parallelize. Some problems at least seem inherently sequential: pseudo-random number generation Markov chain Monte Carlo Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

9 Parallelizable Computations A simple model says a computation runs N times faster when split over N processors. More realistically, a problem has a fraction S of its computation that is inherently sequential and 1 S that can be parallelized. Amdahl s law: Maximum Speedup = 1 S + (1 S)/N Problems with S 0 are called embarrassingly parallel. Some statistical problems are (or seem to be) embarrassingly parallel: computing column means bootsrapping Others seem inherently sequential: pseudo-random number generation Markov chain Monte Carlo Luke Tierney (U. of Iowa) HPC in Statistics August 30, / 9

### GPUs for Scientific Computing

GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

### 10- High Performance Compu5ng

10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

### Introduction to GPU Programming Languages

CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

### Parallel Programming Survey

Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

### Message-passing Multiprocessors. Server-based systems

Message-passing Multiprocessors. In recent years, there has been interest (and even commercially available machines) which do not use common memory at all, but instead, communicate by passing messages.

### Module 2: "Parallel Computer Architecture: Today and Tomorrow" Lecture 4: "Shared Memory Multiprocessors" The Lecture Contains: Technology trends

The Lecture Contains: Technology trends Architectural trends Exploiting TLP: NOW Supercomputers Exploiting TLP: Shared memory Shared memory MPs Bus-based MPs Scaling: DSMs On-chip TLP Economics Summary

### CUDA programming on NVIDIA GPUs

p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

### PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

### Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

### HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

### High Performance Computing

High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.

### Building Blocks. CPUs, Memory and Accelerators

Building Blocks CPUs, Memory and Accelerators Outline Computer layout CPU and Memory What does performance depend on? Limits to performance Silicon-level parallelism Single Instruction Multiple Data (SIMD/Vector)

### Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

### GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning

1 GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning Hao Jun Liu, Chu Tong Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto haojun.liu@utoronto.ca,

### Parallel Computing. Frank McKenna. UC Berkeley. OpenSees Parallel Workshop Berkeley, CA

Parallel Computing Frank McKenna UC Berkeley OpenSees Parallel Workshop Berkeley, CA Overview Introduction to Parallel Computers Parallel Programming Models Race Conditions and Deadlock Problems Performance

### GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

### Overview of High Performance Computing

Overview of High Performance Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu http://geco.mines.edu/workshop 1 This tutorial will cover all three time slots. In the first session we will discuss the

### Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

### Motivation and Goal. Introduction to HPC content and definitions. Learning Outcomes. Organization

Motivation and Goal Introduction to HPC content and definitions Jan Thorbecke, Section of Applied Geophysics Get familiar with hardware building blocks, how they operate, and how to make use of them in

### The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver

1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution

### ASC Workshop Catalogue Brochure CSIRO ASC Version 1.0 August 2, 2013

INFORMATION MANAGEMENT AND TECHNOLOGY www.csiro.au ASC Workshop Catalogue Brochure CSIRO ASC Version 1.0 August 2, 2013 Commercial In Confidence CSIRO Advanced Scientific Computing GPO Box 1289, Melbourne,

### Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

### Introduction to High Performance Computing

Introduction to High Performance Computing Gregory G. Howes Department of Physics and Astronomy University of Iowa Iowa High Performance Computing Summer School University of Iowa Iowa City, Iowa 6-8 June

### benchmarking Amazon EC2 for high-performance scientific computing

Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

### Tamás Budavári / The Johns Hopkins University

PRACTICAL SCIENTIFIC ANALYSIS OF BIG DATA RUNNING IN PARALLEL / The Johns Hopkins University 2 Parallelism Data parallel Same processing on different pieces of data Task parallel Simultaneous processing

### An Introduction to Parallel Computing/ Programming

An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European

2011 European HyperWorks Technology Conference Vladi Nosenzo, Roberto Vadori 20 Novembre, 2010 2011 ABSTRACT The work described below starts from an idea of a previous experience of Reply, developed in

### Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

### High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

### Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

### Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

### Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

### Linux and High-Performance Computing

Linux and High-Performance Computing Outline Architectures & Performance Measurement Linux on High-Performance Computers Beowulf Clusters, ROCKS Kitten: A Linux-derived LWK Linux on I/O and Service Nodes

### David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

### Financial computing on NVIDIA GPUs

Finance on GPUs p. 1/30 Financial computing on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantative Finance Oxford eresearch Centre

### Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

### Parallel Computing with MATLAB

Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

### Computer Architecture TDTS10

why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

### Microsoft Technical Computing The Advancement of Parallelism. Tom Quinn, Technical Computing Partner Manager

Presented at the COMSOL Conference 2010 Boston Microsoft Technical Computing The Advancement of Parallelism Tom Quinn, Technical Computing Partner Manager 21 1.2 x 10 New Bytes of Information in 2010 Source:

### Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

### Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

### Accelerating CST MWS Performance with GPU and MPI Computing. CST workshop series

Accelerating CST MWS Performance with GPU and MPI Computing www.cst.com CST workshop series 2010 1 Hardware Based Acceleration Techniques - Overview - Multithreading GPU Computing Distributed Computing

### Introduction to Cloud Computing

Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

### IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

### Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

### Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

### Basic Concepts in Parallelization

1 Basic Concepts in Parallelization Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline

### on an system with an infinite number of processors. Calculate the speedup of

1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

### Fortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~

Fortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~ 31/Oct/2006 Software &Solutions group * Agenda Features of Intel Fortran Compiler Integrate with Visual

### Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

### Part I Courses Syllabus

Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

### Using Microsoft Visual Studio 2005 / 2008

Using Visual Studio 2005 / 2008 Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o o The

### Basics of Computational Physics

Basics of Computational Physics What is Computational Physics? Basic computer hardware Software 1: operating systems Software 2: Programming languages Software 3: Problem-solving environment What does

### Parallel Algorithm Engineering

Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

### GPU programming using C++ AMP

GPU programming using C++ AMP Petrika Manika petrika.manika@fshn.edu.al Elda Xhumari elda.xhumari@fshn.edu.al Julian Fejzaj julian.fejzaj@fshn.edu.al Abstract Nowadays, a challenge for programmers is to

### BlockLib: A Skeleton Library for Cell Broadband Engine

BlockLib: A Skeleton Library for Cell Broadband Engine Markus Ålind, Mattias V. Eriksson, Christoph Kessler PELAB Linköping university Sweden IWMSE-2008, Leipzig, 11/5/2008 2 2008 PPE: Master processor

### White Paper The Numascale Solution: Extreme BIG DATA Computing

White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer

### Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

### Parallelization: Binary Tree Traversal

By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First

### Performance Basics; Computer Architectures

8 Performance Basics; Computer Architectures 8.1 Speed and limiting factors of computations Basic floating-point operations, such as addition and multiplication, are carried out directly on the central

### Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

### Chapter 6, The Operating System Machine Level

Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General

### Next-Generation PRIMEHPC. Copyright 2014 FUJITSU LIMITED

Next-Generation PRIMEHPC The K computer and the evolution of PRIMEHPC K computer PRIMEHPC FX10 Post-FX10 CPU SPARC64 VIIIfx SPARC64 IXfx SPARC64 XIfx Peak perf. 128 GFLOPS 236.5 GFLOPS 1TFLOPS ~ # of cores

### numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale

### PARALLEL PROGRAMMING

PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL

### Building an Inexpensive Parallel Computer

Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

### Rodrigo Fernandes de Mello, Evgueni Dodonov, José Augusto Andrade Filho

Middleware for High Performance Computing Rodrigo Fernandes de Mello, Evgueni Dodonov, José Augusto Andrade Filho University of São Paulo São Carlos, Brazil {mello, eugeni, augustoa}@icmc.usp.br Outline

### Big Data Visualization on the MIC

Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth timothy.dykes@port.ac.uk Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth

### The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,

### HPC with Multicore and GPUs

HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

### GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

### OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted

### Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

### HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

### What is High Performance Computing?

What is High Performance Computing? V. Sundararajan Scientific and Engineering Computing Group Centre for Development of Advanced Computing Pune 411 007 vsundar@cdac.in Plan Why HPC? Parallel Architectures

### Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

### From Big Data, Data Mining, and Machine Learning. Full book available for purchase here.

From Big Data, Data Mining, and Machine Learning. Full book available for purchase here. Contents Forward xiii Preface xv Acknowledgments xix Introduction 1 Big Data Timeline 5 Why This Topic Is Relevant

### Improving Perfect Parallelism

Improving Perfect Parallelism Replacing shared resource competition with friendly cooperation Lars Karlsson in collaboration with Carl Christian Kjelgaard Mikkelsen and Bo Kågström Who are we? Parallel

### Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

### reduction critical_section

A comparison of OpenMP and MPI for the parallel CFD test case Michael Resch, Bjíorn Sander and Isabel Loebich High Performance Computing Center Stuttgart èhlrsè Allmandring 3, D-755 Stuttgart Germany resch@hlrs.de

### Multi-core Systems What can we buy today?

Multi-core Systems What can we buy today? Ian Watson & Mikel Lujan Advanced Processor Technologies Group COMP60012 Future Multi-core Computing 1 A Bit of History AMD Opteron introduced in 2003 Hypertransport

### Parallel Computing for Data Science

Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint

### High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

### Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

### Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

### Sieve of Eratosthenes. Finding Primes

Sieve of Eratosthenes Finding Primes Timing How fast is your code? How long to find the primes between 1 and 10,000,000? How long to find primes between 1 and 1 billion? Optimization: the C++ compilers

### 3DES ECB Optimized for Massively Parallel CUDA GPU Architecture

3DES ECB Optimized for Massively Parallel CUDA GPU Architecture Lukasz Swierczewski Computer Science and Automation Institute College of Computer Science and Business Administration in Łomża Lomza, Poland

### Adding large data support to R

Adding large data support to R Luke Tierney Department of Statistics & Actuarial Science University of Iowa January 4, 2013 Luke Tierney (U. of Iowa) Large data in R January 4, 2013 1 / 15 Introduction

### HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

### Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

### RevoScaleR Speed and Scalability

EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

### Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine

Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine Ashwin Aji, Wu Feng, Filip Blagojevic and Dimitris Nikolopoulos Forecast Efficient mapping of wavefront algorithms

### Scalability and Classifications

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

### Using the Windows Cluster

Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster

### LSKA 2010 Survey Report Job Scheduler

LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,

### Introduction to OpenACC Directives. Duncan Poole, NVIDIA Thomas Bradley, NVIDIA

Introduction to OpenACC Directives Duncan Poole, NVIDIA Thomas Bradley, NVIDIA GPUs Reaching Broader Set of Developers 1,000,000 s 100,000 s Early Adopters Research Universities Supercomputing Centers

### Mixing Multi-Core CPUs and GPUs for Scientific Simulation Software

SUBMITTED TO IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS 1 Mixing Multi-Core CPUs and GPUs for Scientific Simulation Software K.A. Hawick, Member, IEEE, A. Leist, and D.P. Playne Abstract Recent technological

### Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the