22S:295 Seminar in Applied Statistics High Performance Computing in Statistics



Similar documents
GPUs for Scientific Computing

10- High Performance Compu5ng

Parallel Programming Survey

Introduction to GPU Programming Languages

CUDA programming on NVIDIA GPUs

High Performance Computing

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

HPC Wales Skills Academy Course Catalogue 2015

Overview of HPC Resources at Vanderbilt

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

High Performance Computing. Course Notes HPC Fundamentals

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Next Generation GPU Architecture Code-named Fermi

benchmarking Amazon EC2 for high-performance scientific computing

An Introduction to Parallel Computing/ Programming

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Computer Architecture TDTS10

Scientific Computing Programming with Parallel Objects

Chapter 2 Parallel Architecture, Software And Performance

Introduction to Cloud Computing

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

White Paper The Numascale Solution: Extreme BIG DATA Computing

Parallel Algorithm Engineering

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

Multicore Parallel Computing with OpenMP

Parallel Computing with MATLAB

on an system with an infinite number of processors. Calculate the speedup of

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Computer Graphics Hardware An Overview

Big Data Visualization on the MIC

HPC with Multicore and GPUs

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Building a Top500-class Supercomputing Cluster at LNS-BUAP

RevoScaleR Speed and Scalability

Using the Windows Cluster

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Part I Courses Syllabus

Chapter 6, The Operating System Machine Level

HPC enabling of OpenFOAM R for CFD applications

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Building an Inexpensive Parallel Computer

PARALLEL PROGRAMMING

HPC Software Requirements to Support an HPC Cluster Supercomputer

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Cellular Computing on a Linux Cluster

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Parallelization: Binary Tree Traversal

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Parallel Computing for Data Science

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

White Paper February IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

Kriterien für ein PetaFlop System

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine

Trends in High-Performance Computing for Power Grid Applications

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Scalability and Classifications

OpenMP Programming on ScaleMP

Multi-core Programming System Overview

Optimizing Performance. Training Division New Delhi

Case Study on Productivity and Performance of GPGPUs

Optimizing Code for Accelerators: The Long Road to High Performance

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Building an energy dashboard. Energy measurement and visualization in current HPC systems

reduction critical_section

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Introduction to Cluster Computing

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

1 Bull, 2011 Bull Extreme Computing

Parallel Computing. Benson Muite. benson.

SGI High Performance Computing

Introduction to GPU hardware and to CUDA

ST810 Advanced Computing

LSKA 2010 Survey Report Job Scheduler

~ Greetings from WSU CAPPLab ~

Transcription:

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 1 / 9

Administrative Details Please sign up for the course so we can get you computer accounts. You will need an account on the cluster as well as an account on the math sciences network. The accounts should be available by the second class meeting. The class web page is www.stat.uiowa.edu/~luke/classes/295-hpc There are some pointers on computing and resources available on that page. Class notes will also be posted there. If you are not yet familiar with Linux or R you should become familiar with them soon. Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 2 / 9

Outline A rough outline of what we might cover: Some background on HPC. Overview of the Statistics cluster. The snow package. Some tools for monitoring parallel applications. Other R parallel computing frameworks. Overview of PVM and MPI. Overview of OpenMP. Parallel linear algebra libraries. Batch scheduling on the cluster. Overview of Grid computing.... Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 3 / 9

Why High Performance/Parallel Computing? Many computations are almost instantaneous on desktop computers. Some computations are beyond a single desktop computer: they take too long need too much memory need too much disk storage Using multiple computers in an organized way is one solution. Using multiple processors on a single computer can also help. Doing this can be hard and/or expensive. Good tools can help. Before getting in too deep be sure to ask: is the computation really needed? Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 4 / 9

An Important Number: 2 32 = 4, 294, 967, 296 For many years computers used a 32-bit architecture: Standard computer integer types are usually restricted to 32 bits, usually to the ranges [ 2 31, 2 31 1] or [0, 2 32 ]. The maximal amount of memory a process can address (the address space) is 2 32 bytes, or 4 GB. Using files more than 4G in size can be tricky. Larger amounts of memory can essentially only be used by working with multiple computers. More recently 64-bit architectures have become available: C int and FORTRAN integer are usually still 32 bit for backward compatibility. C long is usually 64 bit (except Win64) and supported in hardware. Maximal address space is 2 64 10 19 = 10 7 TB = 10 4 PB = 10EB. Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 5 / 9

Some Supercomputer and Compute Cluster History Early supercomputers were very fast single processors (1960s). Single (1970s) and multiple (4 16, 1980s) vector processors. Multiple standard processors with shared or distributed memory (1990s). Beowulf clusters (mid 1990s): multiple (more or less) commodity computers reasonably fast dedicated communications network distributed memory (unavoidable for 32-bit) Distributed shared memory systems can use 64-bit Beowulf-style hardware software, hardware, or combination Can provide illusion of single multi-processor system Sometimes called NUMA (Non-Uniform Memory Architecture) Still mostly proprietary and expensive Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 6 / 9

Some Recent Developments Moore s law: number of transistors on a chip doubles every 18 months. Until recently this has meant speed increase at about the same rate. Recently speeds have remained flat limiting factors: heat power consumption Additional transistors have been used for integrating graphics, networking chipsets multiple cores 2 or 4 logical processors on a single chip Realistic 3D graphics have driven multiple processors on a chip: Some nvidia cards have 128 (specialized) cores for $500. Sony/Toshiba/IBM Cell processor for PS 3 has one standard PowerPC Element (PPE) and 8 Synergistic Processing Elements (SPE). Special libraries and methods are needed to program these. Experimental interfaces from high level languages are available for some (Python, R for nvidia). Parallel programming is likely to become essential even for desktop computers. Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 7 / 9

Parallel Programming Tools Writing correct, efficient sequential programs is hard. Writing correct, efficient parallel programs is harder. Good tools can help: Low level tools: sockets for distributed memory programming threads for shared memory computing Intermediate level tools: PVM, MPI message-passing libraries for distributed memory OpenMP for shared memory Higher level tools: simple systems like snow for R distributed memory parallelized libraries (distributed or shared memory) parallelizing compilers (mostly shared memory) Some problems are easy to parallelize. Some problems at least seem inherently sequential: pseudo-random number generation Markov chain Monte Carlo Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 8 / 9

Parallelizable Computations A simple model says a computation runs N times faster when split over N processors. More realistically, a problem has a fraction S of its computation that is inherently sequential and 1 S that can be parallelized. Amdahl s law: Maximum Speedup = 1 S + (1 S)/N Problems with S 0 are called embarrassingly parallel. Some statistical problems are (or seem to be) embarrassingly parallel: computing column means bootsrapping Others seem inherently sequential: pseudo-random number generation Markov chain Monte Carlo Luke Tierney (U. of Iowa) HPC in Statistics August 30, 2007 9 / 9