Tamás Budavári / The Johns Hopkins University

Similar documents
Introduction to GPU Programming Languages

Part I Courses Syllabus

An Introduction to Parallel Computing/ Programming

Chapter 1 Computer System Overview

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Introduction to Cloud Computing

HPC Wales Skills Academy Course Catalogue 2015

GPUs for Scientific Computing

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

10- High Performance Compu5ng

Running applications on the Cray XC30 4/12/2015

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

What is a programming language?

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Overview of HPC Resources at Vanderbilt

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Data Centric Systems (DCS)

Parallel Computing with MATLAB

High Performance Computing

Getting Started with HPC

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Chapter 2 Parallel Architecture, Software And Performance

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Introduction to GPU hardware and to CUDA

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

LSN 2 Computer Processors

A Comparison of Distributed Systems: ChorusOS and Amoeba

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Martinos Center Compute Clusters

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Next Generation GPU Architecture Code-named Fermi

Performance Analysis and Optimization Tool

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

NEC HPC-Linux-Cluster

Parallel Algorithm Engineering

An Introduction to High Performance Computing in the Department

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

L20: GPU Architecture and Models

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Guillimin HPC Users Meeting. Bryan Caron

Parallel Programming Survey

Scalability and Classifications

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Bringing Big Data Modelling into the Hands of Domain Experts

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Turbomachinery CFD on many-core platforms experiences and strategies

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

MPI and Hybrid Programming Models. William Gropp

The Methodology of Application Development for Hybrid Architectures

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

CUDA programming on NVIDIA GPUs

ST810 Advanced Computing

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Matlab on a Supercomputer

Le langage OCaml et la programmation des GPU

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Pedraforca: ARM + GPU prototype

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Parallelization: Binary Tree Traversal

Chapter 2 Heterogeneous Multicore Architecture

Programming Languages & Tools

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00

Manjrasoft Market Oriented Cloud Computing Platform

COSCO 2015 Heterogeneous Computing Programming

Optimizing Shared Resource Contention in HPC Clusters

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Tutorial-4a: Parallel (multi-cpu) Computing

Trends in High-Performance Computing for Power Grid Applications

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Introduction to DISC and Hadoop

CHAPTER 1 INTRODUCTION

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Introduction to Sun Grid Engine (SGE)

Parallel Programming

Transcription:

PRACTICAL SCIENTIFIC ANALYSIS OF BIG DATA RUNNING IN PARALLEL / The Johns Hopkins University

2 Parallelism Data parallel Same processing on different pieces of data Task parallel Simultaneous processing on the same data

On all levels of the hierarchy 3 Clouds Clusters Machines Cores Threads

4 Scalability Scale up Scale out Vertically Add resources to a node Bigger memory, Faster processor, Horizontally Use more of the Threads, cores, machines, clusters, clouds,

5 Cluster

6 High-Performance Computing Traditional HPC clusters Launching jobs on a cluster of machines Use MPI to communicate among nodes Message Passing Interface (not this class)

7 Queuing Systems Used for batch jobs on computer clusters Fair scheduling of user jobs Group policies Several systems Portable Batch System (PBS) Condor, etc

8 Portable Batch System Basic PBS commands qsub qdel qstat, showq

9 Job Requirements Which queue? How much memory? How many CPUs? Think MPI For how long? Send email, where and when?

10 Example Job Submit this Using qsub

11 Computer

Classification of Parallel Computers 12 Flynn s Taxonomy

13 SISD Single Instruction Single Data Classical Von Neumann machines Single threaded codes arstechnica.com

14 SIMD Single Instruction Multiple Data On x86 MMX: Math Matrix extension SSE: Streaming SIMD Extension and more GPU programming!! arstechnica.com

Amdahl s Laws 15 Bell, Gray & Szalay (2005) Petascale Computational Systems: Balanced CyberInfrastructure in a Data-Centric World

Amdahl s Law of Parallelism 16 Speed ratio with T(1) S P T( N) P S N P p S P T(1) T( N) (1 1 p) p N Before looking into parallelism, speed up the serial code, to figure out the max speedup, i.e., N

17 Chip

Moore s Law 18

New Limitation is Energy! 19 Power to compute the same thing? CPU is 10 less efficient than a digital signal processor DSP is 10 less efficient than a custom chip New design: multicores with slower clocks But the interconnect is expensive Need simpler components Swinburne University of Technology 9/1/2011

Emerging Architectures 20 Andrew Chien: 10 10 to replace the 90/10 rule Custom modules on chip, cf. SoC in cellphones Statistics on a video codec module? Swinburne University of Technology 9/1/2011

Emerging Architectures 21 Andrew Chien: 10 10 to replace the 90/10 rule Custom modules on chip, cf. SoC in cellphones Scientific analysis on such specialized units? Swinburne University of Technology 9/1/2011

GPUs Evolved to be General Purpose 22 Virtual world: simulation of real physics C for CUDA and OpenCL 512 cores 25k threads, running 1 billion/sec Old algorithms built on wrong assumption Today processing is free but memory is slow Swinburne University of Technology New programming paradigm! 9/1/2011

New Moore s Law 23 In the number of cores Faster than ever

24 Data Parallel Techniques Embarrassingly Parallel Decoupled problems, independent processing MapReduce Map Reduce

25 Programming

26 Programming Languages No one language to rule them all And many to choose from

27 Assembly Low-level (almost) machine code Different for each computer

28 The C Language Higher level but still close to hardware Pointers! Many things written in C Operating systems Other languages,

29 Java Pros Memory management with garbage collection Just-In-Time compilation from bytecode Cons Not so great performance Hard to include legacy codes New language features were an afterthought

30 Python Scripting to glue things together Easy to wrap legacy codes Lots of scientific modules and plotting Good for prototyping

31 Etc Perl Matlab Mathematica IDL R Lisp Haskell Ocaml Erlang Your favorite here

32 Programming in C

33 Programming in C Skeleton of an application

34 Programming in C Files Headers *.h Source *.c Building an application Compile source Link object files

Using Pointers 35

36 Arrays Dynamic arrays Memory allocation Freeing memory Pointer arithmetics

37 Matrix, etc Point to pointers Data allocated in v Pointers in A For 2D indexing One can have Matrix, tensor, Jagged arrays,