Spring 2011 Prof. Hyesoon Kim

Similar documents
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

Course Development of Programming for General-Purpose Multicore Processors

Parallel Computing for Data Science

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Parallel Computing. Parallel shared memory computing with OpenMP

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

HPC with Multicore and GPUs

Parallel Computing. Shared memory parallel programming with OpenMP

An Introduction to Parallel Computing/ Programming

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Software and the Concurrency Revolution

BLM 413E - Parallel Programming Lecture 3

HPC Wales Skills Academy Course Catalogue 2015

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Introduction to Cloud Computing

Introduction to Parallel Programming and MapReduce

Introduction to GPU Programming Languages

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

MAPREDUCE Programming Model

Scalability evaluation of barrier algorithms for OpenMP

PARALLEL PROGRAMMING

Introduction to GPU hardware and to CUDA

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Cellular Computing on a Linux Cluster

Parallel Algorithm Engineering

Performance Improvement In Java Application

Parallel Programming Survey

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

Part I Courses Syllabus

Parallel Algorithm for Dense Matrix Multiplication

UTS: An Unbalanced Tree Search Benchmark

Big Data Systems CS 5965/6965 FALL 2015

Multi-Threading Performance on Commodity Multi-Core Processors

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

GPU Parallel Computing Architecture and CUDA Programming Model

A Lab Course on Computer Architecture

Using the Intel Inspector XE

What is Multi Core Architecture?

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Systolic Computing. Fundamentals

OpenCL for programming shared memory multicore CPUs

A Case Study - Scaling Legacy Code on Next Generation Platforms

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Tutorial-4a: Parallel (multi-cpu) Computing

Parallel Computing with MATLAB

Mutual Exclusion using Monitors

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

OpenACC Parallelization and Optimization of NAS Parallel Benchmarks

A Pattern-Based Approach to. Automated Application Performance Analysis

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

OpenMP and Performance

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Parallelism and Cloud Computing

Debugging with TotalView

Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

RevoScaleR Speed and Scalability

Data-Flow Awareness in Parallel Data Processing

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Getting OpenMP Up To Speed

Parallel Computing in Python: multiprocessing. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin)

PARALLELIZED SUDOKU SOLVING ALGORITHM USING OpenMP

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Introduction to DISC and Hadoop

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology

GPUs for Scientific Computing

Load Testing Analysis Services Gerhard Brückl

Timing Analysis of Real-Time Software

Real Time Programming: Concepts

Multicore Programming with LabVIEW Technical Resource Guide

OpenCL Programming for the CUDA Architecture. Version 2.3

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

CUDA programming on NVIDIA GPUs

High Performance Computing

ultra fast SOM using CUDA

Scientific Computing Programming with Parallel Objects

OpenACC 2.0 and the PGI Accelerator Compilers

How To Visualize Performance Data In A Computer Program

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

The Asynchronous Dynamic Load-Balancing Library

Transcription:

Spring 2011 Prof. Hyesoon Kim

Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy.

Decompose Into tasks Original Problem Tasks, shared and local data Code with a parallel program Env. Program Thread n {. for (;;;) Program { Thread 1 { Program Thread 1 } }..{ for. (;;;) { for (;;;) { } } } }.. Units of execution + new shared data for extracted dependencies Corresponding source code Patterns for parallel programming Timothy et al.

High quality solution to frequently recurring problem in some domain Learning design patterns makes the programmer to quickly understand the solution and its context. Patterns for parallel programming Timothy et al.

Parallel programs often start as sequential programs Easy to write and debug Already developed/tested Identify program hot spots Parallelization Start with hot spots first Make sequences of small changes, each followed by testing Patterns provide guidance Dr. Rodric Rabbah, IBM

Speedup = Performance for entire task using the enhancement when possible Performance for entire task without using the enhancement Speedup = 1 / (P/N+S) P = parallel fraction (1-S) N = number of processors S = serial fraction

speedup 18 16 14 12 10 2 4 8 16 100 8 6 4 2 0 1 0.950.9 0.850.8 0.750.7 0.650.6 0.550.5 0.450.4 Serial fraction 0.350.3 0.250.2 0.150.1 0.05

Step 1: Find concurrency Step 2: Structure the algorithm so that concurrency can be exploited Step 3 : Implement the algorithm in a suitable programming environment Step 4: Execute and tune the performance of the code on a parallel system Patterns for parallel programming Timothy et al.

Decomposition Task Decomposition Data Decomposition Dependency Analysis Group Tasks Order Tasks Data Sharing Design Evaluation Things to consider: Flexibility, Efficiency, Simplicity Patterns for parallel programming Timothy et al.

Flexibility Program design should afford flexibility in the number and size of tasks generated Tasks should not tie to a specific architecture Fixed tasks vs. Parameterized tasks Efficiency Tasks should have enough work to amortize the cost of creating and managing them Tasks should be sufficiently independent so that managing dependencies doesn t become the bottleneck Simplicity The code has to remain readable and easy to understand, and debug Dr. Rodric Rabbah, IBM

Data decomposition is often implied by task decomposition Programmers need to address task and data decomposition to create a parallel program Which decomposition to start with? Data decomposition is a good starting point when Main computation is organized around manipulation of a large data structure Similar operations are applied to different parts of the data structure Dr. Rodric Rabbah, IBM

Flexibility Size and number of data chunks should support a wide range of executions Efficiency Data chunks should generate comparable amounts of work (for load balancing) Simplicity Complex data compositions can get difficult to manage and debug Dr. Rodric Rabbah, IBM

Geometric data structures Decomposition of arrays along rows, column, blocks Recursive data structures Example: list, tree, graph

Organize by tasks Task parallelism Organize by data decomposition Geometric decomposition Organize by flow of data Pipeline Divide and conquer Recursive data Event-based coordinate Patterns for parallel programming Timothy et al.

Start Organize by tasks Organize by data decomposition Organize by flow of data Linear Recursive Linear Recursive Linear Recursive Task parallelism Divide and conquer Geometric decomposition Recursive data Pipeline Event-based coordination Patterns for parallel programming Timothy et al.

Removable dependencies Temporary variable int ii = 0, jj = 0; for (int i = 0; i < N; i++) { ii = ii + 1; d[ii] = big_time_consuming_work (ii); jj = jj + i; a[jj] = other_big_calc(jj); } - transformed code For (int i =0; i < N; i++) { d[i+1] = big_time_consuming_work(i+1); a[(i*i+i)/2] = other_big_calc((i*i+i)/2)); Separable dependencies for (int i = 0; i < N; i++) { sum = sum + f(i); } Patterns for parallel programming Timothy et al.

Program structures Data structures SPMD Master/Worker Loop Parallelism Shared Data Shared Queue Fork/Join Distributed Array Patterns for parallel programming Timothy et al.

Single program, multiple data All UEs execute the same program in parallel, but each has its own set of data. Initialize Obtain a unique identifier Run the same program each processor Distributed data Finalize CUDA Patterns for parallel programming Timothy et al.

A master process or thread set up a pool of worker processes of threads and a bag of tasks. The workers execute concurrently, with each worker repeatedly removing a tasks from the bag of the tasks. Embarrassingly parallel problems Master Initiate computation Set up problem Create bag of tasks Lunch workers worker 1 do works worker N do works Wait Collect results Terminate computation Patterns for parallel programming Timothy et al.

Many programs are expressed using iterative constructs Programming models like OpenMP provide directives to automatically assign loop iteration to execution units Especially good when code cannot be massively restructured #pragma omp parallel for For (i = 0; i < 16; i++) c[i] = A[i]+B[I]; i = 0 i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 i = 10 i = 11 i = 12 i = 13 i = 14 i = 15 Dr. Rodric Rabbah, IBM

A main UE forks off some number of other UEs that then continue in parallel to accomplish some portion of the overall work. Parent tasks creates new task (fork) then waits until all they complete (join) before continuing on with the computation Parallel region Patterns for parallel programming Timothy et al.

Examples: Instruction pipeline in modern CPUs Algorithm level pipelining Signal processing Graphics Shell programs Cat samplefile grep word wc

Task Parallel. Divide/ Conquer Geometric Decomp. Recursive Data Pipeline Eventbased SPMD Loop Parallel Master/W orker Fork/ Join Patterns for parallel programming Timothy et al.

OpenMP MPI CUDA SPMD Loop Parallel Master/ Slave Fork/Join Prof. Hwu and Dr. Kirk s lecture

UE Management Synchronization Communications Thread control Process control Memory sync/fences Barriers Mutual exclusion Message passing Collective communications Other communications Program language Hardware Patterns for parallel programming Timothy et al.

Task Unit of Execution (UE): process, thread Processing Element (PE) Load balance Synchronization Race conditions Dead locks Concurrency Patterns for parallel programming Timothy et al.

The lecture is just one guidelines. Most parallel programming is finding ways of avoiding data dependences, finding efficient data structures. Can compiler do it? Automatic parallelization Speculative parallelization?

Paid/non-paid positions. Game developers for cloud game Or any companies if you know, let me know Other research positions Good programming skills. Not related to games If you know other students, let me know.