Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI
|
|
|
- Chastity Ferguson
- 10 years ago
- Views:
Transcription
1 Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling jobs to run in parallel dynamic work load balancing with MPI MCS 572 Lecture 7 Introduction to Supercomputing Jan Verschelde, 29 January 2014 Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
2 Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling jobs to run in parallel dynamic work load balancing with MPI Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
3 the Mandelbrot set The number n of iterations ranges from 0 to 255. The grayscales are plotted in reverse, as 255 n. A pixel with coordinates (x, y) is mapped to c = x + iy, i = 1. Consider the map z z 2 + c, starting at z = 0. The grayscale for (x, y) is the number of iterations it takes for z 2 under the map. Grayscales for different pixels are calculated independently pleasingly parallel. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
4 the function iterate The prototype of the function iterate is int iterate ( double x, double y ); /* * Returns the number of iterations for z^2 + c * to grow larger than 2, for c = x + i*y, * where i = sqrt(-1), starting at z = 0. */ We call iterate for all pixels (x,y), for x and y ranging over all rows and columns of a pixel matrix. In our plot we compute 5,000 rows and 5,000 columns. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
5 code for iterate int iterate ( double x, double y ) double wx,wy,v,xx; int k = 0; wx = 0.0; wy = 0.0; v = 0.0; while ((v < 4) && (k++ < 254)) xx = wx*wx - wy*wy; wy = 2.0*wx*wy; wx = xx + x; wy = wy + y; v = wx*wx + wy*wy; return k; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
6 computational cost In the code for iterate we count 6 multiplications on doubles, 3 additions and 1 subtraction. On a Mac OS X laptop 2.26 Ghz Intel Core 2 Duo, for a 5,000-by-5,000 matrix of pixels: $ time /tmp/mandelbrot Total number of iterations : real user sys 0m15.675s 0m14.914s 0m0.163s Performed 682, 940, flops in 15 seconds or 455,293,948 flops per second. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
7 optimizing with -O3 $ make mandelbrot_opt gcc -O3 -o /tmp/mandelbrot_opt mandelbrot.c $ time /tmp/mandelbrot_opt Total number of iterations : real user sys 0m9.846s 0m9.093s 0m0.163s With full optimization, the time drops from 15 to 9 seconds. After compilation with -O3, performed 758,823,246 flops per second. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
8 input and output Input parameters: (x, y) [a, b] [c, d], e.g.: [a, b] = [ 2,+2] = [c, d]; number n of rows (and columns) in pixel matrix determines the resolution of the image and the spacing between points: δx = (b a)/(n 1), δy = (d c)/(n 1). The output is a postscript file. Why? standard format, direct to print or view, allows for batch processing in an environment without visualization capabilities. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
9 Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling jobs to run in parallel dynamic work load balancing with MPI Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
10 static work load assignment Static means: the decision which pixels are computed by which processor is fixed in advance by some algorithm. For the granularity in the communcation, we have two extremes: 1 Matrix of grayscales is divided up into p equal parts and each processor computes part of the matrix. For example: 5,000 rows among 5 processors, each processor takes 1,000 rows. Communication happens after all calculations are done, at the end all processors send their big submatrix to root node. 2 Matrix of grayscales is distributed pixel-by-pixel. Entry (i, j) of the n-by-n matrix is computed by processor with label (i n + j) mod p. Communication is completely interlaced with all computation. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
11 choice of granularity Problem with all communication at end: total cost = computational cost + communication cost. The communication cost is not interlaced with the computation. Problem with pixel-by-pixel distribution: To compute the grayscale of one pixel requires at most 255 iterations, but may finish much sooner. Even in the most expensive case, processors may be mostly busy handling send/recv operations. Compromise: distribute work load along rows: 1 row i is computed by node 1 + (i mod (p 1)). 2 root node 0 distributes row indices and collects the computed rows. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
12 Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling jobs to run in parallel dynamic work load balancing with MPI Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
13 manager/worker algorithm for static load assignment Given n jobs to be completed by p processors, n p. Processor 0 is in charge of 1 distributing the jobs among the p 1 compute nodes, 2 collecting the results from the p 1 compute nodes. Assuming n is a multiple of p 1, let k = n/(p 1). The manager executes the following algorithm: for i from 1 to k do for j from 1 to p 1 do send the next job to compute node j; for j from 1 to p 1 do receive result from compute node j. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
14 run of an example program in C $ mpirun -np 3 /tmp/static_loaddist reading the #jobs per compute node... 1 sending 0 to 1 sending 1 to 2 node 1 received 0 -> 1 computes b node 1 sends b node 2 received 1 -> 2 computes c node 2 sends c received b from 1 received c from 2 sending -1 to 1 sending -1 to 2 The result : bc node 2 received -1 node 1 received -1 $ Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
15 the main program int main ( int argc, char *argv[] ) int i,p; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&p); MPI_Comm_rank(MPI_COMM_WORLD,&i); if(i!= 0) worker(i); else printf("reading the #jobs per compute node...\n"); int nbjobs; scanf("%d",&nbjobs); manager(p,nbjobs*(p-1)); MPI_Finalize(); return 0; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
16 code for the compute node int worker ( int i ) int myjob; MPI_Status status; do MPI_Recv(&myjob,1,MPI_INT,0,tag, MPI_COMM_WORLD,&status); if(v == 1) printf("node %d received %d\n",i,myjob); if(myjob == -1) break; char c = a + ((char)i); if(v == 1) printf("-> %d computes %c\n",i,c); if(v == 1) printf("node %d sends %c\n",i,c); MPI_Send(&c,1,MPI_CHAR,0,tag,MPI_COMM_WORLD); while(myjob!= -1); return 0; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
17 manager distributes jobs int manager ( int p, int n ) char result[n+1]; int job = -1; int j; do for(j=1; j<p; j++) /* distribute jobs */ if(++job >= n) break; int d = 1 + (job % (p-1)); if(v == 1) printf("sending %d to %d\n",job,d); MPI_Send(&job,1,MPI_INT,d,tag,MPI_COMM_WORLD); if(job >= n) break; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
18 manager collects results for(j=1; j<p; j++) /* collect results */ char c; MPI_Status status; MPI_Recv(&c,1,MPI_CHAR,j,tag, MPI_COMM_WORLD,&status); if(v == 1) printf("received %c from %d\n",c,j); result[job-p+1+j] = c; while (job < n); Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
19 end of job queue job = -1; for(j=1; j < p; j++) /* termination signal is -1 */ if(v==1) printf("sending -1 to %d\n",j); MPI_Send(&job,1,MPI_INT,j,tag,MPI_COMM_WORLD); result[n] = \0 ; printf("the result : %s\n",result); return 0; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
20 Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling jobs to run in parallel dynamic work load balancing with MPI Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
21 an example Consider scheduling 8 jobs on 2 processors: serial static p = 2 dynamic p = 2 Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
22 manager/worker algorithm for dynamic load balancing Scheduling n jobs on p processors, n p: node 0 manages the job queue, nodes 1 to p 1 are compute nodes. The manager executes the following algorithm: for j from 1 to p 1 do send job j 1 to compute node j; while not all jobs are done do if a node is done with a job then collect result from node; if there is still a job left to do then send next job to node; else send termination signal. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
23 Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling jobs to run in parallel dynamic work load balancing with MPI Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
24 probing for incoming messages with MPI_Iprobe To check for incoming messages, the nonblocking (or Immediate) MPI command has the syntax: MPI_Iprobe(source,tag,comm,flag,status) where the arguments are source : rank of source or MPI_ANY_SOURCE tag : message tag or MPI_ANY_TAG comm : communicator flag : address of logical variable status : status object If flag is true on return, then status contains the rank of the source of the message and can be received. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
25 code for the manager The manager starts with distributing the first p 1 jobs: int manager ( int p, int n ) char result[n+1]; int j; for(j=1; j<p; j++) /* distribute first jobs */ if(v == 1) printf("sending %d to %d\n",j-1,j); MPI_Send(&j,1,MPI_INT,j,tag,MPI_COMM_WORLD); Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
26 probing and receiving messages int done = 0; int nextjob = p-1; do /* probe for results */ int flag; MPI_Status status; MPI_Iprobe(MPI_ANY_SOURCE,MPI_ANY_TAG, MPI_COMM_WORLD,&flag,&status); if(flag == 1) /* collect result */ char c; j = status.mpi_source; if(v == 1) printf("received message from %d\n",j); MPI_Recv(&c,1,MPI_CHAR,j,tag, MPI_COMM_WORLD,&status); Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
27 sending the next job if(v == 1) printf("received %c from %d\n",c,j); result[done++] = c; if(v == 1) printf("#jobs done : %d\n",done); if(nextjob < n) /* send the next job */ if(v == 1) printf("sending %d to %d\n", nextjob,j); MPI_Send(&nextjob,1,MPI_INT,j,tag, MPI_COMM_WORLD); nextjob = nextjob + 1; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
28 at the end of the queue else /* send -1 to signal termination */ if(v == 1) printf("sending -1 to %d\n",j); flag = -1; MPI_Send(&flag,1,MPI_INT,j,tag, MPI_COMM_WORLD); while (done < n); result[done] = \0 ; printf("the result : %s\n",result); return 0; Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
29 Summary + Exercises Pleasingly parallel computations may need dynamic load balancing. We finished chapter 3 and covered topics of chapter 7. Exercises: 1 Apply the manager/worker algorithm for static load assignment to the computation of the Mandelbrot set. What is the speedup for 2, 4, and 8 compute nodes? 2 Apply the manager/worker algorithm for dynamic load balancing to the computation of the Mandelbrot set. What is the speedup for 2, 4, and 8 compute nodes? 3 Compare the performance of static load assignment with dynamic load balancing for the Mandelbrot set. Introduction to Supercomputing (MCS 572) Load Balancing L-7 29 January / 29
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
To connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
High performance computing systems. Lab 1
High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
Session 2: MUST. Correctness Checking
Center for Information Services and High Performance Computing (ZIH) Session 2: MUST Correctness Checking Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden)
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
What is Multi Core Architecture?
What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a
Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1
Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)
An Incomplete C++ Primer. University of Wyoming MA 5310
An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
Lightning Introduction to MPI Programming
Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
MPI Application Development Using the Analysis Tool MARMOT
MPI Application Development Using the Analysis Tool MARMOT HLRS High Performance Computing Center Stuttgart Allmandring 30 D-70550 Stuttgart http://www.hlrs.de 24.02.2005 1 Höchstleistungsrechenzentrum
Hybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)
Introduction Reading Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF 1 Programming Assignment Notes Assume that memory is limited don t replicate the board on all nodes Need to provide load
Introduction to Cloud Computing
Introduction to Cloud Computing Distributed Systems 15-319, spring 2010 11 th Lecture, Feb 16 th Majd F. Sakr Lecture Motivation Understand Distributed Systems Concepts Understand the concepts / ideas
C++ Programming Language
C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract
16 node Linux cluster at SCFBio
Clustering Tutorial What is Clustering? Clustering is the use of multiple computers, typically PCs or UNIX workstations, multiple storage devices, and redundant interconnections, to form what appears to
Distributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
University of Amsterdam - SURFsara. High Performance Computing and Big Data Course
University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck [email protected] Roy Bakker [email protected] Adam Belloum [email protected]
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage
Outline 1 Computer Architecture hardware components programming environments 2 Getting Started with Python installing Python executing Python code 3 Number Systems decimal and binary notations running
Matlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
Sources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
Python and Cython. a dynamic language. installing Cython factorization again. working with numpy
1 2 Python and 3 MCS 275 Lecture 39 Programming Tools and File Management Jan Verschelde, 19 April 2010 Python and 1 2 3 The Development Cycle compilation versus interpretation Traditional build cycle:
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
Phys4051: C Lecture 2 & 3. Comment Statements. C Data Types. Functions (Review) Comment Statements Variables & Operators Branching Instructions
Phys4051: C Lecture 2 & 3 Functions (Review) Comment Statements Variables & Operators Branching Instructions Comment Statements! Method 1: /* */! Method 2: // /* Single Line */ //Single Line /* This comment
GPI Global Address Space Programming Interface
GPI Global Address Space Programming Interface SEPARS Meeting Stuttgart, December 2nd 2010 Dr. Mirko Rahn Fraunhofer ITWM Competence Center for HPC and Visualization 1 GPI Global address space programming
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
CHAPTER FIVE RESULT ANALYSIS
CHAPTER FIVE RESULT ANALYSIS 5.1 Chapter Introduction 5.2 Discussion of Results 5.3 Performance Comparisons 5.4 Chapter Summary 61 5.1 Chapter Introduction This chapter outlines the results obtained from
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
MPI Runtime Error Detection with MUST For the 13th VI-HPS Tuning Workshop
MPI Runtime Error Detection with MUST For the 13th VI-HPS Tuning Workshop Joachim Protze and Felix Münchhalfen IT Center RWTH Aachen University February 2014 Content MPI Usage Errors Error Classes Avoiding
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
CS1010 Programming Methodology A beginning in problem solving in Computer Science. Aaron Tan http://www.comp.nus.edu.sg/~cs1010/ 20 July 2015
CS1010 Programming Methodology A beginning in problem solving in Computer Science Aaron Tan http://www.comp.nus.edu.sg/~cs1010/ 20 July 2015 Announcements This document is available on the CS1010 website
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Cluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI
Parallel Programming (Multi/cross-platform) Why Choose C/C++ as the programming language? Compiling C/C++ on Windows (for free) Compiling C/C++ on other platforms for free is not an issue Parallel Programming
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
SQLITE C/C++ TUTORIAL
http://www.tutorialspoint.com/sqlite/sqlite_c_cpp.htm SQLITE C/C++ TUTORIAL Copyright tutorialspoint.com Installation Before we start using SQLite in our C/C++ programs, we need to make sure that we have
Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.
Library (versus Language) Based Parallelism in Factoring: Experiments in MPI Dr. Michael Alexander Dr. Sonja Sewera Talk 2007-10-19 Slide 1 of 20 Primes Definitions Prime: A whole number n is a prime number
Miami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
Libmonitor: A Tool for First-Party Monitoring
Libmonitor: A Tool for First-Party Monitoring Mark W. Krentel Dept. of Computer Science Rice University 6100 Main St., Houston, TX 77005 [email protected] ABSTRACT Libmonitor is a library that provides
Introduction to Sun Grid Engine (SGE)
Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems
High-speed image processing algorithms using MMX hardware
High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to
PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
High Performance Computing
High Performance Computing Oliver Rheinbach [email protected] http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum
Analysis report examination with CUBE
Analysis report examination with CUBE Brian Wylie Jülich Supercomputing Centre CUBE Parallel program analysis report exploration tools Libraries for XML report reading & writing Algebra utilities for report
McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]
McMPI Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected] Outline Yet another MPI library? Managed-code, C#, Windows McMPI, design and implementation details Object-orientation,
Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
RA MPI Compilers Debuggers Profiling. March 25, 2009
RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar
Using MATLAB to Measure the Diameter of an Object within an Image
Using MATLAB to Measure the Diameter of an Object within an Image Keywords: MATLAB, Diameter, Image, Measure, Image Processing Toolbox Author: Matthew Wesolowski Date: November 14 th 2014 Executive Summary
UTS: An Unbalanced Tree Search Benchmark
UTS: An Unbalanced Tree Search Benchmark LCPC 2006 1 Coauthors Stephen Olivier, UNC Jun Huan, UNC/Kansas Jinze Liu, UNC Jan Prins, UNC James Dinan, OSU P. Sadayappan, OSU Chau-Wen Tseng, UMD Also, thanks
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Workshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
MPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Performance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014
Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014 Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
Lecture 3. Optimising OpenCL performance
Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL
Program Coupling and Parallel Data Transfer Using PAWS
Program Coupling and Parallel Data Transfer Using PAWS Sue Mniszewski, CCS-3 Pat Fasel, CCS-3 Craig Rasmussen, CCS-1 http://www.acl.lanl.gov/paws 2 Primary Goal of PAWS Provide the ability to share data
Hodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
Agenda. Using HPC Wales 2
Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
OPERATION MANUAL. MV-410RGB Layout Editor. Version 2.1- higher
OPERATION MANUAL MV-410RGB Layout Editor Version 2.1- higher Table of Contents 1. Setup... 1 1-1. Overview... 1 1-2. System Requirements... 1 1-3. Operation Flow... 1 1-4. Installing MV-410RGB Layout
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
MUSIC Multi-Simulation Coordinator. Users Manual. Örjan Ekeberg and Mikael Djurfeldt
MUSIC Multi-Simulation Coordinator Users Manual Örjan Ekeberg and Mikael Djurfeldt March 3, 2009 Abstract MUSIC is an API allowing large scale neuron simulators using MPI internally to exchange data during
Asynchronous Dynamic Load Balancing (ADLB)
Asynchronous Dynamic Load Balancing (ADLB) A high-level, non-general-purpose, but easy-to-use programming model and portable library for task parallelism Rusty Lusk Mathema.cs and Computer Science Division
Lesson 10: Video-Out Interface
Lesson 10: Video-Out Interface 1. Introduction The Altera University Program provides a number of hardware controllers, called cores, to control the Video Graphics Array (VGA) Digital-to-Analog Converter
CSC230 Getting Starting in C. Tyler Bletsch
CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly
A3 Computer Architecture
A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray [email protected] www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory
Analysis and Implementation of Cluster Computing Using Linux Operating System
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 2, Issue 3 (July-Aug. 2012), PP 06-11 Analysis and Implementation of Cluster Computing Using Linux Operating System Zinnia Sultana
Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL
Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL Sylvain Jubertie [email protected] Journée LaMHA - 26/11/2015 Squelettes algorithmiques 2 / 29 Squelettes algorithmiques
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Using Power to Improve C Programming Education
Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden [email protected] jonasskeppstedt.net jonasskeppstedt.net [email protected]
DYNAMIC DOMAIN CLASSIFICATION FOR FRACTAL IMAGE COMPRESSION
DYNAMIC DOMAIN CLASSIFICATION FOR FRACTAL IMAGE COMPRESSION K. Revathy 1 & M. Jayamohan 2 Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India 1 [email protected]
AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables
AMATH 352 Lecture 3 MATLAB Tutorial MATLAB (short for MATrix LABoratory) is a very useful piece of software for numerical analysis. It provides an environment for computation and the visualization. Learning
In-situ Visualization
In-situ Visualization Dr. Jean M. Favre Scientific Computing Research Group 13-01-2011 Outline Motivations How is parallel visualization done today Visualization pipelines Execution paradigms Many grids
The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
