Tutorial on Parallel Programming with Linda
|
|
|
- Toby Johnston
- 10 years ago
- Views:
Transcription
1 Tutorial on Parallel Programming with Linda Copyright All rights reserved. Scientific Computing Associates, Inc 265 Church St New Haven CT
2 Linda Tutorial Outline Linda Basics pg 3 Linda Programming Environment pg 21 Exercise #1 Hello World pg 26 Exercise #2 Ping/Pong pg 33 Parallel Programming with Linda pg 34 Exercise #3 Integral PI pg 52 Linda Implementation pg 54 Linda versus Competition pg 68 Exercise #4 Monte Carlo PI pg 76 Exercise #5 Matrix Multiplication pg 78 Slide 2
3 Linda Basics Slide 3
4 Linda Technology Overview Six simple commands enabling existing programs to run in parallel Complements any standard programming language to build upon user s investments in software and training Yields portable programs which run in parallel on different platforms, even across networks of machine from different vendors Slide 4
5 What s in Tuple Space A Tuple is a sequence of typed fields: ( Linda is powerful, 2, 32.5, 62) (1,2, Linda is efficient, a:20) ( Linda is easy to learn, i, f(i)) Slide 5
6 Tuple Space provides Process creation Synchronization Data communication These capabilities are provided in a way that is logically independent of language or machine Slide 6
7 Operations on the tuple space Generation eval out Extraction in inp rd rdp Slide 7
8 Linda Operations: Generation out Converts its arguments into a tuple All fields evaluated by the outing process eval Spawns a live tuple that evolves into a normal tuple Each field is evaluated separately When all fields are evaluated, a tuple is generated Slide 8
9 Linda Operations: Extraction in Defines a template for matching against Tuple Space Either finds and removes matching tuple or blocks rd Same as in but doesn t remove tuple inp, rdp Same as in and rd, but returns false instead of blocking Slide 9
10 Out/Eval Out evaluates its arguments and creates a tuple: out("cube", 4, 64); Eval does the same in a new process: eval( cube, 4, f(i)); Slide 10
11 In/Rd These operations would match the tuples created by the out and eval. In( cube, 4,?j); rd( cube, 4,?j); As a side effect, j would be set to 64 Slide 11
12 Using the Virtual Shared Memory A: out('south', imin, jmin, u(2:nxloc+1,2)) A2:... B: do i = 1, 3 eval('table entry', i, f(i)) end do ('table entry', 1, 0.65) ('table entry', 2, 1.005) ('table entry', 3, 1.585) ('south', 1, 17, [ ]) ('south', 17, 33, [ ]) C: imin = 17 jmax = 32 in('south', imin, jmax+1,?u(2:nxloc+1,nyloc+2)) D: i = 3 rd('table entry', i,?fi)
13 Tuple/Template matching rules Same number of fields in tuple and template Corresponding field types match Fields containing data must match Slide 13
14 C Tuple data types In C, tuple fields may be of any of the following types: int, long, short, and char, optionally preceded by unsigned. float and double struct union arrays of the above types of arbitrary dimension pointers must always be dereferenced in tuples. Slide 14
15 Fortran Tuple types In Fortran, tuple fields may be of these types Integer (*1 through *8), Real, Double Precision, Logical (*1 through *8), Character, Complex, Complex*16 Synonyms for these standards types (for example, Real*8). Arrays of these types of arbitrary dimension, including multidimensional arrays, and/or portions thereof. Named common blocks Slide 15
16 Array fields The format for an array field is name:len char a[20]; out( a, a:); all 20 elements out( a, a:10); first 10 elements in( a,?a:len); stores # recvd in len Slide 16
17 Matching Semantics Templates matching no tuples will block (except inp/rdp) Templates matching multiple tuples will match non-deterministically Neither Tuples nor Templates match oldest first These semantics lead to clear algorithms without timing dependencies! Slide 17
18 Linda Distributed Data Structures Linda can be used to build distributed data structures in Tuplespace Easier to think about than data passing Atomicity of Tuple Operations provides data structure locking Slide 18
19 Linda Distributed Data Structures (examples) Counter in( counter, name,?cnt); out( counter, name, cnt+1); Table for(i=0; i<n; i++) out( table, name, elem[i]); Slide 19
20 Linda Distributed Data Structures (examples) Queue init(){ out( head, 0); out( tail, 0); } put(elem){ in( tail,?tail); out( elem, tail, elem); out( tail, tail+1); } take(elem) { in( head,?head); out( elem, head, elem); out( head, head+1); } Slide 20
21 The Linda Programming Environment Slide 21
22 Software Development with Linda Step 1: Develop and debug sequential modules Step 2: Use Linda Code Development System and TupleScope to develop parallel version Step 3: Use parallel Linda system to test and tune parallel version Slide 22
23 Linda Code Development System Implements full Linda system on a single workstation Provides comfortable development environment Runs using multitasking, permitting realistic testing Compatible with TupleScope visual debugger Slide 23
24 Parallel Hello World #define NUM_PROCS 4 real_main(){ int i, hello_world(); out( count, 0); for (i=1; i<=num_procs; i++) eval( worker, hello_world(i)); in( count, NUM_PROCS); printf( all processes done.\n ); } hello_world(i) int i; { int j; in( count,?j); out( count, j+1); printf( hello world from process %d, count %d\n, i, j); } Slide 24
25 Using Linda Code Development System % setenv LINDA_CLC cds % clc -o hello hello.cl CLC (V3.1 CDS version) hello.cl:10: warning --- no matching Linda op. % hello Linda initializing (2000 blocks). Linda initialization complete. Hello world from process 3 count 0 Hello world from process 2 count 1 Hello world from process 4 count 2 Hello world from process 1 count 3 All processes done. all tasks are complete (5 tasks). Slide 25
26 Hands on Session #1 - Hello World Compile hello.cl using the Code Development System. Run the program. Slide 26
27 Linda Termination Linda Programs terminate in three ways: Normal termination: when all processes (real_main and any evals) have terminated, by returning or calling lexit() Abnormal termination: any process ends abnormally lhalt() termination: any process may call lhalt() The system will clean up all processes upon termination. Do not call exit from within Linda programs! Slide 27
28 TupleScope Visual Debugger X windows visualization and debugging tool for parallel programs Displays tuple classes, process interaction, program code, and tuple data Contains usual debugger features like single-step of Linda operation Integrated with source debuggers such as dbx, gdb. Slide 28
29 Linda TupleScope Slide 29
30 Debugging with the Linda Code Development System Compile program for TupleScope and dbx: clc -g -o hello -linda tuple_scope hello.cl Run program, single stepping until desired process is evaled Middle click on process icon Set breakpoint in native debugger window Turn off TupleScope single stepping Control process via native debugger Slide 30
31 TCP Linda TCP Linda programs are started via ntsnet utility Ntsnet will: Read tsnet.config configuration file Determine network status Schedule nodes for execution Translate directory paths Copy executables Start Linda process on selected remote machines Monitor execution Etc, etc. Slide 31
32 Running a program with ntsnet Create file ~/.tsnet.config containing the machine names: Tsnet.Appl.nodelist: io ganymede rhea electra Compile the program with TCP Linda % setenv LINDA_CLC linda_tcp % clc -o hello hello.cl Run program with ntsnet % ntsnet hello Slide 32
33 Hands on Session #2 - Ping Pong This exercise demonstrates basic Linda operations and TupleScope Write a program that creates to workers called ping() and pong(). Ping() loops, sending a ping tuple and receiving a pong, while pong() does the opposite. Ping and pong should agree on the length of the game, and terminate when finished. Compile and run the program using Code Development System and TupleScope. Slide 33
34 Parallel Programming with Linda Slide 34
35 Parallel Processing Vocabulary Granularity: ratio of computation to communication Efficiency: how effectively the resources are used Speedup: performance increase as CPU s are added Load Balance: is work evenly distributed? Slide 35
36 Linda Algorithms Live Task Master/Worker Domain Decomposition Owner Computes This is not an exhaustive list: Linda is general and can support most styles of algorithms Slide 36
37 Live Task Algorithms Simplest Linda Algorithm Master evals task tuples Retrieves completed task tuples Caveats: simple parameters only watch out for scheduling problems think about granularity! Slide 37
38 Live Task Algorithms Sequential Code Linda Code main() { /* initialize a[], b[] */ for (i=0; i<limit; i++) res[i]=comp(a[i], b[i]); } real_main() { /* initialize a[], b[] */ for (i=0; i<limit; i++) eval( task,i,comp(a[i],b[i])); } for (i=0; i<limit; i++) in( task, i,?res[i]); Slide 38
39 Master/Worker Algorithms Separate Processes and Tasks Tasks become lightweight Master evals workers generates task tuples consumes result tuples Worker loops continuously consumes a task generates a result Slide 39
40 Dynamic Load Balancing WORKER WORKER WORKER WORKER WORKER WORKER Task to do Task in progress Completed task Slide 40
41 Master/Worker Algorithms: sequential code main() { RESULT r; TASK t; } while (get_task(&t)) { r = compute_task(t); update_result(r); } output_result(); Slide 41
42 Master/Worker Algorithms: parallel code real_main() { int i; RESULT r; TASK t; } for (i=0; i<nworker; i++) eval( worker, worker()); for (i=0; get_task(&t); i++) out( task, t); for (; i; --i){ in( result,?r); update_result(r); } output_result(); worker() { RESULT r; TASK t; } while (1) { in( task,?t); r = compute_task(t); out( result, r); } Slide 42
43 Master/Worker Algorithms: To be most effective, you need: Relatively independent tasks More tasks than workers May need to order tasks Benefits: Easy to code Near ideal speedups Automatic load balancing Lightweight tasks Slide 43
44 Domain Decomposition Algorithms Specific number of processes in a fixed organization Relatively fixed, message/passing style of communication Processes work in lockstep, often time steps Slide 44
45 A PDE Example u t = 2 u x + 2 u 2 y 2 u(x,y,t +dt)= u(x,y,t)+ dt dx 2 { u(x+dx,y,t)+u(x dx,y,t) 2u(x,y,t) }+ dt dy 2 u(x,y+dy,t)+u(x,y dy,t) 2u(x,y,t) { } MASTER WO RKER STEP
46 subroutine real_main common /parms/ cs, cy, nts... GET INITIAL DATA... Master Routine out( parms common, /parms/) np = 0 do ix = 1, nx, nxloc ixmax = min(ix+nxloc-1, nx) do iy = 1, ny, nyloc iymax = min(iy+nyloc-1, ny) np = np + 1 if (ix.gt.nxloc.or. iy.gt.nyloc) then eval( worker, worker(ix, ixmax, iy, iymax)) endif out( initial data, ix, iy, u(ix:ixmax, iy:iymax)) enddo enddo call worker(1, min(nxloc,nx), 1, min(nyloc,ny)) do i = 1, np in( result id,?ixmin,?ixmax,?iymin,?iymax) in( result, ixmin, iymin,?u(ixmin:ixmax, iymin:iymax)) enddo M return end
47 Worker Routines - I subroutine worker(ixmin, ixmax, iymin, iymax) common /parms/ cx, cy, nts dimension uloc(nxlocal+2, NYLOCAL+2, 2) nxloc = ixmax - ixmin + 1 nyloc = iymax - iymin + 1 rd( parms common,?/parms/) in( initial data, ixmin, iymin,?uloc(2:nxloc+1,2:nyloc+1)) iz = 1 do it = 1, nts call step(ixmin, ixmax, iymin, iymax, NXLOCAL+2, * nxloc, nyloc, iz, uloc(1,1,iz), uloc(1,1,3-iz)) iz = 3 - iz enddo out( result id, ixmin, ixmax, iymin, iymax) out( result, ixmin, iymin, uloc(2:nxloc+1, 2:nyloc+1, iz)) return end
48 Worker Routines - II subroutine step(ixmin, ixmax, iymin, iymax, nrows, * nxloc, nyloc, iz, u1, u2) common /parms/ cx, cy, nts dimension u1(nrows, *), u2(nrows, *) if (ixmin.ne.1) out( west, ixmin, iymin, u1(2, 2:nyloc+1)) if (ixmax.ne.nx) out( east, ixmax, iymin, u1(nxloc+1, 2:nyloc+1)) if (iymax.ne.ny) out( north, ixmin, iymax, u1(2:nxloc+1, nyloc+1)) if (iymin.ne.1) out( south, ixmin, iymin, u1(2:nxloc+1, 2)) if (ixmin.ne.1) in( east, ixmin-1, iymin,?u1(1,2:nyloc+1)) if (ixmax.ne.nx) in( west, ixmax+1, iymin,?u1(nxloc+2, 2:nyloc+1)) if (iymin.ne.1) in( north, ixmin, iymin-1,?u1(2:nxloc+1, 1)) if (iymax.ne.ny) in( south, ixmin, iymax+1,?u1(2:nxloc+1, nyloc+2)) do ix = 2, nxloc + 1 do iy = 2, nyloc + 2 u2(ix,iy) = u1(ix,iy) + * cx * (u1(ix+1,iy) + u1(ix-1,iy) - 2.*u1(ix,iy)) + * cy * (u1(ix,iy+1) + u1(ix,iy-1) - 2.*u1(ix,iy)) enddo enddo return end
49 Owner-Computes Algorithm Share loop iterations among different processors Iteration allocation can be dynamic Allows for parallelization with little change to the code Good for parallelizing complex existing sequential codes Slide 49
50 Owner-Computes: sequential code main() { /* lots of complex initialization */ for (ol=0; ol<loops; ++ol) { for (i=0; i<n; ++i) {... elem[i] = f(...); /* complex calculation */... } /* more complex calculations using elem[]; */ } /* output results to disk...*/ } Slide 50
51 Parallel Owner Computes real_main() { out("task", 1); for (i=1; i<numprocs; ++i) eval(old_main(i)); old_main(0); } old_main(id) int id; { /* complex init */ for (ol=0; ol<loops; ++ol) { for (i=0; i<n; ++i) { if (!check()) continue;... elem[i] = f(...);... log_data(i, elem[i]); } gatherscatter(); /* more complex calc */ } /* output results to disk...*/ } check(){ static int next=0, count=0; if (next==0) { in("task",?next); out("task", next+1); } if (++count == next) { next=0; return 1; } return 0; } log_data(i, val){ local_results[i].id = i; local_results[i].val = val; } gatherscatter(){ if (myid==0) /* master ins local results */ /* master outs all results */ else /* worker outs local results */ /* worker ins all results */ }
52 Hands on Session #3 - Live Task PI Convert sequential C program to parallel using live task method. PI is the integral of 4/(1+x*x) from 0 to 1. Slide 52
53 Hands on Session #3 - Sequential Program main(argc, argv) int argc; char *argv[]; { nsteps = atoi(argv[1]); step = 1.0/(double)nsteps; pi=subrange(nsteps,0.0,step)*step; } double subrange(nsteps,x,step) { double result = 0.0; while(nsteps>0) { result += 4.0/(1.0+x*x); x += step; nsteps--; } return(result); }
54 Linda Implementation Slide 54
55 Linda Implementation Efficiency Tuple usage analysis and optimization is the key to Scientific s Linda systems Optimization occurs at compile, link, and runtime In general Satisfying an in requires touching fewer than two tuples On distributed memory architectures, communication pattern optimizes to near message-passing Slide 55
56 Implementation 3 major components: Compile Time: Language Front End supports Linda syntax supports debugging supports tuple usage analysis Link Time: Tuple-usage Analyzer optimizes run-time tuple storage and handling Run Time: Linda Support Library initializes system manages resources provides custom tuple handlers dynamically reconfigures to optimize handling Slide 56
57 Linda Compile-time Processing Linda Source Code Parser Linda Engine Native Source Code Native Compiler Native Object Code Tuple Usage Data Linda Object File Slide 57
58 Linda Link-time Processing Linda Object Files Analyzer Generated C Source Code Native Compiler Generated C Object Code Object Code Other Object Files Linda Library Loader Executable Slide 58
59 Tuple Usage Analysis Converts Linda operations into more efficient, low level operations Phase I partitions Linda operations into disjoint sets based on tuple and template content Example: out ( date, i, j) can never match in( sem,?i) Slide 59
60 Tuple Usage Analysis Phase II analyzes each partition found in Phase I Detects patterns of tuple and template usage Maps each pattern to a conventional data structure Chooses a runtime support routine for each operation Slide 60
61 Linda Compile-Time Analysis Example /* Add to order task list */ out( task, ++j, task_info) /* S1 */ /* Extend table of squares */ out( squares, i, i*i) /* S2 */ /* Consult table */ rd( squares, i,?i2) /* S3 */ /* Grab task */ in( task,?t,?ti) /* S4 */ Slide 61
62 Linda Compile-Time Analysis Example Phase I: two partitions are generated: P1={S2, S3} P2={S1, S4} Phase II: each partition optimized: P1: Field 1 can be suppressed Field 3 is copy only (no matching) Field 2 must be matched, but key available hash implementation P2: Field 1 can be suppressed Fields 2 & 3 are copy only (no matching) queue implementation Slide 62
63 Linda Compile-Time Analysis Associative matching reduced to simple data structure lookups Common set paradigms are: counting semaphores queues hash tables trees In practice, exhaustive searching is never needed Slide 63
64 Run Time Library Contains implementations of set paradigms for tuple storage Structures the tuple space for efficiency Families of implementations for architecture classes Shared-memory Distributed-memory Put/get memory Slide 64
65 TCP Linda runtime optimizations Tuple rehashing Runtime system observes patterns of usage, remaps tuples to better locations Example: Domain decomposition Example: Result tuples Long field handling Large data fields can be stored on outing machine We know they are not needed for matching Bulk data transferred only once Slide 65
66 Hints for Aiding the Analysis Use String Tags out( array elem, i, a[i]) out( task, t); Code is self documenting, more readable Helps with set partitioning No runtime cost! Slide 66
67 Hints for Aiding the Analysis Use care with hash keys Hash key is non-constant, always actual out( array elem, iter, i, a[i]) in( array elem, iter, i,?val) Analyzer combines all such fields (fields 2 and 3) Avoid unnecessary use of formal in hash field (common in cleanup code) in( array elem,?int,?int,?float) Slide 67
68 Linda vs. the Competition Slide 68
69 Portable Parallel Programming Four technology classes for Portable Parallel Programming: Message Passing - the machine language of parallel computing Language extensions - incremental build on traditional languages Inherently Parallel Languages - elegant but steep learning curve Compiler Tools - the solution to the dusty deck problem? Slide 69
70 Portable Parallel Programming: the major players Four technology classes for Portable Parallel Programming: Message Passing - MPI, PVM, Java RMI... Language extensions - Linda, Java Spaces... Inherently Parallel Languages -?? Compiler Tools - HPF Slide 70
71 Why not message passing? Message passing is the machine language of distributedmemory parallelism It s part of the problem, not the solution Linda s Mission: Comparable efficiency with much greater ease of use Slide 71
72 Linda is a high-level approach Point-to-point communication is trivial in Linda, so you can do message passing if you must but Linda s shared associate object memory is extremely hard to implement in message passing Message Passing is a low-level approach Slide 72
73 Simplicity of Expression /* Receive data from master */ msgtype = 0; pvm_recv(-1, msgtype); pvm_upkint(&nproc, 1, 1); pvm_upkint(tids, nproc, 1); pvm_upkint(&n, 1, 1); pvm_upkfloat(data, n, 1); /* Do calculations with data */ result = work(me, n, data, tids, nproc); /* Send result to master */ pvm_initsend(pvmdatadefault); pvm_pkint(&me, 1, 1); pvm_pkfloat(&result, 1, 1); msgtype = 5; master = pvm_parent(); pvm_send(master, msgtype); /* Program done. Exit PVM and stop */ pvm_exit(); /* Receive data from master */ rd( init data,?nproc,?n,?data); /* Do calculations with data */ result = work(id, n, data, tids, nproc); /* Send result to master */ out( result, id, result); PVM Linda
74 Global counter in Linda vs. Message Passing Example in Using MPI, Gropp, et. al. was more than two pages long! Several reasons: MPI cannot easily represent data apart from processes Must build a special purpose counter agent All data marshalling is done by hand (error prone!) Must worry about group issues In Linda, counter requires 3 lines of code! Slide 74
75 Why Linda s Tuple Space is important Parallel program development is much easier than with message passing: Dynamic tasking Distributed data structures Uncoupled programming Anonymous communication Dynamically varying process pool Slide 75
76 Hands on session #4: Monte Carlo PI Pi can be calculated by the probability of randomly placed points falling within a circle Use master/worker algorithm to parallelize program r a a square circle π = 4 = 4r = πr a a circle 2 square 2 Slide 76
77 Poison Pill Termination Common Linda idiom in Master/Worker algorithms Master creates a special task that causes evaled processes to terminate. real_main() { for(i=0; i<nworkers; i++) eval( worker, worker());... /* got all results */ out( task, POISON, t);... } worker() {... in( task,?tid,?t); if (tid==poison) { in( task,?tid,?t); return(); }... } Slide 77
78 Hands on session #5: Matrix Multiplication This exercise develops a Linda application with non-trivial communication costs Write a program that computes C=A*B where A, B, C are square matrices Parallelism can be defined at any of the following levels: single elements of result matrix C single rows (or columns) of C groups of rows (or columns) of C groups of rows and columns (blocks) of C Slide 78
79 Hands on session #5: Matrix Multiplication For your algorithm, estimate the ratio of communication to computation, assuming that: Computational speed is 100 Mflops Communication speed is 1 Mbytes/sec with 1 msec latency (ethernet) How much faster must the network be? Slide 79
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)
Introduction Reading Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF 1 Programming Assignment Notes Assume that memory is limited don t replicate the board on all nodes Need to provide load
An Incomplete C++ Primer. University of Wyoming MA 5310
An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages
Sources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
The programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Chapter 1. Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages
Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Building Applications Using Micro Focus COBOL
Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
Chapter 6: Programming Languages
Chapter 6: Programming Languages Computer Science: An Overview Eleventh Edition by J. Glenn Brookshear Copyright 2012 Pearson Education, Inc. Chapter 6: Programming Languages 6.1 Historical Perspective
Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI
Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling
A Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
The C Programming Language course syllabus associate level
TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming
System Structures. Services Interface Structure
System Structures Services Interface Structure Operating system services (1) Operating system services (2) Functions that are helpful to the user User interface Command line interpreter Batch interface
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
1/20/2016 INTRODUCTION
INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We
THE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
Visual Basic Programming. An Introduction
Visual Basic Programming An Introduction Why Visual Basic? Programming for the Windows User Interface is extremely complicated. Other Graphical User Interfaces (GUI) are no better. Visual Basic provides
Introduction to Embedded Systems. Software Update Problem
Introduction to Embedded Systems CS/ECE 6780/5780 Al Davis logistics minor Today s topics: more software development issues 1 CS 5780 Software Update Problem Lab machines work let us know if they don t
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism
Intel Cilk Plus: A Simple Path to Parallelism Compiler extensions to simplify task and data parallelism Intel Cilk Plus adds simple language extensions to express data and task parallelism to the C and
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
Input Output. 9.1 Why an IO Module is Needed? Chapter 9
Chapter 9 Input Output In general most of the finite element applications have to communicate with pre and post processors,except in some special cases in which the application generates its own input.
Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar
Lexical Analysis and Scanning Honors Compilers Feb 5 th 2001 Robert Dewar The Input Read string input Might be sequence of characters (Unix) Might be sequence of lines (VMS) Character set ASCII ISO Latin-1
Linux Driver Devices. Why, When, Which, How?
Bertrand Mermet Sylvain Ract Linux Driver Devices. Why, When, Which, How? Since its creation in the early 1990 s Linux has been installed on millions of computers or embedded systems. These systems may
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages
ICOM 4036 Programming Languages Preliminaries Dr. Amirhossein Chinaei Dept. of Electrical & Computer Engineering UPRM Spring 2010 Language Evaluation Criteria Readability: the ease with which programs
Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding
Compiling Object Oriented Languages What is an Object-Oriented Programming Language? Last time Dynamic compilation Today Introduction to compiling object oriented languages What are the issues? Objects
Tutorial on C Language Programming
Tutorial on C Language Programming Teodor Rus [email protected] The University of Iowa, Department of Computer Science Introduction to System Software p.1/64 Tutorial on C programming C program structure:
MA-WA1920: Enterprise iphone and ipad Programming
MA-WA1920: Enterprise iphone and ipad Programming Description This 5 day iphone training course teaches application development for the ios platform. It covers iphone, ipad and ipod Touch devices. This
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Moving from CS 61A Scheme to CS 61B Java
Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you
MPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
Replication on Virtual Machines
Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism
Setting up PostgreSQL
Setting up PostgreSQL 1 Introduction to PostgreSQL PostgreSQL is an object-relational database management system based on POSTGRES, which was developed at the University of California at Berkeley. PostgreSQL
Compiler Construction
Compiler Construction Lecture 1 - An Overview 2003 Robert M. Siegfried All rights reserved A few basic definitions Translate - v, a.to turn into one s own language or another. b. to transform or turn from
LearnFromGuru Polish your knowledge
SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities
EMC Publishing. Ontario Curriculum Computer and Information Science Grade 11
EMC Publishing Ontario Curriculum Computer and Information Science Grade 11 Correlations for: An Introduction to Programming Using Microsoft Visual Basic 2005 Theory and Foundation Overall Expectations
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1
Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)
To Java SE 8, and Beyond (Plan B)
11-12-13 To Java SE 8, and Beyond (Plan B) Francisco Morero Peyrona EMEA Java Community Leader 8 9...2012 2020? Priorities for the Java Platforms Grow Developer Base Grow Adoption
Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives
Introduction to Programming and Algorithms Module 1 CS 146 Sam Houston State University Dr. Tim McGuire Module Objectives To understand: the necessity of programming, differences between hardware and software,
Chapter 6 Load Balancing
Chapter 6 Load Balancing Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 2. Parallel Loops Chapter 3. Parallel Loop Schedules Chapter 4. Parallel Reduction Chapter 5. Reduction Variables
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
KITES TECHNOLOGY COURSE MODULE (C, C++, DS)
KITES TECHNOLOGY 360 Degree Solution www.kitestechnology.com/academy.php [email protected] [email protected] Contact: - 8961334776 9433759247 9830639522.NET JAVA WEB DESIGN PHP SQL, PL/SQL
Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Channel Access Client Programming. Andrew Johnson Computer Scientist, AES-SSG
Channel Access Client Programming Andrew Johnson Computer Scientist, AES-SSG Channel Access The main programming interface for writing Channel Access clients is the library that comes with EPICS base Written
The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
Performance Tuning for the Teradata Database
Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document
Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C
Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive
Java (12 Weeks) Introduction to Java Programming Language
Java (12 Weeks) Topic Lecture No. Introduction to Java Programming Language 1 An Introduction to Java o Java as a Programming Platform, The Java "White Paper" Buzzwords, Java and the Internet, A Short
TivaWare Utilities Library
TivaWare Utilities Library USER S GUIDE SW-TM4C-UTILS-UG-1.1 Copyright 2013 Texas Instruments Incorporated Copyright Copyright 2013 Texas Instruments Incorporated. All rights reserved. Tiva and TivaWare
Q N X S O F T W A R E D E V E L O P M E N T P L A T F O R M v 6. 4. 10 Steps to Developing a QNX Program Quickstart Guide
Q N X S O F T W A R E D E V E L O P M E N T P L A T F O R M v 6. 4 10 Steps to Developing a QNX Program Quickstart Guide 2008, QNX Software Systems GmbH & Co. KG. A Harman International Company. All rights
Course MS10975A Introduction to Programming. Length: 5 Days
3 Riverchase Office Plaza Hoover, Alabama 35244 Phone: 205.989.4944 Fax: 855.317.2187 E-Mail: [email protected] Web: www.discoveritt.com Course MS10975A Introduction to Programming Length: 5 Days
University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011
University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011 Department Mission The Department of Computer Science in the College of Arts and Sciences
CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013
Oct 4, 2013, p 1 Name: CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 1. (max 18) 4. (max 16) 2. (max 12) 5. (max 12) 3. (max 24) 6. (max 18) Total: (max 100)
Chapter 13: Program Development and Programming Languages
Understanding Computers Today and Tomorrow 12 th Edition Chapter 13: Program Development and Programming Languages Learning Objectives Understand the differences between structured programming, object-oriented
Parallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
Variable Base Interface
Chapter 6 Variable Base Interface 6.1 Introduction Finite element codes has been changed a lot during the evolution of the Finite Element Method, In its early times, finite element applications were developed
Glossary of Object Oriented Terms
Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
Chapter 12 Programming Concepts and Languages
Chapter 12 Programming Concepts and Languages Chapter 12 Programming Concepts and Languages Paradigm Publishing, Inc. 12-1 Presentation Overview Programming Concepts Problem-Solving Techniques The Evolution
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Chapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
Best practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
The Clean programming language. Group 25, Jingui Li, Daren Tuzi
The Clean programming language Group 25, Jingui Li, Daren Tuzi The Clean programming language Overview The Clean programming language first appeared in 1987 and is still being further developed. It was
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:
omp critical The code inside a CRITICAL region is executed by only one thread at a time. The order is not specified. This means that if a thread is currently executing inside a CRITICAL region and another
Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI)
Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI) SYSTEM V APPLICATION BINARY INTERFACE Motorola M68HC05, M68HC08, M68HC11, M68HC12, and M68HC16 Processors Supplement Version 2.0
How To Write Portable Programs In C
Writing Portable Programs COS 217 1 Goals of Today s Class Writing portable programs in C Sources of heterogeneity Data types, evaluation order, byte order, char set, Reading period and final exam Important
CORBA Programming with TAOX11. The C++11 CORBA Implementation
CORBA Programming with TAOX11 The C++11 CORBA Implementation TAOX11: the CORBA Implementation by Remedy IT TAOX11 simplifies development of CORBA based applications IDL to C++11 language mapping is easy
GDB Tutorial. A Walkthrough with Examples. CMSC 212 - Spring 2009. Last modified March 22, 2009. GDB Tutorial
A Walkthrough with Examples CMSC 212 - Spring 2009 Last modified March 22, 2009 What is gdb? GNU Debugger A debugger for several languages, including C and C++ It allows you to inspect what the program
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
SmartArrays and Java Frequently Asked Questions
SmartArrays and Java Frequently Asked Questions What are SmartArrays? A SmartArray is an intelligent multidimensional array of data. Intelligent means that it has built-in knowledge of how to perform operations
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 [email protected]
New Tools Using the Hardware Pertbrmaiihe... Monitor to Help Users Tune Programs on the Cray X-MP*
ANL/CP--7 4428 _' DE92 001926 New Tools Using the Hardware Pertbrmaiihe... Monitor to Help Users Tune Programs on the Cray X-MP* D. E. Engert, L. Rudsinski J. Doak Argonne National Laboratory Cray Research
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
INTRODUCTION TO OBJECTIVE-C CSCI 4448/5448: OBJECT-ORIENTED ANALYSIS & DESIGN LECTURE 12 09/29/2011
INTRODUCTION TO OBJECTIVE-C CSCI 4448/5448: OBJECT-ORIENTED ANALYSIS & DESIGN LECTURE 12 09/29/2011 1 Goals of the Lecture Present an introduction to Objective-C 2.0 Coverage of the language will be INCOMPLETE
Microsoft Windows PowerShell v2 For Administrators
Course 50414B: Microsoft Windows PowerShell v2 For Administrators Course Details Course Outline Module 1: Introduction to PowerShell the Basics This module explains how to install and configure PowerShell.
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
