Programming with Big Data in R
|
|
|
- Bartholomew McDonald
- 10 years ago
- Views:
Transcription
1 Programming with Big Data in R Drew Schmidt and George Ostrouchov user! pbdr Core Team Programming with Big Data in R
2 The pbdr Core Team Wei-Chen Chen 1 George Ostrouchov 2,3 Pragneshkumar Patel 3 Drew Schmidt 3 Support This work used resources of National Institute for Computational Sciences at the University of Tennessee, Knoxville, which is supported by the Office of Cyberinfrastructure of the U.S. National Science Foundation under Award No. ARRA-NSF-OCI for NICS-RDAV center. This work also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR Department of Ecology and Evolutionary Biology University of Tennessee, Knoxville TN, USA 2 Computer Science and Mathematics Division Oak Ridge National Laboratory, Oak Ridge TN, USA 3 Joint Institute for Computational Sciences University of Tennessee, Knoxville TN, USA pbdr Core Team Programming with Big Data in R
3 About This Presentation Downloads This presentation is available at: pbdr Core Team Programming with Big Data in R
4 About This Presentation Installation Instructions Installation instructions for setting up a pbdr environment are available: This includes instructions for installing R, MPI, and pbdr. pbdr Core Team Programming with Big Data in R
5 Contents 1 Introduction 2 Profiling and Benchmarking 3 The pbdr Project 4 Introduction to pbdmpi 5 The Generalized Block Distribution 6 Basic Statistics Examples 7 Data Input 8 Introduction to pbddmat and the ddmatrix Structure 9 Examples Using pbddmat 10 MPI Profiling 11 Wrapup pbdr Core Team Programming with Big Data in R
6 Contents Introduction 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary pbdr Core Team Programming with Big Data in R
7 Introduction A Concise Introduction to Parallelism 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary pbdr Core Team Programming with Big Data in R
8 Introduction A Concise Introduction to Parallelism Parallelism Serial Programming Parallel Programming pbdr Core Team Programming with Big Data in R 1/131
9 Introduction A Concise Introduction to Parallelism Difficulty in Parallelism 1 Implicit parallelism: Parallel details hidden from user 2 Explicit parallelism: Some assembly required... 3 Embarrassingly Parallel: Also called loosely coupled. Obvious how to make parallel; lots of independence in computations. 4 Tightly Coupled: Opposite of embarrassingly parallel; lots of dependence in computations. pbdr Core Team Programming with Big Data in R 2/131
10 Introduction A Concise Introduction to Parallelism Speedup Wallclock Time: Time of the clock on the wall from start to finish Speedup: unitless measure of improvement; more is better. S n1,n 2 = Run time for n 1 cores Run time for n 2 cores n 1 is often taken to be 1 In this case, comparing parallel algorithm to serial algorithm pbdr Core Team Programming with Big Data in R 3/131
11 Introduction A Concise Introduction to Parallelism Good Speedup Bad Speedup Speedup 4 group Application Optimal Speedup 4 group Application Optimal Cores Cores pbdr Core Team Programming with Big Data in R 4/131
12 Introduction A Concise Introduction to Parallelism Scalability and Benchmarking 1 Strong: Fixed total problem size. Less work per core as more cores are added. 2 Weak: Fixed local (per core) problem size. Same work per core as more cores are added. pbdr Core Team Programming with Big Data in R 5/131
13 Introduction A Concise Introduction to Parallelism Good Strong Scaling Good Weak Scaling Speedup 30 Time Cores Cores pbdr Core Team Programming with Big Data in R 6/131
14 Introduction A Concise Introduction to Parallelism Shared and Distributed Memory Machines Shared Memory Direct access to read/change memory (one node) Distributed No direct access to read/change memory (many nodes); requires communication pbdr Core Team Programming with Big Data in R 7/131
15 Introduction A Concise Introduction to Parallelism Shared and Distributed Memory Machines Shared Memory Machines Thousands of cores Distributed Memory Machines Hundreds of thousands of cores Nautilus, University of Tennessee 1024 cores 4 TB RAM Kraken, University of Tennessee 112,896 cores 147 TB RAM pbdr Core Team Programming with Big Data in R 8/131
16 Introduction A Quick Overview of Parallel Hardware 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary pbdr Core Team Programming with Big Data in R
17 Introduction Three Basic Flavors of Hardware A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory pbdr Core Team Programming with Big Data in R 9/131
18 Your Laptop or Desktop Introduction A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory pbdr Core Team Programming with Big Data in R 10/131
19 A Server or Cluster Introduction A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory pbdr Core Team Programming with Big Data in R 11/131
20 Server to Supercomputer Introduction A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory pbdr Core Team Programming with Big Data in R 12/131
21 Knowing the Right Words PROC + cache Cluster Distributing Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Shared Memory Introduction A Quick Overview of Parallel Hardware Co-Processor GPU or MIC Local Memory Offloading Multicore GPU or Manycore CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory Multithreading pbdr Core Team Programming with Big Data in R 13/131
22 Introduction A Quick Overview of Parallel Software 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary pbdr Core Team Programming with Big Data in R
23 Introduction A Quick Overview of Parallel Software Native Programming Models and Tools Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC pbdr Core Team Programming with Big Data in R 14/131
24 Introduction A Quick Overview of Parallel Software 30+ Years of Parallel Computing Research Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC pbdr Core Team Programming with Big Data in R 15/131
25 Introduction Last 10 years of Advances A Quick Overview of Parallel Software Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC pbdr Core Team Programming with Big Data in R 16/131
26 Introduction Putting It All Together Challenge A Quick Overview of Parallel Software Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC pbdr Core Team Programming with Big Data in R 17/131
27 Introduction R Interfaces to Native Tools A Quick Overview of Parallel Software Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop RHadoop snow Rmpi pbdmpi PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data CORE + cache Shared Memory CORE CORE CORE + cache + cache + cache Network Memory GPU or MIC Local Memory GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel CUDA OpenCL OpenACC OpenMP Pthreads fork OpenMP OpenACC Foreign Langauge Interfaces:.C.Call Rcpp OpenCL inline... snow + multicore = parallel multicore pbdr Core Team Programming with Big Data in R 18/131
28 Introduction Summary 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary pbdr Core Team Programming with Big Data in R
29 Introduction Summary Summary Three flavors of hardware Distributed is stable Multicore and co-processor are evolving Two memory models Distributed works in multicore Parallelism hierarchy Medium to big machines have all three pbdr Core Team Programming with Big Data in R 19/131
30 Contents Profiling and Benchmarking 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary pbdr Core Team Programming with Big Data in R
31 Profiling and Benchmarking Why Profile? 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary pbdr Core Team Programming with Big Data in R
32 Profiling and Benchmarking Why Profile? Performance and Accuracy Sometimes π = 3.14 is (a) infinitely faster than the correct answer and (b) the difference between the correct and the wrong answer is meaningless.... The thing is, some specious value of correctness is often irrelevant because it doesn t matter. While performance almost always matters. And I absolutely detest the fact that people so often dismiss performance concerns so readily. Linus Torvalds, August 8, pbdr Core Team Programming with Big Data in R 20/131
33 Profiling and Benchmarking Why Profile? Why Profile? Because performance matters. Bad practices scale up! Your bottlenecks may surprise you. Because R is dumb. R users claim to be data people... so act like it! pbdr Core Team Programming with Big Data in R 21/131
34 Profiling and Benchmarking Why Profile? Compilers often correct bad behavior... A Really Dumb Loop int main (){ int x, i; for (i =0; i <10; i ++) x = 1; return 0; } main : # BB #0: clang -O3 example.c. cfi _ startproc xorl %eax, % eax ret clang example.c main :. cfi _ startproc # BB #0: movl $0, -4(% rsp ) movl $0, -12(% rsp ). LBB0 _1: cmpl $10, -12(% rsp ) jge. LBB0 _4 # BB #2: movl $1, -8(% rsp ) # BB #3: movl -12(% rsp ), % eax addl $1, % eax movl %eax, -12(% rsp ) jmp. LBB0 _1. LBB0 _4: movl $0, % eax ret pbdr Core Team Programming with Big Data in R 22/131
35 R will not! Profiling and Benchmarking Why Profile? Dumb Loop 1 for (i in 1:n){ 2 ta <- t(a) 3 Y <- ta %*% Q 4 Q <- qr.q(qr(y)) 5 Y <- A %*% Q 6 Q <- qr.q(qr(y)) 7 } 8 9 Q 1 ta <- t(a) 2 Better Loop 3 for (i in 1:n){ 4 Y <- ta %*% Q 5 Q <- qr.q(qr(y)) 6 Y <- A %*% Q 7 Q <- qr.q(qr(y)) 8 } 9 10 Q pbdr Core Team Programming with Big Data in R 23/131
36 Profiling and Benchmarking Example from a Real R Package Why Profile? Exerpt from Original function 1 while (i <=N){ 2 for (j in 1:i){ 3 d.k <- as.matrix(x)[l==j,l==j] 4... Exerpt from Modified function 1 x.mat <- as.matrix(x) 2 3 while (i <=N){ 4 for (j in 1:i){ 5 d.k <- x.mat[l==j,l==j] 6... By changing just 1 line of code, performance of the main method improved by over 350%! pbdr Core Team Programming with Big Data in R 24/131
37 Profiling and Benchmarking Why Profile? Some Thoughts R is slow. Bad programmers are slower. R isn t very clever (compared to a compiler). The Bytecode compiler helps, but not nearly as much as a compiler. pbdr Core Team Programming with Big Data in R 25/131
38 Profiling and Benchmarking Profiling R Code 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary pbdr Core Team Programming with Big Data in R
39 Profiling and Benchmarking Profiling R Code Timings Getting simple timings as a basic measure of performance is easy, and valuable. system.time() timing blocks of code. Rprof() timing execution of R functions. Rprofmem() reporting memory allocation in R. tracemem() detect when a copy of an R object is created. The rbenchmark package Benchmark comparisons. pbdr Core Team Programming with Big Data in R 26/131
40 Profiling and Benchmarking Advanced R Profiling 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary pbdr Core Team Programming with Big Data in R
41 Profiling and Benchmarking Advanced R Profiling Other Profiling Tools perf (Linux) PAPI MPI profiling: fpmpi, mpip, TAU pbdr Core Team Programming with Big Data in R 27/131
42 Profiling and Benchmarking Advanced R Profiling Profiling MPI Codes with pbdprof 1. Rebuild pbdr packages R CMD INSTALL pbdmpi _ tar. gz \ -- configure - args = \ " --enable - pbdprof " 2. Run code mpirun - np 64 Rscript my_ script. R 3. Analyze results 1 library ( pbdprof ) 2 prof <- read. prof ( " output. mpip ") 3 plot (prof, plot. type =" messages2 ") pbdr Core Team Programming with Big Data in R 28/131
43 Profiling and Benchmarking Advanced R Profiling Profiling with pbdpapi Performance Application Programming Interface High and low level interfaces Linux only :( Function system.flips() system.flops() system.cache() system.epc() system.idle() system.cpuormem() system.utilization() Description of Measurement Time, floating point instructions, and Mflips Time, floating point operations, and Mflops Cache misses, hits, accesses, and reads Events per cycle Idle cycles CPU or RAM bound CPU utilization pbdr Core Team Programming with Big Data in R 29/131
44 Profiling and Benchmarking Summary 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary pbdr Core Team Programming with Big Data in R
45 Profiling and Benchmarking Summary Summary Profile, profile, profile. Use system.time() to get a general sense of a method. Use rbenchmark s benchmark() to compare 2 methods. Use Rprof() for more detailed profiling. Other tools exist for more hardcore applications (pbdpapi and pbdprof). pbdr Core Team Programming with Big Data in R 30/131
46 Contents The pbdr Project 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary pbdr Core Team Programming with Big Data in R
47 The pbdr Project The pbdr Project 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary pbdr Core Team Programming with Big Data in R
48 The pbdr Project The pbdr Project Programming with Big Data in R (pbdr) Striving for Productivity, Portability, Performance Free a R packages. Bridging high-performance compiled code with high-productivity of R Scalable, big data analytics. Offers implicit and explicit parallelism. Methods have syntax identical to R. a MPL, BSD, and GPL licensed pbdr Core Team Programming with Big Data in R 31/131
49 The pbdr Project The pbdr Project pbdr Packages pbdr Core Team Programming with Big Data in R 32/131
50 The pbdr Project The pbdr Project pbdr Motivation Why HPC libraries (MPI, ScaLAPACK, PETSc,... )? The HPC community has been at this for decades. They re tested. They work. They re fast. You re not going to beat Jack Dongarra at dense linear algebra. pbdr Core Team Programming with Big Data in R 33/131
51 The pbdr Project The pbdr Project Simple Interface for MPI Operations with pbdmpi Rmpi 1 # int 2 mpi. allreduce (x, type =1) 3 # double 4 mpi. allreduce (x, type =2) 1 allreduce ( x) pbdmpi Types in R 1 > is. integer (1) 2 [1] FALSE 3 > is. integer (2) 4 [1] FALSE 5 > is. integer (1:2) 6 [1] TRUE pbdr Core Team Programming with Big Data in R 34/131
52 The pbdr Project The pbdr Project Distributed Matrices and Statistics with pbddmat Matrix Exponentiation Speedup (Relative to Serial Code) Nodes Cores 1 library ( pbddmat ) 2 3 dx <- ddmatrix (" rnorm ", 5000, 5000) 4 expm (dx) pbdr Core Team Programming with Big Data in R 35/131
53 The pbdr Project The pbdr Project Distributed Matrices and Statistics with pbddmat Least Squares Benchmark 125 Fitting y~x With Fixed Local Size of ~43.4 MiB Predictors Run Time (Seconds) Cores x < d d m a t r i x ( rnorm, nrow=m, n c o l=n ) y < d d m a t r i x ( rnorm, nrow=m, n c o l =1) mdl < lm. f i t ( x=x, y=y ) pbdr Core Team Programming with Big Data in R 36/131
54 The pbdr Project The pbdr Project Getting Started with HPC for R Users: pbddemo 140 page, textbook-style vignette. Over 30 demos, utilizing all packages. Not just a hello world! Demos include: PCA Regression Parallel data input Model-based clustering Simple Monte Carlo simulation Bayesian MCMC pbdr Core Team Programming with Big Data in R 37/131
55 The pbdr Project pbdr Connects R to HPC Libraries 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary pbdr Core Team Programming with Big Data in R
56 The pbdr Project pbdr Connects R to HPC Libraries R and pbdr Interfaces to HPC Libraries pbddemo Distributed Memory ScaLAPACK Interconnection Network PETSc PBLAS BLACS Trilinos PROC PROC PROC PROC + cache + cache + cache + cache pbdmpi pbddmat MPI Mem Mem Mem Mem Co-Processor pbdprof pbdpapi Tau mpip fpmpi PAPI CORE + cache MKL LibSci Shared Memory ACML CORE + cache Network Memory CORE + cache CORE + cache DPLASMA PLASMA HiPLARM MAGMA GPU or MIC magma Local Memory GPU: Graphical Processing Unit MIC: Many Integrated Core cublas NetCDF4 pbdncdf4 ADIOS pbdadios pbdr Core Team Programming with Big Data in R 38/131
57 The pbdr Project Using pbdr 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary pbdr Core Team Programming with Big Data in R
58 The pbdr Project Using pbdr pbdr Paradigms pbdr programs are R programs! Differences: Batch execution (non-interactive). Parallel code utilizes Single Program/Multiple Data (SPMD) style Emphasizes data parallelism. pbdr Core Team Programming with Big Data in R 39/131
59 The pbdr Project Using pbdr Batch Execution Running a serial R program in batch: 1 Rscript my_ script. r or 1 R CMD BATCH my_ script. r Running a parallel (with MPI) R program in batch: 1 mpirun - np 2 Rscript my_ par _ script. r pbdr Core Team Programming with Big Data in R 40/131
60 The pbdr Project Using pbdr Single Program/Multiple Data (SPMD) SPMD is a programming paradigm. Not to be confused with SIMD. Paradigms Programming models OOP, Functional, SPMD,... SIMD Hardware instructions MMX, SSE,... pbdr Core Team Programming with Big Data in R 41/131
61 The pbdr Project Using pbdr Single Program/Multiple Data (SPMD) SPMD is arguably the simplest extension of serial programming. Only one program is written, executed in batch on all processors. Different processors are autonomous; there is no manager. Dominant programming model for large machines for 30 years. pbdr Core Team Programming with Big Data in R 42/131
62 The pbdr Project Summary Summary pbdr connects R to scalable HPC libraries. The pbddemo package offers numerous examples and explanations for getting started with distributed R programming. pbdr programs are R programs. pbdr Core Team Programming with Big Data in R 43/131
63 Contents Introduction to pbdmpi 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary pbdr Core Team Programming with Big Data in R
64 Introduction to pbdmpi Managing a Communicator 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary pbdr Core Team Programming with Big Data in R
65 Introduction to pbdmpi Managing a Communicator Message Passing Interface (MPI) MPI : Standard for managing communications (data and instructions) between different nodes/computers. Implementations: OpenMPI, MPICH2, Cray MPT,... Enables parallelism (via communication) on distributed machines. Communicator: manages communications between processors. pbdr Core Team Programming with Big Data in R 44/131
66 Introduction to pbdmpi Managing a Communicator MPI Operations (1 of 2) Managing a Communicator: Create and destroy communicators. init() initialize communicator finalize() shut down communicator(s) Rank query: determine the processor s position in the communicator. comm.rank() who am I? comm.size() how many of us are there? Printing: Printing output from various ranks. comm.print(x) comm.cat(x) WARNING: only use these functions on results, never on yet-to-be-computed things. pbdr Core Team Programming with Big Data in R 45/131
67 Introduction to pbdmpi Managing a Communicator Quick Example 1 1 library ( pbdmpi, quietly = TRUE ) 2 init () 3 Rank Query: 1 rank.r 4 my. rank <- comm. rank () 5 comm. print (my.rank, all. rank = TRUE ) 6 7 finalize () Execute this script via: 1 mpirun - np 2 Rscript 1_ rank. r Sample Output: 1 COMM. RANK = 0 2 [1] 0 3 COMM. RANK = 1 4 [1] 1 pbdr Core Team Programming with Big Data in R 46/131
68 Introduction to pbdmpi Managing a Communicator Quick Example 2 1 library ( pbdmpi, quietly = TRUE ) 2 init () 3 4 comm. print (" Hello, world ") 5 Hello World: 2 hello.r 6 comm. print (" Hello again ", all. rank =TRUE, quietly = TRUE ) 7 8 finalize () Execute this script via: 1 mpirun - np 2 Rscript 2_ hello. r Sample Output: 1 COMM. RANK = 0 2 [1] " Hello, world " 3 [1] " Hello again " 4 [1] " Hello again " pbdr Core Team Programming with Big Data in R 47/131
69 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary pbdr Core Team Programming with Big Data in R
70 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier MPI Operations 1 Reduce 2 Gather 3 Broadcast 4 Barrier pbdr Core Team Programming with Big Data in R 48/131
71 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Reductions Combine results into single result pbdr Core Team Programming with Big Data in R 49/131
72 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Gather Many-to-one pbdr Core Team Programming with Big Data in R 50/131
73 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Broadcast One-to-many pbdr Core Team Programming with Big Data in R 51/131
74 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Barrier Synchronization pbdr Core Team Programming with Big Data in R 52/131
75 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier MPI Operations (2 of 2) Reduction: each processor has a number x; add all of them up, find the largest/smallest,.... reduce(x, op= sum ) reduce to one allreduce(x, op= sum ) reduce to all Gather: each processor has a number; create a new object on some processor containing all of those numbers. gather(x) gather to one allgather(x) gather to all Broadcast: one processor has a number x that every other processor should also have. bcast(x) Barrier: computation wall ; no processor can proceed until all processors can proceed. barrier() pbdr Core Team Programming with Big Data in R 53/131
76 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Quick Example 3 1 library ( pbdmpi, quietly=true ) 2 init () 3 4 comm. set. seed ( diff = TRUE ) 5 6 n <- sample (1:10, size =1) 7 8 gt <- gather (n) 9 comm. print ( unlist ( gt)) sm <- allreduce (n, op= sum ) 12 comm. print (sm, all. rank=t) finalize () Reduce and Gather: 3 gt.r Execute this script via: 1 mpirun -np 2 Rscript 3_ gt.r Sample Output: 1 COMM. RANK = 0 2 [1] COMM. RANK = 0 4 [1] 10 5 COMM. RANK = 1 6 [1] 10 pbdr Core Team Programming with Big Data in R 54/131
77 Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Quick Example 4 1 library ( pbdmpi, quietly =T) 2 init () 3 4 if ( comm. rank () ==0) { 5 x <- matrix (1:4, nrow =2) 6 } else { 7 x <- NULL 8 } 9 10 y <- bcast (x, rank. source =0) comm. print (y, rank =1) finalize () Broadcast: 4 bcast.r Execute this script via: 1 mpirun -np 2 Rscript 4_ bcast.r Sample Output: 1 COMM. RANK = 1 2 [,1] [,2] 3 [1,] [2,] pbdr Core Team Programming with Big Data in R 55/131
78 Introduction to pbdmpi Other pbdmpi Tools 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary pbdr Core Team Programming with Big Data in R
79 Introduction to pbdmpi Other pbdmpi Tools Random Seeds pbdmpi offers a simple interface for managing random seeds: comm.set.seed(seed=1234, diff=false) All processors use the same seed. comm.set.seed(seed=1234, diff=false) All processors use the same seed. pbdr Core Team Programming with Big Data in R 56/131
80 Introduction to pbdmpi Other pbdmpi Tools Other Helper Tools pbdmpi Also contains useful tools for Manager/Worker and task parallelism codes: Task Subsetting: Distributing a list of jobs/tasks get.jid(n) *ply: Functions in the *ply family. pbdapply(x, MARGIN, FUN,...) analogue of apply() pbdlapply(x, FUN,...) analogue of lapply() pbdsapply(x, FUN,...) analogue of sapply() pbdr Core Team Programming with Big Data in R 57/131
81 Introduction to pbdmpi Summary 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary pbdr Core Team Programming with Big Data in R
82 Introduction to pbdmpi Summary Summary Start by loading the package: 1 library ( pbdmpi, quiet = TRUE ) Always initialize before starting and finalize when finished: 1 init () 2 3 # finalize () pbdr Core Team Programming with Big Data in R 58/131
83 Contents GBD 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary pbdr Core Team Programming with Big Data in R
84 GBD GBD: a Way to Distribute Your Data 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary pbdr Core Team Programming with Big Data in R
85 GBD GBD: a Way to Distribute Your Data Distributing Data Problem: How to distribute the data x = x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10,3 10 3? pbdr Core Team Programming with Big Data in R 59/131
86 GBD GBD: a Way to Distribute Your Data Distributing a Matrix Across 4 Processors: Block Distribution x = Data x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10, Processors pbdr Core Team Programming with Big Data in R 60/131
87 GBD GBD: a Way to Distribute Your Data Distributing a Matrix Across 4 Processors: Local Load Balance x = Data x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10, Processors pbdr Core Team Programming with Big Data in R 61/131
88 GBD GBD: a Way to Distribute Your Data The GBD Data Structure Throughout the examples, we will make use of the Generalized Block Distribution, or GBD distributed matrix structure. 1 GBD is distributed. No processor owns all the data. 2 GBD is non-overlapping. Rows uniquely assigned to processors. 3 GBD is row-contiguous. If a processor owns one element of a row, it owns the entire row. 4 GBD is globally row-major, locally column-major. 5 GBD is often locally balanced, where each processor owns (almost) the same amount of data. But this is not required. x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10,3 6 The last row of the local storage of a processor is adjacent (by global row) to the first row of the local storage of next processor (by communicator number) that owns data. 7 GBD is (relatively) easy to understand, but can lead to bottlenecks if you have many more columns than rows. pbdr Core Team Programming with Big Data in R 62/131
89 GBD Example GBD Distributions 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary pbdr Core Team Programming with Big Data in R
90 GBD Example GBD Distributions Understanding GBD: Global Matrix x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processors = pbdr Core Team Programming with Big Data in R 63/131
91 GBD Example GBD Distributions Understanding GBD: Load Balanced GBD x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processors = pbdr Core Team Programming with Big Data in R 64/131
92 GBD Example GBD Distributions Understanding GBD: Local View [ x11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 [ x31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 [ x51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 ] ] ] x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 [ ] x71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x [ x81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 ] 1 9 [ x91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 ] 1 9 Processors = pbdr Core Team Programming with Big Data in R 65/131
93 GBD Example GBD Distributions Understanding GBD: Non-Balanced GBD x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processors = pbdr Core Team Programming with Big Data in R 66/131
94 GBD Example GBD Distributions Understanding GBD: Local View [ ] x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 [ x51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 [ ] x71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 ] [ ] [ x81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 ] 0 9 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processors = pbdr Core Team Programming with Big Data in R 67/131
95 GBD Summary 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary pbdr Core Team Programming with Big Data in R
96 GBD Summary Summary Need to distribute your data? Try splitting by row. May not work well if your data is square (or longer than tall). pbdr Core Team Programming with Big Data in R 68/131
97 Contents Basic Statistics Examples 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary pbdr Core Team Programming with Big Data in R
98 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary pbdr Core Team Programming with Big Data in R
99 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Example 1: Monte Carlo Simulation Sample N uniform observations (x i, y i ) in the unit square [0, 1] [0, 1]. Then π 4 ( # Inside Circle # Total ) = 4 ( # Blue # Blue + # Red ) pbdr Core Team Programming with Big Data in R 69/131
100 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Example 1: Monte Carlo Simulation GBD Algorithm 1 Let n be big-ish; we ll take n = 50, Generate an n 2 matrix x of standard uniform observations. 3 Count the number of rows satisfying x 2 + y Ask everyone else what their answer is; sum it all up. 5 Take this new answer, multiply by 4 and divide by n 6 If my rank is 0, print the result. pbdr Core Team Programming with Big Data in R 70/131
101 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Example 1: Monte Carlo Simulation Code Serial Code 1 N < X <- matrix ( runif (N * 2), ncol =2) 3 r <- sum ( rowsums (X ^2) <= 1) 4 PI <- 4*r/N 5 print (PI) Parallel Code 1 library ( pbdmpi, quiet = TRUE ) 2 init () 3 comm. set. seed ( seed = , diff = TRUE ) 4 5 N. gbd < / comm. size () 6 X. gbd <- matrix ( runif ( N. gbd * 2), ncol = 2) 7 r. gbd <- sum ( rowsums (X. gbd ^2) <= 1) 8 r <- allreduce (r. gbd ) 9 PI <- 4*r/(N. gbd * comm. size ()) 10 comm. print (PI) finalize () pbdr Core Team Programming with Big Data in R 71/131
102 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Note For the remainder, we will exclude loading, init, and finalize calls. pbdr Core Team Programming with Big Data in R 72/131
103 Basic Statistics Examples pbdmpi Example: Sample Covariance 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary pbdr Core Team Programming with Big Data in R
104 Basic Statistics Examples pbdmpi Example: Sample Covariance Example 2: Sample Covariance cov(x n p ) = 1 n 1 n (x i µ x ) (x i µ x ) T i=1 pbdr Core Team Programming with Big Data in R 73/131
105 Basic Statistics Examples pbdmpi Example: Sample Covariance Example 2: Sample Covariance GBD Algorithm 1 Determine the total number of rows N. 2 Compute the vector of column means of the full matrix. 3 Subtract each column s mean from that column s entries in each local matrix. 4 Compute the crossproduct locally and reduce. 5 Divide by N 1. pbdr Core Team Programming with Big Data in R 74/131
106 Basic Statistics Examples pbdmpi Example: Sample Covariance Example 2: Sample Covariance Code 1 N <- nrow ( X) 2 mu <- colsums ( X) / N 3 Serial Code 4 X <- sweep (X, STATS =mu, MARGIN =2) 5 Cov. X <- crossprod ( X) / (N -1) 6 7 print ( Cov.X) Parallel Code 1 N <- allreduce ( nrow (X. gbd ), op=" sum ") 2 mu <- allreduce ( colsums (X. gbd ) / N, op=" sum ") 3 4 X. gbd <- sweep (X.gbd, STATS =mu, MARGIN =2) 5 Cov.X <- allreduce ( crossprod (X. gbd ), op=" sum ") / (N -1) 6 7 comm. print ( Cov.X) pbdr Core Team Programming with Big Data in R 75/131
107 Basic Statistics Examples pbdmpi Example: Linear Regression 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary pbdr Core Team Programming with Big Data in R
108 Basic Statistics Examples pbdmpi Example: Linear Regression Example 3: Linear Regression Find β such that y = Xβ + ɛ When X is full rank, ˆβ = (X T X) 1 X T y pbdr Core Team Programming with Big Data in R 76/131
109 Basic Statistics Examples pbdmpi Example: Linear Regression Example 3: Linear Regression GBD Algorithm 1 Locally, compute tx = x T 2 Locally, compute A = tx x. Query every other processor for this result and sum up all the results. 3 Locally, compute B = tx y. Query every other processor for this result and sum up all the results. 4 Locally, compute A 1 B pbdr Core Team Programming with Big Data in R 77/131
110 Basic Statistics Examples pbdmpi Example: Linear Regression Example 3: Linear Regression Code 1 tx <- t( X) 2 A <- tx %*% X 3 B <- tx %*% y 4 5 ols <- solve (A) %*% B Serial Code Parallel Code 1 tx.gbd <- t(x. gbd ) 2 A <- allreduce (tx.gbd %*% X.gbd, op = " sum ") 3 B <- allreduce (tx.gbd %*% y.gbd, op = " sum ") 4 5 ols <- solve (A) %*% B pbdr Core Team Programming with Big Data in R 78/131
111 Basic Statistics Examples Summary 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary pbdr Core Team Programming with Big Data in R
112 Basic Statistics Examples Summary Summary SPMD programming is (often) a natural extension of serial programming. More pbdmpi examples in pbddemo. pbdr Core Team Programming with Big Data in R 79/131
113 Contents Data Input 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary pbdr Core Team Programming with Big Data in R
114 Data Input Cluster Computer and File System 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary pbdr Core Team Programming with Big Data in R
115 Data Input One Node to One Storage Server Cluster Computer and File System Cluster computer File System Storage Server Disk Compute Nodes pbdr Core Team Programming with Big Data in R 80/131
116 Data Input Many Nodes to One Storage Server Cluster Computer and File System Cluster computer File System Storage Server Disk Compute Nodes pbdr Core Team Programming with Big Data in R 81/131
117 Data Input Many Nodes to Few Storage Servers Cluster Computer and File System Cluster computer Parallel File System Compute Nodes Storage Servers Disk pbdr Core Team Programming with Big Data in R 82/131
118 Data Input Cluster Computer and File System Few Nodes to Few Storage Servers - Default Cluster computer Parallel File System Compute Nodes I/O Nodes Storage Servers Disk pbdr Core Team Programming with Big Data in R 83/131
119 Data Input Cluster Computer and File System Few Nodes to Few Storage Servers - Coordinated Cluster computer Parallel File System Compute Nodes I/O Nodes Storage Servers Disk pbdadios ADIOS pbdr Core Team Programming with Big Data in R 84/131
120 Data Input Serial Data Input 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary pbdr Core Team Programming with Big Data in R
121 Data Input Serial Data Input Separate manual: scan() read.table() read.csv() socket pbdr Core Team Programming with Big Data in R 85/131
122 Data Input Serial Data Input CSV Data: Read Serial then Distribute Listing : 1 library ( pbddmat ) 2 if( comm. rank () == 0) { # only read on process 0 3 x <- read. csv (" myfile. csv ") 4 } else { 5 x <- NULL 6 } 7 8 dx <- as. ddmatrix ( x) pbdr Core Team Programming with Big Data in R 86/131
123 Data Input Parallel Data Input 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary pbdr Core Team Programming with Big Data in R
124 Data Input Parallel Data Input New Issues How to read in parallel? CSV, SQL, NetCDF4, HDF, ADIOS, custom binary How to partition data across nodes? How to structure for scalable libraries? Read directly into form needed or restructure?... A lot of work needed here! pbdr Core Team Programming with Big Data in R 87/131
125 Data Input Parallel Data Input CSV Data 1 x <- read. csv ("x. csv ") 2 3 x Serial Code 1 library ( pbddemo, quiet = TRUE ) 2 init. grid () 3 Parallel Code 4 dx <- read. csv. ddmatrix ("x. csv ", header =TRUE, sep =",", 5 nrows =10, ncols =10, num. rdrs =2, ICTXT =0) 6 7 dx 8 9 finalize () pbdr Core Team Programming with Big Data in R 88/131
126 Data Input Parallel Data Input Binary Data: Vector 1 ## set up start and length for reading a vector of n doubles 2 size <- 8 # bytes 3 4 my_ids <- get. jid (n, method =" block ") 5 6 my_ start <- (my_ids [1] - 1)* size 7 my_ length <- length ( my_ ids ) 8 9 con <- file (" binary. vector. file ", "rb") 10 seekval <- seek (con, where =my_start, rw=" read ") 11 x <- readbin (con, what =" double ", n=my_length, size = size ) pbdr Core Team Programming with Big Data in R 89/131
127 Data Input Parallel Data Input Binary Data: Matrix 1 ## read an nrow by ncol matrix of doubles split by columns 2 size <- 8 # bytes 3 4 my_ids <- get. jid (ncol, method =" block ") 5 my_ ncol <- length (my_ids ) 6 my_ start <- ( my_ ids [1] - 1)*nrow* size 7 my_ length <- my_ ncol *nrow 8 9 con <- file (" binary. matrix. file ", "rb") 10 seekval <- seek (con, where =my_start, rw=" read ") 11 x <- readbin (con, what =" double ", n=my_length, size = size ) ## glue together as a column - block ddmatrix 14 gdim <- c( nrow, ncol ) 15 ldim <- c( nrow, my_ ncol ) 16 bldim <- c(nrow, allreduce (my_ncol, op=" max ")) 17 X <- new (" ddmatrix ", Data = matrix (x, nrow, my_ ncol ), 18 dim =gdim, ldim =ldim, bldim = bldim, ICTXT =1) ## redistribute for ScaLAPACK s block - cyclic 21 X <- redistribute (X, bldim =c(2, 2), ICTXT =0) 22 Xprc <- prcomp ( X) pbdr Core Team Programming with Big Data in R 90/131
128 Data Input Parallel Data Input NetCDF4 Data 1 ### parallel read after determining start and length 2 nc <- nc_ open _ par ( file _ name ) 3 4 nc_var _ par _ access (nc, " variable _ name ") 5 new.x <- ncvar _ get (nc, " variable _ name ", start, length ) 6 nc_ close (nc) 7 8 finalize () pbdr Core Team Programming with Big Data in R 91/131
129 Data Input Summary 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary pbdr Core Team Programming with Big Data in R
130 Data Input Summary Summary Mostly do it yourself Parallel file system for big data Binary files for true parallel reads Know number of readers vs number of storage servers Redistribution help from ddmatrix functions More help under development pbdr Core Team Programming with Big Data in R 92/131
131 Introduction to pbddmat and the ddmatrix Structure Contents 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary pbdr Core Team Programming with Big Data in R
132 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary pbdr Core Team Programming with Big Data in R
133 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Distributed Matrices You can only get so far with one node Seconds Elapsed time 100 Script Eigenvalues of a 10000x10000 random matrix Inverse of a 10000x10000 random matrix Linear regression over a 10000x10000 matrix (c = a \\ b') 10000x10000 cross product matrix (b = a' * a) Determinant of a 10000x10000 random matrix Cholesky decomposition of a 10000x10000 matrix Cores The solution is to distribute the data. pbdr Core Team Programming with Big Data in R 93/131
134 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Distributed Matrices (a) Block (b) Cyclic (c) Block-Cyclic Figure : Matrix Distribution Schemes pbdr Core Team Programming with Big Data in R 94/131
135 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Distributed Matrices (a) 2d Block (b) 2d Cyclic (c) 2d Block-Cyclic Figure : Matrix Distribution Schemes Onto a 2-Dimensional Grid pbdr Core Team Programming with Big Data in R 95/131
136 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Processor Grid Shapes T [ (a) 1 6 (b) 2 3 ] (c) 3 2 Table : Processor Grid Shapes with 6 Processors (d) pbdr Core Team Programming with Big Data in R 96/131
137 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices The ddmatrix Class For distributed dense matrix objects, we use the special S4 class ddmatrix. Data dim ddmatrix = ldim bldim ICTXT The local submatrix (an R matrix) Dimension of the global matrix Dimension of the local submatrix Dimension of the blocks MPI Grid Context pbdr Core Team Programming with Big Data in R 97/131
138 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Understanding ddmatrix: Global Matrix x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x pbdr Core Team Programming with Big Data in R 98/131
139 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 1-dimensional Row Block x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 Processor grid = = (0,0) (1,0) (2,0) (3,0) pbdr Core Team Programming with Big Data in R 99/131
140 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 2-dimensional Row Block x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processor grid = = (0,0) (0,1) (1,0) (1,1) pbdr Core Team Programming with Big Data in R 100/131
141 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 1-dimensional Row Cyclic x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 Processor grid = = (0,0) (1,0) (2,0) (3,0) pbdr Core Team Programming with Big Data in R 101/131
142 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 2-dimensional Row Cyclic x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processor grid = = (0,0) (0,1) (1,0) (1,1) pbdr Core Team Programming with Big Data in R 102/131
143 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 2-dimensional Block-Cyclic x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processor grid = = (0,0) (0,1) (1,0) (1,1) pbdr Core Team Programming with Big Data in R 103/131
144 Introduction to pbddmat and the ddmatrix Structure pbddmat 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary pbdr Core Team Programming with Big Data in R
145 Introduction to pbddmat and the ddmatrix Structure pbddmat The ddmatrix Data Structure The more complicated the processor grid, the more complicated the distribution. pbdr Core Team Programming with Big Data in R 104/131
146 Introduction to pbddmat and the ddmatrix Structure pbddmat ddmatrix: 2-dimensional Block-Cyclic with 6 Processors x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x Processor grid = = (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) pbdr Core Team Programming with Big Data in R 105/131
147 Introduction to pbddmat and the ddmatrix Structure pbddmat Understanding ddmatrix: Local View x 11 x 12 x 17 x 18 x 21 x 22 x 27 x 28 x 51 x 52 x 57 x 58 x 61 x 62 x 67 x 68 x 91 x 92 x 97 x 98 x 31 x 32 x 37 x 38 x 41 x 42 x 47 x 48 x 71 x 72 x 77 x 78 x 81 x 82 x 87 x x 13 x 14 x 19 x 23 x 24 x 29 x 53 x 54 x 59 x 63 x 64 x 69 x 93 x 94 x 99 x 33 x 34 x 39 x 43 x 44 x 49 x 73 x 74 x 79 x 83 x 84 x x 15 x 16 x 25 x 26 x 55 x 56 x 65 x 66 x 95 x 96 x 35 x 36 x 45 x 46 x 75 x 76 x 85 x Processor grid = = (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) pbdr Core Team Programming with Big Data in R 106/131
148 Introduction to pbddmat and the ddmatrix Structure pbddmat The ddmatrix Data Structure 1 ddmatrix is distributed. No one processor owns all of the matrix. 2 ddmatrix is non-overlapping. Any piece owned by one processor is owned by no other processors. 3 ddmatrix can be row-contiguous or not, depending on the processor grid and blocking factor used. 4 ddmatrix is locally column-major and globally, it depends... x 11 x 12 x 13 x 14 x 15 x 21 x 22 x 23 x 24 x 25 x 31 x 32 x 33 x 34 x 35 x 41 x 42 x 43 x 44 x 45 x 51 x 52 x 53 x 54 x 55 x 61 x 62 x 63 x 64 x 65 x 71 x 72 x 73 x 74 x 75 x 81 x 82 x 83 x 84 x 85 x 91 x 92 x 93 x 94 x 95 6 GBD is a generalization of the one-dimensional block ddmatrix distribution. Otherwise there is no relation. 7 ddmatrix is confusing, but very robust. pbdr Core Team Programming with Big Data in R 107/131
149 Introduction to pbddmat and the ddmatrix Structure pbddmat Pros and Cons of This Data Structure Pros Robust for matrix computations. Cons Confusing layout. This is why we hide most of the distributed details. The details are there if you want them (you don t want them). pbdr Core Team Programming with Big Data in R 108/131
150 Introduction to pbddmat and the ddmatrix Structure pbddmat Methods for class ddmatrix pbddmat has over 100 methods with identical syntax to R: `[`, rbind(), cbind(),... lm.fit(), prcomp(), cov(),... `%*%`, solve(), svd(), norm(),... median(), mean(), rowsums(),... 1 cov (x) 1 cov (x) Serial Code Parallel Code pbdr Core Team Programming with Big Data in R 109/131
151 Introduction to pbddmat and the ddmatrix Structure pbddmat Comparing pbdmpi and pbddmat pbdmpi: MPI + sugar. GBD not the only structure pbdmpi can handle (just a useful convention). pbddmat: Distributed matrices + statistics. The ddmatrix structure must be used for pbddmat. If the data is not 2d block-cyclic compatible, ddmatrix will definitely give the wrong answer. pbdr Core Team Programming with Big Data in R 110/131
152 Introduction to pbddmat and the ddmatrix Structure Summary 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary pbdr Core Team Programming with Big Data in R
153 Introduction to pbddmat and the ddmatrix Structure Summary Summary 1 Start by loading the package: 1 library ( pbddmat, quiet = TRUE ) 2 Always initialize before starting and finalize when finished: 1 init. grid () 2 3 # finalize () pbdr Core Team Programming with Big Data in R 111/131
154 Contents Examples Using pbddmat 9 Examples Using pbddmat RandSVD Summary pbdr Core Team Programming with Big Data in R
155 Examples Using pbddmat RandSVD 9 Examples Using pbddmat RandSVD Summary pbdr Core Team Programming with Big Data in R
156 Examples Using pbddmat RandSVD Randomized SVD 1 PROBABILISTIC ALGORITHMS FOR MATRIX APPROXIMATION 227 Prototype for Randomized SVD Given an m n matrix A, a target number k of singular vectors, and an Serial R exponent q (say, q =1or q =2), this procedure computes an approximate rank-2k factorization UΣV,whereU and V are orthonormal, and Σ is 1 randsvd < f u n c t i o n (A, k, q=3) nonnegative and diagonal. 2 { Stage A: 244 N. HALKO, P. G. MARTINSSON, AND J. A. TROPP 3 ## Stage A 1 Generate an n 2k Gaussian test matrix Ω. 2 Form Y =(AA ) q AΩ by multiplying alternately with A and A. 4 Omega < matrix(rnorm(n*2*k), 3 Construct Algorithm a matrix Q4.3: whose Randomized columns formpower an orthonormal Iterationbasis for 5 nrow=n, ncol=2*k) Given thean range m of ny matrix. A and integers l and q, this algorithm computes an 6 Y < A % % Omega Stage m l orthonormal B: matrix Q whose range approximates the range of A. 7 Q < qr.q( qr (Y) ) 4 1 Form Draw B an= n Q la. Gaussian random matrix Ω. 2 Form the m l matrix Y =(AA 5 Compute an SVD of the small matrix: ) q 8 At < t (A) AΩ via B = ŨΣV alternating application. of A and 6 Set U = QŨ. A. 9 f o r ( i i n 1 : q ) 3 Construct an m l matrix Q whose columns form an orthonormal 10 { Note: The computation of Y in step 2 is vulnerable to round-off errors. basis for the range of Y, e.g., via the QR factorization Y = QR. 11 Y < At % % Q When high accuracy is required, we must incorporate an orthonormalization Note: This procedure is vulnerable to round-off step between each application of A and A errors; see Remark 4.3. The 12 Q < qr.q( qr (Y) ) ; see Algorithm 4.4. recommended implementation appears as Algorithm Y < A % % Q 14 Q < qr.q( qr (Y) ) The theoryalgorithm developed in4.4: thisrandomized paper provides Subspace much more Iteration detailed information15 } about Given the performance m n matrix of thea proto-algorithm. and integers l and q, this algorithm computes an 16 m When l orthonormal the singular matrix values Q whose of A decay range slightly, approximates the error the range A of QQA. A does 17 ## Stage B 1 not Draw depend an n onl the standard dimensions Gaussian of the matrix matrix Ω. (sections ). We can reduce the size of the bracket in the error bound (1.8) by combining18 B < t (Q) % % A 2 Form Y0 = AΩ and compute its QR factorization Y0 = Q0R0. 3 the for proto-algorithm j =1, 2,..., q with a power iteration (section 10.4). For an example, 19 U < La. svd (B) $u 4 see section Form 1.6 below. For the structured Qj 1 and compute its QR factorization Ỹj = Qj Rj. 20 U < Q % % U random matrices we mentioned in section 1.4.1, related 5 Form Yj = A error bounds are in and compute its QR factorization Yj = QjRj. 21 U[, 1 : k ] force (section 11). 6 end 22 } We can obtain inexpensive a posteriori error estimates to verify the quality 7 Q = Qq. of the approximation (section 4.3). 1 Halko 1.6. Example: N, Martinsson Randomized P-G and SVD. Tropp We conclude J A 2011 this Finding introduction structure with a with short randomness: probabilistic algorithms for constructing discussion Algorithm of how4.3 these targets ideas the allow fixed-rank us to perform problem. an approximate To addresssvd the fixed-precision of a large data approximate problem, matrix, which we can ismatrix aincorporate compelling decompositions the application error estimators ofsiam randomized described Rev. matrix 53 in section approximation 4.3 to obtain [113]. an adaptive The two-stage schemerandomized analogous with method Algorithm offers4.2. a natural In situations approach where to itsvd is critical computations. to near-optimal Unfortunately, approximation the simplesterrors, version oneof can this increase schemethe is oversampling inadequate inbeyond many achieve our applications standardbecause recommendation the singular l spectrum = k +5 all of the theinput way to matrix l =2k maywithout decay slowly. changing To address this difficulty, we incorporate q steps of a power pbdr iteration, Core Team where q =1or Programming with Big Data in R 112/131
157 Examples Using pbddmat RandSVD Randomized SVD Serial R 1 randsvd < f u n c t i o n (A, k, q=3) 2 { 3 ## Stage A 4 Omega < matrix(rnorm(n*2*k), 5 nrow=n, ncol=2*k) 6 Y < A % % Omega 7 Q < qr.q( qr (Y) ) 8 At < t (A) 9 f o r ( i i n 1 : q ) 10 { 11 Y < At % % Q 12 Q < qr.q( qr (Y) ) 13 Y < A % % Q 14 Q < qr.q( qr (Y) ) 15 } ## Stage B 18 B < t (Q) % % A 19 U < La. svd (B) $u 20 U < Q % % U 21 U[, 1 : k ] 22 } Parallel pbdr 1 randsvd < f u n c t i o n (A, k, q=3) 2 { 3 ## Stage A 4 Omega < ddmatrix( rnorm, 5 nrow=n, ncol=2*k) 6 Y < A % % Omega 7 Q < qr.q( qr (Y) ) 8 At < t (A) 9 f o r ( i i n 1 : q ) 10 { 11 Y < At % % Q 12 Q < qr.q( qr (Y) ) 13 Y < A % % Q 14 Q < qr.q( qr (Y) ) 15 } ## Stage B 18 B < t (Q) % % A 19 U < La. svd (B) $u 20 U < Q % % U 21 U[, 1 : k ] 22 } pbdr Core Team Programming with Big Data in R 113/131
158 Examples Using pbddmat RandSVD Randomized SVD 30 Singular Vectors from a 100,000 by 1,000 Matrix Algorithm full randomized 30 Singular Vectors from a 100,000 by 1,000 Matrix Speedup of Randomized vs. Full SVD Speedup Speedup Cores Cores pbdr Core Team Programming with Big Data in R 114/131
159 Examples Using pbddmat Summary 9 Examples Using pbddmat RandSVD Summary pbdr Core Team Programming with Big Data in R
160 Examples Using pbddmat Summary Summary pbddmat makes distributed (dense) linear algebra easier. Can enable rapid prototyping at large scale. pbdr Core Team Programming with Big Data in R 115/131
161 Contents MPI Profiling 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary pbdr Core Team Programming with Big Data in R
162 MPI Profiling Profiling with the pbdprof Package 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary pbdr Core Team Programming with Big Data in R
163 MPI Profiling Profiling with the pbdprof Package Introduction to pbdprof Successful Google Summer of Code 2013 project. Available on the CRAN. Enables profiling of MPI-using R scripts. pbdr packages officially supported; can work with others... Also reads, parses, and plots profiler outputs. pbdr Core Team Programming with Big Data in R 116/131
164 MPI Profiling Profiling with the pbdprof Package How it works MPI calls get hijacked by profiler and logged: pbdr Core Team Programming with Big Data in R 117/131
165 MPI Profiling Profiling with the pbdprof Package Introduction to pbdprof Currently supports the profilers fpmpi and mpip. fpmpi is distributed with pbdprof and installs easily, but offers minimal profiling capabilities. mpip is fully supported also, but you have to install and link it yourself. pbdr Core Team Programming with Big Data in R 118/131
166 MPI Profiling Installing pbdprof 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary pbdr Core Team Programming with Big Data in R
167 MPI Profiling Installing pbdprof Installing pbdprof 1 Build pbdprof. 2 Rebuild pbdmpi (linking with pbdprof). 3 Run your analysis as usual. 4 Interactively analyze profiler outputs with pbdprof. This is explained at length in the pbdprof vignette. pbdr Core Team Programming with Big Data in R 119/131
168 MPI Profiling Installing pbdprof Rebuild pbdmpi R CMD INSTALL pbdmpi _ tar.gz -- configure - args =" --enable - pbdprof " Any package which explicitly links with an MPI library must be rebuilt in this way (pbdmpi, Rmpi,... ). Other pbdr packages link with pbdmpi, and so do not need to be rebuilt. See pbdprof vignette if something goes wrong. pbdr Core Team Programming with Big Data in R 120/131
169 MPI Profiling Example 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary pbdr Core Team Programming with Big Data in R
170 MPI Profiling Example An Example from pbddmat Compute SVD in pbddmat package. Profile MPI calls with mpip. pbdr Core Team Programming with Big Data in R 121/131
171 MPI Profiling Example Example Script 1 library ( pbdmpi, quietly = TRUE ) 2 library ( pbddmat, quietly = TRUE ) 3 init. grid () n < x <- ddmatrix (" rnorm ", n, n) my.svd <- La.svd (x) 12 finalize () my svd.r pbdr Core Team Programming with Big Data in R 122/131
172 MPI Profiling Example Example Script Run example with 4 ranks: $ mpirun - np 4 Rscript my_ svd. r mpip : mpip : mpip : mpip V3.3.0 ( Build Sep / 14:00:47) mpip : Direct questions and errors to mpip - help@lists. sourceforge. net mpip : Using 2x2 for the default grid size mpip : mpip : Storing mpip output in [./ R mpip ]. mpip : pbdr Core Team Programming with Big Data in R 123/131
173 MPI Profiling Example Read Profiler Data into R Interactively (or in batch) Read in Profiler Data 1 library ( pbdprof ) 2 prof. data <- read. prof ("R mpip ") > prof. data An mpip profiler object : [[1]] Task AppTime MPITime MPI * Partial Output of Example Data [[2]] ID Lev File. Address Line _ Parent _ Funct MPI _ Call e +14 [ unknown ] Allreduce e +14 [ unknown ] Bcast pbdr Core Team Programming with Big Data in R 124/131
174 MPI Profiling Example Generate plots 1 plot ( prof. data ) pbdr Core Team Programming with Big Data in R 125/131
175 MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" stats1 ") pbdr Core Team Programming with Big Data in R 126/131
176 MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" stats2 ") pbdr Core Team Programming with Big Data in R 127/131
177 MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" messages1 ") pbdr Core Team Programming with Big Data in R 128/131
178 MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" messages2 ") pbdr Core Team Programming with Big Data in R 129/131
179 MPI Profiling Summary 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary pbdr Core Team Programming with Big Data in R
180 MPI Profiling Summary Summary pbdprof offers tools for profiling R-using MPI codes. Easily builds fpmpi; also supports mpip. pbdr Core Team Programming with Big Data in R 130/131
181 Contents Wrapup 11 Wrapup pbdr Core Team Programming with Big Data in R
182 Wrapup Summary Profile your code to understand your bottlenecks. pbdr makes distributed parallelism with R easier. Distributing data to multiple nodes For truly large data, I/O must be parallel as well. pbdr Core Team Programming with Big Data in R
183 Wrapup The pbdr Project Our website: us at: Our google group: Where to begin? The pbddemo package The pbddemo Vignette: pbdr Core Team Programming with Big Data in R
184 Thanks for coming! Questions? Come see our poster on Wednesday at 5:30! pbdr Core Team Programming with Big Data in R
Introducing R: From Your Laptop to HPC and Big Data
Introducing R: From Your Laptop to HPC and Big Data Drew Schmidt and George Ostrouchov November 18, 2013 http://r-pbd.org/tutorial pbdr Core Team Introduction to pbdr The pbdr Core Team Wei-Chen Chen 1
Big Data, R, and HPC
Big Data, R, and HPC A Survey of Advanced Computing with R Drew Schmidt April 16, 2015 Drew Schmidt Big Data, R, and HPC About Me @wrathematics http://librestats.com https://github.com/wrathematics http://wrathematics.info
High Performance Computing with R
High Performance Computing with R Drew Schmidt April 6, 2014 http://r-pbd.org/nimbios Drew Schmidt High Performance Computing with R Contents 1 Introduction 2 Profiling and Benchmarking 3 Writing Better
Version 0.2-0. Programming with Big Data in R. Speaking Serial R with a Parallel Accent. Package Examples and Demonstrations.
Version 0.2-0 Programming with Big Data in R Speaking Serial R with a Parallel Accent Package Examples and Demonstrations pbdr Core Team Speaking Serial R with a Parallel Accent (Ver. 0.2-0) pbdr Package
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
CUDAMat: a CUDA-based matrix class for Python
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based
The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation
The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation Mark Baker, Garry Smith and Ahmad Hasaan SSE, University of Reading Paravirtualization A full assessment of paravirtualization
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Introduction to parallel computing in R
Introduction to parallel computing in R Clint Leach April 10, 2014 1 Motivation When working with R, you will often encounter situations in which you need to repeat a computation, or a series of computations,
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Parallel Options for R
Parallel Options for R Glenn K. Lockwood SDSC User Services [email protected] Motivation "I just ran an intensive R script [on the supercomputer]. It's not much faster than my own machine." Motivation "I
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist
The Top Six Advantages of CUDA-Ready Clusters Ian Lumb Bright Evangelist GTC Express Webinar January 21, 2015 We scientists are time-constrained, said Dr. Yamanaka. Our priority is our research, not managing
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
Parallel Computing for Data Science
Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
Trends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Matrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
THE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
2: Computer Performance
2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Parallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
Parallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
HPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky
Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky Department of Computer
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Parallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
ST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Retargeting PLAPACK to Clusters with Hardware Accelerators
Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.
Big data in R EPIC 2015
Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future
Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)
Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) 1. Foreword Magento is a PHP/Zend application which intensively uses the CPU. Since version 1.1.6, each new version includes some
:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD [email protected]
:Introducing Star-P The Open Platform for Parallel Application Development Yoel Jacobsen E&M Computing LTD [email protected] The case for VHLLs Functional / applicative / very high-level languages allow
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Big Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation
Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American
Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1
Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Big-data Analytics: Challenges and Opportunities
Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)
CS 575 Parallel Processing
CS 575 Parallel Processing Lecture one: Introduction Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5
BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky
Solving the World s Toughest Computational Problems with Parallel Computing Solving the World s Toughest Computational Problems with Parallel Computing Department of Computer Science B. Thomas Golisano
Deal with big data in R using bigmemory package
Deal with big data in R using bigmemory package Xiaojuan Hao Department of Statistics University of Nebraska-Lincoln April 28, 2015 Background What Is Big Data Size (We focus on) Complexity Rate of growth
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Big Data Paradigms in Python
Big Data Paradigms in Python San Diego Data Science and R Users Group January 2014 Kevin Davenport! http://kldavenport.com [email protected] @KevinLDavenport Thank you to our sponsors: Setting up
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
Big Data and Parallel Work with R
Big Data and Parallel Work with R What We'll Cover Data Limits in R Optional Data packages Optional Function packages Going parallel Deciding what to do Data Limits in R Big Data? What is big data? More
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014
Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014 Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue
INTEL PARALLEL STUDIO XE EVALUATION GUIDE
Introduction This guide will illustrate how you use Intel Parallel Studio XE to find the hotspots (areas that are taking a lot of time) in your application and then recompiling those parts to improve overall
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Parallel Computing with Mathematica UVACSE Short Course
UVACSE Short Course E Hall 1 1 University of Virginia Alliance for Computational Science and Engineering [email protected] October 8, 2014 (UVACSE) October 8, 2014 1 / 46 Outline 1 NX Client for Remote
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
