Programming with Big Data in R

Similar documents
Introducing R: From Your Laptop to HPC and Big Data

Big Data, R, and HPC

High Performance Computing with R

Version Programming with Big Data in R. Speaking Serial R with a Parallel Accent. Package Examples and Demonstrations.

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

CUDAMat: a CUDA-based matrix class for Python

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Introduction to parallel computing in R

Part I Courses Syllabus

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Parallel Options for R

Architectures for Big Data Analytics A database perspective

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

An Introduction to Parallel Computing/ Programming

Parallel Computing for Data Science

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

HPC with Multicore and GPUs

Overview of HPC Resources at Vanderbilt

RevoScaleR Speed and Scalability

Chapter 2 Parallel Architecture, Software And Performance

Trends in High-Performance Computing for Power Grid Applications

Binary search tree with SIMD bandwidth optimization using SSE

High Performance Computing. Course Notes HPC Fundamentals

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Matrix Multiplication

THE NAS KERNEL BENCHMARK PROGRAM

Parallel and Distributed Computing Programming Assignment 1

2: Computer Performance

Fast Analytics on Big Data with H20

Parallel Data Preparation with the DS2 Programming Language

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Parallelization Strategies for Multicore Data Analysis

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

HPC enabling of OpenFOAM R for CFD applications

Big Graph Processing: Some Background

MPI and Hybrid Programming Models. William Gropp

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

OpenMP Programming on ScaleMP

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Parallel Large-Scale Visualization

GPUs for Scientific Computing

Parallel Programming Survey

ST810 Advanced Computing

Benchmarking Hadoop & HBase on Violin

Retargeting PLAPACK to Clusters with Hardware Accelerators

Big data in R EPIC 2015

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Operation Count; Numerical Linear Algebra

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Next Generation GPU Architecture Code-named Fermi

Building an Inexpensive Parallel Computer

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

HPC Wales Skills Academy Course Catalogue 2015

Big Data Visualization on the MIC

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Bringing Big Data Modelling into the Hands of Domain Experts

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Parallel Computing. Benson Muite. benson.

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Energy Efficient MapReduce

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Introduction to GPU Programming Languages

Big-data Analytics: Challenges and Opportunities

CS 575 Parallel Processing

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Deal with big data in R using bigmemory package

Lecture 5: Singular Value Decomposition SVD (1)

MOSIX: High performance Linux farm

CUDA programming on NVIDIA GPUs

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Big Data Paradigms in Python

Understanding the Benefits of IBM SPSS Statistics Server

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Big Data and Parallel Work with R

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

Similarity Search in a Very Large Scale Using Hadoop and HBase

Parallel Computing with Mathematica UVACSE Short Course

Linear Algebra Review. Vectors

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Transcription:

Programming with Big Data in R Drew Schmidt and George Ostrouchov user! 2014 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

The pbdr Core Team Wei-Chen Chen 1 George Ostrouchov 2,3 Pragneshkumar Patel 3 Drew Schmidt 3 Support This work used resources of National Institute for Computational Sciences at the University of Tennessee, Knoxville, which is supported by the Office of Cyberinfrastructure of the U.S. National Science Foundation under Award No. ARRA-NSF-OCI-0906324 for NICS-RDAV center. This work also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. 1 Department of Ecology and Evolutionary Biology University of Tennessee, Knoxville TN, USA 2 Computer Science and Mathematics Division Oak Ridge National Laboratory, Oak Ridge TN, USA 3 Joint Institute for Computational Sciences University of Tennessee, Knoxville TN, USA http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

About This Presentation Downloads This presentation is available at: http://r-pbd.org/tutorial http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

About This Presentation Installation Instructions Installation instructions for setting up a pbdr environment are available: http://r-pbd.org/install.html This includes instructions for installing R, MPI, and pbdr. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Contents 1 Introduction 2 Profiling and Benchmarking 3 The pbdr Project 4 Introduction to pbdmpi 5 The Generalized Block Distribution 6 Basic Statistics Examples 7 Data Input 8 Introduction to pbddmat and the ddmatrix Structure 9 Examples Using pbddmat 10 MPI Profiling 11 Wrapup http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Contents Introduction 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction A Concise Introduction to Parallelism 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction A Concise Introduction to Parallelism Parallelism Serial Programming Parallel Programming http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 1/131

Introduction A Concise Introduction to Parallelism Difficulty in Parallelism 1 Implicit parallelism: Parallel details hidden from user 2 Explicit parallelism: Some assembly required... 3 Embarrassingly Parallel: Also called loosely coupled. Obvious how to make parallel; lots of independence in computations. 4 Tightly Coupled: Opposite of embarrassingly parallel; lots of dependence in computations. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 2/131

Introduction A Concise Introduction to Parallelism Speedup Wallclock Time: Time of the clock on the wall from start to finish Speedup: unitless measure of improvement; more is better. S n1,n 2 = Run time for n 1 cores Run time for n 2 cores n 1 is often taken to be 1 In this case, comparing parallel algorithm to serial algorithm http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 3/131

Introduction A Concise Introduction to Parallelism Good Speedup Bad Speedup 8 8 6 6 Speedup 4 group Application Optimal Speedup 4 group Application Optimal 2 2 2 4 6 8 Cores 2 4 6 8 Cores http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 4/131

Introduction A Concise Introduction to Parallelism Scalability and Benchmarking 1 Strong: Fixed total problem size. Less work per core as more cores are added. 2 Weak: Fixed local (per core) problem size. Same work per core as more cores are added. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 5/131

Introduction A Concise Introduction to Parallelism Good Strong Scaling Good Weak Scaling 50 3100 40 3050 Speedup 30 Time 3000 20 10 2950 0 0 20 40 60 Cores 4000 8000 12000 16000 Cores http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 6/131

Introduction A Concise Introduction to Parallelism Shared and Distributed Memory Machines Shared Memory Direct access to read/change memory (one node) Distributed No direct access to read/change memory (many nodes); requires communication http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 7/131

Introduction A Concise Introduction to Parallelism Shared and Distributed Memory Machines Shared Memory Machines Thousands of cores Distributed Memory Machines Hundreds of thousands of cores Nautilus, University of Tennessee 1024 cores 4 TB RAM Kraken, University of Tennessee 112,896 cores 147 TB RAM http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 8/131

Introduction A Quick Overview of Parallel Hardware 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction Three Basic Flavors of Hardware A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 9/131

Your Laptop or Desktop Introduction A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 10/131

A Server or Cluster Introduction A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 11/131

Server to Supercomputer Introduction A Quick Overview of Parallel Hardware Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor GPU or MIC Shared Memory Local Memory CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 12/131

Knowing the Right Words PROC + cache Cluster Distributing Distributed Memory Interconnection Network PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Shared Memory Introduction A Quick Overview of Parallel Hardware Co-Processor GPU or MIC Local Memory Offloading Multicore GPU or Manycore CORE + cache CORE + cache CORE + cache CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Network Memory Multithreading http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 13/131

Introduction A Quick Overview of Parallel Software 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction A Quick Overview of Parallel Software Native Programming Models and Tools Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 14/131

Introduction A Quick Overview of Parallel Software 30+ Years of Parallel Computing Research Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 15/131

Introduction Last 10 years of Advances A Quick Overview of Parallel Software Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 16/131

Introduction Putting It All Together Challenge A Quick Overview of Parallel Software Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data Shared Memory GPU or MIC Local Memory CUDA OpenCL OpenACC CORE + cache CORE CORE + cache + cache Network Memory CORE + cache GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel OpenMP Pthreads fork OpenMP OpenACC http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 17/131

Introduction R Interfaces to Native Tools A Quick Overview of Parallel Software Distributed Memory Interconnection Network Focus on who owns what data and what communication is needed Sockets MPI Hadoop RHadoop snow Rmpi pbdmpi PROC + cache PROC + cache PROC + cache PROC + cache Mem Mem Mem Mem Co-Processor Same Task on Blocks of data CORE + cache Shared Memory CORE CORE CORE + cache + cache + cache Network Memory GPU or MIC Local Memory GPU: Graphical Processing Unit MIC: Many Integrated Core Focus on which tasks can be parallel CUDA OpenCL OpenACC OpenMP Pthreads fork OpenMP OpenACC Foreign Langauge Interfaces:.C.Call Rcpp OpenCL inline... snow + multicore = parallel multicore http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 18/131

Introduction Summary 1 Introduction A Concise Introduction to Parallelism A Quick Overview of Parallel Hardware A Quick Overview of Parallel Software Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction Summary Summary Three flavors of hardware Distributed is stable Multicore and co-processor are evolving Two memory models Distributed works in multicore Parallelism hierarchy Medium to big machines have all three http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 19/131

Contents Profiling and Benchmarking 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Profiling and Benchmarking Why Profile? 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Profiling and Benchmarking Why Profile? Performance and Accuracy Sometimes π = 3.14 is (a) infinitely faster than the correct answer and (b) the difference between the correct and the wrong answer is meaningless.... The thing is, some specious value of correctness is often irrelevant because it doesn t matter. While performance almost always matters. And I absolutely detest the fact that people so often dismiss performance concerns so readily. Linus Torvalds, August 8, 2008 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 20/131

Profiling and Benchmarking Why Profile? Why Profile? Because performance matters. Bad practices scale up! Your bottlenecks may surprise you. Because R is dumb. R users claim to be data people... so act like it! http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 21/131

Profiling and Benchmarking Why Profile? Compilers often correct bad behavior... A Really Dumb Loop int main (){ int x, i; for (i =0; i <10; i ++) x = 1; return 0; } main : # BB #0: clang -O3 example.c. cfi _ startproc xorl %eax, % eax ret clang example.c main :. cfi _ startproc # BB #0: movl $0, -4(% rsp ) movl $0, -12(% rsp ). LBB0 _1: cmpl $10, -12(% rsp ) jge. LBB0 _4 # BB #2: movl $1, -8(% rsp ) # BB #3: movl -12(% rsp ), % eax addl $1, % eax movl %eax, -12(% rsp ) jmp. LBB0 _1. LBB0 _4: movl $0, % eax ret http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 22/131

R will not! Profiling and Benchmarking Why Profile? Dumb Loop 1 for (i in 1:n){ 2 ta <- t(a) 3 Y <- ta %*% Q 4 Q <- qr.q(qr(y)) 5 Y <- A %*% Q 6 Q <- qr.q(qr(y)) 7 } 8 9 Q 1 ta <- t(a) 2 Better Loop 3 for (i in 1:n){ 4 Y <- ta %*% Q 5 Q <- qr.q(qr(y)) 6 Y <- A %*% Q 7 Q <- qr.q(qr(y)) 8 } 9 10 Q http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 23/131

Profiling and Benchmarking Example from a Real R Package Why Profile? Exerpt from Original function 1 while (i <=N){ 2 for (j in 1:i){ 3 d.k <- as.matrix(x)[l==j,l==j] 4... Exerpt from Modified function 1 x.mat <- as.matrix(x) 2 3 while (i <=N){ 4 for (j in 1:i){ 5 d.k <- x.mat[l==j,l==j] 6... By changing just 1 line of code, performance of the main method improved by over 350%! http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 24/131

Profiling and Benchmarking Why Profile? Some Thoughts R is slow. Bad programmers are slower. R isn t very clever (compared to a compiler). The Bytecode compiler helps, but not nearly as much as a compiler. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 25/131

Profiling and Benchmarking Profiling R Code 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Profiling and Benchmarking Profiling R Code Timings Getting simple timings as a basic measure of performance is easy, and valuable. system.time() timing blocks of code. Rprof() timing execution of R functions. Rprofmem() reporting memory allocation in R. tracemem() detect when a copy of an R object is created. The rbenchmark package Benchmark comparisons. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 26/131

Profiling and Benchmarking Advanced R Profiling 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Profiling and Benchmarking Advanced R Profiling Other Profiling Tools perf (Linux) PAPI MPI profiling: fpmpi, mpip, TAU http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 27/131

Profiling and Benchmarking Advanced R Profiling Profiling MPI Codes with pbdprof 1. Rebuild pbdr packages R CMD INSTALL pbdmpi _ 0.2-1. tar. gz \ -- configure - args = \ " --enable - pbdprof " 2. Run code mpirun - np 64 Rscript my_ script. R 3. Analyze results 1 library ( pbdprof ) 2 prof <- read. prof ( " output. mpip ") 3 plot (prof, plot. type =" messages2 ") http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 28/131

Profiling and Benchmarking Advanced R Profiling Profiling with pbdpapi Performance Application Programming Interface High and low level interfaces Linux only :( Function system.flips() system.flops() system.cache() system.epc() system.idle() system.cpuormem() system.utilization() Description of Measurement Time, floating point instructions, and Mflips Time, floating point operations, and Mflops Cache misses, hits, accesses, and reads Events per cycle Idle cycles CPU or RAM bound CPU utilization http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 29/131

Profiling and Benchmarking Summary 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Profiling and Benchmarking Summary Summary Profile, profile, profile. Use system.time() to get a general sense of a method. Use rbenchmark s benchmark() to compare 2 methods. Use Rprof() for more detailed profiling. Other tools exist for more hardcore applications (pbdpapi and pbdprof). http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 30/131

Contents The pbdr Project 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

The pbdr Project The pbdr Project 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

The pbdr Project The pbdr Project Programming with Big Data in R (pbdr) Striving for Productivity, Portability, Performance Free a R packages. Bridging high-performance compiled code with high-productivity of R Scalable, big data analytics. Offers implicit and explicit parallelism. Methods have syntax identical to R. a MPL, BSD, and GPL licensed http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 31/131

The pbdr Project The pbdr Project pbdr Packages http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 32/131

The pbdr Project The pbdr Project pbdr Motivation Why HPC libraries (MPI, ScaLAPACK, PETSc,... )? The HPC community has been at this for decades. They re tested. They work. They re fast. You re not going to beat Jack Dongarra at dense linear algebra. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 33/131

The pbdr Project The pbdr Project Simple Interface for MPI Operations with pbdmpi Rmpi 1 # int 2 mpi. allreduce (x, type =1) 3 # double 4 mpi. allreduce (x, type =2) 1 allreduce ( x) pbdmpi Types in R 1 > is. integer (1) 2 [1] FALSE 3 > is. integer (2) 4 [1] FALSE 5 > is. integer (1:2) 6 [1] TRUE http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 34/131

The pbdr Project The pbdr Project Distributed Matrices and Statistics with pbddmat Matrix Exponentiation Speedup (Relative to Serial Code) Nodes 1 2 3 4 12 4 8 12 16 24 32 48 64 Cores 1 library ( pbddmat ) 2 3 dx <- ddmatrix (" rnorm ", 5000, 5000) 4 expm (dx) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 35/131

The pbdr Project The pbdr Project Distributed Matrices and Statistics with pbddmat Least Squares Benchmark 125 Fitting y~x With Fixed Local Size of ~43.4 MiB Predictors 500 1000 2000 1016.09 100 21.34 42.68 85.35 170.7 341.41 Run Time (Seconds) 75 50 21.34 42.6885.35 170.7 341.41 1016.09 1016.09 25 21.34 42.6 85.35 170.7 341.41 504 1008 2016 4032 8064 Cores 24000 x < d d m a t r i x ( rnorm, nrow=m, n c o l=n ) y < d d m a t r i x ( rnorm, nrow=m, n c o l =1) mdl < lm. f i t ( x=x, y=y ) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 36/131

The pbdr Project The pbdr Project Getting Started with HPC for R Users: pbddemo 140 page, textbook-style vignette. Over 30 demos, utilizing all packages. Not just a hello world! Demos include: PCA Regression Parallel data input Model-based clustering Simple Monte Carlo simulation Bayesian MCMC http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 37/131

The pbdr Project pbdr Connects R to HPC Libraries 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

The pbdr Project pbdr Connects R to HPC Libraries R and pbdr Interfaces to HPC Libraries pbddemo Distributed Memory ScaLAPACK Interconnection Network PETSc PBLAS BLACS Trilinos PROC PROC PROC PROC + cache + cache + cache + cache pbdmpi pbddmat MPI Mem Mem Mem Mem Co-Processor pbdprof pbdpapi Tau mpip fpmpi PAPI CORE + cache MKL LibSci Shared Memory ACML CORE + cache Network Memory CORE + cache CORE + cache DPLASMA PLASMA HiPLARM MAGMA GPU or MIC magma Local Memory GPU: Graphical Processing Unit MIC: Many Integrated Core cublas NetCDF4 pbdncdf4 ADIOS pbdadios http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 38/131

The pbdr Project Using pbdr 3 The pbdr Project The pbdr Project pbdr Connects R to HPC Libraries Using pbdr Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

The pbdr Project Using pbdr pbdr Paradigms pbdr programs are R programs! Differences: Batch execution (non-interactive). Parallel code utilizes Single Program/Multiple Data (SPMD) style Emphasizes data parallelism. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 39/131

The pbdr Project Using pbdr Batch Execution Running a serial R program in batch: 1 Rscript my_ script. r or 1 R CMD BATCH my_ script. r Running a parallel (with MPI) R program in batch: 1 mpirun - np 2 Rscript my_ par _ script. r http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 40/131

The pbdr Project Using pbdr Single Program/Multiple Data (SPMD) SPMD is a programming paradigm. Not to be confused with SIMD. Paradigms Programming models OOP, Functional, SPMD,... SIMD Hardware instructions MMX, SSE,... http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 41/131

The pbdr Project Using pbdr Single Program/Multiple Data (SPMD) SPMD is arguably the simplest extension of serial programming. Only one program is written, executed in batch on all processors. Different processors are autonomous; there is no manager. Dominant programming model for large machines for 30 years. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 42/131

The pbdr Project Summary Summary pbdr connects R to scalable HPC libraries. The pbddemo package offers numerous examples and explanations for getting started with distributed R programming. pbdr programs are R programs. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 43/131

Contents Introduction to pbdmpi 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbdmpi Managing a Communicator 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbdmpi Managing a Communicator Message Passing Interface (MPI) MPI : Standard for managing communications (data and instructions) between different nodes/computers. Implementations: OpenMPI, MPICH2, Cray MPT,... Enables parallelism (via communication) on distributed machines. Communicator: manages communications between processors. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 44/131

Introduction to pbdmpi Managing a Communicator MPI Operations (1 of 2) Managing a Communicator: Create and destroy communicators. init() initialize communicator finalize() shut down communicator(s) Rank query: determine the processor s position in the communicator. comm.rank() who am I? comm.size() how many of us are there? Printing: Printing output from various ranks. comm.print(x) comm.cat(x) WARNING: only use these functions on results, never on yet-to-be-computed things. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 45/131

Introduction to pbdmpi Managing a Communicator Quick Example 1 1 library ( pbdmpi, quietly = TRUE ) 2 init () 3 Rank Query: 1 rank.r 4 my. rank <- comm. rank () 5 comm. print (my.rank, all. rank = TRUE ) 6 7 finalize () Execute this script via: 1 mpirun - np 2 Rscript 1_ rank. r Sample Output: 1 COMM. RANK = 0 2 [1] 0 3 COMM. RANK = 1 4 [1] 1 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 46/131

Introduction to pbdmpi Managing a Communicator Quick Example 2 1 library ( pbdmpi, quietly = TRUE ) 2 init () 3 4 comm. print (" Hello, world ") 5 Hello World: 2 hello.r 6 comm. print (" Hello again ", all. rank =TRUE, quietly = TRUE ) 7 8 finalize () Execute this script via: 1 mpirun - np 2 Rscript 2_ hello. r Sample Output: 1 COMM. RANK = 0 2 [1] " Hello, world " 3 [1] " Hello again " 4 [1] " Hello again " http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 47/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier MPI Operations 1 Reduce 2 Gather 3 Broadcast 4 Barrier http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 48/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Reductions Combine results into single result http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 49/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Gather Many-to-one http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 50/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Broadcast One-to-many http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 51/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Barrier Synchronization http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 52/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier MPI Operations (2 of 2) Reduction: each processor has a number x; add all of them up, find the largest/smallest,.... reduce(x, op= sum ) reduce to one allreduce(x, op= sum ) reduce to all Gather: each processor has a number; create a new object on some processor containing all of those numbers. gather(x) gather to one allgather(x) gather to all Broadcast: one processor has a number x that every other processor should also have. bcast(x) Barrier: computation wall ; no processor can proceed until all processors can proceed. barrier() http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 53/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Quick Example 3 1 library ( pbdmpi, quietly=true ) 2 init () 3 4 comm. set. seed ( diff = TRUE ) 5 6 n <- sample (1:10, size =1) 7 8 gt <- gather (n) 9 comm. print ( unlist ( gt)) 10 11 sm <- allreduce (n, op= sum ) 12 comm. print (sm, all. rank=t) 13 14 finalize () Reduce and Gather: 3 gt.r Execute this script via: 1 mpirun -np 2 Rscript 3_ gt.r Sample Output: 1 COMM. RANK = 0 2 [1] 2 8 3 COMM. RANK = 0 4 [1] 10 5 COMM. RANK = 1 6 [1] 10 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 54/131

Introduction to pbdmpi Reduce, Gather, Broadcast, and Barrier Quick Example 4 1 library ( pbdmpi, quietly =T) 2 init () 3 4 if ( comm. rank () ==0) { 5 x <- matrix (1:4, nrow =2) 6 } else { 7 x <- NULL 8 } 9 10 y <- bcast (x, rank. source =0) 11 12 comm. print (y, rank =1) 13 14 finalize () Broadcast: 4 bcast.r Execute this script via: 1 mpirun -np 2 Rscript 4_ bcast.r Sample Output: 1 COMM. RANK = 1 2 [,1] [,2] 3 [1,] 1 3 4 [2,] 2 4 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 55/131

Introduction to pbdmpi Other pbdmpi Tools 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbdmpi Other pbdmpi Tools Random Seeds pbdmpi offers a simple interface for managing random seeds: comm.set.seed(seed=1234, diff=false) All processors use the same seed. comm.set.seed(seed=1234, diff=false) All processors use the same seed. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 56/131

Introduction to pbdmpi Other pbdmpi Tools Other Helper Tools pbdmpi Also contains useful tools for Manager/Worker and task parallelism codes: Task Subsetting: Distributing a list of jobs/tasks get.jid(n) *ply: Functions in the *ply family. pbdapply(x, MARGIN, FUN,...) analogue of apply() pbdlapply(x, FUN,...) analogue of lapply() pbdsapply(x, FUN,...) analogue of sapply() http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 57/131

Introduction to pbdmpi Summary 4 Introduction to pbdmpi Managing a Communicator Reduce, Gather, Broadcast, and Barrier Other pbdmpi Tools Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbdmpi Summary Summary Start by loading the package: 1 library ( pbdmpi, quiet = TRUE ) Always initialize before starting and finalize when finished: 1 init () 2 3 #... 4 5 finalize () http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 58/131

Contents GBD 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

GBD GBD: a Way to Distribute Your Data 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

GBD GBD: a Way to Distribute Your Data Distributing Data Problem: How to distribute the data x = x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10,3 10 3? http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 59/131

GBD GBD: a Way to Distribute Your Data Distributing a Matrix Across 4 Processors: Block Distribution x = Data x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10,3 10 3 Processors 0 1 2 3 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 60/131

GBD GBD: a Way to Distribute Your Data Distributing a Matrix Across 4 Processors: Local Load Balance x = Data x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10,3 10 3 Processors 0 1 2 3 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 61/131

GBD GBD: a Way to Distribute Your Data The GBD Data Structure Throughout the examples, we will make use of the Generalized Block Distribution, or GBD distributed matrix structure. 1 GBD is distributed. No processor owns all the data. 2 GBD is non-overlapping. Rows uniquely assigned to processors. 3 GBD is row-contiguous. If a processor owns one element of a row, it owns the entire row. 4 GBD is globally row-major, locally column-major. 5 GBD is often locally balanced, where each processor owns (almost) the same amount of data. But this is not required. x 1,1 x 1,2 x 1,3 x 2,1 x 2,2 x 2,3 x 3,1 x 3,2 x 3,3 x 4,1 x 4,2 x 4,3 x 5,1 x 5,2 x 5,3 x 6,1 x 6,2 x 6,3 x 7,1 x 7,2 x 7,3 x 8,1 x 8,2 x 8,3 x 9,1 x 9,2 x 9,3 x 10,1 x 10,2 x 10,3 6 The last row of the local storage of a processor is adjacent (by global row) to the first row of the local storage of next processor (by communicator number) that owns data. 7 GBD is (relatively) easy to understand, but can lead to bottlenecks if you have many more columns than rows. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 62/131

GBD Example GBD Distributions 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

GBD Example GBD Distributions Understanding GBD: Global Matrix x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processors = 0 1 2 3 4 5 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 63/131

GBD Example GBD Distributions Understanding GBD: Load Balanced GBD x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processors = 0 1 2 3 4 5 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 64/131

GBD Example GBD Distributions Understanding GBD: Local View [ x11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 [ x31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 [ x51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 ] ] ] 2 9 2 9 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 [ ] x71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 2 9 1 9 [ x81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 ] 1 9 [ x91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 ] 1 9 Processors = 0 1 2 3 4 5 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 65/131

GBD Example GBD Distributions Understanding GBD: Non-Balanced GBD x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processors = 0 1 2 3 4 5 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 66/131

GBD Example GBD Distributions Understanding GBD: Local View [ ] x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 [ x51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 [ ] x71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 ] 0 9 4 9 2 9 1 9 [ ] [ x81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 ] 0 9 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 2 9 Processors = 0 1 2 3 4 5 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 67/131

GBD Summary 5 The Generalized Block Distribution GBD: a Way to Distribute Your Data Example GBD Distributions Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

GBD Summary Summary Need to distribute your data? Try splitting by row. May not work well if your data is square (or longer than tall). http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 68/131

Contents Basic Statistics Examples 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Example 1: Monte Carlo Simulation Sample N uniform observations (x i, y i ) in the unit square [0, 1] [0, 1]. Then π 4 ( # Inside Circle # Total ) = 4 ( # Blue # Blue + # Red ) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 69/131

Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Example 1: Monte Carlo Simulation GBD Algorithm 1 Let n be big-ish; we ll take n = 50, 000. 2 Generate an n 2 matrix x of standard uniform observations. 3 Count the number of rows satisfying x 2 + y 2 1 4 Ask everyone else what their answer is; sum it all up. 5 Take this new answer, multiply by 4 and divide by n 6 If my rank is 0, print the result. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 70/131

Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Example 1: Monte Carlo Simulation Code Serial Code 1 N <- 50000 2 X <- matrix ( runif (N * 2), ncol =2) 3 r <- sum ( rowsums (X ^2) <= 1) 4 PI <- 4*r/N 5 print (PI) Parallel Code 1 library ( pbdmpi, quiet = TRUE ) 2 init () 3 comm. set. seed ( seed =1234567, diff = TRUE ) 4 5 N. gbd <- 50000 / comm. size () 6 X. gbd <- matrix ( runif ( N. gbd * 2), ncol = 2) 7 r. gbd <- sum ( rowsums (X. gbd ^2) <= 1) 8 r <- allreduce (r. gbd ) 9 PI <- 4*r/(N. gbd * comm. size ()) 10 comm. print (PI) 11 12 finalize () http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 71/131

Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation Note For the remainder, we will exclude loading, init, and finalize calls. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 72/131

Basic Statistics Examples pbdmpi Example: Sample Covariance 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Basic Statistics Examples pbdmpi Example: Sample Covariance Example 2: Sample Covariance cov(x n p ) = 1 n 1 n (x i µ x ) (x i µ x ) T i=1 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 73/131

Basic Statistics Examples pbdmpi Example: Sample Covariance Example 2: Sample Covariance GBD Algorithm 1 Determine the total number of rows N. 2 Compute the vector of column means of the full matrix. 3 Subtract each column s mean from that column s entries in each local matrix. 4 Compute the crossproduct locally and reduce. 5 Divide by N 1. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 74/131

Basic Statistics Examples pbdmpi Example: Sample Covariance Example 2: Sample Covariance Code 1 N <- nrow ( X) 2 mu <- colsums ( X) / N 3 Serial Code 4 X <- sweep (X, STATS =mu, MARGIN =2) 5 Cov. X <- crossprod ( X) / (N -1) 6 7 print ( Cov.X) Parallel Code 1 N <- allreduce ( nrow (X. gbd ), op=" sum ") 2 mu <- allreduce ( colsums (X. gbd ) / N, op=" sum ") 3 4 X. gbd <- sweep (X.gbd, STATS =mu, MARGIN =2) 5 Cov.X <- allreduce ( crossprod (X. gbd ), op=" sum ") / (N -1) 6 7 comm. print ( Cov.X) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 75/131

Basic Statistics Examples pbdmpi Example: Linear Regression 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Basic Statistics Examples pbdmpi Example: Linear Regression Example 3: Linear Regression Find β such that y = Xβ + ɛ When X is full rank, ˆβ = (X T X) 1 X T y http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 76/131

Basic Statistics Examples pbdmpi Example: Linear Regression Example 3: Linear Regression GBD Algorithm 1 Locally, compute tx = x T 2 Locally, compute A = tx x. Query every other processor for this result and sum up all the results. 3 Locally, compute B = tx y. Query every other processor for this result and sum up all the results. 4 Locally, compute A 1 B http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 77/131

Basic Statistics Examples pbdmpi Example: Linear Regression Example 3: Linear Regression Code 1 tx <- t( X) 2 A <- tx %*% X 3 B <- tx %*% y 4 5 ols <- solve (A) %*% B Serial Code Parallel Code 1 tx.gbd <- t(x. gbd ) 2 A <- allreduce (tx.gbd %*% X.gbd, op = " sum ") 3 B <- allreduce (tx.gbd %*% y.gbd, op = " sum ") 4 5 ols <- solve (A) %*% B http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 78/131

Basic Statistics Examples Summary 6 Basic Statistics Examples pbdmpi Example: Monte Carlo Simulation pbdmpi Example: Sample Covariance pbdmpi Example: Linear Regression Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Basic Statistics Examples Summary Summary SPMD programming is (often) a natural extension of serial programming. More pbdmpi examples in pbddemo. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 79/131

Contents Data Input 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Data Input Cluster Computer and File System 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Data Input One Node to One Storage Server Cluster Computer and File System Cluster computer File System Storage Server Disk Compute Nodes http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 80/131

Data Input Many Nodes to One Storage Server Cluster Computer and File System Cluster computer File System Storage Server Disk Compute Nodes http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 81/131

Data Input Many Nodes to Few Storage Servers Cluster Computer and File System Cluster computer Parallel File System Compute Nodes Storage Servers Disk http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 82/131

Data Input Cluster Computer and File System Few Nodes to Few Storage Servers - Default Cluster computer Parallel File System Compute Nodes I/O Nodes Storage Servers Disk http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 83/131

Data Input Cluster Computer and File System Few Nodes to Few Storage Servers - Coordinated Cluster computer Parallel File System Compute Nodes I/O Nodes Storage Servers Disk pbdadios ADIOS http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 84/131

Data Input Serial Data Input 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Data Input Serial Data Input Separate manual: http://r-project.org/ scan() read.table() read.csv() socket http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 85/131

Data Input Serial Data Input CSV Data: Read Serial then Distribute Listing : 1 library ( pbddmat ) 2 if( comm. rank () == 0) { # only read on process 0 3 x <- read. csv (" myfile. csv ") 4 } else { 5 x <- NULL 6 } 7 8 dx <- as. ddmatrix ( x) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 86/131

Data Input Parallel Data Input 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Data Input Parallel Data Input New Issues How to read in parallel? CSV, SQL, NetCDF4, HDF, ADIOS, custom binary How to partition data across nodes? How to structure for scalable libraries? Read directly into form needed or restructure?... A lot of work needed here! http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 87/131

Data Input Parallel Data Input CSV Data 1 x <- read. csv ("x. csv ") 2 3 x Serial Code 1 library ( pbddemo, quiet = TRUE ) 2 init. grid () 3 Parallel Code 4 dx <- read. csv. ddmatrix ("x. csv ", header =TRUE, sep =",", 5 nrows =10, ncols =10, num. rdrs =2, ICTXT =0) 6 7 dx 8 9 finalize () http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 88/131

Data Input Parallel Data Input Binary Data: Vector 1 ## set up start and length for reading a vector of n doubles 2 size <- 8 # bytes 3 4 my_ids <- get. jid (n, method =" block ") 5 6 my_ start <- (my_ids [1] - 1)* size 7 my_ length <- length ( my_ ids ) 8 9 con <- file (" binary. vector. file ", "rb") 10 seekval <- seek (con, where =my_start, rw=" read ") 11 x <- readbin (con, what =" double ", n=my_length, size = size ) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 89/131

Data Input Parallel Data Input Binary Data: Matrix 1 ## read an nrow by ncol matrix of doubles split by columns 2 size <- 8 # bytes 3 4 my_ids <- get. jid (ncol, method =" block ") 5 my_ ncol <- length (my_ids ) 6 my_ start <- ( my_ ids [1] - 1)*nrow* size 7 my_ length <- my_ ncol *nrow 8 9 con <- file (" binary. matrix. file ", "rb") 10 seekval <- seek (con, where =my_start, rw=" read ") 11 x <- readbin (con, what =" double ", n=my_length, size = size ) 12 13 ## glue together as a column - block ddmatrix 14 gdim <- c( nrow, ncol ) 15 ldim <- c( nrow, my_ ncol ) 16 bldim <- c(nrow, allreduce (my_ncol, op=" max ")) 17 X <- new (" ddmatrix ", Data = matrix (x, nrow, my_ ncol ), 18 dim =gdim, ldim =ldim, bldim = bldim, ICTXT =1) 19 20 ## redistribute for ScaLAPACK s block - cyclic 21 X <- redistribute (X, bldim =c(2, 2), ICTXT =0) 22 Xprc <- prcomp ( X) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 90/131

Data Input Parallel Data Input NetCDF4 Data 1 ### parallel read after determining start and length 2 nc <- nc_ open _ par ( file _ name ) 3 4 nc_var _ par _ access (nc, " variable _ name ") 5 new.x <- ncvar _ get (nc, " variable _ name ", start, length ) 6 nc_ close (nc) 7 8 finalize () http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 91/131

Data Input Summary 7 Data Input Cluster Computer and File System Serial Data Input Parallel Data Input Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Data Input Summary Summary Mostly do it yourself Parallel file system for big data Binary files for true parallel reads Know number of readers vs number of storage servers Redistribution help from ddmatrix functions More help under development http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 92/131

Introduction to pbddmat and the ddmatrix Structure Contents 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Distributed Matrices You can only get so far with one node... 1000 Seconds Elapsed time 100 Script Eigenvalues of a 10000x10000 random matrix Inverse of a 10000x10000 random matrix Linear regression over a 10000x10000 matrix (c = a \\ b') 10000x10000 cross product matrix (b = a' * a) Determinant of a 10000x10000 random matrix Cholesky decomposition of a 10000x10000 matrix 10 1 4 16 64 256 Cores The solution is to distribute the data. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 93/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Distributed Matrices (a) Block (b) Cyclic (c) Block-Cyclic Figure : Matrix Distribution Schemes http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 94/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Distributed Matrices (a) 2d Block (b) 2d Cyclic (c) 2d Block-Cyclic Figure : Matrix Distribution Schemes Onto a 2-Dimensional Grid http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 95/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Processor Grid Shapes T 0 1 2 3 [ 4 0 1 2 5 3 4 5 (a) 1 6 (b) 2 3 ] 0 1 2 3 4 5 (c) 3 2 Table : Processor Grid Shapes with 6 Processors 0 1 2 3 4 5 (d) 6 1 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 96/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices The ddmatrix Class For distributed dense matrix objects, we use the special S4 class ddmatrix. Data dim ddmatrix = ldim bldim ICTXT The local submatrix (an R matrix) Dimension of the global matrix Dimension of the local submatrix Dimension of the blocks MPI Grid Context http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 97/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices Understanding ddmatrix: Global Matrix x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 98/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 1-dimensional Row Block x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 Processor grid = 0 1 2 3 = (0,0) (1,0) (2,0) (3,0) 9 9 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 99/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 2-dimensional Row Block x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processor grid = 0 1 2 3 = (0,0) (0,1) (1,0) (1,1) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 100/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 1-dimensional Row Cyclic x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 Processor grid = 0 1 2 3 = (0,0) (1,0) (2,0) (3,0) 9 9 http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 101/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 2-dimensional Row Cyclic x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processor grid = 0 1 2 3 = (0,0) (0,1) (1,0) (1,1) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 102/131

Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices ddmatrix: 2-dimensional Block-Cyclic x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processor grid = 0 1 2 3 = (0,0) (0,1) (1,0) (1,1) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 103/131

Introduction to pbddmat and the ddmatrix Structure pbddmat 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbddmat and the ddmatrix Structure pbddmat The ddmatrix Data Structure The more complicated the processor grid, the more complicated the distribution. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 104/131

Introduction to pbddmat and the ddmatrix Structure pbddmat ddmatrix: 2-dimensional Block-Cyclic with 6 Processors x = x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 21 x 22 x 23 x 24 x 25 x 26 x 27 x 28 x 29 x 31 x 32 x 33 x 34 x 35 x 36 x 37 x 38 x 39 x 41 x 42 x 43 x 44 x 45 x 46 x 47 x 48 x 49 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x 59 x 61 x 62 x 63 x 64 x 65 x 66 x 67 x 68 x 69 x 71 x 72 x 73 x 74 x 75 x 76 x 77 x 78 x 79 x 81 x 82 x 83 x 84 x 85 x 86 x 87 x 88 x 89 x 91 x 92 x 93 x 94 x 95 x 96 x 97 x 98 x 99 9 9 Processor grid = 0 1 2 3 4 5 = (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 105/131

Introduction to pbddmat and the ddmatrix Structure pbddmat Understanding ddmatrix: Local View x 11 x 12 x 17 x 18 x 21 x 22 x 27 x 28 x 51 x 52 x 57 x 58 x 61 x 62 x 67 x 68 x 91 x 92 x 97 x 98 x 31 x 32 x 37 x 38 x 41 x 42 x 47 x 48 x 71 x 72 x 77 x 78 x 81 x 82 x 87 x 88 5 4 4 4 x 13 x 14 x 19 x 23 x 24 x 29 x 53 x 54 x 59 x 63 x 64 x 69 x 93 x 94 x 99 x 33 x 34 x 39 x 43 x 44 x 49 x 73 x 74 x 79 x 83 x 84 x 89 5 3 4 3 x 15 x 16 x 25 x 26 x 55 x 56 x 65 x 66 x 95 x 96 x 35 x 36 x 45 x 46 x 75 x 76 x 85 x 86 5 2 4 2 Processor grid = 0 1 2 3 4 5 = (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 106/131

Introduction to pbddmat and the ddmatrix Structure pbddmat The ddmatrix Data Structure 1 ddmatrix is distributed. No one processor owns all of the matrix. 2 ddmatrix is non-overlapping. Any piece owned by one processor is owned by no other processors. 3 ddmatrix can be row-contiguous or not, depending on the processor grid and blocking factor used. 4 ddmatrix is locally column-major and globally, it depends... x 11 x 12 x 13 x 14 x 15 x 21 x 22 x 23 x 24 x 25 x 31 x 32 x 33 x 34 x 35 x 41 x 42 x 43 x 44 x 45 x 51 x 52 x 53 x 54 x 55 x 61 x 62 x 63 x 64 x 65 x 71 x 72 x 73 x 74 x 75 x 81 x 82 x 83 x 84 x 85 x 91 x 92 x 93 x 94 x 95 6 GBD is a generalization of the one-dimensional block ddmatrix distribution. Otherwise there is no relation. 7 ddmatrix is confusing, but very robust. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 107/131

Introduction to pbddmat and the ddmatrix Structure pbddmat Pros and Cons of This Data Structure Pros Robust for matrix computations. Cons Confusing layout. This is why we hide most of the distributed details. The details are there if you want them (you don t want them). http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 108/131

Introduction to pbddmat and the ddmatrix Structure pbddmat Methods for class ddmatrix pbddmat has over 100 methods with identical syntax to R: `[`, rbind(), cbind(),... lm.fit(), prcomp(), cov(),... `%*%`, solve(), svd(), norm(),... median(), mean(), rowsums(),... 1 cov (x) 1 cov (x) Serial Code Parallel Code http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 109/131

Introduction to pbddmat and the ddmatrix Structure pbddmat Comparing pbdmpi and pbddmat pbdmpi: MPI + sugar. GBD not the only structure pbdmpi can handle (just a useful convention). pbddmat: Distributed matrices + statistics. The ddmatrix structure must be used for pbddmat. If the data is not 2d block-cyclic compatible, ddmatrix will definitely give the wrong answer. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 110/131

Introduction to pbddmat and the ddmatrix Structure Summary 8 Introduction to pbddmat and the ddmatrix Structure Introduction to Distributed Matrices pbddmat Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Introduction to pbddmat and the ddmatrix Structure Summary Summary 1 Start by loading the package: 1 library ( pbddmat, quiet = TRUE ) 2 Always initialize before starting and finalize when finished: 1 init. grid () 2 3 #... 4 5 finalize () http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 111/131

Contents Examples Using pbddmat 9 Examples Using pbddmat RandSVD Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Examples Using pbddmat RandSVD 9 Examples Using pbddmat RandSVD Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Examples Using pbddmat RandSVD Randomized SVD 1 PROBABILISTIC ALGORITHMS FOR MATRIX APPROXIMATION 227 Prototype for Randomized SVD Given an m n matrix A, a target number k of singular vectors, and an Serial R exponent q (say, q =1or q =2), this procedure computes an approximate rank-2k factorization UΣV,whereU and V are orthonormal, and Σ is 1 randsvd < f u n c t i o n (A, k, q=3) nonnegative and diagonal. 2 { Stage A: 244 N. HALKO, P. G. MARTINSSON, AND J. A. TROPP 3 ## Stage A 1 Generate an n 2k Gaussian test matrix Ω. 2 Form Y =(AA ) q AΩ by multiplying alternately with A and A. 4 Omega < matrix(rnorm(n*2*k), 3 Construct Algorithm a matrix Q4.3: whose Randomized columns formpower an orthonormal Iterationbasis for 5 nrow=n, ncol=2*k) Given thean range m of ny matrix. A and integers l and q, this algorithm computes an 6 Y < A % % Omega Stage m l orthonormal B: matrix Q whose range approximates the range of A. 7 Q < qr.q( qr (Y) ) 4 1 Form Draw B an= n Q la. Gaussian random matrix Ω. 2 Form the m l matrix Y =(AA 5 Compute an SVD of the small matrix: ) q 8 At < t (A) AΩ via B = ŨΣV alternating application. of A and 6 Set U = QŨ. A. 9 f o r ( i i n 1 : q ) 3 Construct an m l matrix Q whose columns form an orthonormal 10 { Note: The computation of Y in step 2 is vulnerable to round-off errors. basis for the range of Y, e.g., via the QR factorization Y = QR. 11 Y < At % % Q When high accuracy is required, we must incorporate an orthonormalization Note: This procedure is vulnerable to round-off step between each application of A and A errors; see Remark 4.3. The 12 Q < qr.q( qr (Y) ) ; see Algorithm 4.4. recommended implementation appears as Algorithm 4.4. 13 Y < A % % Q 14 Q < qr.q( qr (Y) ) The theoryalgorithm developed in4.4: thisrandomized paper provides Subspace much more Iteration detailed information15 } about Given the performance m n matrix of thea proto-algorithm. and integers l and q, this algorithm computes an 16 m When l orthonormal the singular matrix values Q whose of A decay range slightly, approximates the error the range A of QQA. A does 17 ## Stage B 1 not Draw depend an n onl the standard dimensions Gaussian of the matrix matrix Ω. (sections 10.2 10.3). We can reduce the size of the bracket in the error bound (1.8) by combining18 B < t (Q) % % A 2 Form Y0 = AΩ and compute its QR factorization Y0 = Q0R0. 3 the for proto-algorithm j =1, 2,..., q with a power iteration (section 10.4). For an example, 19 U < La. svd (B) $u 4 see section Form 1.6 below. For the structured Qj 1 and compute its QR factorization Ỹj = Qj Rj. 20 U < Q % % U random matrices we mentioned in section 1.4.1, related 5 Form Yj = A error bounds are in and compute its QR factorization Yj = QjRj. 21 U[, 1 : k ] force (section 11). 6 end 22 } We can obtain inexpensive a posteriori error estimates to verify the quality 7 Q = Qq. of the approximation (section 4.3). 1 Halko 1.6. Example: N, Martinsson Randomized P-G and SVD. Tropp We conclude J A 2011 this Finding introduction structure with a with short randomness: probabilistic algorithms for constructing discussion Algorithm of how4.3 these targets ideas the allow fixed-rank us to perform problem. an approximate To addresssvd the fixed-precision of a large data approximate problem, matrix, which we can ismatrix aincorporate compelling decompositions the application error estimators ofsiam randomized described Rev. matrix 53 in 217 88 section approximation 4.3 to obtain [113]. an adaptive The two-stage schemerandomized analogous with method Algorithm offers4.2. a natural In situations approach where to itsvd is critical computations. to near-optimal Unfortunately, approximation the simplesterrors, version oneof can this increase schemethe is oversampling inadequate inbeyond many achieve our applications standardbecause recommendation the singular l spectrum = k +5 all of the theinput way to matrix l =2k maywithout decay slowly. changing To http://r-pbd.org/tutorial address this difficulty, we incorporate q steps of a power pbdr iteration, Core Team where q =1or Programming with Big Data in R 112/131

Examples Using pbddmat RandSVD Randomized SVD Serial R 1 randsvd < f u n c t i o n (A, k, q=3) 2 { 3 ## Stage A 4 Omega < matrix(rnorm(n*2*k), 5 nrow=n, ncol=2*k) 6 Y < A % % Omega 7 Q < qr.q( qr (Y) ) 8 At < t (A) 9 f o r ( i i n 1 : q ) 10 { 11 Y < At % % Q 12 Q < qr.q( qr (Y) ) 13 Y < A % % Q 14 Q < qr.q( qr (Y) ) 15 } 16 17 ## Stage B 18 B < t (Q) % % A 19 U < La. svd (B) $u 20 U < Q % % U 21 U[, 1 : k ] 22 } Parallel pbdr 1 randsvd < f u n c t i o n (A, k, q=3) 2 { 3 ## Stage A 4 Omega < ddmatrix( rnorm, 5 nrow=n, ncol=2*k) 6 Y < A % % Omega 7 Q < qr.q( qr (Y) ) 8 At < t (A) 9 f o r ( i i n 1 : q ) 10 { 11 Y < At % % Q 12 Q < qr.q( qr (Y) ) 13 Y < A % % Q 14 Q < qr.q( qr (Y) ) 15 } 16 17 ## Stage B 18 B < t (Q) % % A 19 U < La. svd (B) $u 20 U < Q % % U 21 U[, 1 : k ] 22 } http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 113/131

Examples Using pbddmat RandSVD Randomized SVD 30 Singular Vectors from a 100,000 by 1,000 Matrix Algorithm full randomized 30 Singular Vectors from a 100,000 by 1,000 Matrix Speedup of Randomized vs. Full SVD 128 64 15 32 Speedup 16 8 4 Speedup 10 2 1 5 1 2 4 8 16 32 64 128 Cores 1 2 4 8 16 32 64 128 Cores http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 114/131

Examples Using pbddmat Summary 9 Examples Using pbddmat RandSVD Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Examples Using pbddmat Summary Summary pbddmat makes distributed (dense) linear algebra easier. Can enable rapid prototyping at large scale. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 115/131

Contents MPI Profiling 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

MPI Profiling Profiling with the pbdprof Package 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

MPI Profiling Profiling with the pbdprof Package Introduction to pbdprof Successful Google Summer of Code 2013 project. Available on the CRAN. Enables profiling of MPI-using R scripts. pbdr packages officially supported; can work with others... Also reads, parses, and plots profiler outputs. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 116/131

MPI Profiling Profiling with the pbdprof Package How it works MPI calls get hijacked by profiler and logged: http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 117/131

MPI Profiling Profiling with the pbdprof Package Introduction to pbdprof Currently supports the profilers fpmpi and mpip. fpmpi is distributed with pbdprof and installs easily, but offers minimal profiling capabilities. mpip is fully supported also, but you have to install and link it yourself. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 118/131

MPI Profiling Installing pbdprof 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

MPI Profiling Installing pbdprof Installing pbdprof 1 Build pbdprof. 2 Rebuild pbdmpi (linking with pbdprof). 3 Run your analysis as usual. 4 Interactively analyze profiler outputs with pbdprof. This is explained at length in the pbdprof vignette. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 119/131

MPI Profiling Installing pbdprof Rebuild pbdmpi R CMD INSTALL pbdmpi _ 0.2-2. tar.gz -- configure - args =" --enable - pbdprof " Any package which explicitly links with an MPI library must be rebuilt in this way (pbdmpi, Rmpi,... ). Other pbdr packages link with pbdmpi, and so do not need to be rebuilt. See pbdprof vignette if something goes wrong. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 120/131

MPI Profiling Example 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

MPI Profiling Example An Example from pbddmat Compute SVD in pbddmat package. Profile MPI calls with mpip. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 121/131

MPI Profiling Example Example Script 1 library ( pbdmpi, quietly = TRUE ) 2 library ( pbddmat, quietly = TRUE ) 3 init. grid () 4 5 6 n <- 1000 7 x <- ddmatrix (" rnorm ", n, n) 8 10 11 9 my.svd <- La.svd (x) 12 finalize () my svd.r http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 122/131

MPI Profiling Example Example Script Run example with 4 ranks: $ mpirun - np 4 Rscript my_ svd. r mpip : mpip : mpip : mpip V3.3.0 ( Build Sep 23 2013 / 14:00:47) mpip : Direct questions and errors to mpip - help@lists. sourceforge. net mpip : Using 2x2 for the default grid size mpip : mpip : Storing mpip output in [./ R.4.5944.1. mpip ]. mpip : http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 123/131

MPI Profiling Example Read Profiler Data into R Interactively (or in batch) Read in Profiler Data 1 library ( pbdprof ) 2 prof. data <- read. prof ("R.4.28812.1. mpip ") > prof. data An mpip profiler object : [[1]] Task AppTime MPITime MPI. 1 0 5.71 0.0387 0.68 2 1 5.70 0.0297 0.52 3 2 5.71 0.0540 0.95 4 3 5.71 0.0355 0.62 5 * 22.80 0.1580 0.69 Partial Output of Example Data [[2]] ID Lev File. Address Line _ Parent _ Funct MPI _ Call 1 1 0 1.397301 e +14 [ unknown ] Allreduce 2 2 0 1.397301 e +14 [ unknown ] Bcast http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 124/131

MPI Profiling Example Generate plots 1 plot ( prof. data ) http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 125/131

MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" stats1 ") http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 126/131

MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" stats2 ") http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 127/131

MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" messages1 ") http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 128/131

MPI Profiling Example Generate plots 1 plot ( prof.data, plot. type =" messages2 ") http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 129/131

MPI Profiling Summary 10 MPI Profiling Profiling with the pbdprof Package Installing pbdprof Example Summary http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

MPI Profiling Summary Summary pbdprof offers tools for profiling R-using MPI codes. Easily builds fpmpi; also supports mpip. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R 130/131

Contents Wrapup 11 Wrapup http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Wrapup Summary Profile your code to understand your bottlenecks. pbdr makes distributed parallelism with R easier. Distributing data to multiple nodes For truly large data, I/O must be parallel as well. http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Wrapup The pbdr Project Our website: http://r-pbd.org/ Email us at: RBigData@gmail.com Our google group: http://group.r-pbd.org/ Where to begin? The pbddemo package http://cran.r-project.org/web/packages/pbddemo/ The pbddemo Vignette: http://goo.gl/hzkrt http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R

Thanks for coming! Questions? http://r-pbd.org/ Come see our poster on Wednesday at 5:30! http://r-pbd.org/tutorial pbdr Core Team Programming with Big Data in R