Big Data, R, and HPC



Similar documents
High Performance Computing with R

Programming with Big Data in R

Big Data and Parallel Work with R

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

The Saves Package. an approximate benchmark of performance issues while loading datasets. Gergely Daróczi

Introducing R: From Your Laptop to HPC and Big Data

Parallel Options for R

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Trends in High-Performance Computing for Power Grid Applications

Overview of HPC Resources at Vanderbilt

Parallel Programming Survey

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

An Introduction to Parallel Computing/ Programming

Parallel Computing for Data Science

Big data in R EPIC 2015

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

CMYK 0/100/100/20 66/54/42/17 34/21/10/0 Why is R slow? How to run R programs faster? Tomas Kalibera

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

Part I Courses Syllabus

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

HPC Wales Skills Academy Course Catalogue 2015

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Parallel Computing. Benson Muite. benson.

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Programming Languages & Tools

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Next Generation GPU Architecture Code-named Fermi

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Big-data Analytics: Challenges and Opportunities

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

Intelligent Heuristic Construction with Active Learning

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

2: Computer Performance

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Architectures for Big Data Analytics A database perspective

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

The Not Quite R (NQR) Project: Explorations Using the Parrot Virtual Machine

Grid 101. Grid 101. Josh Hegie.

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

HPC with Multicore and GPUs

11.1 inspectit inspectit

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

A Crash course to (The) Bighouse

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Parallel Large-Scale Visualization

Tools for Performance Debugging HPC Applications. David Skinner

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Performance Analysis and Optimization Tool

Getting Started with domc and foreach

Spring 2011 Prof. Hyesoon Kim

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Enhancing SQL Server Performance

Parallelization Strategies for Multicore Data Analysis

Introduction to GPU hardware and to CUDA

FileNet System Manager Dashboard Help

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Cluster Computing at HRI

Chapter 2 Parallel Architecture, Software And Performance

ST810 Advanced Computing

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Streaming Data, Concurrency And R

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Parallel and Distributed Computing Programming Assignment 1

Whitepaper: performance of SqlBulkCopy

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Multi-Threading Performance on Commodity Multi-Core Processors

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Performance Characteristics of Large SMP Machines

How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power

Automated Repair of Binary and Assembly Programs for Cooperating Embedded Devices

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

High Performance Computing in CST STUDIO SUITE

Effective Java Programming. measurement as the basis

SR-IOV: Performance Benefits for Virtualized Interconnects!

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Hardware Performance Monitor (HPM) Toolkit Users Guide

Performance Application Programming Interface

The RWTH Compute Cluster Environment

Transcription:

Big Data, R, and HPC A Survey of Advanced Computing with R Drew Schmidt April 16, 2015 Drew Schmidt Big Data, R, and HPC

About Me @wrathematics http://librestats.com https://github.com/wrathematics http://wrathematics.info Drew Schmidt Big Data, R, and HPC

Introduction 1 Introduction XSEDE Big Data and Bio Why is my code slow? 2 Profiling and Benchmarking 3 A Hasty Introduction to Advanced Computing with R 4 Wrapup Drew Schmidt Big Data, R, and HPC

Introduction XSEDE 1 Introduction XSEDE Big Data and Bio Why is my code slow? Drew Schmidt Big Data, R, and HPC

Introduction XSEDE Extreme Science and Engineering Discovery Environment Follow on NSF project to TeraGrid in 2012 Centers operate machines, and XSEDE provides seamless infrastructure for allocaeons, access, and training Researchers propose resource use through XRAS Supports thousands of scienests in fields such as: Chemistry BioinformaEcs Materials Science Data Sciences Drew Schmidt Big Data, R, and HPC 1/48

Introduction XSEDE XSEDE Allocations Want to use XSEDE resources to teach a class? h3ps://portal.xsede.org/alloca;ons- overview#types- educa;on Just looking to try out a larger resource or a special resource your campus doesn t have? h3ps://portal.xsede.org/alloca;ons- overview#types- startup Drew Schmidt Big Data, R, and HPC 2/48

Introduction XSEDE XSEDE Allocations See a Campus Champion h.ps://www.xsede.org/current- champions Ready to scale up your research? h.ps://portal.xsede.org/alloca>ons- overview#types- research Drew Schmidt Big Data, R, and HPC 3/48

Introduction XSEDE More helpful resources xsede.orgà User Services Resources available at each Service Provider User Guides describing memory, number of CPUs, file systems, etc. Storage facili?es So@ware (Comprehensive Search) Training: portal.xsede.org à Training Course Calendar On- line training Cer?fica?ons Get face- to- face help from XSEDE experts at your ins?tu?on; contact your local Campus Champions. Extended Collabora?ve Support (formerly known as Advanced User Support (AUSS)) Drew Schmidt Big Data, R, and HPC 4/48

Introduction Big Data and Bio 1 Introduction XSEDE Big Data and Bio Why is my code slow? Drew Schmidt Big Data, R, and HPC

Introduction Big Data and Bio Big Data Volume, Velocity, Variety Volume Sequencers Velocity sensors??? Variety fasta, csv, databases, images, unstructured,... Complexity complicated models Drew Schmidt Big Data, R, and HPC 5/48

Introduction Big Data and Bio Common Computational Problems in Bio p > n. Many small tasks (workflow). Parallelization often difficult. Drew Schmidt Big Data, R, and HPC 6/48

Introduction Why is my code slow? 1 Introduction XSEDE Big Data and Bio Why is my code slow? Drew Schmidt Big Data, R, and HPC

Introduction Why is my code slow? Why is my code slow? Bad code. Abstraction. Serial. Drew Schmidt Big Data, R, and HPC 7/48

Introduction Why is my code slow? Bad Code R isn t very smart. R is slow, but bad programmers are slower! Bad parallel code may be slower than good serial code. Drew Schmidt Big Data, R, and HPC 8/48

Introduction Why is my code slow? Abstraction Never completely free! Could cost a microsecond (not worth worrying about!) or much, much more... But abstraction is A Good Thing TM. Have to find the right balance. Drew Schmidt Big Data, R, and HPC 9/48

Introduction Why is my code slow? Serial https://csgillespie.wordpress.com/2011/01/25/cpu-and-gpu-trends-over-time/ Drew Schmidt Big Data, R, and HPC 10/48

Profiling and Benchmarking 1 Introduction 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Benchmarking 3 A Hasty Introduction to Advanced Computing with R 4 Wrapup Drew Schmidt Big Data, R, and HPC

Profiling and Benchmarking Why Profile? 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Benchmarking Drew Schmidt Big Data, R, and HPC

Profiling and Benchmarking Why Profile? Performance and Accuracy Sometimes π = 3.14 is (a) infinitely faster than the correct answer and (b) the difference between the correct and the wrong answer is meaningless.... The thing is, some specious value of correctness is often irrelevant because it doesn t matter. While performance almost always matters. And I absolutely detest the fact that people so often dismiss performance concerns so readily. Linus Torvalds, August 8, 2008 Drew Schmidt Big Data, R, and HPC 11/48

Profiling and Benchmarking Why Profile? Why Profile? Because performance matters. Bad practices scale up! Your bottlenecks may surprise you. Because R is dumb. R users claim to be data people... so act like it! Drew Schmidt Big Data, R, and HPC 12/48

Profiling and Benchmarking Why Profile? Compilers often correct bad behavior... A Really Dumb Loop 1 int main (){ 2 int x, i; 3 for (i =0; i <10; i ++) 4 x = 1; 5 return 0; 6 } clang -O3 -S example.c main : # BB #0:. cfi _ startproc xorl %eax, % eax ret clang -S example.c main :. cfi _ startproc # BB #0: movl $0, -4(% rsp ) movl $0, -12(% rsp ). LBB0 _1: cmpl $10, -12(% rsp ) jge. LBB0 _4 # BB #2: movl $1, -8(% rsp ) # BB #3: movl -12(% rsp ), % eax addl $1, % eax movl %eax, -12(% rsp ) jmp. LBB0 _1. LBB0 _4: movl $0, % eax ret Drew Schmidt Big Data, R, and HPC 13/48

R will not! Profiling and Benchmarking Why Profile? Dumb Loop 1 for (i in 1:n){ 2 ta <- t(a) 3 Y <- ta %*% Q 4 Q <- qr.q(qr(y)) 5 Y <- A %*% Q 6 Q <- qr.q(qr(y)) 7 } 8 9 Q 1 ta <- t(a) 2 Better Loop 3 for (i in 1:n){ 4 Y <- ta %*% Q 5 Q <- qr.q(qr(y)) 6 Y <- A %*% Q 7 Q <- qr.q(qr(y)) 8 } 9 10 Q Drew Schmidt Big Data, R, and HPC 14/48

Profiling and Benchmarking Example from a Real R Package Why Profile? Exerpt from Original function 1 while (i <=N){ 2 for (j in 1:i){ 3 d.k <- as.matrix(x)[l==j,l==j] 4... Exerpt from Modified function 1 x.mat <- as.matrix(x) 2 3 while (i <=N){ 4 for (j in 1:i){ 5 d.k <- x.mat[l==j,l==j] 6... By changing just 1 line of code, performance of the main method improved by over 350%! Drew Schmidt Big Data, R, and HPC 15/48

Profiling and Benchmarking Why Profile? Some Thoughts R is slow. Bad programmers are slower. R can t fix bad programming. Drew Schmidt Big Data, R, and HPC 16/48

Profiling and Benchmarking Profiling R Code 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Benchmarking Drew Schmidt Big Data, R, and HPC

Profiling and Benchmarking Profiling R Code Timings Getting simple timings as a basic measure of performance is easy, and valuable. system.time() timing blocks of code. Rprof() timing execution of R functions. Rprofmem() reporting memory allocation in R. tracemem() detect when a copy of an R object is created. Drew Schmidt Big Data, R, and HPC 17/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: system.time() system.time() is a basic R utility for timing expressions 1 x <- matrix ( rnorm (20000 * 750), nrow =20000, ncol =750) 2 3 system. time (t(x) %*% x) 4 # user system elapsed 5 # 2.187 0.032 2.324 6 7 system. time ( crossprod (x)) 8 # user system elapsed 9 # 1.009 0.003 1.019 10 11 system. time ( cov (x)) 12 # user system elapsed 13 # 6.264 0.026 6.338 Drew Schmidt Big Data, R, and HPC 18/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: system.time() Put more complicated expressions inside of brackets: 1 x <- matrix ( rnorm (20000 * 750), nrow =20000, ncol =750) 2 3 system. time ({ 4 y <- x+1 5 z <- y* 2 6 }) 7 # user system elapsed 8 # 0.057 0.032 0.089 Drew Schmidt Big Data, R, and HPC 19/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: Rprof() 1 Rprof ( filename =" Rprof. out ", append = FALSE, interval =0.02, 2 memory. profiling = FALSE, gc. profiling = FALSE, 3 line. profiling = FALSE, numfiles =100L, bufsize =10000 L) Drew Schmidt Big Data, R, and HPC 20/48

Profiling and Benchmarking Profiling R Code Drew Schmidt Big Data, R, and HPC 21/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: Rprof() 1 x <- matrix ( rnorm (10000 * 250), nrow =10000, ncol =250) 2 3 Rprof () 4 invisible ( prcomp (x)) 5 Rprof ( NULL ) 6 7 summaryrprof () Drew Schmidt Big Data, R, and HPC 22/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: Rprof() 1 $by. self 2 self. time self. pct total. time total. pct 3 " La. svd " 0.68 69.39 0.72 73.47 4 "%*%" 0.12 12.24 0.12 12.24 5 " aperm. default " 0.04 4.08 0.04 4.08 6 " array " 0.04 4.08 0.04 4.08 7 " matrix " 0.04 4.08 0.04 4.08 8 " sweep " 0.02 2.04 0.10 10.20 9 ### output truncated by presenter 10 11 $by. total 12 total. time total. pct self. time self. pct 13 " prcomp " 0.98 100.00 0.00 0.00 14 " prcomp. default " 0.98 100.00 0.00 0.00 15 " svd " 0.76 77.55 0.00 0.00 16 " La. svd " 0.72 73.47 0.68 69.39 17 ### output truncated by presenter 18 19 $ sample. interval 20 [1] 0.02 21 22 $ sampling. time 23 [1] 0.98 Drew Schmidt Big Data, R, and HPC 23/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: Rprof() 1 Rprof ( interval =.99) 2 invisible ( prcomp (x)) 3 Rprof ( NULL ) 4 5 summaryrprof () Drew Schmidt Big Data, R, and HPC 24/48

Profiling and Benchmarking Profiling R Code Performance Profiling Tools: Rprof() 1 $by. self 2 [1] self. time self. pct total. time total. pct 3 <0 rows > (or 0- length row. names ) 4 5 $by. total 6 [1] total. time total. pct self. time self. pct 7 <0 rows > (or 0- length row. names ) 8 9 $ sample. interval 10 [1] 0.99 11 12 $ sampling. time 13 [1] 0 Drew Schmidt Big Data, R, and HPC 25/48

Profiling and Benchmarking Advanced R Profiling 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Benchmarking Drew Schmidt Big Data, R, and HPC

Profiling and Benchmarking Advanced R Profiling Other Profiling Tools perf, PAPI fpmpi, mpip, TAU pbdprof pbdpapi Drew Schmidt Big Data, R, and HPC 26/48

Profiling and Benchmarking Advanced R Profiling Profiling MPI Codes with pbdprof 1. Rebuild pbdr packages R CMD INSTALL pbdmpi _ 0.2-1. tar. gz \ -- configure - args = \ " --enable - pbdprof " 2. Run code mpirun - np 64 Rscript my_ script. R 3. Analyze results 1 library ( pbdprof ) 2 prof <- read. prof ( " output. mpip ") 3 plot (prof, plot. type =" messages2 ") Drew Schmidt Big Data, R, and HPC 27/48

Profiling and Benchmarking Advanced R Profiling Profiling with pbdpapi Bindings for Performance Application Programming Interface (PAPI) Gathers detailed hardware counter data. High and low level interfaces Function system.flips() system.flops() system.cache() system.epc() system.idle() system.cpuormem() system.utilization() Description of Measurement Time, floating point instructions, and Mflips Time, floating point operations, and Mflops Cache misses, hits, accesses, and reads Events per cycle Idle cycles CPU or RAM bound CPU utilization Drew Schmidt Big Data, R, and HPC 28/48

Profiling and Benchmarking Benchmarking 2 Profiling and Benchmarking Why Profile? Profiling R Code Advanced R Profiling Benchmarking Drew Schmidt Big Data, R, and HPC

Profiling and Benchmarking Benchmarking Benchmarking R functions are complicated! Symbol lookup, creating the abstract syntax tree, creating promises for arguments, argument checking, creating environments,... Executing a second time can have dramatically different performance over the first execution. Benchmarking several methods fairly requires some care. Drew Schmidt Big Data, R, and HPC 29/48

Profiling and Benchmarking Benchmarking Benchmarking tools: rbenchmark rbenchmark is a simple package that easily benchmarks different functions: 1 x <- matrix ( rnorm (10000 * 500), nrow =10000, ncol =500) 2 3 f <- function (x) t(x) %*% x 4 g <- function ( x) crossprod ( x) 5 6 library ( rbenchmark ) 7 benchmark (f(x), g(x), columns =c(" test ", " replications ", " elapsed ", " relative ")) 8 9 # test replications elapsed relative 10 # 1 f( x) 100 13.679 3.588 11 # 2 g( x) 100 3.812 1.000 Drew Schmidt Big Data, R, and HPC 30/48

Profiling and Benchmarking Benchmarking Benchmarking tools: microbenchmark microbenchmark is a separate package with a slightly different philosophy: 1 x <- matrix ( rnorm (10000 * 500), nrow =10000, ncol =500) 2 3 f <- function (x) t(x) %*% x 4 g <- function ( x) crossprod ( x) 5 6 library ( microbenchmark ) 7 microbenchmark (f(x), g(x), unit ="s") 8 9 # Unit : seconds 10 # expr min lq mean median uq max neval 11 # f( x) 0.11418617 0.11647517 0.12258556 0.11754302 0.12058145 0.17292507 100 12 # g( x) 0.03542552 0.03613772 0.03884497 0.03668231 0.03740173 0.07478309 100 Drew Schmidt Big Data, R, and HPC 31/48

Profiling and Benchmarking Benchmarking Benchmarking tools: microbenchmark I generally prefer rbenchmark, but the built-in plots for microbenchmark are nice: 1 bench <- microbenchmark (f(x), g(x), unit ="s") 2 3 boxplot ( bench ) log(time) [t] 40 60 80 100 120 140 160 f(x) g(x) Expression Drew Schmidt Big Data, R, and HPC 32/48

A Hasty Introduction to Advanced Computing with R 1 Introduction 2 Profiling and Benchmarking 3 A Hasty Introduction to Advanced Computing with R Free Better Code Compiled code Parallelism 4 Wrapup Drew Schmidt Big Data, R, and HPC

A Hasty Introduction to Advanced Computing with R Types of Improvements Free. Better code. Compiled code. Parallelism. Drew Schmidt Big Data, R, and HPC 33/48

A Hasty Introduction to Advanced Computing with R Free 3 A Hasty Introduction to Advanced Computing with R Free Better Code Compiled code Parallelism Drew Schmidt Big Data, R, and HPC

A Hasty Introduction to Advanced Computing with R Free Build R with a Better Compiler Not entirely painless. Can cost $$$. Better compiler = Faster R R Installation and Administration: http://cran.r-project.org/doc/manuals/r-admin.html Drew Schmidt Big Data, R, and HPC 34/48

A Hasty Introduction to Advanced Computing with R Free The Bytecode Compiler 1 f <- function ( n) for ( i in 1: n) 2* (3+4) 2 3 4 library ( compiler ) 5 f_ comp <- cmpfun ( f) 6 7 8 library ( rbenchmark ) 9 10 n <- 100000 11 benchmark (f(n), f_ comp (n), columns =c(" test ", " replications ", " elapsed ", 12 " relative "), 13 order =" relative ") 14 # test replications elapsed relative 15 # 2 f_ comp ( n) 100 2.604 1.000 16 # 1 f( n) 100 2.845 1.093 Drew Schmidt Big Data, R, and HPC 35/48

A Hasty Introduction to Advanced Computing with R Free Choice of BLAS Library 50 40 Comparison of Different BLAS Implementations for Matrix Matrix Multiplication and SVD x%*%x on 2000x2000 matrix (~31 MiB) x%*%x on 4000x4000 matrix (~122 MiB) 30 1 s e t. s e e d (1234) 2 m < 2000 3 n < 2000 4 x < m a t r i x ( 5 rnorm (m n ), 6 m, n ) 7 8 o b j e c t. s i z e ( x ) 9 10 l i b r a r y ( rbenchmark ) 11 12 benchmark ( x% %x ) 13 benchmark ( svd ( x ) ) Average Wall Clock Run Time (10 Runs) 20 10 0 50 40 30 svd(x) on 1000x1000 matrix (~8 MiB) svd(x) on 2000x2000 matrix (~31 MiB) 20 10 0 reference atlas openblas1 openblas2 reference atlas openblas1 openblas2 BLAS Impelentation Drew Schmidt Big Data, R, and HPC 36/48

A Hasty Introduction to Advanced Computing with R Better Code 3 A Hasty Introduction to Advanced Computing with R Free Better Code Compiled code Parallelism Drew Schmidt Big Data, R, and HPC

A Hasty Introduction to Advanced Computing with R Better Code Loops, Plys, and Vectorization Loops are slow. apply(), Reduce() are just for loops. Map(), lapply(), sapply(), mapply() (and most other core ones) are not for loops. Ply functions are not vectorized. Vectorization is fastest, but consumes lots of memory. Drew Schmidt Big Data, R, and HPC 37/48

A Hasty Introduction to Advanced Computing with R Compiled code 3 A Hasty Introduction to Advanced Computing with R Free Better Code Compiled code Parallelism Drew Schmidt Big Data, R, and HPC

A Hasty Introduction to Advanced Computing with R Compiled code Rcpp What Rcpp is R interface to compiled code. Package ecosystem (Rcpp, RcppArmadillo, RcppEigen,... ). Utilities to make writing C++ more convenient for R users. A tool which requires C++ knowledge to effectively utilize. What Rcpp is not Magic. Automatic R-to-C++ converter. A way around having to learn C++. As easy to use as R. Drew Schmidt Big Data, R, and HPC 38/48

A Hasty Introduction to Advanced Computing with R Compiled code Quickly Getting Started 1 code <- 2 # include <Rcpp.h> 3 4 // [[ Rcpp :: export ]] 5 int plustwo ( int n) 6 { 7 return n +2; 8 } 9 10 11 library ( Rcpp ) 12 sourcecpp ( code = code ) 13 14 plustwo (1) 15 # [1] 3 Drew Schmidt Big Data, R, and HPC 39/48

A Hasty Introduction to Advanced Computing with R Parallelism 3 A Hasty Introduction to Advanced Computing with R Free Better Code Compiled code Parallelism Drew Schmidt Big Data, R, and HPC

A Hasty Introduction to Advanced Computing with R Parallelism Parallelism Serial Programming Parallel Programming Drew Schmidt Big Data, R, and HPC 40/48

A Hasty Introduction to Advanced Computing with R Parallel Programming: In Theory Parallelism Drew Schmidt Big Data, R, and HPC 41/48

A Hasty Introduction to Advanced Computing with R Parallelism Parallel Programming: In Practice Drew Schmidt Big Data, R, and HPC 42/48

A Hasty Introduction to Advanced Computing with R Parallelism Shared and Distributed Memory Machines Shared Memory Machines Thousands of cores Distributed Memory Machines Hundreds of thousands of cores Titan, Oak Ridge National Lab 299,008 cores 584 TB RAM Nautilus, University of Tennessee 1024 cores 4 TB RAM Drew Schmidt Big Data, R, and HPC 43/48

A Hasty Introduction to Advanced Computing with R Parallelism Parallel Programming Packages for R Shared Memory Examples: parallel, snow, foreach, gputools, HiPLARM Distributed Examples: pbdr, Rmpi, RHadoop, RHIPE CRAN HPC Task View For more examples, see: http://cran.r-project.org/web/views/ HighPerformanceComputing.html Drew Schmidt Big Data, R, and HPC 44/48

A Hasty Introduction to Advanced Computing with R Parallelism Parallel Programming Packages for R Distributed Memory Interconnection Network PROC PROC PROC PROC + cache + cache + cache + cache snow Rmpi pbdmpi Focus on who owns what data and what communication is needed RHIPE Sockets MPI Hadoop LAPACK BLAS ScaLAPACK PBLAS BLACS Mem Mem Mem Mem Shared Memory CORE CORE CORE CORE + cache + cache + cache + cache Network Memory Co-Processor Focus on which tasks can be parallel Same Task on Blocks of data GPU or MIC CUDA OpenCL.C Local Memory OpenACC.Call GPU: Graphical Processing Unit Rcpp OpenMP MIC: Many Integrated Core OpenCL inline OpenACC OpenMP Threads fork magma HiPLARM PETSc Trilinos pbddmat CUBLAS MAGMA MKL ACML LibSci DPLASMA snow + multicore = parallel multicore (fork) PLASMA Drew Schmidt Big Data, R, and HPC 45/48

A Hasty Introduction to Advanced Computing with R Parallelism pbdr Packages Drew Schmidt Big Data, R, and HPC 46/48

Wrapup 1 Introduction 2 Profiling and Benchmarking 3 A Hasty Introduction to Advanced Computing with R 4 Wrapup Drew Schmidt Big Data, R, and HPC

Wrapup Performance-Centered Development Model 1 Just get it working. 2 Profile vigorously. 3 Weigh your options. Improve R code? (lapply(), vectorization, a package,... ) Incorporate C/C++? Go parallel? Some combination of these... 4 Don t forget the free stuff (BLAS, bytecode compiler,... ). 5 Repeat 2 4 until performance is acceptable. Drew Schmidt Big Data, R, and HPC 47/48

Wrapup Where to Learn More The Art of R Programming by Norm Matloff: http://nostarch.com/artofr.htm The R Inferno by Patrick Burns: http://www.burns-stat.com/pages/tutor/r_inferno.pdf Using R for HPC 4 Hour Tutorial Drew Schmidt Big Data, R, and HPC 48/48

Thanks so much for attending! Questions? Breakout Sessions Big Data, R and HPC Environmental/Habitat data Physiological data Population data Vegetation surveys Drew Schmidt Big Data, R, and HPC 48/48