Package snow. R topics documented: October 31, 2015



Similar documents
Package snowfall. October 14, 2015

Package retrosheet. April 13, 2015

Package HadoopStreaming

Package Rdsm. February 19, 2015

Package empiricalfdr.deseq2

CDD user guide. PsN Revised

Package plan. R topics documented: February 20, 2015

Package hive. January 10, 2011

Package survpresmooth

Each function call carries out a single task associated with drawing the graph.

Package sjdbc. R topics documented: February 20, 2015

Package tagcloud. R topics documented: July 3, 2015

Package TSfame. February 15, 2013

Package hive. July 3, 2015

Package neuralnet. February 20, 2015

Cluster Analysis using R

Your Best Next Business Solution Big Data In R 24/3/2010

Package cpm. July 28, 2015

Package httprequest. R topics documented: February 20, 2015

Introduction to parallel computing in R

Package packrat. R topics documented: March 28, Type Package

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

Package bigdata. R topics documented: February 19, 2015

Package png. February 20, 2015

Package parallel. R-core. May 19, 2015

2 intervals-package. Index 33. Tools for working with points and intervals

Package missforest. February 20, 2015

Package uptimerobot. October 22, 2015

Package PRISMA. February 15, 2013

Introduction to the doredis Package

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Computational Mathematics with Python

Package EstCRM. July 13, 2015

Introduction to R. Control Structures. Programming in R. Branching/Conditional statements: the if statement. example: if(...) {... } else {...

Package MDM. February 19, 2015

Package LexisPlotR. R topics documented: April 4, Type Package

Big Data Analytics. Tools and Techniques

Package MBA. February 19, Index 7. Canopy LIDAR data

Package erp.easy. September 26, 2015

Parallel Options for R

Parallel Computing with Bayesian MCMC and Data Cloning in R with the dclone package

Graphics in R. Biostatistics 615/815

Calcul parallèle avec R

Computational Mathematics with Python

Package decompr. August 17, 2016

Creating a Simple Macro

S3IT: Service and Support for Science IT. Scaling R on cloud infrastructure Sergio Maffioletti IS/Cloud S3IT: Service and Support for Science IT

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

CLC Server Command Line Tools USER MANUAL

Package hoarder. June 30, 2015

Cellular Computing on a Linux Cluster

Package TimeWarp. R topics documented: April 1, 2015

Final Software Tools and Services for Traders

Package bigrf. February 19, 2015

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

Computational Mathematics with Python

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

User s Manual

Package sendmailr. February 20, 2015

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

Parallel System This is a system that will fail only if they all fail.

Excel 2003 Tutorials - Video File Attributes

Package pdfetch. R topics documented: July 19, 2015

Package optirum. December 31, 2015

High Performance Computing with R

Package CoImp. February 19, 2015

Package RIGHT. March 30, 2015

Package SHELF. February 5, 2016

Package MixGHD. June 26, 2015

Optimization of sampling strata with the SamplingStrata package

Intellicus Cluster and Load Balancing- Linux. Version: 7.3

Package dsmodellingclient

Big Data and Parallel Work with R

A few useful MATLAB functions

Package TRADER. February 10, 2016

Package colortools. R topics documented: February 19, Type Package

Package CRM. R topics documented: February 19, 2015

Viewing Ecological data using R graphics

Package polynom. R topics documented: June 24, Version 1.3-8

Using Casio Graphics Calculators

Package trimtrees. February 20, 2015

Package ATE. R topics documented: February 19, Type Package Title Inference for Average Treatment Effects using Covariate. balancing.

LabVIEW Day 6: Saving Files and Making Sub vis

Package fimport. February 19, 2015

Package smoothhr. November 9, 2015

Package RCassandra. R topics documented: February 19, Version Title R/Cassandra interface

Package robfilter. R topics documented: February 20, Version 4.1 Date Title Robust Time Series Filters

Package dunn.test. January 6, 2016

Load Distribution on a Linux Cluster using Load Balancing

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Project 2: Bejeweled

HP-UX Essentials and Shell Programming Course Summary

NICK COLLIER - REPAST DEVELOPMENT TEAM

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Data Domain Profiling and Data Masking for Hadoop

Transcription:

Package snow October 31, 2015 Title Simple Network of Workstations Version 0.4-1 Author Luke Tierney, A. J. Rossini, Na Li, H. Sevcikova Support for simple parallel computing in R. Maintainer Luke Tierney <luke-tierney@uiowa.edu> Suggests Rmpi,rlecuyer,nws License GPL Depends R (>= 2.13.1), utils NeedsCompilation no Repository CRAN Date/Publication 2015-10-31 11:10:46 R topics documented: snow-cluster......................................... 1 snow-parallel........................................ 3 snow-rand.......................................... 4 snow-startstop........................................ 5 snow-timing......................................... 8 Index 9 snow-cluster Cluster-Level SNOW Functions Functions for computing on a SNOW cluster. 1

2 snow-cluster clustersplit(cl, seq) clustercall(cl, fun,...) clusterapply(cl, x, fun,...) clusterapplylb(cl, x, fun,...) clusterevalq(cl, expr) clusterexport(cl, list, envir =.GlobalEnv) clustermap(cl, fun,..., MoreArgs = NULL, RECYCLE = TRUE) cl fun expr seq list envir x cluster object function or character string naming a function expression to evaluate vector to split character vector of variables to export environment from which t export variables array... additional arguments to pass to standard function MoreArgs RECYCLE additional argument for fun logical; if true shorter arguments are recycled These are the basic functions for computing on a cluster. All evaluations on the slave nodes are done using trycatch. Currently an error is signaled on the master if any one of the nodes produces an error. More sophisticated approaches will be considered in the future. clustercall calls a function fun with identical arguments... on each node in the cluster cl and returns a list of the results. clusterevalq evaluates a literal expression on each cluster node. It a cluster version of evalq, and is a convenience function defined in terms of clustercall. clusterapply calls fun on the first cluster node with arguments seq[[1]] and..., on the second node with seq[[2]] and..., and so on. If the length of seq is greater than the number of nodes in the cluster then cluster nodes are recycled. A list of the results is returned; the length of the result list will equal the length of seq. clusterapplylb is a load balancing version of clusterapply. if the length p of seq is greater than the number of cluster nodes n, then the first n jobs are placed in order on the n nodes. When the first job completes, the next job is placed on the available node; this continues until all jobs are complete. Using clusterapplylb can result in better cluster utilization than using clusterapply. However, increased communication can reduce performance. Furthermore, the node that executes a particular job is nondeterministic, which can complicate ensuring reproducibility in simulations. clustermap is a multi-argument version of clusterapply, analogous to mapply. If RECYCLE is true shorter arguments are recycled; otherwise, the result length is the length of the shortest argument. Cluster nodes are recycled if the length of the result is greater than the number of nodes.

snow-parallel 3 clusterexport assigns the values on the master of the variables named in list to variables of the same names in the global environments of each node. The environment on the master from which variables are exported defaults to the global environment. clustersplit splits seq into one consecutive piece for each cluster and returns the result as a list with length equal to the number of cluster nodes. Currently the pieces are chosen to be close to equal in length. Future releases may attempt to use relative performance information about nodes to choose split proportional to performance. For more details see http://www.stat.uiowa.edu/~luke/r/cluster/cluster.html. cl <- makesockcluster(c("localhost","localhost")) clusterapply(cl, 1:2, get("+"), 3) clusterevalq(cl, library(boot)) x<-1 clusterexport(cl, "x") clustercall(cl, function(y) x + y, 2) snow-parallel Higher Level SNOW Functions Parallel versions of apply and related functions. parlapply(cl, x, fun,...) parsapply(cl, X, FUN,..., simplify = TRUE, USE.NAMES = TRUE) parapply(cl, X, MARGIN, FUN,...) parrapply(cl, x, fun,...) parcapply(cl, x, fun,...) parmm(cl, A, B) cl fun X x cluster object function or character string naming a function array to be used matrix to be used

4 snow-rand FUN function or character string naming a function MARGIN vector specifying the dimensions to use. simplify logical; see sapply USE.NAMES logical; see sapply... additional arguments to pass to standard function A matrix B matrix parlapply, parsapply, and parapply are parallel versions of lapply, sapply, and apply. parrapply and parcapply are parallel row and column apply functions for a matrix x; they may be slightly more efficient than parapply. parmm is a very simple(minded) parallel matrix multiply; it is intended as an illustration. For more details see http://www.stat.uiowa.edu/~luke/r/cluster/cluster.html. cl <- makesockcluster(c("localhost","localhost")) parsapply(cl, 1:20, get("+"), 3) snow-rand Uniform Random Number Generation in SNOW Clusters Initialize independent uniform random number streams to be used in a SNOW cluster. It uses either the L Ecuyer s random number generator (package rlecuyer required) or the SPRNG generator (package rsprng required). clustersetuprng (cl, type = "RNGstream",...) clustersetuprngstream (cl, seed=rep(12345,6),...) clustersetupsprng (cl, seed = round(2^32 * runif(1)), prngkind = "default", para = 0,...)

snow-startstop 5 cl type Cluster object. type="rngstream" (default) initializes the L Ecuyer s RNG. type="sprng" initializes the SPRNG generator.... passed to the underlying function (see details bellow). seed prngkind para Integer value (SPRNG) or a vector of six integer values (RNGstream) used as seed for the RNG. Character string naming generator type used with SPRNG. Additional parameters for the generator. clustersetuprng calls (subject to its argument values) one of the other functions, passing arguments (cl,...). If the "SPRNG" type is used, then the function clustersetupsprng is called. If the "RNGstream" type is used, then the function clustersetuprngstream is called. clustersetupsprng loads the rsprng package and initializes separate streams on each node. For further details see the documentation of init.sprng. The generator on the master is not affected. NOTE: SPRNG is currently not supported. clustersetuprngstream loads the rlecuyer package, creates one stream per node and distributes the stream states to the nodes. For more details see http://www.stat.uiowa.edu/~luke/r/cluster/cluster.html. clustersetupsprng(cl) clustersetupsprng(cl, seed=1234) clustersetuprng(cl, seed=rep(1,6)) snow-startstop Starting and Stopping SNOW Clusters Functions to start and stop a SNOW cluster and to set default cluster options. makecluster(spec, type = getclusteroption("type"),...) setdefaultclusteroptions(...)

6 snow-startstop makesockcluster(names,..., options = defaultclusteroptions) makempicluster(count,..., options = defaultclusteroptions) makenwscluster(names,..., options = defaultclusteroptions) getmpicluster() spec count names options cl cluster specification number of nodes to create character vector of node names cluster options object cluster object... cluster option specifications type character; specifies cluster type. makecluster starts a cluster of the specified or default type and returns a reference to the cluster. Supported cluster types are "SOCK", "MPI", and "NWS". For "MPI" clusters the spec argument should be an integer specifying the number of slave nodes to create. For "SOCK" and "NWS" clusters spec should be a character vector naming the hosts on which slave nodes should be started; one node is started for each element in the vector. For "SOCK" and "NWS" clusters spec can also be an integer specifying the number of slaves nodes to create on the local machine. For SOCK and NWS clusters the spec can also be a list of machine specifications, each a list of named option values. Such a list must include a character value named host host specifying the name or address of the host to use. Any other option can be specified as well. For SOCK and NWS clusters this may be a more convenient alternative than inhomogeneous cluster startup procedure. The options rscript and snowlib are often useful; see the examples below. stopcluster should be called to properly shut down the cluster before exiting R. If it is not called it may be necessary to use external means to ensure that all slave processes are shut down. setdefaultclusteroptions can be used to specify alternate values for default cluster options. There are many options. The most useful ones are type and homogeneous. The default value of the type option is currently set to "MPI" if Rmpi is on the search path. Otherwise it is set to "MPI" if Rmpi is available, and to "SOCK" otherwise. The homogeneous option should be set to FALSE to specify that the startup procedure for inhomogeneous clusters is to be used; this requires some additional configuration. The default setting is TRUE unless the environment variable R_SNOW_LIB is defined on the master host with a non-empty value. The optionoutfile can be used to specify the file to which slave node output is to be directed. The default is /dev/null; during debugging of an installation it can be useful to set this to a proper file.

snow-startstop 7 On some systems setting outfile to "" or to /dev/tty will result in worker output being sent tothe terminal running the master process. The functions makesockcluster, makempicluster, and makenwscluster can be used to start a cluster of the corresponding type. In MPI configurations where process spawning is not available and something like mpirun is used to start a master and a set of slaves the corresponding cluster will have been pre-constructed and can be obtained with getmpicluster. It is also possible to obtain a reference to the running cluster using makecluster or makempicluster. In this case the count argument can be omitted; if it is supplied, it must equal the number of nodes in the cluster. This interface is still experimental and subject to change. For SOCK and NWS clusters the option manual = TRUE forces a manual startup mode in which the master prints the command to be run manually to start a worker process. Together with setting the outfile option this can be useful for debugging cluster startup. For more details see http://www.stat.uiowa.edu/~luke/r/cluster/cluster.html. ## Two workers run on the local machine as a SOCK cluster. cl <- makecluster(c("localhost","localhost"), type = "SOCK") clusterapply(cl, 1:2, get("+"), 3) ## Another approach to running on the local machine as a SOCK cluster. cl <- makecluster(2, type = "SOCK") clusterapply(cl, 1:2, get("+"), 3) ## A SOCK cluster with two workers on Mac OS X, two on Linux, and two ## on Windows: macoptions <- list(host = "owasso", rscript = "/Library/Frameworks/R.framework/Resources/bin/Rscript", snowlib = "/Library/Frameworks/R.framework/Resources/library") lnxoptions <- list(host = "itasca", rscript = "/usr/lib64/r/bin/rscript", snowlib = "/home/luke/tmp/lib") winoptions <- list(host="192.168.1.168", rscript="c:/program Files/R/R-2.7.1/bin/Rscript.exe", snowlib="c:/rlibs") cl <- makecluster(c(rep(list(macoptions), 2), rep(list(lnxoptions), 2), rep(list(winoptions), 2)), type = "SOCK") clusterapply(cl, 1:6, get("+"), 3)

8 snow-timing snow-timing Timing SNOW Clusters Experimental functions to collect and display timing data for cluster computations. snow.time(expr) ## S3 method for class snowtimingdata print(x,...) ## S3 method for class snowtimingdata plot(x, xlab = "Elapsed Time", ylab = "Node", title = "Cluster ",...) expr x xlab ylab title expression to evaluate timing data object to plot or print x axis label y axis label plot main title... additional arguments snow.time collects and returns and returns timing information for cluster usage in evaluating expr. The return value is an object of class snowtimingdata; details of the return value are subject to change. The print method for snowtimingdata objects shows the total elapsed time, the total communication time between master and worker nodes, and the compute time on each worker node. The plot, motivated by the display produced by xpvm, produces a Gantt chart of the computation, with green rectangles representing active computation, blue horizontal lines representing a worker waiting to return a result, and red lines representing master/worker communications. cl <- makecluster(2,type="sock") x <- rnorm(1000000) tm <- snow.time(clustercall(cl, function(x) for (i in 1:100) sum(x), x)) print(tm) plot(tm)

Index Topic programming snow-cluster, 1 snow-parallel, 3 snow-rand, 4 snow-startstop, 5 snow-timing, 8 snow.time (snow-timing), 8 stopcluster (snow-startstop), 5 clusterapply (snow-cluster), 1 clusterapplylb (snow-cluster), 1 clustercall (snow-cluster), 1 clusterevalq (snow-cluster), 1 clusterexport (snow-cluster), 1 clustermap (snow-cluster), 1 clustersetuprng (snow-rand), 4 clustersetuprngstream (snow-rand), 4 clustersetupsprng (snow-rand), 4 clustersplit (snow-cluster), 1 getmpicluster (snow-startstop), 5 makecluster (snow-startstop), 5 makempicluster (snow-startstop), 5 makenwscluster (snow-startstop), 5 makesockcluster (snow-startstop), 5 parapply (snow-parallel), 3 parcapply (snow-parallel), 3 parlapply (snow-parallel), 3 parmm (snow-parallel), 3 parrapply (snow-parallel), 3 parsapply (snow-parallel), 3 plot.snowtimingdata (snow-timing), 8 print.snowtimingdata (snow-timing), 8 setdefaultclusteroptions (snow-startstop), 5 snow-cluster, 1 snow-parallel, 3 snow-rand, 4 snow-startstop, 5 snow-timing, 8 9