Package Rdsm. February 19, 2015



Similar documents
Package biganalytics

Big Data and Parallel Work with R

Your Best Next Business Solution Big Data In R 24/3/2010

Package bigrf. February 19, 2015

Deal with big data in R using bigmemory package

Package packrat. R topics documented: March 28, Type Package

Parallel Computing for Data Science

Package retrosheet. April 13, 2015

Chapter 12 File Management

Chapter 12 File Management. Roadmap

Package erp.easy. September 26, 2015

Managing users. Account sources. Chapter 1

Package missforest. February 20, 2015

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

File Systems Management and Examples

- An Oracle9i RAC Solution

Package hive. January 10, 2011

Symmetric Multiprocessing

Running a Workflow on a PowerCenter Grid

Oracle Cluster File System on Linux Version 2. Kurt Hackel Señor Software Developer Oracle Corporation

ZooKeeper Administrator's Guide

CDD user guide. PsN Revised

Oracle EXAM - 1Z Oracle Weblogic Server 11g: System Administration I. Buy Full Product.

Cloud Computing. Chapter Hadoop

A programming model in Cloud: MapReduce

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Package hive. July 3, 2015

We mean.network File System

CEN 559 Selected Topics in Computer Engineering. Dr. Mostafa H. Dahshan KSU CCIS

Chapter 6, The Operating System Machine Level

Hybrid Programming with MPI and OpenMP

HADOOP MOCK TEST HADOOP MOCK TEST II

Package HadoopStreaming

Intellicus Cluster and Load Balancing- Linux. Version: 7.3

White Paper. Optimizing the Performance Of MySQL Cluster

Monitoring Replication

A Locally Cache-Coherent Multiprocessor Architecture

Distributed R for Big Data

How To Set Up An Intellicus Cluster Server On Ubuntu (Amd64) With A Powerup.Org (Amd86) And Powerup (Amd76) (Amd85) (Powerup) (Net

Intellicus Web-based Reporting Suite

MCSE TestPrep: Windows NT Server 4, Second Edition Managing Resources

InfiniteGraph: The Distributed Graph Database

Kiko> A personal job scheduler

1z0-102 Q&A. DEMO Version

Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure

Network File System (NFS) Pradipta De

Parallel and Distributed Computing Programming Assignment 1

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Hypertable Architecture Overview

Intellicus Enterprise Reporting and BI Platform

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

SAS 9.4 Intelligence Platform

Java Memory Model: Content

Package sjdbc. R topics documented: February 20, 2015

Analyze Database Optimization Techniques

Package RCassandra. R topics documented: February 19, Version Title R/Cassandra interface

Distributed File Systems. NFS Architecture (1)

Package MBA. February 19, Index 7. Canopy LIDAR data

Apache Hama Design Document v0.6

COS 318: Operating Systems

Work. MATLAB Source Control Using Git

High Availability Solutions for the MariaDB and MySQL Database

File System Management

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Operating Systems and Networks

Colligo Contributor File Manager 4.6. User Guide

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Hadoop Architecture. Part 1

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

User's Guide - Beta 1 Draft

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Package TSfame. February 15, 2013

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Package sendmailr. February 20, 2015

I. General Database Server Performance Information. Knowledge Base Article. Database Server Performance Best Practices Guide

Package snowfall. October 14, 2015

Attunity RepliWeb Event Driven Jobs

Java Virtual Machine: the key for accurated memory prefetching

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

Distributed File Systems

Chapter 10 Case Study 1: LINUX

S3IT: Service and Support for Science IT. Scaling R on cloud infrastructure Sergio Maffioletti IS/Cloud S3IT: Service and Support for Science IT

Transactions and ACID in MongoDB

A Survey of Shared File Systems

Code:1Z Titre: Oracle WebLogic. Version: Demo. Server 12c Essentials.

Distributed File Systems Part I. Issues in Centralized File Systems

Tivoli Monitoring for Databases: Microsoft SQL Server Agent

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Frequently Asked Questions (FAQs) United Computer Group, Inc. VAULT400 System i (AS/400) Agent

WebSphere Business Monitor V7.0: Clustering Single cluster deployment environment pattern

Efficiency of Web Based SAX XML Distributed Processing

StarWind iscsi SAN & NAS: Configuring HA Storage for Hyper-V October 2012

Package DCG. R topics documented: June 8, Type Package

Version Control with Subversion and Xcode

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

HDFS. Hadoop Distributed File System

Transcription:

Version 2.1.1 Package Rdsm February 19, 2015 Author <normmatloff@gmail.com> Maintainer <normmatloff@gmail.com> Date 10/01/2014 Title Threads Environment for R Provides a threads-type programming environment for R. The package gives the R programmer the clearer, more concise shared memory world view, and in some cases gives superior performance as well. In addition, it enables parallel processing on very large, out-of-core matrices. OS_type unix Imports bigmemory (>= 4.0.0), parallel Suggests synchronicity LazyLoad no License GPL (>= 2) Repository CRAN NeedsCompilation no Date/Publication 2014-10-08 05:51:02 R topics documented: Rdsm-package........................................ 2 barr............................................. 4 getidxs............................................ 4 getmatrix.......................................... 5 loadex............................................ 6 makebarr.......................................... 7 mgrinit............................................ 7 mgrmakelock........................................ 8 mgrmakevar......................................... 9 rdsmlock.......................................... 9 readsync........................................... 11 stoprdsm........................................... 12 1

2 Rdsm-package Index 13 Rdsm-package Adds a threaded parallel programming paradigm to R. This package provides a parallel shared-memory programming paradigm for R, very similar to threaded programming in C/C++. This enables the programmer to write simpler, clearer code. Furthermore, in some applications this package produces significantly faster code, compared to versions written for other parallel R libraries. It also allows placing very large matrices in secondary storage, while treating them as being in shared memory. Package: Rdsm Type: Package Version: 2.1.1 Date: 2014-02-16 License: GPL (>= 2) List of functions: initialization, run at manager: mgrinit(): initialize system mgrmakevar(): create a shared variable mgrmakelock(): create a lock makebarr(): create a barrier called by applications: barr(): barrier call rdsmlock(): lock operation (via realrdsmlock()) rdsmunlock(): unlock operation (via realrdsmunlock()) application utilities: getidxs(): partition a set of indices for work assignment getmatrix(): allow a matrix to be referenced regardless of whether it is specified as a bigmemory object, a bigmemory descriptor, or via a quoted name stoprdsm() shut down cluster and clean up files Built-in variables accessible by the threads, at the worker nodes:

Rdsm-package 3 myinfo$nwrkrs: total number of threads myinfo$id: this thread s ID number To run, set up a cluster via the parallel package); we ll refer to the R process from which this is done as the manager; the processes running in the cluster will be called workers. Create the application s shared variables from the manager, using mgrmakevar(). Launch the worker threads, again from the manager, by the parallel call clusterevalq() or clustercall(). One typically codes so that the results are in shared variables. See examples below, and more in the examples/ directory in this distribution. The shared variables are read to/written by any of the workers and the manager. In fact, while an Rdsm application is running, other R processes on the same machine (or a different machine sharing the same file system, if the variables are filebacked) can access the shared variables. See the file ExternalAccess.txt in the doc/ directory. Rdsm uses the bigmemory library to store its shared variables. Though the latter can work on a (physical) cluster of several machines sharing a file system, Rdsm does not run on such systems at this time. Further documentation in the doc/ directory. See Also <matloff@cs.ucdavis.edu> mgrinit, mgrmakevar, mgrmakelock, barr, rdsmlock, rdsmunlock, getidxs, getmatrix Examples library(parallel) c2 <- makecluster(2) # form 2-thread Snow cluster mgrinit(c2) # initialize Rdsm mgrmakevar(c2,"m",2,2) # make a 2x2 shared matrix m[,] <- 3 # 2x2 matrix of all 3s # example of shared memory: # at each thread, set id to Rdsm built-in ID variable for that thread clusterevalq(c2,id <- myinfo$id) clusterevalq(c2,m[1,id] <- id^2) # assignment executed by each thread m[,] # top row of m should now be (1,4) # matrix multiplication; the product u %*% v is computed, product # placed in w # note again: mmul() call will be executed by each thread mmul <- function(u,v,w) { require(parallel) # decide which rows of u this thread will work on myidxs <- splitindices(nrow(u),myinfo$nwrkrs)[[myinfo$id]] # multiply this thread s part of u with v, placing the product in the

4 getidxs } # corresponding part of w w[myidxs,] <- u[myidxs,] %*% v[,] invisible(0) # create test matrices mgrmakevar(c2,"a",6,2) mgrmakevar(c2,"b",2,6) mgrmakevar(c2,"c",6,6) # give them values a[,] <- 1:12 b[,] <- 1 # all 1s clusterexport(c2,"mmul") # send mmul() to the threads clusterevalq(c2,mmul(a,b,c)) # run the threads c[,] # check results barr Barrier operation. Standard barrier operation. barr() Standard barrier operation, to ensure that work done by one thread is ready before other threads make use of it. When a thread executes barr(), it will block until all threads have executed it. getidxs Parallelizing work assignment. Assigns to an Rdsm thread its portion of a set of indices, for the purpose of partitioning work to the threads. getidxs(m)

getmatrix 5 m The sequence 1:m will be partitioned, and one portion will be assigned to the calling thread. The range 1:m will be partitioned into r subranges, the i-th of which will be used by thread i to determine which work that thread has been assigned. Value The subrange assigned to the invoking thread. getmatrix Referencing a matrix via different forms. Returns a matrix, whether requested via an R matrix, bigmemory object, bigmemory descriptor or quoted name form. getmatrix(m) m Specification of the matrix, as either an R matrix, bigmemory object, bigmemory descriptor or quoted name. This utility function enables writing general Rdsm code, specifying a matrix via different forms. Value The requested matrix.

6 loadex Examples library(parallel) c2 <- makecluster(2) mgrmakevar(c2,"u",2,2) # u is a 2x2 bigmemory matrix u[] <- 8 # fill u with 8s u[] # prints a 2x2 matrix of 8s v <- getmatrix(u) # get u and assign it to v # u and v are both addresses, pointing to the same memory location v[] # prints all 8s v[2,1] <- 3 v[] # prints three 8s and a 3 u[] # prints three 8s and a 3 w <- getmatrix("u") # w will also be a copy of u w[] # same as u loadex Sources example files Facilitates learning a new package, by sourcing any specified example file for an installed package. loadex(pkg,exfile=na,subdir="examples") pkg exfile subdir Package name, quoted. Desired example file name, if any, quoted. Subdirectory name of the examples directory in the package. Loads the example file exfile, from the subdir directory within the tree wherepkg is installed. If exfile = NA, the names of the files are returned, without any loading. Value See the case exfile = NA above.

makebarr 7 makebarr Create an Rdsm barrier. Creates an Rdsm barrier. makebarr(cls,boost=f,barrback=f) cls boost barrback The snow cluster. Locks type. See mgrinit. If TRUE, the count/sense variables related to the barrier will be placed in backing store. Run this from the manager (the R process from which you create the cluster) if you need a barrier. Only one barrier is allowed in an Rdsm program (but multiple calls are allowed). It is accessible from application code only through barr(). mgrinit Initialize Rdsm Initializes Rdsm on the given parallel cluster. mgrinit(cls,boost=f,barrback=f) cls boost barrback The parallel cluster. Lock functions. If TRUE, boostlock() and boostunlock() are used; otherwise backlock() and backunlock(). If TRUE, the count/sense variables associated with the barrier will be placed in backing store.

8 mgrmakelock Run this from the manager (the R process from which you create the cluster), before creating the shared variables with mgrmakevar. The initialization need be done only once for the life of the cluster. If you put shared variables in backing store (barrback = TRUE in mgrmakevar()), or if you are on a Windows platform, you must have boost = FALSE. mgrmakelock Create an Rdsm lock. Creates an Rdsm lock. mgrmakelock(cls,lockname,boost=f) cls lockname boost The parallel cluster. Name of the lock, quoted. If TRUE, boost locks will be used. Run this from the manager (the R process from which you create the cluster) if you need a lock. The lock is created, lockable/unlockable by all threads. If boost is TRUE, The variable will be of class boost.mutex; see the library synchronicity for details.

mgrmakevar 9 mgrmakevar Create an Rdsm shared variable. Creates an Rdsm shared variable. mgrmakevar(cls,varname,nr,nc,vartype="double",fs=false,mgrcpy=true,savedesc=true) cls varname nr nc vartype fs mgrcpy savedesc The parallel cluster. Name of the shared variable, quoted. (The variable must be a matrix, though it could be 1x1 etc.) Number of rows in the variable. Number of columns in the variable. Atomic R type of the variable, quoted, "double" by default. Place in backing store? FALSE by default. Place a copy of the shared variable on the manager node? TRUE by default. Save the bigmemory descriptor for this variable on disk. Run this from the manager (the R process from which you create the cluster). The shared variable will be created, readable/writable from all threads. The variable will be of class big.matrix; see the library bigmemory for details. rdsmlock Lock/unlock operations. Lock/unlock operations to avoid race conditions among the threads. rdsmlock(lck) rdsmunlock(lck)

10 rdsmlock lck Lock name, quoted. Standard lock/unlock operations from the threaded coding world. When one thread executes rdsmlock(), any other thread attempting to do so will block until the first thread executes rdsmunlock(). If a thread does rdsmlock() on an unlocked lock, the thread call immediately returns and the thread continues. These functions are set in the call to mgrinit() via the argument boost to either boostlock and boostunlock() or backlock and backunlock(), depending on whether you set boost to TRUE or FALSE. respectively. Code should be written so that locks are used as sparingly as possible, since they detract from performance. Examples ## Not run: # unreliable function s <- function(n) { for (i in 1:n) { tot[1,1] <- tot[1,1] + 1 } } library(parallel) c2 <- makecluster(2) clusterexport(c2,"s") mgrinit(c2) mgrmakevar(c2,"tot",1,1) tot[1,1] <- 0 clusterevalq(c2,s(1000)) tot[1,1] # should be 2000, but likely far from it s1 <- function(n) { require(rdsm) for (i in 1:n) { rdsmlock("totlock") tot[1,1] <- tot[1,1] + 1 rdsmunlock("totlock") } } mgrmakelock(c2,"totlock") tot[1,1] <- 0

readsync 11 clusterexport(c2,"s1") clusterevalq(c2,s1(1000)) tot[1,1] # will print out 2000, the correct number ## End(Not run) readsync Syncing file-backed variables. Actions to propagate changes to file-backed Rdsm variables across a shared file system. readsync(varname) writesync(varname) varname Name of the Rdsm variable, quoted. This feature should be considered experimental, with poor performance and portability. Suppose we have an Rdsm variable x which is in backing store (fs = TRUE in call to mgrmakevar()), and that we on a shared file system. (These functions are not needed if all threads are on the same machine.) When one Rdsm thread writes to x, the question here is when the updated value for x is visible to other Rdsm threads. The answer may depend on the underlying file system. The functions readsync() and writesync() force the updates across the network. Normally one would call readsync() following rdsmlock() and call writesync() just before calling rdsmunlock(). These should work on systems with "close-to-open cache coherency," as with the Network File System (NFS). On some systems, these functions should be unnecessary. Value None.

12 stoprdsm Examples library(parallel) c2 <- makecluster(2) mgrinit(c2) mgrmakevar(c2,"x",1,1,fs=true) clusterevalq(c2,me <- myinfo$id) clusterevalq(c2,if (me==1) x[1,1] <- 3) # force update on network clusterevalq(c2,if (me==1) writesync("x")) clusterevalq(c2,if (me==2) readsync("x")) clusterevalq(c2,if (me==2) x[1,1]) # should be 3 clusterevalq(c2,if (me==2) x[1,1] <- 8) clusterevalq(c2,if (me==2) writesync("x")) clusterevalq(c2,if (me==1) readsync("x")) clusterevalq(c2,x[1,1]) # both should yield 8 stoprdsm Barrier operation. Standard barrier operation. stoprdsm(cls) cls Cluster from the parallel package. Shuts down the given parallel cluster, and removes any.desc files that had been created.

Index Topic parallel computation Rdsm-package, 2 Topic shared memory Rdsm-package, 2 Topic threads Rdsm-package, 2 Topic Rdsm-package, 2 barr, 3, 4 getidxs, 3, 4 getmatrix, 3, 5 loadex, 6 makebarr, 7 mgrinit, 3, 7 mgrmakelock, 3, 8 mgrmakevar, 3, 9 Rdsm (Rdsm-package), 2 Rdsm-package, 2 rdsmlock, 3, 9 rdsmunlock, 3 rdsmunlock (rdsmlock), 9 readsync, 11 stoprdsm, 12 writesync (readsync), 11 13