Parallel Computing with R using GridRPC



Similar documents
GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources

How to Tunnel Remote Desktop Through SSH on a Windows Computer

F-SECURE MESSAGING SECURITY GATEWAY

1Z Oracle Weblogic Server 11g: System Administration I. Version: Demo. Page <<1/7>>

F-Secure Messaging Security Gateway. Deployment Guide

Application Note: FTP Server Setup on computers running Windows-7 For use with 2500P-ACP1

Active Directory Sync (AD) How to Setup

Clusterix Dynamic Clusters Administration Tutorial

Client/Server Computing Distributed Processing, Client/Server, and Clusters

SCOPTEL WITH ACTIVE DIRECTORY USER DOCUMENTATION

A Survey Study on Monitoring Service for Grid

Active Directory Sync (AD) How it Works in WhosOnLocation

Security Provider Integration RADIUS Server

Ciphermail Gateway Separate Front-end and Back-end Configuration Guide

Configuration Manual

CYAN SECURE WEB HOWTO. NTLM Authentication

PineApp Surf-SeCure Quick

PaperCut Payment Gateway Module - PayPal Payflow Link - Quick Start Guide

GT 6.0 GRAM5 Key Concepts

How To Configure A Bomgar.Com To Authenticate To A Rdius Server For Multi Factor Authentication

escan SBS 2008 Installation Guide

Preinstallation Requirements Guide

Oracle Exam 1z0-102 Oracle Weblogic Server 11g: System Administration I Version: 9.0 [ Total Questions: 111 ]

CS312 Solutions #6. March 13, 2015

WHMCS LUXCLOUD MODULE

PaperCut Payment Gateway Module - PayPal Payflow Link - Quick Start Guide

File Transfer Examples. Running commands on other computers and transferring files between computers

Configuring the BIG-IP and Check Point VPN-1 /FireWall-1

IP Interface for the Somfy Digital Network (SDN) & RS485 URTSII

Install FileZilla Client. Connecting to an FTP server

Chapter 2: Getting Started

LUCOM GmbH * Ansbacher Str. 2a * Zirndorf * Tel / * Fax / *

About This Document 3. About the Migration Process 4. Requirements and Prerequisites 5. Requirements... 5 Prerequisites... 5

GRAVITYZONE HERE. Deployment Guide VLE Environment

G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids

Securing Windows Remote Desktop with CopSSH

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

Parallel Computing with Mathematica UVACSE Short Course

/ Preparing to Manage a VMware Environment Page 1

Products for the registry databases and preparation for the disaster recovery

CycleServer Grid Engine Support Install Guide. version 1.25

Building Web Services with XML Service Utility Library (XSUL)

Assignment # 1 (Cloud Computing Security)

Chapter 2: Remote Procedure Call (RPC)

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

What is new in Zorp Professional 6

The SSL device also supports the 64-bit Internet Explorer with new ActiveX loaders for Assessment, Abolishment, and the Access Client.

Stream Processing on GPUs Using Distributed Multimedia Middleware

Laptop Backup - Administrator Guide (Windows)

The MoCA CIS LIS WSDL Network SOAP/WS

[Setup procedure for Windows 95/98/Me]

Spam Marshall SpamWall Step-by-Step Installation Guide for Exchange 5.5

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallels Cloud Server 6.0

Security Provider Integration Kerberos Authentication

Setting Up Specify to use a Shared Workstation as a Database Server

By Wick Gankanda Updated: August 8, 2012

Microsoft XP Professional Remote Desktop Connection

OMU350 Operations Manager 9.x on UNIX/Linux Advanced Administration

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme. Middleware. Chapter 8: Middleware

HDFS Cluster Installation Automation for TupleWare

Quick Start Guide. User Manual. 1 March 2012

PERFORMANCE COMPARISON OF COMMON OBJECT REQUEST BROKER ARCHITECTURE(CORBA) VS JAVA MESSAGING SERVICE(JMS) BY TEAM SCALABLE

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Juniper Networks Secure Access. Initial Configuration User Records Synchronization

Manager. Configuration Guide. ICS Software Solutions Clarendon House Church Lane Naphill HP14 4US Buckinghamshire

PaperCut Payment Gateway Module Realex Realauth Redirect Quick Start Guide

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

McAfee Firewall Enterprise 8.2.1

PaperCut Payment Gateway Module - RBS WorldPay Quick Start Guide

MS 10972A Administering the Web Server (IIS) Role of Windows Server

Getting Started with the Internet Communications Engine

13.1 Backup virtual machines running on VMware ESXi / ESX Server

Technical Overview Emif Connector

Solving Large Scale Optimization Problems via Grid and Cluster Computing

PaperCut Payment Gateway Module CyberSource Quick Start Guide

Getting Started with SandStorm NoSQL Benchmark

Setting Up B2B Data Exchange for High Availability in an Active/Active Configuration

Using WhatsUp IP Address Manager 1.0

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

10972-Administering the Web Server (IIS) Role of Windows Server

Configuring and Integrating MAPI

Extending Remote Desktop for Large Installations. Distributed Package Installs

ReadyNAS Remote Troubleshooting Guide NETGEAR

INSTALLING KAAZING WEBSOCKET GATEWAY - HTML5 EDITION ON AN AMAZON EC2 CLOUD SERVER

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version Fix Pack 2.

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

Service-Oriented Architecture and Software Engineering

Web Service Robust GridFTP

estos ECSTA for OpenScape Business

NAS 224 Remote Access Manual Configuration

ivms-4500 (Android Tablet) Mobile Client Software User Manual (V3.0)

Transcription:

Parallel Computing with R using GridRPC Junji NAKANO Ei-ji NAKAMA The Institute of Statistical Mathematics, Japan COM-ONE Ltd., Japan The R User Conference 2010 July 20-23, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, USA 1 / 23

1 Introduction 2 GridRPC 3 RGridRPC 4 Installation 5 Setup 6 Concluding remarks 2 / 23

Our aim We hope to use computing resources located on local and remote networks simultaneously by R easily and efficiently. For this aim, we make it possible to use GridRPC protocol in R. 3 / 23

Usual use of remote resources We log in to a front-end of the remote system by using ssh Job scheduler ssh Job scheduler ssh ssh Job scheduler ssh Job scheduler Different remote systems require different operations even for executing the same job ssh Systems have difficulty to access data located outside of firewall We can improve them by using GridRPC 4 / 23

GridRPC GridRPC is middleware that provides a model for access to remote libraries and parallel programming for tasks on a grid. Typical GridRPC middleware includes Ninf-G and Netsolve. The other GridRPC middleware includes GridSolve, DIET, and OmniRPC. We use Ninf-G to realize GridRPC functions in R. Ninf-G is a reference implementation of GridRPC system using the Globus Toolkit. Ninf-G provides GridRPC APIs which are discussed for the standardization at the Grid Remote Procedure Call Working Group of the Global Grid Forum. Some implementations of GridRPC (including Ninf-G) can work through ssh without any Grid middleware. 5 / 23

Overview of Ninf-G Ninf-G is a set of library functions that provide an RPC capability in a Grid environment, based on the GridRPC API specifications. Several processes shown below work together. See http://ninf.apgrid.org/. 6 / 23

Overview of RGridRPC RGridRPC is an implementation to use embedded R and submits jobs to stubs. One process starts from the generation of a handle and ends by the the destruction of it. GridRPC APIs are used like the following figure. grpc_initialize grpc_function_handle_init Client Server1 Server2 grpc_call_async grpc_wait stub(r) stub(r) grpc_function_handle_destruct grpc_finalize 7 / 23

RGridRPC primitive functions Client initialization and finalization functions.grpc initialize(config file).grpc finalize() Handle functions.grpc function handle init(hostname).grpc function handle default().grpc function handle destruct(handle) Session synchronous function.grpc call(handle,fun,...) Session asynchronous functions.grpc call async(handle,fun,...).grpc probe(session).grpc wait(session) 8 / 23

Examples of RGridRPC primitive functions > library(rgridrpc) >.grpc initialize() [1] TRUE > c1<-.grpc function handle default() > f<-function(){sys.sleep(1);paste(sys.info()["nodename"],sys.getpid(),sys.time())} > f() [1] "triton 13228 2010-07-05 12:34:27" >.grpc call(c1, f) [1] "r1400a 26504 2010-07-05 12:34:30" > s1<-.grpc call async(c1, f) > rc<-.grpc probe(s1) > while (rc$result) { cat(rc$message,fill=t); Sys.sleep(1) ; rc<-.grpc probe(s1) } Call has not completed Call has not completed > cat(rc$message,fill=t) No error > grpc wait(s1) [1] "r1400a 26504 2010-07-05 12:34:31" >.grpc R finalize(c1) # server finalize [1] TRUE >.grpc function handle destruct(c1) [1] TRUE >.grpc finalize() [1] TRUE 9 / 23

RGridRPC snow-like functions Client initialization and finalization functions GRPCmake(hostname) GRPCstop(handle) Synchronous functions GRPCevalq(handle,expr) GRPCexport(handle,names) GRPCcall(handle,fun,...) Asynchronous functions GRPCcallAsync(handle,fun,...) GRPCprobe(section) GRPCwait(section) 10 / 23

Examples of RGridRPC snow-like functions (1) > library(rgridrpc) > prt<-function(l){unlist(lapply(l,paste,collapse=":"))} > cpus<-get num cpus() > cl<-grpcmake(rep("localhost",cpus)) > unlist(grpccall(cl,sys.getpid)) [1] 14956 14962 > A<-matrix(rnorm(1e3^2),1e3,1e3) > B<-t(A) > GRPCexport(cl,c("A")) > prt(grpccall(cl,ls)) [1] "A" "A" > sl<-grpccallasync(cl,function(x){ %*% (A,x)},B) > prt(grpcprobe(sl)) [1] "12:Call has not completed" "12:Call has not completed" > str(grpcwait(sl)) List of 2 $ : num [1:1000, 1:1000] 983.48-43.7-9.81-30.66-58.44... $ : num [1:1000, 1:1000] 983.48-43.7-9.81-30.66-58.44... > unlist(grpcstop(cl)) [1] TRUE TRUE 11 / 23

Examples of RGridRPC snow-like functions (2-1) > # http://www.stat.uiowa.edu/~luke/r/cluster/cluster.html > library(rgridrpc) > > library(boot) > data(nuclear) > nuke <- nuclear[,c(1,2,5,7,8,10,11)] > nuke.lm <- glm(log(cost)~date+log(cap)+ne+ ct+log(cum.n)+pt, data=nuke) > nuke.diag <- glm.diag(nuke.lm) > nuke.res <- nuke.diag$res*nuke.diag$sd > nuke.res <- nuke.res-mean(nuke.res) > nuke.data <- data.frame(nuke,resid=nuke.res,fit=fitted(nuke.lm)) > new.data <- data.frame(cost=1, date=73.00, cap=886, ne=0,ct=0, cum.n=11, pt=1) > new.fit <- predict(nuke.lm, new.data) > nuke.fun <- function(dat, inds, i.pred, fit.pred, x.pred) { + assign(".inds", inds, envir=.globalenv) + lm.b <- glm(fit+resid[.inds] ~date+log(cap)+ne+ct+ + log(cum.n)+pt, data=dat) + pred.b <- predict(lm.b,x.pred) + remove(".inds", envir=.globalenv) + c(coef(lm.b), pred.b-(fit.pred+dat$resid[i.pred])) + } 12 / 23

Examples of RGridRPC snow-like functions (2-2) > N<-500 > cpus<-get num cpus() > system.time(nuke.boot <- boot(nuke.data, nuke.fun, R=N*cpus, m=1, + fit.pred=new.fit, x.pred=new.data)) user system elapsed 185.051 616.522 66.795 > > cl<-grpcmake(rep("localhost",cpus)) > GRPCevalq(cl, library(boot)) [[1]] [1] "boot" "stats" "graphics" "grdevices" "utils" "datasets"... [[12]] [1] "boot" "stats" "graphics" "grdevices" "utils" "datasets" > > system.time(cl.nuke.boot <- GRPCcall(cl,boot,nuke.data, nuke.fun, R=N, m=1, + fit.pred=new.fit, x.pred=new.data)) user system elapsed 0.008 0.004 7.189 > > GRPCstop(cl) [[1]] [1] TRUE... [[12]] [1] TRUE 13 / 23

RGridRPC installation by users Download $ wget http://prs.ism.ac.jp/rgridrpc/rgridrpc_0.10-197.tar.gz Client and server $ R -q -e dir.create(sys.getenv("r_libs_user"),rec=t) $ R CMD INSTALL RGridRPC_0.10-123.tar.gz Toolchain and Python are required. When we use Grid middleware (except ssh), we install Ninf-G for each system and set NG DIR environment variable properly and install RGridRPC. Using Grid middleware $ R -q -e dir.create(sys.getenv("r_libs_user"),rec=t) $ NG DIR=/opt/ng R CMD INSTALL RGridRPC_0.10-123.tar.gz 14 / 23

RGridRPC setup RGridRPC reads the file client.conf in the current directry as a configuration file. Two-way connections are required for RGridRPC. Client should be specified by a client hostname from server side in client.conf. Or Proxy should be specified by a Proxy IP address from server side in client.conf. An execution module of stub requires NG DIR environment variable to know the top directory of Ninf-G. RGridRPC uses NRF as Information sources. 15 / 23

client.conf : localhost only <CLIENT> hostname </CLIENT> <SERVER> hostname invoke server environment environment </SERVER> <INFORMATION SOURCE> type tag source </INFORMATION SOURCE> localhost localhost SSH NG DIR=${R LIBS USER}/RGridRPC OMP NUM THREADS=1 NRF nrf RGridRPC.localhost.nrf 16 / 23

Information flow : localhost only R Client RGridRPC Ninf G Client Library Client Ninf G Executable R Ninf G stub Ninf G Executable Library Information Service Information Service (NRF) Invoke Server Invoke Server (SSH) <server> Hostname localhost <client> Hostname localhost sshd 17 / 23

client.conf : using a remote server directly <CLIENT COMMUNICATION PROXY> type SSH </CLIENT COMMUNICATION PROXY> <SERVER> hostname r.ism.ac.jp invoke server SSH environment NG DIR=${R LIBS USER}/RGridRPC environment OMP NUM THREADS=1 communication proxy SSH communication proxy option ssh relaynode r.ism.ac.jp communication proxy option ssh bindaddress 127.0.0.1 </SERVER> <INFORMATION SOURCE> type NRF tag nrf source RGridRPC.r.ism.ac.jp.nrf </INFORMATION SOURCE> 18 / 23

Information flow : using a remote server directly Client R Client RGridRPC Ninf G Client Library Cluster frontend Ninf G Executable R Ninf G stub Ninf G Executable Library Information Service Information Service (NRF) Invoke Server Invoke Server (SSH) Client Communication Proxy Communication Proxy (SSH) Firewall Firewall Remote Communication Proxy Communication Proxy ssh relaynode r.ism.ac.jp ssh bindaddress 127.0.0.1 sshd 19 / 23

client.conf : using a remote cluster <CLIENT COMMUNICATION PROXY> type SSH </CLIENT COMMUNICATION PROXY> <SERVER> hostname r.ism.ac.jp invoke server SSH environment NG DIR=${R LIBS USER}/RGridRPC environment OMP NUM THREADS=1 jobmanager jobmanager-pbs communication proxy SSH communication proxy option ssh relaynode r.ism.ac.jp communication proxy option ssh bindaddress 192.168.0.1 </SERVER> <INFORMATION SOURCE> type NRF tag nrf source RGridRPC.r.ism.ac.jp.nrf </INFORMATION SOURCE> 20 / 23

Information flow : using a remote cluster Client R Client RGridRPC Ninf G Client Library Cluster frontend Cluster node Ninf G Executable R Ninf G stub Ninf G Executable Library Ninf G Executable R Information Service Information Service (SSH) Invoke Server Invoke Server (SSH) Client Communication Proxy Communication Proxy (SSH) Firewall Firewall Remote Communication Proxy Communication Proxy Ninf G stub Ninf G Executable Library ssh relaynode r.ism.ac.jp Job Scheduler ssh bindaddress 192.168.0.1 sshd 21 / 23

RGridRPC eazy setup We provide an R function makeninfgconf to generate client.conf and servername.nrf file. Bind address of each cluster needs to be specified manually. makeninfgconf makeninfgconf(hostname=c( "pbscluster.ism.ac.jp", "remotesv.ism.ac.jp", "localhost"), pkgpath=c( "/home/eiji/r/i486-pc-linux-gnu-library/2.11/rgridrpc/", "/home/eiji/r/x86 64-pc-linux-gnu-library/2.11/RGridRPC/", "/home/eiji/r/powerpc-unknown-linux-gnu-library/2.11/rgridrpc/"), ngdir=c( "/home/eiji/r/i486-pc-linux-gnu-library/2.11/rgridrpc/", "/home/eiji/r/x86 64-pc-linux-gnu-library/2.11/RGridRPC/", "/opt/ng"), invoke server=c("ssh", "SSH", "SSH"), jobmanager=c("jobmanager-pbs", NA, NA)) 22 / 23

Concluding remarks Advantages of RGridRPC R can use many different cluster servers. R can use resources outside of firewall. Disadvantages of present RGridRPC If the limit of job scheduler is tight and some scattered jobs are waiting in the queue, the execution does not stop at all. We cannot handle big objects because of serialization Serialize base64 RAW(max 2Gbyte) The present implementation depends on Ninf-G. Available Job Schedulers are limited to PBS,Torque and SGE. Other Grid RPC middleware is not available. 23 / 23