Implementation of a compute cluster for R/ BUGS simulations

Similar documents
Parallel Processing using the LOTUS cluster

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

MFCF Grad Session 2015

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

LabStats 5 System Requirements

LSKA 2010 Survey Report Job Scheduler

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Installing and running COMSOL on a Linux cluster

What is new in Switch 12

Matlab on a Supercomputer

SLURM Workload Manager

The RWTH Compute Cluster Environment

PARALLELS SERVER 4 BARE METAL README

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Hardware/Software Guidelines

Cloud Computing through Virtualization and HPC technologies

Table of Contents. Considering Terminal services. Considering Citrix on Terminal srvices. Considering Equipment. Installing the Application

An Introduction to High Performance Computing in the Department

Windows HPC 2008 Cluster Launch

WEBLOGIC ADMINISTRATION

Learning More About Load Testing

Using the Windows Cluster

Using SAS in a Distributed Computing Environment Chris Smith, Platform Computing, San Jose, CA

WebLogic Server Foundation Topology, Configuration and Administration

LLamasoft K2 Enterprise 8.1 System Requirements

Batch Scheduling and Resource Management

Running COMSOL in parallel

Grid Computing in SAS 9.4 Third Edition

Agenda. Using HPC Wales 2

Citrix EdgeSight for Load Testing Installation Guide. Citrix EdgeSight for Load Testing 3.5

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

APNT#1209 Using GP-Pro EX in Windows 7 XP Mode. Introduction. Prerequisites. Licensing and availability of XP Mode

Extending Hadoop beyond MapReduce

Enterprise Deployment

Advanced Operating Systems CS428

Informationsaustausch für Nutzer des Aachener HPC Clusters

Universidad Simón Bolívar

Installation Instructions Release Version 15.0 January 30 th, 2011

The Asterope compute cluster

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

Linux Cluster - Compute Power Out of the Box

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

UMass High Performance Computing Center

Apache Hama Design Document v0.6

Running applications on the Cray XC30 4/12/2015

Miami University RedHawk Cluster Working with batch jobs on the Cluster

PPC s SMART Practice Aids Prepare for Installing database upgrade to SQL Express 2008 R2

Table of Contents. FleetSoft Installation Guide

HPC Wales Skills Academy Course Catalogue 2015

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

User s Manual

CDH installation & Application Test Report

Citrix EdgeSight for Load Testing Installation Guide. Citrix EdgeSight for Load Testing 3.8

Simplest Scalable Architecture

Citrix XenServer Backups with SEP sesam

Copyright by Parallels Holdings, Ltd. All rights reserved.

SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

This Deployment Guide is intended for administrators in charge of planning, implementing and

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Grid Scheduling Dictionary of Terms and Keywords

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

CycleServer Grid Engine Support Install Guide. version 1.25

MEGA Web Application Architecture Overview MEGA 2009 SP4

Using the Yale HPC Clusters

Chapter 2: Getting Started

Platform LSF Version 9 Release 1.2. Foundations SC

Configuring and Launching ANSYS FLUENT Distributed using IBM Platform MPI or Intel MPI

Multiprogramming. IT 3123 Hardware and Software Concepts. Program Dispatching. Multiprogramming. Program Dispatching. Program Dispatching

Operating Systems Overview As we have learned in working model of a computer we require a software system to control all the equipment that are

Kiko> A personal job scheduler

Parallel Computing with Mathematica UVACSE Short Course

Table of Contents Introduction and System Requirements 9 Installing VMware Server 35

The ENEA-EGEE site: Access to non-standard platforms

Optimizing Business Continuity Management with NetIQ PlateSpin Protect and AppManager. Best Practices and Reference Architecture

MapGuide Open Source Repository Management Back up, restore, and recover your resource repository.

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Parallel Debugging with DDT

Introduction to Sun Grid Engine (SGE)

The Google File System

Oracle EXAM - 1Z Oracle Weblogic Server 11g: System Administration I. Buy Full Product.

A closer look at HP LoadRunner software

Dragon Medical Enterprise Network Edition Technical Note: Requirements for DMENE Networks with virtual servers

Data Center Specific Thermal and Energy Saving Techniques

How To Install Acronis Backup & Recovery 11.5 On A Linux Computer

IM and Presence Disaster Recovery System

SIEBEL SERVER ADMINISTRATION GUIDE

ABAQUS High Performance Computing Environment at Nokia

PARALLELS SERVER BARE METAL 5.0 README

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Computer Virtualization in Practice

Guideline for stresstest Page 1 of 6. Stress test

Intellicus Enterprise Reporting and BI Platform

Martinos Center Compute Clusters

Moxa Device Manager 2.0 User s Guide

Workshop & Chalk n Talk Catalogue Services Premier Workshop & Chalk n Talk Catalogue

BitDefender Security for Exchange

PMOD Installation on Linux Systems

Administrator s Guide

This document describes the new features of this release and important changes since the previous one.

Transcription:

Implementation of a compute cluster for R/ BUGS simulations Maud Destrée Alain Dalis Will Talbot Charles Castelain UCB IT Department Lou, living with epilepsy 26 mai 10

Agenda 2! Background! Goal & Requirements! Compute cluster architecture! Changes in R code! Conclusions and next steps

Background 3! Statisticians run R/BUGS simulations on their laptops -> performance problems! Not enough CPU available! In case of BUGS trap, the simulations had to be restarted manually! Upgrading the R/BUGS versions must be done on several machines! Difficult to check simulations status from home

Goals & Requirements 4! Remove heavy simulations from the laptops -> remote environment! Increase the number of CPU available for BUGS simulations! Window based environment to run WinBUGS 1.4! Run simulations in batch mode! Work on UCB environment from home

R code - Example 5! Initialisation data initial values, contexts! BUGS Simulations For loop on contexts For loop on Markov chains bugs(...)! Final analysis

Initial investigation - Tests 6! Metrum: Open source software for parallel computing of multiple MCMC chains with WinBUGS W.R.Gillepsie, M.R. Gastonguay, W.Knebel, G.Georgalis! MPI (Message Passing Interface): library of functions for C or Fortran to execute program on distant computers! RMPI: implementation of MPI for R! bugsparallel: R module to allow starting multiple BUGS simulations on several processors! Easy to implement! Not robust enough in case of WinBUGS crash -> kill the process manually

Compute cluster 7! Compute cluster: several (similar) computers grouped together in a (private) network in order to expedite calculations! Job scheduler: manage queues of incoming requests and their priorities Example Simulation 1 Run Simulation 2 Simulation 3 Simulation 4 Run Run Run Simulation 5 Pending! Master node: computer that receives the request and dispatch them to available processors on other computers, the compute nodes.

Cluster architecture (1) 8! 3 servers bi-quad cores, 4GB RAM! Job scheduler: LSF (Load Sharing Facilities - Platform Computing)! 1 master node and 2 compute nodes! Master node: 4 cores reserved for managing incoming requests! 20 cores available for running R/BUGS simulations! Shared drive on master node (250 GB)

Cluster architecture (2) - Client part 9! Access to cluster: LSF client: submit the simulations to the cluster, view simulation jobs running/pending, check servers status Windows Explorer: access to shared drive on cluster master node, launch simulation via DOS scripts! Client components are installed on Citrix server: Easy to upgrade LSF client No additional installation required in case of a new user

Cluster architecture (3) 10

Changes in R code - Example 11! Initialisation data initial values, contexts! BUGS Simulations For loop on contexts For loop on Markov chains bugs(...)! Final analysis

DOS script 12 SET SUBDIR=Project!!!!!Project name!! SET JOBDIR=\\BRABATBAP001\\USERS\\%USERNAME%\\%SUBDIR% SET LOGDIR=%JOBDIR%\\Logs!!Project folder SET NCTX=27!!!!!!Number of contexts for /L %%i in (1,1,%NCTX%) do (! )! MD %JOBDIR%\\%%i!!!!!Create 1 folder / context COPY %JOBDIR%\\Data.txt %JOBDIR%\\%%i\\.!!!Copy files in context folder! COPY %JOBDIR%\\Init.txt %JOBDIR%\\%%i\\.!!!! bsub -J %SUBDIR%_[1-%NCTX%] -o %LOGDIR%\\%SUBDIR%_%%I.txt -Q 128 %JOBDIR%\\Sim.bat!!-J Job name -o Output file name -Q Specify that the job must be requeued for the given error code Submit commands in Sim.bat as NCTX separate runs named Project_1, Project_2,... Log files are located in Logs folder as Project_1.txt, Project_2.txt,...Runs are requeued if the application exit with code 128. pause!

sim.r program 13! Define working directory in R code: JOBDIR is defined in DOS script, LSB_JOBINDEX is defined by LSF!setwd(paste(Sys.getenv("JOBDIR"),"\\",Sys.getenv ("LSB_JOBINDEX"),sep=""))!! Make sure the R program can be restarted in case OpenBUGS crashes (no recovery possible with WinBUGS)! Optimize the use of memory: the BUGS model variables to be saved must be defined thoroughly Remove large R objects when not used

Summary 14! Organize the simulations to take advantage of the compute resources : split the problem into several bunch of simulations that can run independently on the different servers! DOS script: create working directories for simultaneous runs, define environment variables value used in R program, start the simulations! Adapt R program to run on the cluster: working environment variables, restart from intermediate results file

Conclusions & Next Steps 15! Compared to previous environment, simulation times are divided by 5 at least! OpenBUGS crashes are automatically restarted! Software is installed on remote computers, no additional installation needed for a new user! Explore solutions to start parallel BUGS simulations directly from R code: package RLSF is available under Linux! OpenBUGS is very slow for complicated model: investigate other samplers and MCMC packages