Development of parallel codes using PL-Grid infrastructure. Rafał Kluszczyński 1, Marcin Stolarek 1, Grzegorz Marczak 1,2, Łukasz Górski 2, Marek Nowicki 2, 1 bala@icm.edu.pl 1 ICM, University of Warsaw, Warsaw, PL 2 WMiI, Nicolaus Copernicus University, Toruń, PL
Motivation The grid is widely used for execution of CPU intensive applications. The PL-Grid infrasrtucture consists of clusters built of multicore nodes. The grid jobs can have different characteristics: trivially paralel / paralel tasks parallel applications (MPI/OpenMP) masivelly paralel jobs (from few up to hudreads of nodes) Most of the PL-Grid users is using exising, already parallelized applications There is relatively small community of the developers who deploy and test paralel applications. 2
Problems Production and developement jobs have differet characteristics Production paralel jobs require few nodes, and run several hours (or even more) Development jobs require different numer of nodes 1-2 nodes for developement large numer of nodes for scalability tests 3
PL-Grid National Grid Initiative Partners: Polish supercomputer centres: Cyfronet, ICM, PCSS, WCSS, TASK Project aims: Build and operate Polish National Grid Provide training and user s support Provide support for application deployment on the grid ICM role in PL-GRID Operate DEISA compatible grid UNICORE Operating Center Domain applications: health, materials 4
SECURITY UNICORE 7 Architecture Registry Gateway HTTPS HTTPS / HTTP Atomic Services Client Layer UNICORE 6 Hosting Environment User DB WSRF Container Other Services Execution Manager Policies Target System Interface Computational Resources Files Data Storages Databases Target System 5
UNICORE Client framework Eclipse-based rich client Eclipse based workflow editor Command line client Web client UNICORE Portal 6
Job preparation 7
Job preparation files 8
Job preparation - script module load plgrid/tools/openmpi module load plgrid/tools/java8 # Get list of nodes mpiexec bash -c 'hostname -s ' > nodes.u sort nodes.u sed s/-g7e// uniq > nodes.uniq # Compile javac -cp PCJ_4.jar Pcj_hpc_ra1.java # Execute mpiexec --hostfile nodes.uniq bash -c 'java - d64 -cp.:pcj_4.jar Pcj_hpc_ra1' 9
PCJ - Parallel Computations in Java Java library developed at ICM pcj.icm.edu.pl Programming paradigm: partitioned global address space (PGAS) Features Does not require modification of JVM Work on almost all operating system that have JVM eg. IBM Java 1.7 on Power7 architecture Uses newest Java SE 7 (NIO, SDP,... ) Works with Java SE 8 Does not require other libraries! 10 7/6/2014
PCJ Hello world import pl.umk.mat.pcj.* public class PcjHelloWorld extends Storage @Override public void main() { System.out.println("Hello!"); } implements StartPoint { } public static void main(string[] args) { String[] nodes = new String[]{"localhost", "localhost"}; PCJ.deploy(PcjHelloWorld2.class, } PcjHelloWorld2.class, nodes); 11 7/6/2014
PCJ - basics double c; if (PCJ.myId()==0) c =(double) PCJ.get(3, "a"); if (PCJ.myId()==0) PCJ.put(3, "a", 5.0); public static void PCJ.barrier(); public static int PCJ.threadCount() 12 7/6/2014
Developement vs. Production Queueing systems are optimized for long production runs Developement jobs are short but use large amount of resources Developement jobs can wait days or weeks to eneter eexecution The plgrid-testing queue is configured with the short maximal execution time but it shares overall queue policy The solution is to use reservation system. PCJ resrvation has been created and is used by the code developers PCJ reservation contains 2 nodes each type (in total 6 nodes) 13
Job properties 14
Results (Raytrace performance) 15
Results (Global FFT performance) 16
Conclusions PL-Grid offers resources and functionality necessary for development parallel codes Short parallel jobs can be run without delay This is possible using resource reservation implemented in PL-Grid Server side support thanks to modified UNICORE TSI and extensions to the SLURM resource management UNICORE is able to use reservations Reservation is easy to use 17
Acknowledgmens This work was made possible thanks to the PLGrid Plus project POIG.02.03.00-00-096/10. This research was supported in part by the PL-Grid Infrastructure.. 18