Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni (mahidhar@sdsc.edu) SDSC Summer Institute August 05, 2013
Getting Started" System Access Logging in Linux/Mac Use available ssh clients." ssh clients for windows Putty, Cygwin" http://www.chiark.greenend.org.uk/~sgtatham/putty/" Login hosts for the machines:" gordon.sdsc.edu, trestles.sdsc.edu" For NSF Resources Users can login via the XSEDE user portal: https://portal.xsede.org/" "
Access Via Science Gateways (XSEDE) " Community-developed set of tools, applications, and data that are integrated via a portal. Enables researchers of particular communities to use HPC resources through portals without the complication of getting familiar with the hardware and software details. Allows them to focus on the scientific goals. CIPRES gateway hosted by SDSC PIs enables large scale phylogenetic reconstructions using applications such as MrBayes, Raxml, and Garli. Enabled ~200 publications in 2012 and accounts for a significant fraction of the XSEDE users. NSG portal hosted by SDSC PIs enables HPC jobs for neuroscientists.
Data Transfer (scp, globus-url-copy)" scp is o.k. to use for simple file transfers and small file sizes (<1GB). Example: $ scp w.txt train40@gordon.sdsc.edu:/home/train40/w.txt 100% 15KB 14.6KB/s 00:00 " globus-url-copy for large scale data transfers between XD resources (and local machines w/ a globus client). Uses your XSEDE-wide username and password " Retrieves your certificate proxies from the central server" Highest performance between XSEDE sites, uses striping across multiple servers and multiple threads on each server." 4
Data Transfer globus-url-copy" Step 1: Retrieve certificate proxies: $ module load globus" $ myproxy-logon l xsedeusername" Enter MyProxy pass phrase:" A credential has been received for user xsedeusername in /tmp/ x509up_u555555." " Step 2: Initiate globus-url-copy: $ globus-url-copy -vb -stripe -tcp-bs 16m -p 4 gsiftp:// gridftp.ranger.tacc.teragrid.org:2811///scratch/00342/username/test.tar gsiftp:// trestles-dm2.sdsc.xsede.org:2811///oasis/scratch/username/temp_project/testgordon.tar" Source: gsiftp://gridftp.ranger.tacc.teragrid.org:2811///scratch/00342/username/" Dest: gsiftp://trestles-dm2.sdsc.xsede.org:2811///oasis/scratch/username/ temp_project/" test.tar -> test-gordon.tar" 5
Data Transfer Globus Online" Works from Windows/Linux/Mac via globus online website: https://www.globusonline.org" Gordon, Trestles, and Triton endpoints already exist. Authentication can be done using XSEDEwide username and password for the NSF resources. Globus Connect application (available for Windows/Linux/Mac can turn your laptop/ 6 desktop into an endpoint.
Data Transfer Globus Online" Step 1: Create a globus online account 7
Data Transfer Globus Online" 8
Data Transfer Globus Online" Step 2: Set up local machine as endpoint using Globus Connect. 9
Data Transfer Globus Online" Step 3: Pick Endpoints and Initiate Transfers 10
Data Transfer Globus Online" 11
SDSC HPC Resources: Running Jobs "
Running Batch Jobs" All clusters use the TORQUE/PBS resource manager for running jobs. TORQUE allows the user to submit one or more jobs for execution, using parameters specified in a job script. NSF resources have the Catalina scheduler to control the workload. Copy hands on examples directory from: cp r /home/diag/si2013."
Gordon : Filesystems" Lustre filesystems Good for scalable large block I/O Accessible from both native and vsmp nodes." /oasis/scratch/gordon 1.6 PB, peak measured performance ~50GB/s on reads and writes." /oasis/projects 400TB" SSD filesystems /scratch local to each native compute node 300 GB each." /scratch on vsmp node 4.8TB of SSD based filesystem." NFS filesystems (/home) 14
Gordon Compiling/Running Jobs" Copy the SI2013 directory: cp r /home/diag/si2013 ~/" "" Change to workshop directory: cd ~/SI2013" Verify modules loaded: $ module li" Currently Loaded Modulefiles:" 1) binutils/2.22 2) intel/2011 3) mvapich2_ib/1.8a1p1" Compile the MPI hello world code: mpif90 -o hello_world hello_mpi.f90" " Verify executable has been created: ls -lt hello_world " -rwxr-xr-x 1 mahidhar hpss 735429 May 15 21:22 hello_world"
Gordon: Compiling/Running Jobs" Job Queue basics: Gordon uses the TORQUE/PBS Resource Manager with the Catalina scheduler to define and manage job queues." Native/Regular compute (Non-vSMP) nodes accessible via normal queue." vsmp node accessible via vsmp queue." Workshop examples illustrate use of both the native and vsmp nodes. hello_native.cmd script for running hello world example on native nodes (using MPI)." hello_vsmp.cmd script for running hello world example on vsmp nodes (using OpenMP)" Hands on section of tutorial has several scenarios"
Gordon: Hello World on native (non-vsmp) nodes" The submit script (located in the workshop directory) is hello_native.cmd #/bin/bash #PBS -q normal #PBS -N hello_native #PBS -l nodes=4:ppn=1:native #PBS -l walltime=0:10:00 #PBS -o hello_native.out #PBS -e hello_native.err #PBS -V ##PBS -M youremail@xyz.edu ##PBS -m abe #PBS A gue998 cd $PBS_O_WORKDIR mpirun_rsh -hostfile $PBS_NODEFILE -np 4./hello_world
Gordon: Output from Hello World " " Submit job using qsub hello_native.cmd $ qsub hello_native.cmd " 845444.gordon-fe2.local" " Output: $ more hello_native.out node 2 : Hello world node 1 : Hello world node 3 : Hello world node 0 : Hello world Nodes: gcn-15-58 gcn-15-62 gcn-15-63 gcn-15-68
Compiling OpenMP Example" Change to the SI2013 directory: cd ~/SI2013" " Compile using openmp flag: ifort -o hello_vsmp -openmp hello_vsmp.f90" " Verify executable was created: ls -lt hello_vsmp" -rwxr-xr-x 1 train61 gue998 786207 May 9 10:31 hello_vsmp"
Hello World on vsmp node (using OpenMP)" hello_vsmp.cmd #/bin/bash" #PBS -q vsmp" #PBS -N hello_vsmp" #PBS -l nodes=1:ppn=16:vsmp" #PBS -l walltime=0:10:00" #PBS -o hello_vsmp.out" #PBS -e hello_vsmp.err" #PBS -V" ##PBS -M youremail@xyz.edu" ##PBS -m abe" #PBS -A gue998" cd $PBS_O_WORKDIR" export LD_PRELOAD=/opt/ScaleMP/libvsmpclib/0.1/lib64/libvsmpclib.so" export PATH="/opt/ScaleMP/numabind/bin:$PATH"" export KMP_AFFINITY=compact,verbose,0,`numabind --offset 8`" export OMP_NUM_THREADS=8"./hello_vsmp"
Hello World on vsmp node (using OpenMP)" Code written using OpenMP PROGRAM OMPHELLO" INTEGER TNUMBER" INTEGER OMP_GET_THREAD_NUM " " $OMP PARALLEL DEFAULT(PRIVATE)" TNUMBER = OMP_GET_THREAD_NUM()" PRINT *, 'HELLO FROM THREAD NUMBER = ', TNUMBER" $OMP END PARALLEL " " STOP" END "
vsmp OpenMP binding info (from hello_vsmp.err file)" " " " OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {504}" OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {505}" OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {506}" OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {507}" OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {508}" OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {509}" OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {511}" OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {510}"
Hello World (OpenMP version) Output" HELLO FROM THREAD NUMBER = 1 HELLO FROM THREAD NUMBER = 6 HELLO FROM THREAD NUMBER = 5 HELLO FROM THREAD NUMBER = 4 HELLO FROM THREAD NUMBER = 3 HELLO FROM THREAD NUMBER = 2 HELLO FROM THREAD NUMBER = 0 HELLO FROM THREAD NUMBER = 7 Nodes: gcn-3-11 23
Running on vsmp nodes - Guidelines" Identify type of job serial (large memory), threaded (pthreads, openmp), or MPI Workshop directory has examples for the different scenarios. Hands on section will walk through different types. Use affinity in conjunction with automatic process placement utility (numabind). Optimized MPI (mpich2 tuned for vsmp) is available.
vsmp Guidelines for Threaded Codes" 25
OpenMP Matrix Multiply Example" #/bin/bash" #PBS -q vsmp" #PBS -N openmp_mm_vsmp" #PBS -l nodes=1:ppn=16:vsmp" #PBS -l walltime=0:10:00" #PBS -o openmp_mm_vsmp.out" #PBS -e openmp_mm_vsmp.err" #PBS -V" ##PBS -M youremail@xyz.edu" ##PBS -m abe" #PBS -A gue998" cd $PBS_O_WORKDIR" # Setting stacksize to unlimited." ulimit -s unlimited" # ScaleMP preload library that throttles down unnecessary system calls." export LD_PRELOAD=/opt/ScaleMP/libvsmpclib/0.1/lib64/libvsmpclib.so" source./intel.sh" export MKL_VSMP=1" # Path to NUMABIND." export PATH=/opt/ScaleMP/numabind/bin:$PATH" np=8" tag=`date +%s`" # Dynamic binding of OpenMP threads using numabind." export KMP_AFFINITY=compact,verbose,0,`numabind --offset $np`" export OMP_NUM_THREADS=$np" /usr/bin/time./openmp-mm > log-openmp-nbind-$np-$tag.txt 2>&1 " " 26
Using SSD Scratch (Native Nodes)" #/bin/bash #PBS -q normal #PBS -N ior_native #PBS -l nodes=1:ppn=16:native #PBS -l walltime=00:25:00 #PBS -o ior_scratch_native.out #PBS -e ior_scratch_native.err #PBS -V ##PBS -M youremail@xyz.edu ##PBS -m abe #PBS -A gue998 cd /scratch/$user/$pbs_jobid mpirun_rsh -hostfile $PBS_NODEFILE -np 4 $HOME/SI2013/IOR-gordon -i 1 -F b 16g -t 1m -v -v > IOR_native_scratch.log cp /scratch/$user/$pbs_jobid/ior_native_scratch.log $PBS_O_WORKDIR/
Using SSD Scratch (Native Nodes)" Snapshot on the node during the run: $ pwd" /scratch/mahidhar/72251.gordon-fe2.local" $ ls -lt" total 22548292" -rw-r--r-- 1 mahidhar hpss 5429526528 May 15 23:48 testfile.00000001" -rw-r--r-- 1 mahidhar hpss 6330253312 May 15 23:48 testfile.00000003" -rw-r--r-- 1 mahidhar hpss 5532286976 May 15 23:48 testfile.00000000" -rw-r--r-- 1 mahidhar hpss 5794430976 May 15 23:48 testfile.00000002" -rw-r--r-- 1 mahidhar hpss 1101 May 15 23:48 IOR_native_scratch.log" Performance from single node (in log file copied back): Max Write: 250.52 MiB/sec (262.69 MB/sec)" Max Read: 181.92 MiB/sec (190.76 MB/sec)" 28
Running Jobs on Trestles" All nodes on Trestles are identical. However, nodes have 32 cores and can be shared." Scheduler is again PBS + Catalina. " Two options" normal Exclusive access to compute nodes. Allocation charged for 32 cores / node." shared Shared access. Allocation charged based on number of cores requested."
Data Intensive Computing & Viz Stack" Gordon was designed to enable data intensive computing (details in following slides). Additionally, some of the Triton nodes have large memory (up to 512 GB) to aid in such processing. All clusters have access to the high speed lustre filesystem (Data Oasis: details in separate presentation) with an aggregated peak measured data rate of 100GB/s. Several libraries and packages have been installed to enable data intensive computing and visualization: R Software environment for statistical computing and graphics." Weka Tools for data analysis and predictive modeling" RapidMiner Environment for machine learning, data mining, text mining, and predictive analytics" Octave" Matlab " VisIt" Paraview" The myhadoop infrastructure was developed to enable use Hadoop for distributed data intensive analysis.
Hands On Example - Hadoop" Examples in /home/diag/si2013/hadoop Simple benchmark examples: TestDFS_2.cmd TestDFS example to benchmark HDFS performance." TeraSort_2.cmd Sorting performance benchmark."
TestDFS Example" PBS variables part: #/bin/bash #PBS -q normal #PBS -N hadoop_job #PBS -l nodes=2:ppn=1 #PBS -o hadoop_dfstest_2.out #PBS -e hadoop_dfstest_2.err #PBS -V
TestDFS example" Set up Hadoop environment variables: # Set this to location of myhadoop on gordon export MY_HADOOP_HOME="/opt/hadoop/contrib/myHadoop" # Set this to the location of Hadoop on gordon export HADOOP_HOME="/opt/hadoop" #### Set this to the directory where Hadoop configs should be generated # Don't change the name of this variable (HADOOP_CONF_DIR) as it is # required by Hadoop - all config files will be picked up from here # # Make sure that this is accessible to all nodes export HADOOP_CONF_DIR="/home/$USER/config"
TestDFS Example" #### Set up the configuration # Make sure number of nodes is the same as what you have requested from PBS # usage: $MY_HADOOP_HOME/bin/configure.sh -h echo "Set up the configurations for myhadoop" ### Create a hadoop hosts file, change to ibnet0 interfaces - DO NOT REMOVE - sed 's/$/.ibnet0/' $PBS_NODEFILE > $PBS_O_WORKDIR/hadoophosts.txt export PBS_NODEFILEZ=$PBS_O_WORKDIR/hadoophosts.txt ### Copy over configuration files $MY_HADOOP_HOME/bin/configure.sh -n 2 -c $HADOOP_CONF_DIR ### Point hadoop temporary files to local scratch - DO NOT REMOVE - sed -i 's@haddtemp@'$pbs_jobid'@g' $HADOOP_CONF_DIR/hadoopenv.sh
TestDFS Example" #### Format HDFS, if this is the first time or not a persistent instance echo "Format HDFS" $HADOOP_HOME/bin/hadoop --config $HADOOP_CONF_DIR namenode -format echo sleep 1m #### Start the Hadoop cluster echo "Start all Hadoop daemons" $HADOOP_HOME/bin/start-all.sh #$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave echo
TestDFS Example" #### Run your jobs here echo "Run some test Hadoop jobs" $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadooptest-1.0.3.jar TestDFSIO -write -nrfiles 8 -filesize 1024 -buffersize 1048576 sleep 30s $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadooptest-1.0.3.jar TestDFSIO -read - nrfiles 8 -filesize 1024 -buffersize 1048576 echo #### Stop the Hadoop cluster echo "Stop all Hadoop daemons" $HADOOP_HOME/bin/stop-all.sh echo
Running the TestDFS example" Submit the job: qsub TestDFS_2.cmd " " Check the job is running (qstat) Once the job is running the hadoophosts.txt file is created. For example on a sample run: $ more hadoophosts.txt " gcn-13-11.ibnet0" gcn-13-12.ibnet0" "
Summary, Q/A " Access options ssh clients, XSEDE User Portal Data Transfer options scp, globus-url-copy (gridftp), globus online, and XSEDE User Portal File Manager. Two queues normal (native, non-vsmp) and vsmp. Follow guidelines for serial, OpenMP, Pthreads, MPI jobs on the vsmp nodes. Use SSD local scratch where possible. Excellent for codes like Gaussian, Abaqus. 38