Running Hadoop and Stratosphere jobs on TomPouce cluster 16 October 2013
TomPouce cluster TomPouce is a cluster of 20 calcula@on nodes = 240 cores Located in the Inria Turing building (École Polytechnique) Used jointly by Inria teams Jobs are run with the help of a scheduler SGE (Sun Grid Engine) 2
TomPouce cluster SPECIFICATIONS: Calcula@on: 20 nodes > bi- processors > 6 cores. Total: 240 cores 48 Gb Ram per node Local space 400 Gb Storage: Dell R510 /home 19 Tb NFS Dell R710 x2 /scratch 37 Tb FHGFS (Fraunhofer FS) Network: Switch Dell 5548 Switch infiniband Mellanox InfiniScale IV QDR 3
TomPouce cluster 4
1. Copy your job from the local machine to the cluster front node $ scp myjob.jar inria_username@195.83.212.209:~/ myjob.jar will be copied in the folder /home/leo/inria_username. 5
2. Connect via ssh to front node $ ssh inria_username@195.83.212.209 Welcome to Bright Cluster Manager 6.0 Based on Scientific Linux release 6 Cluster Manager ID: #120054 Use the following commands to adjust your environment: 'module avail ' - show available modules 'module add <module> ' - adds a module to your environment for this session 'module initadd <module> ' - configure module to be loaded at every login IMPORTANT: To connect to the cluster, you ssh key should be stored in the Inria LDAP. If not, send an e- mail with your public ssh key to: helpmi- saclay@inria.fr 6
3. Log in as clustervision superuser using your LDAP password $ sudo su - clustervision - To execute Hadoop and Stratosphere jobs and edit configura@ons needed. - If you don t have enough permissions, ask for them to: helpmi- saclay@inria.fr 7
4. Add Hadoop/Stratosphere environment to your session To add Hadoop environment, type: $module add hadoop/1.1.1 To add Stratosphere environment, type: $module add stratosphere/stratosphere - To add an environment automa@cally when you login: $module initadd hadoop/1.1.1 - To check all the environments loaded: $ module list Currently Loaded Modulefiles: 1) gcc/4.7.0 2) intel-cluster-checker/1.8 3) stratosphere/stratosphere-0.2.1 4) sge/2011.11 5) openmpi/gcc/64/1.4.5 6) gromacs/openmpi/gcc/64/4.0.7 7) hadoop/1.1.1 8
4. Add Hadoop/Stratosphere environment to your session Hadoop installa@on: /cm/shared/apps/hadoop/current/ Stratosphere installa@on: /cm/shared/apps/stratosphere/current/ 9
5. Create an executon script (Hadoop) #/bin/bash #$ -N hadoop_run #$ -pe hadoop 12 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q #Copy the input files into the HDFS filesystem hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input #Running the hadoop task(s) here. I am specifying the jar, class, run parameters: hadoop --config /home/guests/clustervision/current/ jar myjob.jar org.myorg.job /input /output # Copying the output files from the HDFS filesystem hadoop --config /home/guests/clustervision/current/ fs get /output 10
5. Create an executon script (Hadoop) #/bin/bash #$ -N hadoop_run #$ -pe hadoop 12 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q #Copy the input files into the HDFS filesystem hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input #Running the hadoop task(s) here. I am specifying the jar, class, run parameters: hadoop --config /home/guests/clustervision/current/ jar myjob.jar org.myorg.job /input /output # Copying the output files from the HDFS filesystem hadoop --config /home/guests/clustervision/current/ fs get /output 11
SGE executon parameters: Should be wrigen aher #$ at the beginning of the script. - N <job_name>. Used to give a name to the job to run. - pe <environment> N. Specifies the environment. N is the number of cores (limited to 180). - j y : to use the same output file (errors and standard exit). 12
SGE executon parameters: - o output.$job_id: the standard output will be in a file name ouput.$job_id. $JOB_ID will be the number SGE will assign automa@cally to our job. - l name=value. Used to demand a resource. In this case: h_rt=00:10:00 indicates that the job should be killed aher 10 minutes hadoop=true indicates that the job to run is a Hadoop job (it DOES NOT CHANGE for Stratosphere jobs) excl=true indicates that it is executed exclusively 13
5. Create an executon script (Hadoop) HADOOP COMMANDS Copy input files into HDFS hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input Run Hadoop tasks hadoop --config /home/guests/clustervision/current/ jar /pathtojob/myjob.jar org.myorg.job /input /output Copy output files from HDFS hadoop --config /home/guests/clustervision/current/ fs get /output 14
5. Create an executon script (Hadoop) HADOOP COMMANDS Copy input files into HDFS hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/clustervision/tmp /input Run Hadoop tasks hadoop --config /home/guests/clustervision/current/ jar /pathtojob/myjob.jar org.myorg.job /input /output Copy output files from HDFS hadoop --config /home/guests/clustervision/current/ fs get /output 15
5. Create an executon script (Stratosphere): #/bin/bash #$ -N strato_run #$ -pe stratosphere 24 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current MASTER=`cat /home/guests/clustervision/current/masters` hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/ clustervision/tmp /var/hadoop/dfs.name.dir $STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$master:50040/var/hadoop/ dfs.name.dir/inputfile hdfs://$master:50040/var/hadoop/dfs.name.dir/outputfile hadoop --config /home/guests/clustervision/current/ fs -get /var/hadoop/dfs.name.dir/output 16
5. Create an executon script (Stratosphere): #/bin/bash #$ -N strato_run #$ -pe stratosphere 24 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current MASTER=`cat /home/guests/clustervision/current/masters` hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/ clustervision/tmp /input $STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$master:50040/input hdfs://$master:50040/output hadoop --config /home/guests/clustervision/current/ fs -get /output 17
5. Create an executon script (Stratosphere): #/bin/bash #$ -N strato_run #$ -pe stratosphere 24 #$ -j y #$ -o output.$job_id #$ -l h_rt=00:10:00,hadoop=true,excl=true #$ -cwd #$ -q hadoop.q export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current MASTER=`cat /home/guests/clustervision/current/masters` hadoop --config /home/guests/clustervision/current/ dfs -copyfromlocal /home/guests/ clustervision/tmp /input $STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$master:50040/input hdfs://$master:50040/output hadoop --config /home/guests/clustervision/current/ fs -get /output 18
5. Create an executon script (Stratosphere) STRATOSPHERE COMMANDS Copy input files into HDFS hadoop --config /home/guests/clustervision/current/ dfs - copyfromlocal /home/guests/clustervision/tmp /input Run Stratosphere tasks $STRATOSPHERE_HOME/bin/pact-client.sh run -j /pathtojob/myjob.jar -a 2 hdfs://$master:50040/input hdfs://$master:50040/output Copy output files from HDFS hadoop --config /home/guests/clustervision/current/ fs -get /output 19
6. Submission of a job To submit, execute: $qsub script.qsub Aher submission, you can see the state of execu@on with the command: $ qstat job-id prior name user state submit/start at queue slots ja-task-id ----------------------------------------------------------------------------------------------------------------- 159048 0.60500 strato_run clustervisio r 10/15/2013 23:17:59 hadoop.q@node011.cm.cluster 24 20
6. Submission of a job Or if you want a more detailed informa@on: $qstat t 21
7. Logs /home/guests/clustervision/output.$job_id: Output of the job execu@on in SGE /home/guests/clustervision/config.$job_id/logs: Logs of Hadoop file system. 22