Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad 1
Table of Contents 1. Computational Resources 03 1.1 Specifications of Super Computer 03 1.1.1 Snapshot of Data Center 04 1.1.2 Rack Diagram of Super Computer 05 1.1.3 Locations of Compilers 06 1.1.4 Locations of Runtimes 06 1.2 Storage Area Network (SAN) Setup 07 1.3 Network Diagram 08 2. Submitting Processes to Super Computer 09 2.1 Steps to Login through SSH 09 2.1.1 Linux Client 09 2.1.2 Windows Client 10 2.2 Steps for Password-less Authentications 11 2.3 Steps for Submitting Jobs on the Cluster 12 2.3.1 Running Parallel MPI Jobs 12 2.3.2 Running Parallel MPI Scripts 13 3. Contact Information 14 3.1 Contact Persons 14 3.2 Contact Address 14 2
1. Computational Resources RCMS Super Computer is installed in state of art data center with 80 KVA of UPS backup and 12 ton precision cooling system. The data center is protected by FM-200 based Automatic Fire Detection and Suppression System and manual fire extinguishers. CCTV Cameras and Access Control systems are being procured for effective surveillance of data center. Specifications of Super Computer are given below:- 1.1 Specifications of Super Computer The Super Computer is comprised of 32 Intel Xeon based machines and each one of them is connected to Nvidia Tesla S1070 (each of which contains 4 GPU s). All nodes are connected by40gbps QDR InfiniBand Interconnect for internal communication. A high-performance and reliable SAN storage is linked to Servers, accessible by all computational nodes. Table 1shows the detailed specification of RCMS Super Computer. Cluster Name afrit.rcms.nust.edu.pk Brand HP ProLiant DL380 G6 Servers/ HP ProLiant DL160se G6 Server Total Processors 272 Intel Xeon Total Nodes 34 Total Memory 1.312 TB Operating System Redhat Enterprise Linux 5.6 Interconnects InfiniBand Switch Storage HP P2000 SAN Storage 22TB capacity, SAN Switches, Host Bus Adapters (HBA s), Fiber Channel Switch with RAID Controllers Graphic Processing Unit 32 x NVidia Tesla S1070 (each system contains 4 GPU s) Table 1: Specification of RCMS Super Computer 3
1.1.1 Snapshot of Data Center Figure 1: SnapshotofRCMSSuper Computer Cluster Nodes afrit.rcms.nust.edu.pk Compute-0-13 Compute-0-24 Compute-0-3 Compute-0-14 Compute-0-25 Compute-0-4 Compute-0-15 Compute-0-26 Compute-0-5 Compute-0-16 Compute-0-27 Compute-0-6 Compute-0-17 Compute-0-28 Compute-0-7 Compute-0-18 Compute-0-29 Compute-0-8 Compute-0-19 Compute-0-30 Compute-0-9 Compute-0-20 Compute-0-31 Compute-0-10 Compute-0-21 Compute-0-32 Compute-0-11 Compute-0-22 Compute-0-33 Compute-0-12 Compute-0-23 Compute-0-35 4
1.1.2 Rack Diagram of Super Computer Figure 2: Logical Rack DiagramofRCMSCluster 5
1.1.3 Locations of Compilers Name Command Location Make utility make /usr/bin/make GNU C compiler gcc /usr/bin/gcc GNU C++ compiler g++ /usr/bin/g++ GNU F77 Compiler g77 /usr/bin/g77 MPI C compiler mpicc /usr/mpi/intel/openmpi-1.4.3/bin/mpicc MPI C++ compiler mpic++ /usr/mpi/intel/openmpi-1.4.3/bin/mpic++ MPI Fortran 77 Compiler mpif77 /usr/mpi/intel/openmpi-1.4.3/bin/mpif77 Java Compiler javac /usr/java/latest/bin/javac Ant Utility ant /opt/rocks/bin/ant C compiler cc /usr/bin/cc F77 Compiler f77 /usr/bin/f77 GFortran Compiler gfortran /usr/bin/gfortran Fortran95 Compiler f95 /usr/bin/f95 UPC Compiler upcc /share/apps/upc/upc-installation/upcc 1.1.4 Locations of Runtimes: Name Command Location MPI Runtime mpirun /usr/mpi/intel/openmpi-1.4.3/bin/mpirun Java Virtual Machine java /usr/java/latest/bin/java UPC Runtime upcrun /share/apps/upc/upc-installation/upcrun 6
1.2 SAN Setup Total 22 TB of SAN Storage is available for storing users data. Two SAN Switches are installed in UBB Rack. 8 x 8 Gb transceiver installed in each of the SAN Switch. Total 48 slots are available which are occupied with 450GB each. The system is configured on RAID -1 one unit and RAID-5 on 4 units, each containing 16 drives. One online spare drive is marked in each of the disk enclosure for high availability. In case of drive failure the online spare drive will take over and data will be re-created depending on the RAID level. Each unit is presented to Storage Node, whose hostname is u2. NFS Server daemon is installed on u2. NFS share has been created in order to assign storage to other nodes on network. The Storage is managed using an application called Storage Management Utility. Figure 3: SAN Storage 7
1.3 Network Diagram for Super Computer Figure 4: Network diagram of Super Computing Resources. 8
2. Submitting Processes to Super Computer 2.1 Steps to Login using SSH 2.1.1 Linux Client a) On the shell prompt type: $ ssh -p 2299 username@111.68.97.5 Where username is the login name assigned by System Administrator b) The following text will be displayed: You will have to type yes as shown and press enter c) After pressing enter system will prompt for password. You may enter your password in order to login. Please note that your password will not be displayed for security reasons. 9
2.1.2 Windows Client: a) An SSH Client will be required in order to login using SSH. You may download PuTTY from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html b) In Host Name field of putty write IP Address: 111.68.97.5 c) Select SSH as connection type and enter port: 2299. c) Then click open. e) On clicking Open you will be asked to enter your username and password. Write your username and password followed by an Enter to login to Super Computer. Please note that your password will not be displayed for security reasons. 10
2.2 Password-less Authentication 2.2.1 The private / public key pair will be required in order to authenticate you on the target machine. To generate this key pair, type the following in console: $ ssh-keygen t rsa 2.2.2 Shell will ask the following questions: Enter file in which to save the key (/export/home2/rcms/usman/.ssh/id_rsa): (Press enter here) Enter passphrase: (Press enter here or give passphrase) Enter same passphrase again: (Press enter here or give passphrase) 2.2.3 After generating the public key, copy the public key to a file named authorized_keys, as shown below: $ cd ~ $ cat./.ssh/id_rsa.pub >>./.ssh/ authorized_keys 11
2.3 Running Scripts on Cluster 2.3.1 Running Sequential Scripts: a) Create a shell Script #!/bin/bash #myscript.sh # # **************************** # Following is the title of the Sequential script. #$ -N SEQ_PRO # The output and errors of the programmes will be written in # SEQ_PRO.OPRO_ID and SEQ_PRO.EPRO_ID respectively. #Path to the executable file. /usr/bin/myscript # End of Script b) Set execute permissions to the script $ chmod 755 myscript.sh c) Now submit your script as follows:- $ qsub -V myscript.sh d) In order to see the status of script type the following command followed by enter: $ qstat e) In order to del your running script type the following command: $ qdel 19 19 is the ID of your script 12
2.3.2 Running Parallel MPI Scripts a) Create a shell script for parallel processing #!/bin/bash # mpi_script.sh # ********************** # Following is the name of MPI Script # $ -N MPI_PRO # Following will be the output and error files # MPI_PRO.OPRO_ID and MPI_PRO.EPRO_ID respectively. # $ -pe mpi 16 #This will assign 16 cores to the script # $NSLOTS # No of cores allocated by the sun grid engine # machines # a machine file containing names of all available nodes echo "Allocated $NSLOTS slots." mpirun -np $NSLOTS -mca ras gridengine --hostfile machines mpi_script.sh # End of Script b) Set execute permissions to the script $ chmod 755 mpi_script.sh c) Now submit your script as follows:- $ qsub -V mpi_script.sh d) In order to see the status of script type the following command followed by enter: $ qstat e) In order to del your running script type the following command: $ qdel 19 19 is the ID of your script 13
3. Contact Information: 3.1 Contact Persons: In case of any inquiry or assistance feel free to contact the following persons: S. No Designation Name Contact No Email Address 1 Director ScREC Engr. Taufique-ur-Rehman 051-90855730 taufique.rehman@rcms.nust.edu.pk 2 Faculty Member Mr. Tariq Saeed 051-90855731 tariq@rcms.nust.edu.pk 3 System Administrator 4 Assistant System Administrator Engr. Muhammad Usman 051-90855717 usman@rcms.nust.edu.pk Mr. Shahzad Shoukat 051-90855714 shahzad@rcms.nust.edu.pk 3.2 Contact Address: Supper Computing Lab, 1 st Floor, Acad1 Block, Research Center of Modeling and Simulation (RCMS) National University of Sciences and Technology, Sector H 12, Islamabad Pakistan. Special Thanks to: Mr. Hammad Siddiqi, System Administrator High Performance Computing Lab, School of Electrical Engineering and Computer Science, National University of Sciences and Technology, H-12, Islamabad http://hpc.seecs.nust.edu.pk/ 14