Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959 Office: B1-109 cdc at ccr.buffalo.edu Fall 2013
CCR Resources CCR provides over 8000 processors in Linux cluster to the UB academic community. RedHat 6 High speed network: Infiniband Extensive store and data backup is available. The cluster environment is a command line interface for the most part. WebMO web portal Cluster machines are accessed through a batch scheduler.
Cluster Resources 372 nodes 12 cores with 48GB memory 256 nodes 8 cores with 24GB memory 32 nodes 16 processors with 128GB of memory 18 nodes 32 cores with 256GB memory 1 node 32 cores with 512GB memory 32 GPU nodes 12 cores with 48GB memory 2 GPUs per node NVIDIA Fermi M2050 (CUDA) more information on CCR cluster
Data Storage User home directories, group project directories, and application software are accessible to all compute nodes in the CCR cluster. The path to a user s home directory is /user/ubitusername/ The default user quota for a home directory is 2GB. There is a daily backup of home directories. The iquota command will show the usage and quota limit. iquota u UBITusername
Data Storage Project directories are available to UB research group. There is a daily backup of data files in the projects directory. A project directory starts at 200GB. The storage can be increased to 500GB without charge, after which there is a fee. The iquota command will show the usage and quota limit. -iquota p group UB Professors can request a projects directory by sending email to ccr-help.
Scratch Space All scratch spaces are for temporary use. Users can create files and directories in the scratch spaces. The /panasas/scratch file system is available on all nodes in the rush cluster. /panasas/scratch is 180TB. Files older than 2 weeks are automatically removed. There is no backup of data files in /panasas/scratch All compute nodes have the a local /scratch directory. User files are removed when the job exists.
Accessing the Cluster The CCR cluster front-end machine is rush.ccr.buffalo.edu Compute nodes are not directly accessible from outside the cluster. The storage spaces is not accessible from outside the cluster This login machine is accessible from computers on the UB network through SSH for login, and sftp or scp for file transfer. ssh, sftp, scp, and X are usually installed on Linux and Mac workstations by default.
Accessing the Clusters Windows users must install X-Win32 or PuTTY for the secure login. The X-Win32 program allows for a graphical display. Filezilla is graphical file transfer program, which is available for Windows, Linux and MAC computers. Be sure to specify port 22 to secure login. UBVPN must be used to access the rush login machine from off campus. Users should setup UB-Secure on their laptops for wireless. more on UB software
Login to the front-end of the cluster The front-end machine is intended for login to the cluster, compiling code, and submitting jobs. The login machine is a 32-core node with 256GB of memory. This machine is available for short and small scale computations. These computations should be less than 20 minutes. Usage of memory and CPU utilization should be monitored.
Accessing the Clusters Use the MyCCR web interface to change or reset the CCR password. more on MyCCR Login from Linux or Mac ssh -X rush.ccr.buffalo.edu ssh -X UBITusername@rush.ccr.buffalo.edu The -X flag enables a graphical display. This is optional.
Accessing the Clusters Details for X-Win32 Select Manual in the configuration window. Select ssh Connection name: rush Host: rush.ccr.buffalo.edu Login: your UBITusername Command: /usr/bin/xterm ls Password: CCR password Be sure to run only one X-Win32.
Command Line Environment The rush clusters is command line UNIX environment. The user s login executes a shell. This is a process that accepts and executes the commands typed by the user. By default the shell is set to bash. The.bashrc file is executed on login. This script resides in the user s home directory. Variables and paths can be set in the.bashrc file. It is also executed when a job starts on a compute node.
Command Line Environment List the contents of a directory Show details: ls -l Displays ownership and permissions. Show details and hidden files: ls la Show details in reverse time order: ls -latr Show current directory pathname pwd Change directory Change to given pathname: cd /pathname Change to one directory above: cd.. Change to two directories above: cd../.. Change to home directory: cd Create a directory mkdir dirname
Command Line Environment Remove Remove a file: rm filename Remove a directory: rm -R dirname If the directory is empty: rmdir dirname Copy Copy a file: cp oldfile newfile Copy a directory: cp -R olddir newdir View a file Display file to the screen: cat filename Display with page breaks: more filename Press space bar to page through the file. Press enter to advance one line. Forward direction only. Control c to quit.
Command Line Environment All directories and files have permissions. Users can set the permissions to allow or deny access to their files and directories. Permissions are set for the user (u), the group (g) and everyone else (o). The permissions are read, write and execute. Read (r) permission allows viewing and copying. Write (w) permission allows changing and deleting. Execute (x) permission allows execution of a file and access to a directory.
Command Line Environment Show permissions with ls -l. This will also show the user and group associated with the file or directory. drwxrwxrwx -rwxrwxrwx The left most is for a file and d for a directory. The next three are read, write and execute for the user (owner). The middle three are read, write and execute for the group. These permissions set the access for all members of the group. The rightmost three are read, write and execute for other.
Command Line Environment Change permissions with the chmod command. chmod ugo+rwx filename chmod ugo-rwx filename ugo is user, group and other rwx is read, write and execute + add a permission - removes a permission Adding write permission for group: chmod g+w filename Removing all permissions for group and other on a directory and its contents. chmod -R go-rwx dirname
Command Line Environment The grep command is used to search for a pattern. grep pattern file df will show the disk usage. df -h du will show the estimated disk space usage. du -s -k `ls` sort -rn more This will list the file and directories using the most space first. The pipe symbol is used to indicate that the output of the first command should be used as input to the second.
Command Line Environment The asterisk is a wild card for many UNIX commands. List all C files: ls *.c Show the type of file: file filename Manual pages are available for UNIX commands. man ls There are several editors available. emacs, nano and vi emacs can provide a graphical interface.
Command Line Environment Files edited on Windows machines can contain hidden characters, which may cause runtime problems. Use the file command to show the type of file. If the file is a DOS file, then use the dos2unix command to remove any hidden characters. See the CCR user guide for more information. more on the User Guide CCR Command Line Reference Card
Application Software A large number of software applications are installed on the cluster. more on Software Intel, GNU and PGI compilers CUDA is available for the GPU nodes. more on Compilers Parallel Programming support Message Passing Interface (MPI) OpenMP more MPI and Multi-thread
Managing the Environment Software modules are used to manage the user environment. A module is used to set paths, library paths and variables. Most applications installed on the clusters have a module. Loading a module adds the path for the application to path variables and sets variables. Unloading a module removes the path and unsets the variables.
Managing the Environment List all available modules module avail module avail intel List modules currently loaded module list Show what a module will do module show module-name Load a module module load module-name Unload a module module unload module-name
Managing the Environment Creating your own module. Create the privatemodules directory in your home directory. Copy the template or other module to the privatemodules directory. Edit the module. Using your own module. Load the use.own module. module load use.own This module will be listed by module avail. Load the module. module load your-module-name
Managing the Environment Modules can be loaded in the.bashrc file. The module would be loaded automatically when you login. This is helpful if you know that you will always want that module to be loaded. Adding module list is helpful to remind you of the modules that were loaded. Modules are loaded in batch scripts that run jobs on the clusters. Be sure to load the same module that was used in the compilation of the code.
Compilers The GNU compilers can be the easiest to use when building some codes. There are no modules for the GNU compilers. These compilers and libraries are part of the Operating System and are in the default path. C compiler: gcc gcc -o hello-gnu hello.c C++ compiler: g++ Fortran compilers: g77, gfortran gfortran is a Fortran 95 compiler.
Compilers Typically codes built with the Intel compilers can have the best performance. The Intel compilers can take advantage of the processor and core architecture and features. Load the module for the intel compiler: module load intel C compiler: icc icc -o hello-intel hello.c C++ compiler: icpc Fortran compiler: ifort
Compilers The PGI (Portland Group) compilers are also available. Load the module for the PGI compiler: module load pgi C compiler: pgcc pgcc -o hello-pgi hello.c C++ compiler: pgcc Fortran compilers: pgf77, pgf90, pgf95
Message Passing Interface There are several implementations of the Message Passing Interface Protocol available on the cluster. Typically, Intel MPI is the best performing and most reliable implementation for running parallel computations on the cluster. Show the Intel MPI modules: module avail intel-mpi Always use the most recent version. Intel MPI can be used with the Intel, PGI and GNU compilers. Use wrappers to compile code: mpicc o cpi-gnu cpi.c
Numerical Libraries Intel Math Kernel Libraries (MKL) MKL provides highly tuned BLAS, LAPACK, DFTs, BLACS and ScaLAPACK Intel MKL linking tool is very useful in setting up the linking to the libraries. PETSc (Portable Extensible Toolkit for Scientific Computation) PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. more on Numerical Libraries
Running Computations The CCR SLURM User Guide has the most complete and recent information for running and monitoring jobs on the CCR Cluster. CCR SLURM User Guide Code, scripts, READMEs and pdfs for download on the right-hand side. Navigation to SLURM topics on the left-hand side. Please do not use any instructions for PBS. We no longer support that scheduler.
Getting Help Online resources: User Guide New User Information Tutorials and Presentations more on getting help