CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch 'penguin' FrontEnd 192.168.0.1 130.113.68.23 192.168.0.2 node 2 node 8 lab8fig1.cfl wfsp/jan04 192.168.0.3 node 3 192.168.0.4 node 4 192.168.0.5 node 5 node 7 node 6 192.168.0.6 192.168.0.7 192.168.0.8 192.168.0.9 McMaster University Hamilton, Ontario L8S 4K1 2004 Done: Round-Robin File: 4cc04lb7.doc Date:25oct04/nm Revision Level: 01
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-2 Introduction This lab will provide you with an understanding of what the major hype is behind grid computing. Grid computing is not a new idea; the concept has been around the research world for a long time but lacks real life generalized tools. Named for the ubiquity of the electric power grid, grid computing represents a flexible and scalable architecture that collects and concentrates available computational resources to solve business and mission-critical computational challenges. Several definitions can be found on the internet of what many consider grid computing. For example, a grid can be a flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources. By organizing hundreds or thousands of interconnected heterogeneous computers as a single, unified computational resource, Grid computing offers a cost-effective approach to solving compute-intensive problems while consolidating and simplifying distributed resource management. For an example of a well known grid computing project please visit SETI (Search for Extraterrestrial Intelligence) at http://setiathome.ssl.berkeley.edu/download.html. Here PC users worldwide donate unused processor cycles to help the search for signs of extraterrestrial life by analyzing signals coming from outer space. The project relies on individual users to volunteer to allow the project to harness the unused processing power of the user s computer. This method allows the researchers to maximize their processing capabilities yet minimizing their operations costs. Many other real life examples to address issues such as Smallpox, cancer, and anthrax are in place and being used. Objectives Understanding of Grid Computing Familiarize yourself with Grid Engine Create and submit a task for a grid to solve CAS Setup As stated in the introduction in order to do grid computing a cluster of computers is required along with appropriate tools to manage those nodes transparently. In our department (CAS), we have one such cluster that can provide us with tremendous amounts of processing power. It s a combination of 9 machines, one acting as the main server (i.e. the frontend) along with 8 nodes that act as aiding computational entities. To illustrate with a diagram, please review figure 1. Table 1. CAS Grid Computing Component Specifications Penguin: 2 x 2.4GHz Xeon processor 4GB RAM main memory 2 x 36GB 15000 RPM SCSI disks 2 gigabit ethernet interfaces For Each Node: * 2 x 2.4GHz Xeon * 1GB RAM * 1 x 80GB IDE disk * 1 gigabit ethernet
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-3 Figure 1. CAS Grid Computing Setup PART I Introduction to the Grid Engine The introduction and McMaster setup provided the background needed to start your grid computing lab. Notice we provided the famous example of SETI in the introduction as a real life use of grid computing. 1. As a first question, please survey the Internet to find another real life implementation/use of grid computing. Your answer is to include what group is using it, for what purposes, and explain how they are taking advantage of grid computing. Grid computing does require special software that is unique to the computing project for which the grid is being used. In our case the software we will use is the Grid Engine project. The Grid Engine project is an open source community effort to facilitate the adoption of distributing computing solutions. Sun developed the initial versions of the software, which turn extremely successful and was free of charge to use. They now sponsor the Grid Engine open source project and develop their own Enterprise Edition in which licenses are required. This management software allows us to transparently submit a job and not have to worry about how the task is split up within the cluster. Therefore in order to prepare you for part II, part I will be used to review the documentation that comes with the Grid Engine. This documentation can be found on the cs4cc3 website at: http://www.cas.mcmaster.ca/~cs4cc3/papers/sungeng.pdf
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-4 Please open the above link and begin reading to answer the following questions. The answers you obtain here will be required for you to complete the second section so do take the time to understand your answers. 2. Grids can be classified broadly into three different classes. Please list those classes, and identify which is supported by the Grid Engine project and which are identified by Sun Grid Engine Enterprise Edition. 3. At the bottom of page 22 they give an analogy of how the Grid Engine behaves. This analogy relates a money-center (i.e. a bank s behavior) to the Grid engine by illustrating a typical day in a bank and relating each scenario to the Grid software. Your job is to come up with another scenario which clearly illustrates that you understand the task of the Grid engine software and the big picture of this lab. 4. Identify the four types of hosts that are fundamental to the Sun Grid system. Discuss their differences, and what each are used for. In our case what is penguin.cas.mcmaster.ca classified as? What are the eight nodes that are behind classified as? 5. What are the three daemons that most be running on a master host? Are they running on penguin? What command did you use to check that? What two daemons must be running on an execution host? 6. In general when using daemons, and standard internet application you tell the OS what port to listen on for certain activities. In UNIX the /etc/services file is used for that purpose. Note that one daemon runs on all hosts using the grid engine. Discuss what that daemon is used for? Determine what port that Penguin is set to listen for the SGE (Sun Grid Engine) TCP traffic. 7. Can a host be a combination of an execution host and a Master host? In General can a host be a part of two groups or are the groups mutually exclusive? 8. What are sun grid engine queues used for? Explain how queues and jobs are tied together (i.e. how one can affect the other). As a submit hosts do we need to worry about the management of queues? If we do explain how, if we don t explain who does for us. 9. What are the two ways of operating the Grid engine (i..e. modes of operation)? 10. An account has been created for the use of this lab. Log onto penguin and redirect the display to your terminal. Your logon information will be the following: user: cs4cc3st password: moores.law Make sure to SSH and redirect the root display to your current terminal. Please determine how to do this and write the command you used as part of this answer. 11. Once you log in you must run a specific script in order to be able to execute the grid engine commands. This script simply updates your system environment to add the appropriate paths. Do the following: source /usr/local/sge-5.3p6/default/common/settings.csh You should now be able to execute the set of SGE binaries that are installed. Execute QMON the Graphical User Interface (GUI) that will aid us in managing our cluster as well as submitting jobs. If the X-window GUI for qmon, which is illustrated in Figure 2, does not appear on your X server display after several minutes or complains with some sort of error, the GridEngine may need to be restarted. (Note: the same error occurs, if the above source command is not issued in your filespace.) Please advise the TA or
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-5 more appropriately Derek Lipiec who has root access and can restart SunGE on penguin. Figure 2- QMON Main GUI window for using SGE The above window should be shown upon successful completion of QMON. 12. Explain what the following terms mean: a. Cluster b. Cell c. Manager d. Job Class e. Operator 13. Discuss what a queue represents in terms of the Grid Engine? Explain two different ways of verifying the status of your queues? This concludes PART I of the lab. Although very little practical operational programming has been accomplished, much more will be done in Part II. Record your answers for this section for inclusion in the lab report to be completed at the end of Part II and submitted one week later via WebCT. PART II Running Programs on the Grid Engine Now that you have had the chance to play around with Grid Engine in Part I we will put that knowledge to use. The core of any cluster or enterprise grid is the Distributed Resource Manager (DRM). Sun Grid Engine and its open source version Grid Engine are both examples of excellent DRMs. You can think of Grid Engine as an extra layer above parallel environment libraries such as PVM, MPI, and Globus. This extra layer provides a graphical frontend that allows you to manage your resources more effectively.
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-6 As illustrated above you are connecting to penguin that acts as the frontend (i.e. the logon node). You as a user never have to communicate directly with the computing nodes. The following three experiments will show you how you can use the Grid engine to simply submit a job and wait for results. Any tasks can be submitted to the Grid Engine but to gain its full power, using a parallel environment such as MPI will provide much greater use. Essentially a developer simply has to learn to use a library such as MPI, compile his code locally, and then submit it to SGE. You should already be familiar with parallel environments through either previous programming exercises/labs or other courses. A Parallel Environment (PE) is a software package designed for parallel computing in a network of computers, which allows execution of shared memory and distributed memory parallelized applications. A variety of systems have evolved over the past years into viable technology for distributed and parallel processing on various hardware platforms. The most commonly used parallel environments are Parallel Virtual Machine (PVM), Message Passing Interface (MPI), and OpenMP. All these systems show different characteristics and have unique requirements. In order to be able to handle arbitrary parallel jobs running on top of such systems, the Sun Grid Engine system provides a flexible and powerful interface that satisfies the various needs. Arbitrary PEs can be interfaced by Grid Engine as long as suitable startup and stop procedures are provided which is what we will do for MPI for the last part of this lab. The purpose of the rest of this lab will be to see how SGE facilitates the submissions of multiple types of jobs. We will test jobs of all sorts include standard bash commands, binary submissions, and submission of a parallel environment such as MPI. Therefore, to get started log back in to the class account and make sure step 11 from part 1 is already completed before going on. Execute QMON as well. Before practicing the submission of jobs you should check to see the status of the available queues and how busy the Grid system is. From your knowledge of part 1 please describe which nodes are being used as computational nodes and how many slots each node is given (a possible screenshot here with an explanation is probably the best approach).
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-7 What does the orange bar signify on cnode2 queue status? 1. Shell Script job submission The easiest and most straightforward submissions are simple shell scripts. By default you cannot submit binaries to SGE it will return an error (there are ways around as you will see in the next section). In order to get a feel of a simple submission download the script sleeper.sh from the CS4CC3 website and place it in your home directory. Open the script and it should be obvious as to what it does please include a brief note in your discussion what sleeper.sh did. Before submitting the job open the Job Control window as well as the Queue Control window. By having these two windows open on the side you will be able to visually see the job being completed. To submit this job use your knowledge from question 14 of part 1 and submit the job using whichever way you desire. Notice that when the job is done a new file was generated in your current working directory called sleeper.sh.o[job]. View the content of that file and explain the results of that file in your write-up. If you get two output files from submitting your job it is because an error has occurred. The.o extension refers to the standard output, but the.e extension refers to the standard error. Now to see that SGE delegates where each tasks will run continually submit the sleeper.sh script (maybe a dozen times) in a row and with QMON open the Queue Control window to see how the queue dynamically get filled up and how they empty out as they gradually finish. In order to get some practice develop a script with your group that can be submitted to the Grid Engine. The script does not have to be long or take any form of parallelism (just something similar to sleeper.sh ). In your write up you are required to submit that script, a screenshot of your submission or queue status (either one-gui or CLI), as well as the output file(s) that are generated. 2. Binary Submission This section similar to the previous will illustrate that you are not bounded to simply script languages but that you can also submit binaries. Now again here, we will not take advantage of any sort of parallelism yet but you will see a new method to submit a task. Go to the 4CC3 website and download the machineeps.c file. Look at the code and determine what the purpose of the while loop is for. Please include what is meant by machine epsilon and the purpose of that loop in your discussion for this section. After analyzing the code compile the output to create the corresponding binary. The following section will all be strictly done at the command line interface. So in order to submit your work correctly you should log the following commands and their corresponding output (hint: use the typescript command).
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-8 Now although you can simply run the program locally what happens if this were a more computational intense program that a single CPU on a standalone machine could not handle? This is where the beauty of SGE comes to play. At the command prompt type the qsub command then press return without specifying a job script. You will then see a secondary shell prompt where you can type in the name of the binary file you want to submit. You can then press return and continue to enter more binary or shell commands. When you are done specifying your job press control-d. i.e % qsub machineeps.exe <CTR-D> your job 104 ( STDIN ) has been submitted To show that you have submitted your job as soon as you hit CTR-D type in qstat ne which will describe to you the status of your used queues. If nothing shows up yet keep entering that command till you see a change in queue status. You should be seeing something like: [cs4cc3st@penguin ~] qstat -ne job-id prior name user state submit/start at queue master ja-task-id --------------------------------------------------------------------------------------------- 133 0 greetings merizzn qw 10/24/2004 01:59:57 [cs4cc3st@penguin ~] qstat -ne job-id prior name user state submit/start at queue master ja-task-id --------------------------------------------------------------------------------------------- 133 0 greetings merizzn qw 10/24/2004 01:59:57 [cs4cc3st@penguin ~] qstat -ne job-id prior name user state submit/start at queue master ja-task-id --------------------------------------------------------------------------------------------- 133 0 greetings cs4cc3st t 10/24/2004 02:00:12 cnode3.q MASTER 0 greetings cs4cc3st t 10/24/2004 02:00:12 cnode3.q SLAVE 0 greetings cs4cc3st t 10/24/2004 02:00:12 cnode3.q SLAVE 0 greetings cs4cc3st t 10/24/2004 02:00:12 cnode3.q SLAVE 0 greetings cs4cc3st t 10/24/2004 02:00:12 cnode3.q SLAVE [cs4cc3st@penguin ~]
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-9 3. Parallel Environment Submission To reduce the complexity of this lab the Grid engine has already been configured to accept parallel jobs. Your task once again is to transparently submit and wait to obtain results. From your screenshots above and from looking at your queues after submitting the previous two parts I m sure you have noticed that one submission results in one queue being used. In this section you will see one submission that will result in multiple queues being used. We will ignore the details of the code and will omit going into details of parallel programming. Here we are strictly looking at how SGE can be used to submit complex algorithms to a larger more efficient infrastructure. In order for the environment to work correctly you need two files and an environment variable: 1. First move to your home directory and check to see if you already have a.login file that exists. If so open it up and make sure the following line is contained inside (at the end): setenv TMPDIR ~/SGElab/ 2. Again without going into the details of MPI a key file that needs to be provided in order to successfully execute tasks on multiple CPUs is providing MPI with a machines file. The scripts that have been created for your use assume their location to be at $TMPDIR/machines. So inside your home directory please create a new file called machines and add the following lines: penguin cnode1 cnode3 cnode4 Below is the script that you will be executing:!/bin/sh Your job name $ -N MPI_jacobi Use current working directory $ -cwd Join stdout and stderr $ -j y pe request for MPICH. Set your number of processors here. $ -pe mpich 4 Run job through bash shell $ -S /bin/tcsh The following is for reporting only. It is not really needed to run the job. It will show up in your output file.
2004/2005 CS 4CC3/6CC3 -- Laboratory 7 page 7-10 echo "Got $NSLOTS processors." echo "Machines:" cat $TMPDIR/machines Use full pathname to make sure we are using the right mpirun /usr/local/mpich/bin/mpirun -np $NSLOTS \ -machinefile $TMPDIR/machines jacobi.exe Commands to do something with the data after the program has finished. When submitting parallel environment (PE) jobs you need to warn SGE about it. The SGE administrator will have already provided a parallel environment for your programs to run in but you need to tell it to use it. So that is what the $ -pe mpich 4 means in the above script. This says use the mpich PE that is already setup and then I want the script to use 4 processors. With the Queue Control window open next to a bash prompt type submit your job and constantly refresh your Queue Control window and observe what happens. Please discuss what happens. In order to help you with your discussion another MPI script was provided which can be changed a little. Open up the greetings.sh script and change the $ -pe mpich 4 line to some other value between 2-8 (ie. $ -pe mpich 6). Submit the program with your changed value and observe the number queues that are taken up. This completes the lab on the Grid Engine. Although only introductory issues were looked at its power extends greatly, and many more experimental issues can be tested to really see the performance gains of using such a system. Acknowledgements Thanks go to Mr. Nicholas Merizzi, a graduate student in the Applied Computersystems Group (ACsG) for his idea conception, design and implementation of this laboratory. His attention to detail has produced a very instructive mechanism for understanding group communications in a networked environment. (wfsp/2004) File: 4cc04lb7.doc Revision Level: 01 Date:25oct04 / wfsp