Cloud Computing. Up until now

Transcription

1 Cloud Computing Lecture 3 Grid Schedulers: Condor, Sun Grid Engine Introduction. Up until now Definition of Cloud Computing. Grid Computing: Schedulers: Condor architecture. 1

2 Summary Condor: user perspective. Condor Flocking. Sun Grid Engine. Create a sub file: Job Submission % vi program.sub Submit the job: % condor_submit program.sub Universe = standard input = program.in output = program.out executable = program queue 3 2

3 Job Submission Executable = /bin/foo Arguments = xpto $(Process) Requirements = Memory >= 1024 && OpSys=="WINNT51" && Arch =="INTEL" Universe = vanilla input = test.data output = $(Process).out error = $(Process).error log = $(Process).log Initialdir = run_1 Queue 5 Initialdir = run_2 Queue 5 Arch, OpSys, Disk (KB), Memory (MB), Machine, Job Submission More: _Job.html 3

4 ClassAds ClassAds are Condor s mechanism for: Representing resources and clients within the system. Expressing client and machine preferences. Allocating resources. Sufficiently expressive for representing characteristics (features), requests and policies. Simple enough to allow matching (at the negotiator) between clients and resources. Can be listed using condor_status. Condor_status example 4

5 ClassAds MyType = Machine TargetType = Job Machine = n3.grid.com Arch = INTEL OpSys = Linux Disk = Rank = (Customer=john?1:0) MyType = Job TargetType = Machine Owner = john Cmd = /usr/bin/java Rank = Kflops * 10 + Disk Condor Scheduling Calculate the total available resources. Order requests by their users priority (lower is better). Priority starts with a configured value and decays with resource use for fairness. Calculate the proportional resource share by user priority. Start the jobs from the user with highest priority by order of machine preference followed by job preference. Continue with the next user. 5

6 When one connects clusters HELP! SOS! Cluster Cluster Cluster File Server File Server SOS! Cluster Cluster HELP! SOS! File Server File Server File Server File Server Unfriendly Environments An executable may run with: Correct OS and HW architecture and enough memory. But some elements may be missing: Input files. Disk space for output files. Absence of shared file system. No login. Run as nobody? 6

7 Condor Applications Unix or Windows binary executables. Scripts. Interpreted programs (JVM, Mono, perl). MPI. PVM. Universe Types Condor provides different universes: vanilla UNIX jobs + no Remote I/O. standard UNIX jobs + Remote I/O. scheduler UNIX jobs with immediate local execution. globus UNIX jobs over Globus. java Javaapps. Finds and benchmarks the VM. pvm PVMjobs. Finds new nodes as the job progresses. mpi MPIjobs. Reserves nodes before starting job. vm Run a job inside a system virtual machine (VMWare or Xen). 7

8 vanilla Universe Allows users to submit any UNIX process to Condor. Pros: No program modification. Very flexible. Includes: Binaries. Scripts. Interpreted programs (java, perl). Multi-process jobs. Cons: vanilla Universe (cont.) No checkpointing. Limited I/O at remote machines: Explicit description of input files. Explicit descriptions of output files. Condor does not start vanillajobs at an unfriendly node. ClassAds: FilesystemDomain and UIDDomain must match. 8

9 standard Universe Allows users to submit jobs with special Condor relinking. Pros: Checkpointing Remote I/O: Friendly environment anywhere. Data buffering. I/O performance monitoring and reporting. Remapping of file names. Cons: standard Universe (cont.) Applications must be relinked. Limited set of applications: Only single process UNIX apps. Certain system calls are restricted. 9

10 Restrictions on System Calls standard universe does not allow: Multiple processes: fork(), exec(), system() Inter-process communication : Semaphores, message passing, shared memory. Sophisticated I/O: mmap(), select(), poll(), non-blocking I/O, file locking. Threads. Remote I/O Starter!!! file_remaps = "data = 10

11 Brief I/O Summary % condor_q -io -- Schedd: c01.cs.wisc.edu : < :2016> ID OWNER READ WRITE SEEK XPUT BUFSIZE BLKSIZE joe KB KB KB/s KB 32.0 KB joe KB KB B /s KB 32.0 KB joe 44.7 KB 22.1 KB B /s KB 32.0 KB 3 jobs; 0 idle, 3 running, 0 held Great for performance debugging! Complete I/O Summary in Your condor job "/usr/joe/records.remote input output" exited with status 0. Total I/O: KB/s effective throughput 5 files opened 104 reads totaling KB 316 writes totaling 1.2 MB 102 seeks I/O by File: buffered file /usr/joe/output opened 2 times 4 reads totaling 12.4 KB 4 writes totaling 12.4 KB buffered file /usr/joe/input opened 2 times 100 reads totaling KB 311 write totaling 1.2 MB 101 seeks 11

12 File Remapping Suppose a program opens a file called data, but one wants to open a different file according to the process number. In the jobs sub file, add: file_remaps = "data = /home/john/data.$(process)" Process 1 gets /home/john/data.1 Process 2 gets /home/john/data.2 And so on And of course free access to distributed file systems. Relinking Use condor_compilebefore usual compilation commands: For example: gcc main.o utils.o -o program Becomes: condor_compile gcc main.o utils.o -o program Despite the name (compile), it s just relinking with Condor libraries. 12

13 Checkpoint To checkpoint an executing program is to take a snapshotof its current state in such a way that the program can be restarted from that stateat a later time possibly at a different resource. Provides: Preemptive-Resume scheduling. Fault Tolerance when checkpointing is done periodically. In Condor, checkpointingrunning jobs is optional. If it is needed, source should be linked with condor_syscall_lib. Checkpointing in Condor Implemented in condor_syscall_libas a signal handler When condor sends a signal to checkpoint, the handler saves process state informationin a checkpoint file From Core -contents of process uarea, data and stack segments From Executable symbol and debugging info, initialized data, text 13

14 Checkpointing& Restart Shadow sends the latest checkpoint file to the new Starter during restart The starter, reads the job state from the checkpoint file and the execution continues Starter periodically sends a checkpoint signal to the executing job Condor_syscall_lib makes job dump core and saves job state in the checkpoint file Checkpoint file temporarily Remote Machine Starter transfers latest checkpoint file to shadow when job vacated Checkpoint signal Starter process for the remote job Checkpoint file Code in condor_syscall_lib saves process state information Checkpoint file transferred when job vacated Checkpoint file transferred when job restarted Local File System Shadow process for the job Remote Machine Submit Machine Ganglia: GUI for Grid Monitoring 14

15 DAGMan Directed Acyclic Graph Manager Manages dependencies between processes: Don t run B before A finishes. The execution plan is represented as a directed acyclical graph (DAG), where: Nodes are processes. Edges are dependencies. Defining DAGs A DAG is specified in a.dagfile that lists the tasks and their dependencies. For example: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D Job B Job A Job D Job C Each node corresponds to the job described in its.sub file. 15

16 Running a DAG % condor_submit_dag diamond.dag Starts a daemon process to follow the execution and interact with the schedd. It s a meta-scheduler: controls the scheduler. Only submits jobs when the plan allows for it. Processing the DAG results in a list of execution levels. Level 1 A Level 2 B C D Level 3 E DAG: other features Associate scripts to jobs: SCRIPT PRE e SCRIPT POST Rescue: If a job fails, DAGMangenerates a.dag.rescuefile with the missing part of the DAG. Retry: If a job fails, it may be reexecuted: RETRY A 5 Throttling: It is possible to limit the number of concurrent jobs: condor_submit_dag maxjobs N 16

17 Condor: Flocking It s a compilation configuration + configuration file describing the other pools. Gatewaysshare job and node characteristics among themselves. Sun Grid Engine Grid/cluster scheduler from Oracle (formerly Sun) One machine can have multiple roles 17

18 Sun Grid Engine (SGE) Node types: Master: sge_qmaster and sge_schedd Worker: sge_execd, queues and load balancing: Queues are independent from CPUs. They can refer to a CPU, to a machine or to a cluster. Shadow-master: Passive replica of the master. Becomes the master, if the master fails (hot spare). Administration. Submission. Each node has a sge_commddaemon for communication between nodes. SGE: job submission g.jobfile #!/bin/csh # 4 CPUs num SMP (alternatives: nothing, mpi [roundrobin], mpi_fillup [fill each node]) #$ -pe smp 4 #$ -M jog@gsd.inesc-id.pt # notification when it aborts or ends. #$ -m ae # no restart #$ -r n my_script < data.in To start it: qsubg.job 18

19 SGE: execution model Users may specify requirements (CPU type, free disk, memory, etc.) SGE records the job request, requirements and control information (user, group, department, submission date/hour, etc.) As soon as a queue is available, SGE starts running one of the waiting job (FIFO): The job with the highest priority or longest waiting time is launched. If there are several available queues, the least loaded is selected. SGE: Policies Scheduling policies: Ticket based (per user) The more tickets a user has, the higher her priority will be. Tickets are assigned statically according to the queue policy and priority assigned to each user. Urgency based (per job): Deadline to finish the task (user specified). Job s queue waiting time. Requested resources. Personalized: allows for arbitrary assignment of priority to the jobs (similar to the UNIX nice command). 19

20 SGE: job lifecycle Submission. Master (sge_masterd) stores the job and informs the scheduler. Scheduler (sge_schedd) inserts the job in the appropriate queue. Master sends job to the assigned node. Before running, the execution daemon (sge_execd): Changes the working directory to the job s directory. Sets environment variables. Sets the number of processors to be used. Changes the current uidto the job owner s uid. Initializes the job s resource limits. Starts records auditing information. Stores the job information persistently. When the jobs terminates, the master is notified and the database entry removed. 20

21 Systems are similar. Condor has flocking. Condor vs. SGE Condor connects to the Grid (later). SGE delegates account privileges across machines. SGE has more flexible scheduling: algorithms and queuing. Globus. Condor-G. Next time 21