Generic Grid Computing Tools for and Project Management Erik Elmroth Dept. of Computing Science & HPC2N Umeå University, Sweden Overall objectives Short term: Generic infrastructure components for resource & project management Interoperable, standards-based Long term: Grid-enabled & Grid-enabling tools for scientific computing Accounting Broker 1
1 2 3 4 5 Grid Projects Overview Generic Grid Computing Research Multiproject Job submission and resource brokering Standards-based, cross-middleware (ARC, LCG2, GT4) SweGrid Accounting System (SGAS) (with KTH, Sthlm) Included in Globus Toolkit 4 Grid-wide fairshare scheduling Hierarchical three-party QoS support (user, resource-owner, VO-authority) Grid interface-generation for numerical software libraries SLICOT-interfaces for NetSolve and web-portals High-level data re-replication systems (new) & project portal for SNIC HPC2N (coordinator), NSC, PDC Portal interface and functionality SNIC-wide database & security sol ns An Interoperable, Standards-based Grid Broker and Job Submission Service joint work Johan Tordsson, UmU 2
Contributions - Summary Web Service (GT4) based job submission service (JSS) and Grid resource broker Decentralized broker not assuming global control Based on existing and emerging Grid standards JSDL, GLUE, WSAG, WSRF Exchangeable modules Replaceable resource selection algorithms Interoperable with multiple Grid middlewares Supports advance reservations benchmark-based estimation of job duration ARC client GT4 client LCG2 Client Job Submission Module LCG2 GT4 ARC JSS Architecture Overview 4. 4. 8. 8. 6. 3. 5. 9. 7. 7. 9. 2. 1. 3
Middleware Integration Points (cont.) 2. 3. 1. Selection Algorithms Earliest job completion = shortest Total Time to Delivery (TTD) TTD Stage in Wait Execute Stage out TTD part: How to predict? File stage in - network bandwidth / user estimation Wait for resource access - adv. reservation / load prediction Application execution - benchmarks / user estimation File stage out - network bandwidth / user estimation Earliest possible job start File stage in and wait for access (same predictions as above) 4
Performance Evaluation Response time, including all overhead: brokering, interaction with information services and resources Five runs of 200 jobs each One client submitting one job at the time Observed response time of 1.3 seconds per job Throughput:40 jobs/minute (multiple clients via single JSS) Without advance reservations: With advance reservations: Current and Future Work Integration with additional middlewares Extended performance evaluation Performance evaluation of JSS against different middlewares (ARC, GT4, LCG2) Add coallocation support Reuse main framework CoAllocator Replace submitter only 5
Enforcing resource allocations with the SweGrid Accounting System (SGAS) joint work with Peter Gardfjäll, UmU Lennart Johnsson, KTH Olle Mulmo, KTH Thomas Sandholm, KTH SweGrid Accounting System (SGAS) Decentralized resource allocation enforcement system SGAS performs soft real-time enforcement of allocations Real-time enforcement: s can, at the time of job submission, deny access if project quota has been used up Soft: enforcement is subject to local resource policies (strict enforcement not always appropriate) Initially addressed allocation enforcement in SweGrid Not restricted to SweGrid use Developed with an emphasis on easy integration into different Grid middleware Single-point-of-integration In SweGrid: deployed on top of NorduGrid middleware WSRF-compliantJava implementation using Globus Toolkit 4 6
Component interactions 1. Contact resource 2. Authenticate/authorize (delegate credentials) 3. Submit job request 4. JARM intercepts request 5. Make account reservation 6. Run job 7. Collect usage info 8. Charge project account and log usage info Project information Please visit us at http://www.sgas.se SGAS download (version 2.0 available) Documentation Publications Mailing list: swegrid-accounting@pdc.kth.se Globus Toolkit contribution http://www-unix.globus.org/toolkit/docs/4.0/techpreview/sgas/ 7
A Decentralized System for Grid-wide Fairshare Scheduling joint work with Peter Gardfjäll, UmU Fairshare scheduling (Logical) division of resource capacity Users granted target shares Entitled portion of delivered utilization Scheduler adjusts job prio according to job owners' past usage job prio := f(target share, job submitter historical usage) History decay to increase impact of recent usage Goal: fairness over time We apply fairshare scheduling on a Grid-wide scale Share policies that (logically) divide aggregate Grid capacity Locally (on a resource) & globally (Grid-wide) Hierarchical (between VOs, projects, users, ) 8
allocation model share policies Coordinate VO utilization VO allocation authority grant local share grant Grid-wide share Control degree of contribution owner FSGrid consume share VO user group FairShareGrid system Establish and enforce share policies VO users are granted shares of aggregate Grid capacity Coordinates utilization across the Grid subdivide share QoS guarantees Control usage within group Share policy illustration 1 Local scope Global scope SweGrid (40%) NorduGrid (20%) Local users (40%) SweGrid NorduGrid Physics project () Biology project (20%) Chemistry project (50%) Group 1 (50%) Group 2 (50%) Share policy enforcement Carried out locally by steering utilization towards target shares Local shares enforced locally (local usage data) Global shares collective enforcement (Grid-wide usage data) Top-down enforcement Decentralization! No central coordinator 9
Framework components VO-A usage data VO-A policy provider Policy reference Runtime Runtime Policy Policy tree tree Policy Policy engine engine Local policy Local usage DB Priority Priority calculator calculator job Workload manager Fairshare factor callout Scheduler Simulated Grid GridSim: discrete-event Grid simulation toolkit SweGrid-like environment (6 x 100 CPUs) Each resource has a cluster scheduler Space-shared (one job per processor) Non-preemptive Callout to determine FS priority factor for each job Global view on utilization data refreshed once/min Workload Each user runs a stream of single-cpu, batch jobs Contention for resources One hour jobs (±40%) 10
1. Correctness VO-B usage P-A1 50% VO-A P-A2 P-A3 20% U-B11 55% VO-B 70% P-B1 60% U-B12 P-B2 40% U-B13 15% 75 70 P-B1 P-B2 Aggregated utilization (%) 65 60 55 50 45 40 35 30 25 0 50000 100000 150000 200000 250000 300000 350000 400000 Time (s) VO-B projects utilization 1. Correctness P-B1 usage P-A1 50% VO-A P-A2 P-A3 20% U-B11 55% VO-B 70% P-B1 60% U-B12 P-B2 40% U-B13 15% Aggregated utilization (%) 70 60 50 40 30 20 10 U-B11 U-B12 U-B13 0 0 50000 100000 150000 200000 250000 300000 350000 400000 Time (s) P-B1 users utilization 11
3. Imbalanced workload P-A1 50% VO-A P-A2 P-A3 20% U-B11 55% VO-B 70% P-B1 60% U-B12 P-B2 40% U-B13 15% 80 70 Only local usage data P-A1 P-A2 P-A3 60 55 Grid-wide usage data P-A1 P-A2 P-A3 Aggregated utilization (%) 60 50 40 30 20 Aggregated utilization (%) 50 45 40 35 30 25 10 20 0 15 0 50000 100000 150000 200000 250000 300000 350000 400000 0 50000 100000 150000 200000 250000 300000 350000 400000 Time (s) Time (s) P-A2 and P-A3 only submit jobs to half of the resources Conclusion: Grid-wide usage data important for global share enforcement 4. Subgroup isolation P-A1 50% VO-A P-A2 P-A3 20% U-B11 55% VO-B 70% P-B1 60% U-B12 P-B2 40% U-B13 15% Sibling shares Parent shares Aggregated utilization (%) 70 60 50 40 30 20 10 U-B11 U-B12 U-B13 Aggregated utilization (%) 75 70 65 60 55 50 45 40 35 30 P-B1 P-B2 0 0 50000 100000 150000 200000 250000 300000 350000 400000 Time (s) U-B12 becomes idle 25 0 50000 100000 150000 200000 250000 300000 350000 400000 Time (s) Conclusion Performs subgroup isolation Idle share made available to (and only to) active sibling entries 12
and project portal joint work with Mats Nylén, Roger Oscarsson, UmU (additional parts jointly with PDC and NSC) Grid Portal Development Common easy-to-use interface to a diverse set of heterogeneous systems (Grids or specific computers) Features (on-going work): Access a general Grid or individual resources Single sign-on Submit Grid/batch jobs Monitor/delete jobs Integrated information services View output Use system commands File transfer Archive/retrieve data Manage accounts View/manipulate files/navigate in file systems Open a terminal window + SNIC-wide local/global database! Main developer: Roger Oscarsson Collaboration with NSC and PDC 13
Recent Grid Computing Publications (2005) E. Elmroth, M. Nylén, and R. Oscarsson. A User-Centric Cluster and Grid Computing Portal. International Journal of Computational Science and Engineering, 2005, (accept.) E. Elmroth and J. Tordsson. An Interoperable Standards-based Grid Broker and Job Submission Service. e-science 2005. First IEEE Conference on e-science and Grid Computing, IEEE Computer Society Press, USA, 2005, pp. 212-220, 2005. E. Elmroth and P. Gardfjäll. Design and Evaluation of a Decentralized System for Gridwide Fairshare Scheduling. e-science 2005. First IEEE Conference on e-science and Grid Computing, IEEE Computer Society Press, USA, 2005, pp. 221-229, 2005. E. Elmroth, P. Gardfjäll, and J. Tordsson. An Advanced Grid Computing Course for Application and Infrastructure Developers. CCGrid05, IEEE Computer Society Press, USA, 2005, pp. 43-50, 2005. E. Elmroth and R. Skelander. Semi-automatic generation of Grid computing interfaces for numerical software libraries. State-of-the-art in Scientific Computing. Springer- Verlag, Lecture Notes in Computer Science, Vol. 3732, 2005, pp. 404-412, 2005. E. Elmroth, P. Gardfjäll, O. Mulmo, and T. Sandholm. An OGSA-based Bank Service for Grid Accounting Systems. State-of-the-art in Scientific Computing. Springer-Verlag, Lecture Notes in Computer Science, Vol. 3732, pp. 1051-1060, 2005. E. Elmroth and J. Tordsson. A Grid Broker Supporting Advance Reservations and Benchmark-based Selection. State-of-the-art in Scientific Computing. Springer-Verlag, Lecture Notes in Computer Science, Vol. 3732, pp. 1061-1070, 2005. T. Sandholm, P. Gardfjäll, E. Elmroth, L. Johnsson, and O. Mulmo. A Service-Oriented Approach to Enforce Grid Allocations. (Submitted for Journal publication.) See http://www.cs.umu.se/~elmroth 14