The Grid Vision (Foster and Kesselman)

Presentation context!this research has been conducted in the framework of the european DataGRID project (http://www.edg.org)!cooperation between ITC-irst (now Fondazione Bruno Kessler) University of Glasgow CERN involved in WP2-Data Management The Grid Vision (Foster and Kesselman)

Example of HEP data analysis: CMS Testbed!20 sites (Europe + US)!6 countries!initially all file master copies are @ CERN and FNAL!Physicists execute jobs to perform data analysis!a job is a set of data files to be analysed

Assumptions! Data management: large amounts of data at distributed sites! Data is read-only! Replication is required between Storage Elements (SEs)! Need for storage and transfer optimization SE SE Our focus: Data Grid Optimization There are 3 stages in the lifetime of a job where optimisation occurs:!scheduling find the best site to run my job!replica Selection find the best replica for my running job (short term optimisation)!dynamic Replica Optimisation make sure replicas are in the best position for possible future jobs (long term optimisation, depends on collected access patterns)

Contribution of our research! Development of OptorSim, a Data Grid simulator! Definition of strategies for Grid Optimization scheduling algorithms for Grid jobs economy-based algorithms for replica selection and dynamic replica optimization! Definition of evaluation metrics! Evaluation and comparison of algorithms using simulation OptorSim! OptorSim is a Grid simulator written in Java to model the behaviour of replica optimisation algorithms! It mimics the DataGRID environment by simulating the execution of experiments that require distributed data!input Scenario!Grid topology, computational and data resources!set of jobs to be executed!optimisation strategy! It allows testing and comparison of optimisation algorithms in various Grid scenarios http://sourceforge.net/projects/optorsim

Simplified DataGrid Architecture Implemented in OptorSim Replica Replica Dynamic selection Dynamic replica replica selection optimization optimization Job Job submission submission Scheduling Scheduling Job Job execution execution File File storage storage Scheduling Algorithms Random - Site for job execution is randomly selected by the storage broker schedule(j) = random(s) Shortest Queue - Site having min-length job queue of is selected schedule(j) = argmin s in S jobqueue(s) Access Cost - access time of all files for the current job is calculated. Site with minimum total time is selected schedule(j) = argmin s in S accesscost(j,s) accesscost(j,s) =! f in J accesstime(f,s) = =! f in J (min r in repl(f) accesstime(r,s)) accesstime(r,s) = size(r) * bandwidth(s,site(r)) Queue Access Cost - access cost of all files for all jobs in the queue is calculated. Site with minimum total time is selected

Replication Algorithms (1) Least Frequently Used (LFU) " Replica selection: choose replica with min. network transfer time to the job's execution site selectreplica(f,s) = argmin r in repl(f) accesstime(r,s) NB: The selected replica for file f can be different from the best replica at scheduling time " Dynamic replica optimization Files are always replicated to the local SE of the running job If storage space is full, file replacement according to LFU with time window in the past Replication Algorithms (2) Economy-based algorithms " Replica selection: auction mechanism for selecting best replicas " Dynamic replica optimization: Always replicate if there is space on the local SE If no, use prediction functions to estimate future value of selected and local replicas and decide if it is worth replicating/deleting If no local replication, access the best replica remotely

P2P Structure of Replica Optimizer Access Mediator (AM) - contacts other replica optimizers to locate the cheapest copies of files for the Computing Element Storage Broker (SB) - manages files stored in storage element, trying to maximize profit for the finite amount of storage space available P2P Mediator (P2PM) - establishes and maintains P2P communication between grid sites Auction Protocol for Replica Selection!We need a mechanism to fix the price of a file sold by a SB to an AM (or another SB) that guarantees: Low price for purchaser Trading fairness Minimal messaging / fast as possible!we use a Vickrey auction (one-round sealed bid auction): Every potential seller makes an offer (lower than or equal to the proposed price)!

Economic Model: Prediction Function! A SB rationally decides to replicate file f (and possibly to delete another file f in storage) if this increases its cumulative profit over time.! Ascribe values to files based on prediction function.! Assumes files close in file space more likely to be requested close together in time.!the prediction function returns the most probable number of times a file will be requested within a time window W in the future based on the requests (for that or similar files) within a time window W in the past!we have experimented 2 prediction functions, i.e.:!binomial distance between file requests in the history has binomial distribution!zipf-like file popularity has an inverse power law distribution Performance Metrics!Mean Job Execution Time: total_job_execution_time / N jobs!effective Network Usage: enu = (N remote_file_accesses + N file_replications ) / N local_file_accesses!se Usage:!Percentage of storage used during simulation!ce Usage:!Percentage of CPU power used during simulation

Simulation Set-Up! Use CMS Testbed!20 sites (Europe + US)!6 countries!take into account background network traffic!physics analysis jobs based on real CDF analysis jobs!total file size 97 GB!SEs sizes @ CERN and FNAL 100 GB, all other sites 50 GB!Initially all master copies are @ CERN and FNAL Access Patterns (per job) Sequential access pattern Access pattern following a Zipf-like distribution

Job Mean Time & CE Usage Sequential Access Pattern (1K Jobs) Fig 3b) The Queue Access Cost scheduler gives the best balance between placing jobs close to the data while not overloading sites or leaving them idle Job Mean Time & CE Usage Zipf Access Pattern (1K Jobs) Again: the Queue Access Cost scheduler gives the best balance between placing jobs close to the data while not overloading sites or leaving them idle.

SE Usage: Queue Access Cost vs. Access Cost for Various Opt. Strategies LFU Eco (Binomial) Eco (Zipf) Queue Access Cost scheduler shows best SE usage over the simulation run Mean Job Time & Effective Network Usage for Different Number of Jobs Queue access cost scheduling and sequential access pattern For the CMS testbed, scalability tests show the economic models improving more than LFU as more jobs are added to the Grid

Mean Available Network Bandwidth "Measurements of actual available bandwidth between various sites "Iperf 1 data gathered from e-science monitoring pages 2, the GridNM 3 monitoring service, and SLAC 4 "~10 90% of bandwidth available, depending on link 1 http://dast.nlanr.net/projects/iperf/ 2 http://gridmon.ucs.ed.ac.uk/gridmon/ 3 http://www.hep.ucl.ac.uk/~ytl/monitoring/gridnm/gridnmclient.html 4 http://www.slac.stanford.edu/comp/net/bandwidthtests/antonia/html/slac_wan_bw_tests.html Available bandwidth (Mbits/sec) per day, averaged over up to 3 months. Effects of Network Traffic Large increase of simulation time with network traffic switched on. LFU + Eco (binomial) show increased effective network usage. Eco (Zipf) more stable to fluctuations

Conclusions from experimentation!the economic models generally make more efficient use of Grid resources than traditional algorithms such as LFU!In particular situations the Economic models are considerably faster than LFU and improve over the runtime of the simulation Contribution of our research!development of OptorSim, a Data Grid simulator! It has been used by several reseachers as reference Data Grid environment to be used for realistic experimentation! Definition and evaluation of strategies for Grid Optimization! Strategies have been partially integrated in the Replica Optimization service delivered by EDG WP2

Future work!optorsim's grid model is rather simplistic!add simulation of CE internals!add simulation of unreliable resources (network, CEs, SEs)!There's no enough real economy in our economic models!embed more sophisticated economy based mechanism for resource allocation!work in this direction done in the CATNETS project http://www.catnets.uni-bayreuth.de/!ideas from audience? Thank you for your attention!