The Grid Vision (Foster and Kesselman)



Similar documents
Network monitoring in DataGRID project

EDG Project: Database Management Services

An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications

A SIMULATION STUDY FOR T0/T1 DATA REPLICATION AND PRODUCTION ACTIVITIES. Iosif C. Legrand *

Analysis and selection of the Simulation Environment

Scheduling and Load Balancing in the Parallel ROOT Facility (PROOF)

Zini, Floriano; Giulioni, Gianfranco; Reinicke, Michael; Streitberger, Werner; Eymann, Torsten

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

Using Peer to Peer Dynamic Querying in Grid Information Services

Dynamic Load Balancing Strategy for Grid Computing

LCMON Network Traffic Analysis

Simulation-based Evaluation of an Intercloud Service Broker

Windows Server Performance Monitoring

LOAD BALANCING STRATEGY BASED ON CLOUD PARTITIONING CONCEPT

State of the German Market - Year 2006

IMPROVEMENT OF RESPONSE TIME OF LOAD BALANCING ALGORITHM IN CLOUD ENVIROMENT

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

GRIDCENTRIC VMS TECHNOLOGY VDI PERFORMANCE STUDY

An objective comparison test of workload management systems

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Dynamic Pricing for Usage of Cloud Resource

Analysis of Service Broker Policies in Cloud Analyst Framework

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

A REPORT ON ANALYSIS OF OSPF ROUTING PROTOCOL NORTH CAROLINA STATE UNIVERSITY

CHAPTER 7 SUMMARY AND CONCLUSION

A Load Balanced PC-Cluster for Video-On-Demand Server Systems

Web Server Software Architectures

LOAD BALANCING IN CLOUD COMPUTING USING PARTITIONING METHOD

1. Simulation of load balancing in a cloud computing environment using OMNET

Based on the Correlation of the File Dynamic Replication Strategy in Multi-Tier Data Grid

Load Balancing in Distributed Web Server Systems With Partial Document Replication

Stability of QOS. Avinash Varadarajan, Subhransu Maji

Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems

The EU DataGrid Data Management

Digital libraries of the future and the role of libraries

SOFTWARE PERFORMANCE TESTING SERVICE

An Emulation Study on PCE with Survivability: Protocol Extensions and Implementation

The glite File Transfer Service

Informatica Data Director Performance

Agility Database Scalability Testing

Shoal: IaaS Cloud Cache Publisher

Cost-optimized, Policy-based Data Management in Cloud Environments

Content Delivery Network (CDN) and P2P Model

MEASURING PERFORMANCE OF DYNAMIC LOAD BALANCING ALGORITHMS IN DISTRIBUTED COMPUTING APPLICATIONS

TeraPaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

Figure 1. The cloud scales: Amazon EC2 growth [2].

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Application Performance Testing Basics

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

In-Memory BigData. Summer 2012, Technology Overview

Scalable stochastic tracing of distributed data management events

On demand synchronization and load distribution for database grid-based Web applications

On the Cost of Reliability in Large Data Grids

DDS-Enabled Cloud Management Support for Fast Task Offloading

How To Balance In Cloud Computing

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Performance Evaluation of a QoS-Aware Handover Mechanism

How To Understand The Power Of Icdn

perfsonar Multi-Domain Monitoring Service Deployment and Support: The LHC-OPN Use Case

CFS: A New Dynamic Replication Strategy for Data Grids

Network Tomography and Internet Traffic Matrices

EMC CENTERA VIRTUAL ARCHIVE

Ensuring Collective Availability in Volatile Resource Pools via Forecasting

Energy Efficient Load Balancing among Heterogeneous Nodes of Wireless Sensor Network

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Developing Scalable Java Applications with Cacheonix

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

SLA BASED SERVICE BROKERING IN INTERCLOUD ENVIRONMENTS

@IJMTER-2015, All rights Reserved 355

Public Cloud Partition Balancing and the Game Theory

SIP Server Overload Control: Design and Evaluation

On the effect of forwarding table size on SDN network utilization

An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN

Application. Performance Testing

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

Introduction to LAN/WAN. Network Layer

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 1, March, 2013 ISSN:

2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Planning Domain Controller Capacity

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April ISSN

CDBMS Physical Layer issue: Load Balancing

2 Prof, Dept of CSE, Institute of Aeronautical Engineering, Hyderabad, Andhrapradesh, India,

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March ISSN

Dynamic Resource Pricing on Federated Clouds

Characterizing Performance of Enterprise Pipeline SCADA Systems

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

Forecasting and Planning a Multi-Skilled Workforce: What You Need To Know

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

International Journal of Engineering Research & Management Technology

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

The Load Balancing Strategy to Improve the Efficiency in the Public Cloud Environment

CHAPTER 6 MAJOR RESULTS AND CONCLUSIONS

Transcription:

Presentation context!this research has been conducted in the framework of the european DataGRID project (http://www.edg.org)!cooperation between ITC-irst (now Fondazione Bruno Kessler) University of Glasgow CERN involved in WP2-Data Management The Grid Vision (Foster and Kesselman)

Example of HEP data analysis: CMS Testbed!20 sites (Europe + US)!6 countries!initially all file master copies are @ CERN and FNAL!Physicists execute jobs to perform data analysis!a job is a set of data files to be analysed

Assumptions! Data management: large amounts of data at distributed sites! Data is read-only! Replication is required between Storage Elements (SEs)! Need for storage and transfer optimization SE SE Our focus: Data Grid Optimization There are 3 stages in the lifetime of a job where optimisation occurs:!scheduling find the best site to run my job!replica Selection find the best replica for my running job (short term optimisation)!dynamic Replica Optimisation make sure replicas are in the best position for possible future jobs (long term optimisation, depends on collected access patterns)

Contribution of our research! Development of OptorSim, a Data Grid simulator! Definition of strategies for Grid Optimization scheduling algorithms for Grid jobs economy-based algorithms for replica selection and dynamic replica optimization! Definition of evaluation metrics! Evaluation and comparison of algorithms using simulation OptorSim! OptorSim is a Grid simulator written in Java to model the behaviour of replica optimisation algorithms! It mimics the DataGRID environment by simulating the execution of experiments that require distributed data!input Scenario!Grid topology, computational and data resources!set of jobs to be executed!optimisation strategy! It allows testing and comparison of optimisation algorithms in various Grid scenarios http://sourceforge.net/projects/optorsim

Simplified DataGrid Architecture Implemented in OptorSim Replica Replica Dynamic selection Dynamic replica replica selection optimization optimization Job Job submission submission Scheduling Scheduling Job Job execution execution File File storage storage Scheduling Algorithms Random - Site for job execution is randomly selected by the storage broker schedule(j) = random(s) Shortest Queue - Site having min-length job queue of is selected schedule(j) = argmin s in S jobqueue(s) Access Cost - access time of all files for the current job is calculated. Site with minimum total time is selected schedule(j) = argmin s in S accesscost(j,s) accesscost(j,s) =! f in J accesstime(f,s) = =! f in J (min r in repl(f) accesstime(r,s)) accesstime(r,s) = size(r) * bandwidth(s,site(r)) Queue Access Cost - access cost of all files for all jobs in the queue is calculated. Site with minimum total time is selected

Replication Algorithms (1) Least Frequently Used (LFU) " Replica selection: choose replica with min. network transfer time to the job's execution site selectreplica(f,s) = argmin r in repl(f) accesstime(r,s) NB: The selected replica for file f can be different from the best replica at scheduling time " Dynamic replica optimization Files are always replicated to the local SE of the running job If storage space is full, file replacement according to LFU with time window in the past Replication Algorithms (2) Economy-based algorithms " Replica selection: auction mechanism for selecting best replicas " Dynamic replica optimization: Always replicate if there is space on the local SE If no, use prediction functions to estimate future value of selected and local replicas and decide if it is worth replicating/deleting If no local replication, access the best replica remotely

P2P Structure of Replica Optimizer Access Mediator (AM) - contacts other replica optimizers to locate the cheapest copies of files for the Computing Element Storage Broker (SB) - manages files stored in storage element, trying to maximize profit for the finite amount of storage space available P2P Mediator (P2PM) - establishes and maintains P2P communication between grid sites Auction Protocol for Replica Selection!We need a mechanism to fix the price of a file sold by a SB to an AM (or another SB) that guarantees: Low price for purchaser Trading fairness Minimal messaging / fast as possible!we use a Vickrey auction (one-round sealed bid auction): Every potential seller makes an offer (lower than or equal to the proposed price)!

Economic Model: Prediction Function! A SB rationally decides to replicate file f (and possibly to delete another file f in storage) if this increases its cumulative profit over time.! Ascribe values to files based on prediction function.! Assumes files close in file space more likely to be requested close together in time.!the prediction function returns the most probable number of times a file will be requested within a time window W in the future based on the requests (for that or similar files) within a time window W in the past!we have experimented 2 prediction functions, i.e.:!binomial distance between file requests in the history has binomial distribution!zipf-like file popularity has an inverse power law distribution Performance Metrics!Mean Job Execution Time: total_job_execution_time / N jobs!effective Network Usage: enu = (N remote_file_accesses + N file_replications ) / N local_file_accesses!se Usage:!Percentage of storage used during simulation!ce Usage:!Percentage of CPU power used during simulation

Simulation Set-Up! Use CMS Testbed!20 sites (Europe + US)!6 countries!take into account background network traffic!physics analysis jobs based on real CDF analysis jobs!total file size 97 GB!SEs sizes @ CERN and FNAL 100 GB, all other sites 50 GB!Initially all master copies are @ CERN and FNAL Access Patterns (per job) Sequential access pattern Access pattern following a Zipf-like distribution

Job Mean Time & CE Usage Sequential Access Pattern (1K Jobs) Fig 3b) The Queue Access Cost scheduler gives the best balance between placing jobs close to the data while not overloading sites or leaving them idle Job Mean Time & CE Usage Zipf Access Pattern (1K Jobs) Again: the Queue Access Cost scheduler gives the best balance between placing jobs close to the data while not overloading sites or leaving them idle.

SE Usage: Queue Access Cost vs. Access Cost for Various Opt. Strategies LFU Eco (Binomial) Eco (Zipf) Queue Access Cost scheduler shows best SE usage over the simulation run Mean Job Time & Effective Network Usage for Different Number of Jobs Queue access cost scheduling and sequential access pattern For the CMS testbed, scalability tests show the economic models improving more than LFU as more jobs are added to the Grid

Mean Available Network Bandwidth "Measurements of actual available bandwidth between various sites "Iperf 1 data gathered from e-science monitoring pages 2, the GridNM 3 monitoring service, and SLAC 4 "~10 90% of bandwidth available, depending on link 1 http://dast.nlanr.net/projects/iperf/ 2 http://gridmon.ucs.ed.ac.uk/gridmon/ 3 http://www.hep.ucl.ac.uk/~ytl/monitoring/gridnm/gridnmclient.html 4 http://www.slac.stanford.edu/comp/net/bandwidthtests/antonia/html/slac_wan_bw_tests.html Available bandwidth (Mbits/sec) per day, averaged over up to 3 months. Effects of Network Traffic Large increase of simulation time with network traffic switched on. LFU + Eco (binomial) show increased effective network usage. Eco (Zipf) more stable to fluctuations

Conclusions from experimentation!the economic models generally make more efficient use of Grid resources than traditional algorithms such as LFU!In particular situations the Economic models are considerably faster than LFU and improve over the runtime of the simulation Contribution of our research!development of OptorSim, a Data Grid simulator! It has been used by several reseachers as reference Data Grid environment to be used for realistic experimentation! Definition and evaluation of strategies for Grid Optimization! Strategies have been partially integrated in the Replica Optimization service delivered by EDG WP2

Future work!optorsim's grid model is rather simplistic!add simulation of CE internals!add simulation of unreliable resources (network, CEs, SEs)!There's no enough real economy in our economic models!embed more sophisticated economy based mechanism for resource allocation!work in this direction done in the CATNETS project http://www.catnets.uni-bayreuth.de/!ideas from audience? Thank you for your attention!