A GRID BASED VIRTUAL REACTOR: PARALLEL PERFORMANCE AND ADAPTIVE LOAD BALANCING



Similar documents
Speech Quality Measurement Methods with Applying PLC Algorithms on Real-time Transmission Control Scheme for VoIP Service

Use of Multi-attribute Utility Functions in Evaluating Security Systems

An Alternative Way to Measure Private Equity Performance

Peer-to-peer systems have attracted considerable attention

Modern Problem Solving Techniques in Engineering with POLYMATH, Excel and MATLAB. Introduction

Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

Figure 1. Inventory Level vs. Time - EOQ Problem

A Statistical Perspective on Data Mining

Applied Research Laboratory. Decision Theory and Receiver Design

A Comprehensive Analysis of Bandwidth Request Mechanisms in IEEE Networks

DEFINING %COMPLETE IN MICROSOFT PROJECT

Optimal Adaptive Voice Smoother with Lagrangian Multiplier Method for VoIP Service

DECOMPOSITION ALGORITHM FOR OPTIMAL SECURITY-CONSTRAINED POWER SCHEDULING

PERFORMANCE ANALYSIS OF PARALLEL ALGORITHMS

Economy-based Content Replication for Peering Content Delivery Networks

Series Solutions of ODEs 2 the Frobenius method. The basic idea of the Frobenius method is to look for solutions of the form 3

CONSIDER a connected network of n nodes that all wish

Load Balancing of Parallelized Information Filters

When can bundling help adoption of network technologies or services?

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Evaluation of the information servicing in a distributed learning environment by using monitoring and stochastic modeling

24. Impact of Piracy on Innovation at Software Firms and Implications for Piracy Policy

Chapter 3: Dual-bandwidth Data Path and BOCP Design

Calculation of Sampling Weights


Portfolio Loss Distribution

JCM_VN_AM003_ver01.0 Sectoral scope: 03

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Efficient Computation of Optimal, Physically Valid Motion

Calculating the high frequency transmission line parameters of power cables

Low Voltage Energy Harvesting by an Efficient AC-DC Step-Up Converter

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

A Prediction System Based on Fuzzy Logic

Analysis and Modeling of Buck Converter in Discontinuous-Output-Inductor-Current Mode Operation *

Chosen Public Key and Ciphertext Secure Proxy Re-encryption Schemes

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Study on Secure Data Storage Strategy in Cloud Computing

A NEW ACTIVE QUEUE MANAGEMENT ALGORITHM BASED ON NEURAL NETWORKS PI. M. Yaghoubi Waskasi M. J. Yazdanpanah

Energy-based Design of Steel Structures According to the Predefined Interstory Drift Ratio 1

Forecasting the Direction and Strength of Stock Market Movement

Behavior Coordination in E-commerce Supply Chains

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Interest-Oriented Network Evolution Mechanism for Online Communities

A Ratio-Based Control Algorithm for Defense of DDoS Attacks

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Modeling and Prediction of Pedestrian Behavior based on the Sub-goal Concept

IMMPDA Vehicle Tracking System using Asynchronous Sensor Fusion of Radar and Vision


Research Article Competition and Integration in Closed-Loop Supply Chain Network with Variational Inequality

An Analytical Model for Multi-tier Internet Services and Its Applications

Financial Mathemetics

INTELLIGENCE IN SWITCHED AND PACKET NETWORKS

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Optimal maintenance of a production-inventory system with continuous repair times and idle periods

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

J. Parallel Distrib. Comput.

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Realistic Image Synthesis

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Forschung zur Entwicklungsökonomie und -politik Research in Development Economics and Policy

An MILP model for planning of batch plants operating in a campaign-mode

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

A DATA MINING APPLICATION IN A STUDENT DATABASE

RequIn, a tool for fast web traffic inference

A Secure Password-Authenticated Key Agreement Using Smart Cards

BPMSG AHP Excel Template with multiple Inputs

Activity Scheduling for Cost-Time Investment Optimization in Project Management

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

A Structure Preserving Database Encryption Scheme

Real-Time Traffic Signal Intelligent Control with Transit-Priority

Multi-class kernel logistic regression: a fixed-size implementation

Project Networks With Mixed-Time Constraints

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Extending Probabilistic Dynamic Epistemic Logic

The Load Balancing of Database Allocation in the Cloud

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

Enabling P2P One-view Multi-party Video Conferencing

Traffic State Estimation in the Traffic Management Center of Berlin

Social Network Analysis Based on BSP Clustering Algorithm

Transcription:

A GRID BASED VIRTUAL REACTOR: PARALLEL PERFORMANCE AND ADAPTIVE LOAD BALANCING Vladmr V. Korkhov 1,2, Valera V. Krzhzhanovskaya 1,2 and P.M.A. Sloot 1 {vkorkhov valera sloot}@sene.uva.nl 1 Unversty of Amsterdam, Seton Comutatonal Sene, the Netherlands 2 St. Petersburg State Polytehn Unversty, Russa Abstrat. Ths aer addresses the roblem of ortng dstrbuted arallel alatons to the Grd. One of the hallenges we address s the hange from stat homogeneous luster envronments to dynam heterogeneous Grd resoures. We ntrodue a gener tehnque for adatve load balanng of arallel alatons on heterogeneous resoures and evaluate t usng a ase study alaton: a Vrtual Reator for smulaton of lasma hemal vaour deoston. Ths alaton has a modular arhteture wth a number of loosely ouled omonents sutable for dstrbuton over the Grd. It requres large arameter sae exloraton that allows usng Grd resoures for hghthroughut omutng. The Vrtual Reator ontans a number of arallel solvers orgnally desgned for homogeneous omuter lusters that needed adataton to the heterogenety of the Grd resoures. In ths aer we study the erformane of one of the ore arallel solvers on the Grd, aly the develoed tehnque for adatve load balanng to the solver, evaluate the effeny of ths aroah and outlne an automated roedure for otmal utlzaton of heterogeneous Grd resoures for hgh-erformane arallel omutng. Keywords: Grd, adatve load balanng, heterogeneous resoures, benhmarkng, Vrtual Reator, PECVD 1 Introduton Portng omlex dstrbuted alatons to the Grd oses a grand hallenge to the omuter and omutatonal senes, mostly due to the dynamal and deentralzed nature of the Grd. Involvng the arallel omutatonal solvers further omlates the roblem beause of a severe heterogenety of Grd resoures haraterzed by a wde range of roessors and network ommunatons erformane. Lately, the sentf ommunty has been nvestng lots of efforts nto develoment of Grd-aware roblem solvng envronments for omlex alatons [4,5]. The mortane of fully ntegrated smulators (e.g. Vrtual Reators) s reognzed by varous researh grous and sentf software omanes [1]. The Vrtual Reator used here as a test ase was develoed for smulaton of lasma enhaned hemal vaour deoston (PECVD) reators, a multhyss roblem sannng a wde range of satal and temoral sales [2,3]. Smulaton of three-dmensonal flow wth hemal reatons and lasma dsharge n omlex geometres s one of the most hallengng and demandng roblems n omutatonal sene and engneerng, requrng both hgherformane and hgh-throughut omutng. Grd omutng tehnologes oened u new oortuntes to aess vrtually unlmted omutatonal resoures, and nsred many researhers to work on adataton of arallel methods and to develo new mehansms for dstrbuted alatons on the Grd. The PECVD Vrtual Reator dsussed n ths aer has also been on ts way to the Grd [2]. It serves as a test-ase drvng and valdatng the develoment of the Russan-Duth omutatonal Grd (RDG) for dstrbuted hgh erformane smulaton [6]. The Vrtual Reator s artularly sutable for ortng to a Grd envronment sne t an be deomosed nto a number of funtonal omonents (serves). In addton to that, ths alaton requres large arameter sae exloraton, whh an be effently organzed on the Grd. Tools to suort dstrbuted arametr modellng on the Grd are beng develoed, n artular the Nmrod-G mddleware [7] whh s used n ths rojet. Current work on ortng the Vrtual Reator to the Grd started wthn the framework of the CrossGrd EU rojet [5] and the Vrtual Laboratory for e-sene [8]. Some results of these efforts were reorted n 1

[2]. The RDG Grd s the suessor of the CrossGrd n a sense that t uses many of the CrossGrd nfrastruture serves and oerates as a testbed for the Vrtual Reator alaton. The fnal Grd-based Vrtual Reator roblem solvng envronment ams at beng a ollaboratve system, a dstrbuted sentf workbenh wth advaned nteraton and vsualzaton faltes. In ths aer we address the ssue of ortng an exstng omlex roblem-solvng envronment (PSE) of the Vrtual Reator from homogeneous luster envronment to heterogeneous dynam Grd resoures. The Russan-Duth Grd rovdes a strong hardware bakground for ths researh as t ontans stes wth both homogeneous and heterogeneous omutng and networkng resoures. To buld a Grd-enabled PSE based on a modular alaton, a roer funtonal deomoston of modules s requred. To assure that the omonents, eseally omutatonal modules, are dstrbuted effently t s neessary to arry out erformane evaluaton of the ndvdual modules on Grd resoures and draw gener onlusons on ther behavour: how salable they are deendng on nut data and resoures used, what s the ossble ahevable seedu, how nfrastruture roertes nfluene the alaton erformane, et. A ountless number of arallel alatons have been develoed for tradtonal (.e. stat homogeneous) arallel systems. Portng suh alatons from homogeneous omutng envronments to dynam heterogeneous omutng and networkng resoures oses a hallenge to kee u a hgh level of alaton effeny. To assure effent utlzaton of Grd resoures, seal methods for workload dstrbuton ontrol should be aled. Proer workload otmzaton methods should take nto aount two asets: (1) the alaton haratersts (e.g. the amount of data transferred between the roesses, amount of floatng ont oeratons and memory onsumton) and (2) the resoure haratersts (e.g. roessors, network and memory aates, as well as the level of heterogenety of the dynamally assgned resoures). The method should be omutatonally nexensve not to ndue a large overhead on alaton erformane. In ths aer we resent suh a method and evaluate t usng one of the ore arallel solvers of the Vrtual Reator. The ssue of load balanng n a Grd envronment s addressed by a number of researh grous. Generally studes on load balanng onsder dstrbuton of roesses to omutatonal resoures on the system/lbrary level wth no modfatons n the alaton ode [14,15]. Less often, load balanng ode s nluded nto the alaton soureode to mrove erformane n sef ases [16,17]. Some researh rojets onern load balanng tehnques that use soure ode transformatons to mrove the exeuton of the alaton [13]. We emloy an alaton-entr aroah where the balanng desons are taken by the alaton tself. The algorthm that estmates the avalable resoures and suggests the otmal load balanng of a arallel job s gener and an be emloyed n any arallel alaton to be exeuted on heterogeneous resoures. A detaled desrton of global load otmzaton aroahes for heterogeneous resoures and adatve mesh refnement alatons s gven n [23,24,25]. However, n [23] and [25] no network lnks heterogenety was onsdered and only stat resoure estmaton (ntalzaton) was erformed n [23] and [24]. These two ssues are the major hallenges of Grd omutng: 1) the heterogenety of the network lnks an be an order of magntude hgher that that of the roessng ower; and 2) Grd resoures are nherently dynam. Develong our algorthm, we tred to address sefally these two ssues. The aroahes dsussed n [23] and [25] are only vald for bath sequental alatons (sefally for the queung systems and omuter luster shedulers), whereas our effort s dreted towards arallel rograms utlzng heterogeneous resoures. The aer s organzed as follows: Seton 2 gves the desrton of the roosed algorthm for adatve workload balanng on heterogeneous resoures. Seton 3 outlnes the arhteture of the Vrtual Reator alaton and the Russan-Duth Grd testbed nfrastruture. Seton 4 demonstrates the results of testng the behavour of one of the arallel solvers on the RDG homogeneous stes. In Seton 5 we show the results of alyng the load balanng tehnque to our ase study alaton. Seton 6 draws onlusons from our work and resents some future researh dretons. 2

2 Adatve load balanng on heterogeneous resoures One of the fators that determne the erformane of arallel alatons on heterogeneous resoures s the qualty of the workload dstrbuton, e.g. through funtonal deomoston or doman deomoston. Otmal load dstrbuton s haraterzed by two thngs: (1) all roessors have a workload roortonal to ther omutatonal aaty; (2) ommunatons between the roessors are mnmzed. These goals are onfltng sne the ommunaton s mnmzed when all the workload s roessed by a sngle roessor and no ommunaton takes lae, and dstrbutng the workload nevtably nurs ommunaton overheads. Thus t s needed to fnd a trade-off and defne a metr that haraterzes the qualty of workload dstrbuton for a arallel roblem. One of the exstng methods to measure t s to ntrodue a ost funton refletng the alaton exeuton tme. Mnmzaton of ths funton orresonds to mnmzaton of the alaton runtme. The funton should be smle and ndeendent of the detals of the ode. The gener form of suh a ost funton s [20,21,22]: H = + β, (1) H al H omm where H al s mnmzed when the workload dstrbuton among the roessors s roortonal to the roessors aaty (or equal n ase of homogeneous roessors); H omm s mnmzed when the ommunaton tme s mnmal; and β s a arameter that an be vared n order to tune the balane between the alulaton and ommunaton terms. Ths arameter s deendent on the haratersts of both the alaton requrements and the resoures aabltes. The man gener arameters that defne a arallel alaton erformane are: An alaton arameter f ~ N omm N al ( N omm and N al are the amounts of alaton ommunatons and omutatons resetvely); A resoure arameter µ ~ t omm tal ( t omm s a tyal tme taken to ommunate a sngle word between the roessors, t al - tyal tme requred to erform a gener floatng ont alulaton). The rodut of these two arameters f µ s often alled the fratonal ommunaton overhead [20]. The goal of load balanng s to mnmze the ost funton (1). The arameter β n ths exresson s an aggregated value based on the alaton and resoure sef arameters f and µ. Knowledge of these alaton and resoure roertes allows onstrutng an arorate form of arameterβ and erformng subotmal load dstrbuton [22]. However n most real-lfe omlex smulaton roblems, t s not ossble to theoretally alulate the alaton sef arameter f wth a reasonable reson. Even a detaled analyss of the algorthms and odes an fal n many ratal ases when the ode has multle logal swthes and omletely dfferent algorthms and omutatonal shemes are used whle solvng a roblem, deendng on the ntal ondtons and omutatonal arameters. Estmaton of the resoure-sef arameter µ also oses a hallenge on heterogeneous Grd resoures, sne there s a multtude of roessors wth the rato of ommunaton to omutaton erformane sannng a few orders of magntude. Moreover, the Grd exhbts dynam network and roessor erformane, therefore stat doman deomoston fals to rovde realst estmatons and onsequently the otmal load dstrbuton, To ensure effent load balanng of a arallel alaton on the Grd, t s neessary to estmate the β arameter exermentally.there are two ossble aroahes to that: (1) dretly measure the lumed value of β for the alaton on the alloated resoures and (2) searately benhmark the resoures, estmate µ and then fnd out the alaton-sef arameter f that would rovde an otmal workload dstrbuton on a gven set of resoures. The frst aroah requres serous ntruson nto the alaton ode. Ths s ertanly not desrable, eseally when targetng to buld a gener load balanng system whh tres to abstrat from the alaton sef ssues. Thus we have hosen the seond aroah whh s more gener and requres mnmal modfatons n the alaton ode. We have develoed a meta-algorthm for adatve load balanng on heterogeneous resoures based on benhmarkng the avalable resoures aaty (defned as a set of ndvdual resoure arameters µ = { µ }) and exermental estmaton of the alaton arameter f. The algorthm ensures effent load dstrbuton, thus mnmzng the alaton exeuton tme. The ost funton n our ase s the exermentally measured exeuton tme, whh deends 3

on the dstrbuton of the workload between the artatng roessors. The target s to exermentally determne the value of f that rovdes the best workload dstrbuton,.e. mnmal runtme of the alaton maed to the resoures haraterzed by arameter set µ. The outlne of the load balanng meta-algorthm s as follows: 1. Benhmark the resoures dynamally assgned to a arallel alaton; measure the resoure haratersts that onsttute the set of resoure arameters µ (avalable roessors ower, memory and lnks bandwdth). f. The mnmal value s f 0 2. Estmate the range of ossble values of the alaton arameter, whh orresonds to the ase when no ommunatons our between the arallel roesses of the alaton. The mn = maxmal value an be alulated based on the followng reasonng: For the arallel roessng to make sense, that s to ensure that runnng a arallel rogram on several roessors s faster than sequental exeuton, the alulaton tme should obvously exeed ommunaton tme. For homogeneous resoures ths an be exressed as followng: T T omm al N ommtomm < 1 < 1 f N t al al max = 1/ µ Analogously, for heterogeneous resoures the uer lmt an be found as: max f = max( t al ) / mn( t omm ) 3. Run through the range of ossble values of f wth a dsrete ste. For eah value of f alulate the orresondng load dstrbuton based on the resoure arameters µ determned n ste 1 (detals on alulatng the load dstrbuton weghts wll follow ths algorthm). Wth ths dstrbuton erform one or a few tme stes/teratons, and measure the exeuton tme. Proeed wth the next value of f for the subsequent teratons, assurng that the smulaton ontnues wthout delays, wth a modfed load dstrbuton, 4. Analyze erformane results for dfferent values of f ; fnd the otmal value f, whh rovdes the best erformane of the alaton (.e. mnmal exeuton tme). 5. Exeute further alulatons usng the dsovered f. 6. In ase of dynam resoures where erformane s nfluened by other fators (whh s generally the ase on the Grd), a erod re-estmaton of resoure arameters µ and load re-dstrbuton shall be erformed. 7. If the alaton s dynamally hangng (for nstane due to adatve meshes, movng nterfaes or dfferent ombnatons of hysal roesses modelled at dfferent smulaton stages) then f must be erodally reestmated on the same set of resoures. Perod re-estmatons n stes 6 and 7 shall be erformed frequently durng the run-tme of the alaton to orret the load mbalane wth a reasonably short delay. The mnmally requred frequeny of re-balanng an be estmated by alulatng the relatve mbalane ntrodued durng the ontrolled erod of tme (the number of tme stes/teratons). The ombnaton of µ and f determnes the dstrbuton of the workload between the roessors. To alulate the amount of the workload er roessor, we assgn a weght-fator to eah roessor aordng to ts roessng ower, memory and network onneton. A smlar aroah was aled n [12] and n [17] for heterogeneous omuter lusters, but the mehansm for adatve alulaton of the weghts and alaton requrements was not develoed there. Moreover, the tools develoed for luster systems an not be used n Grd envronments wthout modfatons sne stat resoure benhmarkng s not sutable for dynam Grd resoures, where the weghts shall be alulated every tme the solver s started on a new set of dynamally assgned roessors. Let us assume that for the th roessor: s the avalable roessor erformane (e.g. n Flo/s), m s the avalable memory (n MB) and n - avalable network bandwdth to the roessor (n MB/s). An ndvdual resoure arameter µ then an be reresented usng the values of, m, n. In a smle ase when memory s onsdered only a onstranng fator (and not drvng the load balanng roess) t s µ. Ths resoure arameter s wdely used n sentf alatons where the most mortant fator s the rato of the omutatonal ower to the network bandwdth. In a more general ase, two arameters shall be onsdered, µ and m. And for the memory-drven = n 4

alatons, the rato of the avalable memory to the network aaty of that roessor role n resoure evaluaton. To reflet the roessor aaty, we ntrodue a weghtng fator fnal workload for a roessor gven by:, where W s the total workload. W = w W To determne the weghtng fators we ntrodue arameters, m and m n should lay the major w for eah roessor. It determnes the n that reflet omutatonal, memory and ommunaton requrements of the alaton. Then the weght of eah roessor s estmated usng the followng exresson: w = + m + n ; w =1. (2) Ths weghtng fator arameter,, ) m n w reflets a relatve aaty of the resoures aordng to the estmated nfrastruture µ = µ ( m n and the alaton arameter f. The nfrastruture arameters µ an be determned by a set of benhmark runs before the atual alulatons start (but after the resoures have been assgned to the alaton). Searhng through f wth fxed values of µ gves us the otmal value f whh orresonds to the otmal mang of the workload to the resoures. The arameters, and deend not only on the alaton haratersts but also on the m n heterogenety of the resoures. Let us analyse how these arameters and weghtng fators w are related to f and µ. Consder a tradtonal stuaton when memory s only a onstranng fator ( m = 0 ). Then arameters and n shall be roortonal to the amount of alaton ommunatons (omutatons) and the heterogenety fators: ~ ; ~ N. (3) Here N al ro n omm net ro and net are heterogenety metrs of roessors and network lnks. In ase of equal network lnks the weghtng should be done only aordng to the roessors aaty, therefore the network heterogenety arameter s nullfed: net =0. Analogously, for homogeneous roessors ro =0. The heterogenety metrs of the network and omutng resoures an be defned as followng: N N 2 2 ( n navg ) ( avg ) = 1 = 1 net =, 2 ro = 2 Nnavg Navg. and n n eq. (2), the weghts an be re-wrtten as: w ~ N + N n Substtutng exressons (3) for al ro omm net For the trval ases: net = 0 (the network s homogeneous): w ~ N al ro ~ ro = 0 (the roessors are homogeneous): w ~ N omm netn ~ n otherwse w Defnng net ~ N al ro + N omm netn ~ + n f ; w = 1 ro = net ro as an aggregate heterogenety metr of resoures and keeng n mnd that = n w (1 + f µ ~ get: ) Consderng µ ϑ w (1 µ, we = whh ombnes the haratersts of the resoure erformane and heterogenety we get: ϑ ) + (4) ~ f 5

Knowng the fratonal overhead of the alaton and the heterogenety level of the resoures, we an otmze the workload dstrbuton usng ths fast weghtng tehnque. To evaluate the effeny of the workload dstrbuton we ntrodue the load balanng seedu Θ : T Θ = T non balaned 100%, (5) balaned Tnon balaned s the exeuton tme of the arallel alaton wthout the load balanng, and balaned where T s the exeuton tme usng load balanng on the same set of resoures. Ths metr s used to estmate the f that rovdes the best erformane on gven resoures the largest value of Θ n a gven range of f. In a non-trval ase we exet to fnd a maxmum of Θ and thus an otmal f for some workload dstrbuton. Fnte and non-zero value of f means that the alaton requrements ft best the resoures n ths artular workload dstrbuton, whh mnmzes the total run-tme of the alaton. The ase of 0 means that the alaton s totally f whle 0 omutaton domnated.e. there s no ommunaton between dfferent roesses, and the otmal workload dstrbuton wll be roortonal only to the omutatonal ower of the roessors. The ase of = 0 means that we onsder the resoure nfrastruture of heterogeneous roessors onneted by homogeneous network lnks and the value of f does not lay a role the dstrbuton s agan roortonal only to the roessng ower. In the dsusson resented above whle dervng eq. (4), we onsdered a smle ase when memory requrements only ut a Boolean onstrant to the alloaton of roesses on the resoures: ether there s enough memory to run the alaton or not. But t an lay a role n the load balanng roess beng one of the determnng fators of alaton erformane. Ths s the ase for alatons that are able to ontrol memory requrements aordng to the avalable resoures. In ths ase there wll be addtonal arameters analogous to f and µ (or these funtons wll be = more omlex), but the dea and the load balanng mehansm reman the same. net 3 Case study on adatve load balanng: the Vrtual Reator 3.1 The Vrtual Reator overvew and ts mlementaton on the Grd A omlex roblem-solvng envronment usually has a modular arhteture and onssts of a number of loosely or tghtly ouled omonents [9]. Our test ase, the Vrtual Reator, nludes the bas omonents for reator geometry desgn; omutatonal mesh generaton; lasma, flow and hemstry smulaton; edtors of hemal roesses and gas roertes onneted to the orresondng databases; re- and ostroessors, vsualzaton and arhvng modules [2]. The am of our researh s to vrtualze searate modules of the alaton to run them effently as serves and aess them on the Grd. The alaton omonents erform one (or a few) of the followng funtons: roblem desrton, smulaton, vsualzaton and nteraton. Ths s shematally shown n Fg. 1, where we emhasze the smulaton omonents. 6

Fg. 1. Funtonal sheme of the Vrtual Reator alaton The ore omonents are modules smulatng lasma dsharge, gas flow, hemal reatons and flm deoston roesses ourrng n a PECVD reator. The detals on numeral methods and arallel algorthms emloyed n the solvers are desrbed n [10]. The most mortant features relevant to the Grd mlementaton are as follows: for stablty reasons, mlt fnte volume shemes were aled, thus forng us to use a swee-tye algorthm for solvng equatons n every beam of omutatonal ells n eah satal dreton of the Cartesan mesh. A seal arallel algorthm was develoed wth beams dstrbuted among the roessors. Communatons are organzed exlotng a Master-Slave model, where at eah smulated tme ste the Master reares nstrutons for the Slaves, sends them the data to be roessed, reeves the results, and roesses them before roeedng to the next ste. The algorthm was mlemented n an SPMD model, usng the MPI message assng nterfae wth MPI Barrer onts for synhronzaton. Data exhange between the Master and the Slaves s reeated every tme ste, and smulaton roeeds for thousands to mllons of stes. In the testbed we use gener MPICH-P4 bult bnares that an be exeuted on all the testbed mahnes usng the Globus job submsson serve. To study the nfluene of varous arameters on the smulated roesses we run a number of smulatons n arallel (shown n Fg. 1 as Smulaton 1 Smulaton N bloks) wth the assstane of Nmrod-G [7]. To rovde effent exeuton of a arallel alaton on heterogeneous resoures, t s needed to learly understand the alaton erformane deendenes on homogeneous resoures frst. Ths gves an nsght nto the alaton salablty, ndued fratonal overhead, deendenes of the amount of the ommunatons and alulatons on the number of roessors used, et. The results of suh tests an hel estmatng and redtng the behavour of the alaton on heterogeneous resoures, thus smlfyng the adataton roess. 3.2 Russan-Duth Grd testbed nfrastruture Generally the nfrastruture of a ste wthn a Grd testbed an be of one of the followng tyes deendng on the underlyng resoures: I. tradtonal homogeneous omuter luster arhteture: homogeneous worker nodes and unform nteronneton lnks; II. homogeneous worker nodes wth heterogeneous nteronnetons; III. heterogeneous worker nodes wth unform nteronnetons; IV. heterogeneous nodes wth heterogeneous nteronnetons. A omlete Grd nfrastruture s always of the Tye IV, haraterzed by severe heterogenety wth a wde range of roessor and network ommunaton arameters. As we show later n ths aer, the tye of resoures alloated to a 7

arallel alaton sgnfantly nfluenes ts erformane, and dfferent load balanng tehnques shall be aled to dfferent ombnatons of the resoures. Currently the Russan-Duth Grd testbed onssts of sx stes wth dfferent nfrastrutures: Amsterdam-1 (ontans 3 nodes, 4 roessors) Tye IV; Amsterdam-2 (32 nodes, 64 roessors) Tye I; St. Petersburg (4 nodes, 6 roessors) Tye IV; Novosbrsk (4 roessors) Tye II; Mosow-1 (13 nodes, 26 roessors) Tye I; Mosow-2 (12 nodes, 24 roessors) Tye I. The Russan-Duth Grd testbed s bult wth the CrossGrd mddleware [5] based on the LCG-2 dstrbutons and sustans the nteroerablty wth the CrossGrd testbed. More detaled nformaton on the RDG testbed an be found n [6]. The RDG Vrtual Organzaton (VO) s nluded nto the CrossGrd VO, thus allowng the RDG ertfate holders to aess some of the CrossGrd resoures and serves. The CrossGrd testbed onssts of 16 stes wth the nfrastrutures of all 4 tyes. 4 Alaton erformane analyss on homogeneous stes 4.1 Benhmark aroah Benhmarkng of a omlex alaton s requred to evaluate ts erformane and reveal the deendenes of ts behavour on the underlyng nfrastruture. We use a strutural aroah to benhmarkng the Vrtual Reator as an examle of a omlex alaton. Wthn ths aroah, the overall funtonalty of the whole system s studed, followed by erformane measurements of the ndvdual omonents whle they are not nfluened by atvtes of the other omonents. Benhmarkng the omonents of a omlex roblem solvng envronment allows evaluatng ther erformane deendng on varous arameters lke tyes of nut data and the resoures used. Ths hels to redt the erformane of a gven omonent and use t for effent resoure alloaton, thus mrovng the overall resoure management wthn the whole alaton. The earler tests of the Vrtual Reator erformed on the CrossGrd testbed showed that most of the nteratve omonents of the Vrtual Reator do not ut restrtons on the omuter systems and network bandwdth and an be effently exeuted on dstrbuted Grd resoures [2]. Next, we foused on benhmarkng of the smulaton modules. Eah smulaton onssts of two bas omonents: one for lasma smulaton and another for reatve flow smulaton (see Fg. 1). These two omonents exhange only a small amount of data every hundred or thousand tme stes, therefore the network bandwdth s not rtal for ther ommunaton. Fnally, we onentrate on benhmarkng the ndvdual arallel solvers, startng from a 2D PECVD solver whh mantans all the features of the 3D one but takes less tme to estmate the solver behavour on the Grd. 4.2 Benhmark setu The goal of the benhmarkng we arry out s to determne the salablty of the alaton, fnd out the lmtatons on the effeny osed by the alaton arhteture, resoures and tyes of the smulatons. Unoverng suh detals wll allow us to otmze resoure management strategy for alloatng the alaton omonents wthn the whole Vrtual Reator roblem solvng envronment. The solver oerates a reator geometry that s omosed of a number of onneted bloks. Dfferent tyes of smulaton an be erformed wthn a sngle geometry: a hemally natve flow and a flow wth hemal and lasma roesses. Physally the roblem tye s determned by the gas mxture omoston, temeratures, ressures, and the lasma dsharge oeraton mode. From the omutatonal ont of vew these tyes of smulatons dffer by the rato of omutatons to ommunatons: n ase of smulatng hemal roesses the omutatonal load s sgnfantly hgher. 8

We started from a lght-weghted roblem not smulatng the hemal and lasma roesses, wth a smlfed reator geometry onsstng of a sngle blok that allows easy trakng of arameter nfluene on the exeuton tme. To measure the deendeny of the solver erformane uon the nut data, multarameter varaton has been aled. We measured the solver exeuton tme, seedu and ommunaton tme deendng on the ombnatons of nut arameters: the omutatonal mesh sze, number of smulaton tme stes and number of roessors. The benhmark tests had to be automated beause the arameter varaton leads to a large number of job submssons. To solve ths roblem we have bult an exeuton envronment to suort seres of arameter-swee Globus job submssons. The envronment s gener and an be used for any knd of erformane benhmarks wth user-defned metrs and arameters to be analyzed. Wthn ths envronment, the alaton to benhmark s desrbed usng some temlates that are flled wth artular alaton data (e.g. Globus RSL temlate for job submsson whh also ontans the lst of nut and outut fles). One of the funtonaltes of ths exeuton envronment s the suort for arameter-swee runs, analogous to what Nmrod-G or Condor-G rovdes. The advantage of our mlementaton s that we an sefy the arameters (and ther ranges) that shall be hanged, as well as the haratersts to be measured and vsualzed automatally to analyze the nfluene of those arameters. In these tests, a sngle-blok toology was used. The blok was subdvded nto a (nell x nell) number of omutatonal mesh ells, wth nell runnng from 40 to 100, thus formng 1600 10000 ells. We erformed also some tests wth real reator geometres n order to hek whether the reator toology nfluenes the arallel erformane, sne otentally t an ntrodue some load mbalane. 4.3 Influene of the number of tme stes and reator toology Exerments wth a dfferent number of tme stes showed that the exeuton tme and other measured arameters are lnearly roortonal to the number of tme stes, rovded that ths number s hgh enough and the standard outut and hard dsk oeratons are ket mnmal (that means no exessve loggng, nor any storng of the 2D felds or other addtonal fles every tme ste). All the results resented below are measured for 100 tme stes. Along wth the sngle-blok geometry, we studed the erformane of the solver wth a omlex mult-blok PECVD reator toology, whh onssts of an equvalent number of omutatonal mesh ells. The results showed that all the measured haratersts of the solver behavour (exeuton tme, seedu, omutaton and ommunaton tme) on the same resoures do not dffer for the sngle-blok and mult-blok toologes of equal number of ells wthn 1% auray. Ths assures us that the arallel algorthm used n the solver rovdes a good load balanng even n ases of omlex toologes. Further we test the nfluene of the roblem sze (the number of mesh ells) wth the sngle-blok reator geometry, sne t s easer to vary the mesh sze arbtrarly wth a sngle-blok geometry than wth a mult-blok omlex toology. 4.4 Seedu of the hemstry-dsabled and hemstry-enabled smulatons The measurements were arred out on all the Grd stes wthn the RDG testbed. The arallel solver showed a noteable seedu on the Mosow and Amsterdam stes of Tye I (homogeneous luster wth unform ommunaton lnks). Fgures 2 and 3 demonstrate the total exeuton tme and seedu of the arallel solver for dfferent tyes of smulaton: A hemstry-dsabled lght-weghted smulaton (Fg. 2) and a hemstry-enabled heavy smulaton (Fg. 3). 9

Fg. 2. Lght-weghted (no hemstry) smulaton: total exeuton tme and seedu for dfferent omutatonal mesh szes. Fg. 3. Chemstry-enabled smulaton: total exeuton tme and seedu for dfferent omutatonal mesh szes. We observe dfferent trends of the solver erformane: for the lght-weghted smulaton, the seedu dereases wth the nrease of the mesh sze (see the dfferent urves n Fg. 2, rght), whle for the hemstry-enabled smulaton, the seedu nreases wth the roblem sze nrease (Fg. 3). The dfferent absolute values of the seedu n Fg. 2 and 3 are mostly deendent on the resoures: The results resented n Fg. 2 were obtaned on the Mosow-1 ste wth slow nterroessor lnks, and Fg. 3 shows the results of the Amsterdam-2 ste wth fast ommunatons. Dfferent trends n the seedu deendeny on the roblem sze are dsussed and exlaned n detal n Setons 4.6 and 4.7. The same arallel solver tested on homogeneous Grd stes wth a hgher rato of the nter-roess ommunaton bandwdth to the roessor erformane aheved muh hgher seedus, for nstane on lsa.sara.nl wth Infnband nteronnetons t was 3 tmes hgher for the large roblem sze smulatons. The tye of MPI lbrary also nfluenes the arallel effeny of a rogram: a sealzed lbrary otmzed for the natve ommunaton tehnology (e.g. MPICH-GM for Myrnet ommunatons on das2.nkhef.nl) nreases the seedu u to two tmes omared to the gener MPICH-P4 or MPICH-G2. 4.5 Communaton tme trends The tme sent on nter-roess ommunatons wthn the solver s shown n Fg. 4 for dfferent mesh szes. The ommunaton tme was alulated as a sum of MPI Send/MPI Reeve tme on the master node over the total number of teratons. 10

Fg. 4. Deendeny of the ommunaton tme on the omutatonal mesh sze for dfferent number of roessors (lght-weghted smulaton) We observe that ommunaton tme grows suer-lnearly wth the nrease n mesh sze, although the amount of data transferred s lnearly roortonal to the number of mesh ells. The exat understandng of ths behavour s not relevant and falls outsde the soe of ths aer. The fat that the tme sent for sendng the data s not equal to the tme of reevng the data n Fg. 4 an be exlaned by the tye of measurements: both lots reresent the data sent or reeved by the master node only, as a synhronzng roess. As one an see, the sze of the reeved data (whh reresent the results of the alulatons erformed on the slave nodes) s sgnfantly less than the data sent. Some eulartes n the ommunaton tme an be seen n Fg. 4, whh are even more exlt n Fg. 5: (1) The ommunaton tme grows non-monotonally wth the number of roessors, but dros down a lttle on every roessor wth an even number; and (2) The tme of MPI Reeve alls s an order of magntude hgher for the larger meshes on the frst few roessors. These observatons are dsussed n Seton 4.7. Fg. 5. Deendeny of the ommunaton tme on the number of roessors for dfferent omutatonal mesh szes (lght-weghted smulaton) 4.6 Comutaton to ommunaton rato In Fgure 6 the total exeuton tme s resented along wth the ontrbutons of alulaton and ommunaton. For a smaller omutatonal mesh (Fg. 6 left), the ommunaton tme makes a relatvely small ontrbuton to the total exeuton tme even for a large number of roessors nvolved. For a larger mesh (Fg. 6 rght), ommunaton makes u to 30% of the exeuton tme. Ths result onfrms that the network bandwdth s not suffent for ths tye of roblem (see also the exlanatons to Fg. 3). 11

Fg. 6. Total exeuton tme and ontrbutons of the alulaton and ommunaton deendng on the number of roessors for dfferent omutatonal mesh szes (lght-weghted smulaton) As t was mentoned n the revous Seton, the solver an smulate the hemal and lasma roesses wthn the reator along wth the gas flow. Fgure 7 demonstrates the rato of omutaton to ommunaton tme for dfferent mesh szes wth dfferent tyes of the smulaton. The hgher the rato s, the less ommunatons are requred, whh obvously offers a better arallel effeny and alaton salablty The ratos n Fg. 7 exlan the dfferent seedu trends observed n Fg. 2 and 3 for hemstry-enabled and hemstry-dsabled (lght-weghted) smulatons. From the resented grahs we an see that the behavour of ths rato does not deend on the mesh sze for the hemstry-enabled smulatons, whle ths behavour for the lght-weghted smulatons sgnfantly dffers for small and large mesh szes. For a small mesh sze, the rato stays deently hgh, and for 6 roessors and more t reahes the level of the hemstryenabled smulatons. For a larger mesh, the omutaton/ommunaton rato for the no-hemstry smulatons s very low, thus dmnshng the overall arallel effeny. Fg. 7. The rato of the omutaton to ommunaton tme for hemstry-enabled and lght-weghted smulatons 4.7 Dsusson of the results for homogeneous resoures The results resented n Seton 4.4 show that the arallel seedu s lower for a larger roblem sze (wth more omutatonal mesh ells) for the smulatons of roblems wthout hemal roesses (see Fg. 2). Ths fat ndates that the rato of the nter-roess ommunaton bandwdth to the roessor erformane was not hgh enough for lght-weghted roblems wth relatvely small number of oeratons er omutatonal ell. It means that for otmal usage of the omutng ower, a large number of roessors for one arallel run shall only be used for relatvely small omutatonal meshes. Thus the ommunaton tehnology uts a lmt to the salablty of the solver for ths roblem tye. On the other hand, the smulaton of the flow wth hemal roesses shows hgher seedu wth larger meshes 12

(see Fg. 3). Here the amount of omutatons brought by smulatng the hemstry hanges the behavour of the solver qualtatvely. Ths leads us to the onluson that dfferent resoure alloaton strateges should be aled for dfferent tyes of smulaton and meshes used. The results n Fg. 5 reflet the network and nodes features of the tested Grd ste: 1. Sne the ste onssts of dual nodes, the network hannels work more effently for data transfers between the Master and a Slave roessor f a onneton was already establshed wth another Slave roessor on the same node. Ths an be exlaned by mlementaton of the MPI lbrary whh saves network resoures whle oenng and mantanng onnetons for onurrent roesses on the same node. 2. The eaks of the MPI Reeve tme for the frst few roessors (see Fg. 5 rght) are aused by the onstrants on the ortons of data that ould be aommodated at one. The onstranng fators ould be the network bandwdth dstrbuton, the roessor ahe sze, the memory avalable on the node, or a ombnaton of these fators. 5 Alaton erformane on heterogeneous resoures 5.1 Performane of the arallel solver on heterogeneous resoures The RDG Grd stes wth heterogeneous roessors and/or network lnks (Tyes II, III, IV) rovded only a lmted arallel seedu or even a slow-down of the orgnal solver wth a homogeneous arallel algorthm (data not shown). Ths was nevtable sne, n addton to the low-bandwdth lnks, these stes are haraterzed by very dverse resoures: the roessor and network arameters dffer by orders of magntude for dfferent nodes. The solver arallel algorthm was orgnally develoed for homogeneous omuter lusters wth equal roessor ower, memory and nter-roessor ommunaton bandwdth. In ase of submttng equal ortons of a arallel job to the nodes wth dfferent erformane, all the fast roessors have to wat at the barrer synhronzaton ont tll the slowest ones ath u, thus the effet of slow-down on heterogeneous resoures s not surrsng. The same roblem ours f the network onneton from the Master roessor to some of the Slave roessors s muh slower than to the others. As we have shown n the revous seton, for ommunaton-bound smulatons (hemstrydsabled smulaton wth large omutatonal meshes), the ommunaton tme on low-bandwdth networks s of the order of the alulaton tme, therefore the heterogenety of the nter-roessor ommunaton lnks s a hndrane as onsderable as the dversty of the roessor ower. One of the natural ways to adat the solver to the heterogeneous Grd resoures s to dstrbute the ortons of job among the roessors aordng to the roessor erformane and network onnetons, takng nto aount the alaton haratersts. To adat the arallel solver, we aled the aroah resented n Seton 2. 5.2 Exermental results of the workload balanng algorthm To llustrate the aroah desrbed n Seton 2 we resent the results obtaned for dfferent tyes of smulaton (hemstry-dsabled and enabled) of a reator geometry wth 10678 ells on the St. Petersburg Grd ste. Ths ste s heterogeneous n both the CPU ower and the network onnetons of the roessors (Tye IV). There are two 1.8 GHz nodes (nwo1.sa.ru, nwo2.sa.ru) and two dual 450 MHz nodes (row2.sa.ru, row3.sa.ru), all havng 512 MB RAM. One of the dual nodes (row3.sa.ru) s laed n a searate network segment wth 10 tmes lower bandwdth (10 Mbt/s aganst 100 Mbt/s n the man segment). The load balanng tests were erformed wth a moderate-sze roblem whh does not ose restrtons on requred memory, thus the memory nfluene arameter m was redued to zero and the exloraton was done for the alaton arameter f. The lnk bandwdth between the Master and Slave roessors was estmated by measurng the tme of MPI_Send transfers of a redefned data blok (wth the MPI buffer sze equal to 10 6 of MPI_DOUBLEs) durng the solver exeuton, after the resoures have been alloated. In these measurements 13

the same logal network toology was used as emloyed n the solver. The CPU ower and avalable memory were obtaned by a funton from the erfsute lbrary [11]. To valdate the aroah resented n Seton 2 we aled the workload balanng tehnque for a sngle smulaton runnng on dfferent sets of heterogeneous resoures. The estmaton of erformane for dfferent ossble values of the arameter f (hene dfferent weghtng and workload dstrbuton) was arred out. For one smulaton tye we exet to obtan aroxmately the same value of the arameter f (that rovdes the best erformane, see Seton 2) on dfferent sets of resoures. Fgure 8 (left) llustrates the load-balanng seedu Θ aheved by alyng the workload balanng tehnque for dfferent values of the arameter f on several fxed sets of heterogeneous resoures for a lght-weghted (hemstry-dsabled) smulaton. In Table 1 we summarze the ombnatons of roessors dynamally alloated n 4 tests (dfferent sets of resoures) and the weghts assgned to eah roessor for the values of f rovdng the best exeuton tme, thus the maxmal balanng seedu (see Fg 8 left). Sets of resoures set I 3 roessors set II 4 roessors set III 5 roessors set IV 6 roessors nwo1 1.8GHz/ 100Mb/s Weghts assgned to eah roessor: row2/1 450MHz/ 100Mb/s row3/1 450MHz/ 10Mb/s row2/2 450MHz/ 100Mb/s nwo2 1.8GHz/ 100Mb/s row3/2 450MHz/ 10Mb/s Heterogenety metrs Balanng seedu ro net Θ 0.580 0.274 0.146 - - - 0.618 0.606 196 % 0.452 0.218 0.112 0.218 - - 0.638 0.502 182 % 0.314 0.146 0.080 0.146 0.314-0.591 0.439 201 % 0.278 0.160 0.062 0.160 0.278 0.062 0.618 0.606 207 % Table 1. Dstrbuton of roessors and balanng weghts rovdng the best load balanng seedu for dfferent sets of resoures. Fgure 8 (left) shows that for a gven smulaton the best erformane s delvered by weghtng the resoures wth the value of f 0.3-0.4. Noteably, ths orresonds to the value obtaned for ths smulaton durng the relmnary analyss on homogeneous resoures (omare to results for smlar smulatons n Seton 4.6, Fg. 7). The results show that the algorthm gves the nrease of the balanng seedu Θ u to 207 erent omared to the ntal nonbalaned verson of the ode (wth homogeneous workload dstrbuton) on the tested resoure sets. We an see that the dstrbuton of the workload roortonal only to the roessor erformane ( = 0 ) also gves a sgnfant nrease of the erformane, but ntroduton of the deendeny on alaton sef ommunaton/omutaton rato and resoure nfrastruture arameters µ adds another 40 erent to the balanng seedu Θ. f f Fg. 8. Deendeny of the balanng seedu Θ on the resoure arameter of resoures..rght: dfferent tyes of smulaton on the same set of resoures. f. Left: sngle smulaton on dfferent sets 14

Fgure 8 (rght) shows the deendeny of the balanng seedu Θ for dfferent tyes of smulaton (hemstry enabled or dsabled) on the same set of resoures (set III from Table 1). The hemstry-dsabled smulaton has a hgher ommunaton/omutaton rato (as was shown also n Seton 4.6, Fg. 7). Ths s learly seen n the exermental results where hemstry-dsabled smulaton obtans the hghest balanng seedu Θ at hgher f values. Moreover, the gan n the balanng seedu (maxmal value of Θ ) s hgher for the smulaton wth a larger fraton of ommunatons. These results llustrate that the ntrodued algorthm for resoure adatve workload balanng an brng a valuable nrease n the erformane for ommunaton-ntensve arallel rograms runnng on heterogeneous resoures. 5.3 Dsusson and suggestons for generalzed automated load balanng The ntroduton of the load balanng tehnque allowed us to nrease the effeny of the arallel solver on heterogeneous resoures. The roosed method of suessve estmaton of resoure nfrastruture arameters µ and further determnaton of the alaton sef f shows the ossblty of automat load balanng for alatons whh nternal struture (omutatons and ommunatons) s not known. Analyss of the results aheved wth the workload balanng algorthm suggested that the followng ssues shall be addressed n order to otmze the balanng tehnque: 1. To measure the nter-roess ommunaton rate, we sent a fxed amount of data from the Master to eah Slave roessor. However n some ases the resonse of the ommunaton hannels to the nreasng amount of data s not lnearly roortonal as shown n Fg. 4. For the slower networks ths tendeny s even more ronouned. Ths brngs us to a onluson that the amount of data sent to measure the lnks erformane shall be lose to the amount really transferred wthn the solver for every artular mesh sze, geometry and solver tye. Another oton to estmate the nter-roessor ommunaton rate s to analyze the teraton data transfer tme durng the atual exeuton. However, ths requres sgnfant ode modfatons and mght be undesrable. 2. To roerly take nto aount the memory requrements of eah artular nstane of a arallel solver, smlar reasonng shall be aled as for seletng and : the hoe of the oeffent, settng the sgnfane n (rorty) of the memory fator nfluene on the alaton erformane, must deend on the tye of resoures assgned, analogous to the. 3. The sealty of the memory fator s that n addton to ths resoure-deendeny t s strongly nfluened by the alaton features. To take nto aount the memory requrements of a arallel solver, the weghtng algorthm must be enrhed by the funton measurng the memory requrements er roessor for eah smulaton on eah set of resoures. In ase of suffent memory on alloated roessors, the load balanng an be erformed takng nto aount all the fators (CPU, memory and network) where memory fator s a onstrant. After ths, another hek of meetng the memory requrements on eah roessor must be erformed. In the unfavourable ase of nsuffent memory on some of the roessors, they must be dsregarded from the arallel omutaton or relaed by other, better suted roessors. Ths must be done referably outsde the alaton, on the level of arallel job shedulng and resoure alloaton. Ths brngs us to the onluson that deally a ombned tehnque shall be develoed, where the alaton-entred load balanng aroah s ouled wth a system-level resoure management. m 15

6 Conlusons In ths aer we address the ssue of ortng dstrbuted roblem solvng envronments to the Grd, usng Vrtual Reator as an examle of a omlex alaton. One of the most hallengng roblems we enountered was ortng arallel modules from homogeneous luster envronments to heterogeneous resoures of the Grd, sefally the ssue of keeng u a hgh arallel effeny of the omutatonal omonents. Ths roblem arses for a wde lass of arallel rograms that emloy homogeneous load dstrbuton algorthms. To adat these alatons to heterogeneous Grd resoures, we develoed a theoretal aroah and a gener workload balanng tehnque that takes nto aount sef arameters of the Grd resoures dynamally assgned to a arallel job, as well as the alaton requrements. We valdated the roosed algorthm by alyng ths tehnque to the Vrtual Reator arallel solvers runnng on the Russan-Duth Grd testbed. It s worth notng that the load balanng seedu goes through a maxmum at f = f as shown n Fg. 8. Ths ndates that the load balanng strategy does fnd an otmum n the omlex arameter sae of the heterogeneous alaton/arhteture ombnaton. The lear maxmum gves an unbased gude towards automat load balanng. The develoed aroah s well suted for ether stat or dynam load balanng, and an be ombned wth the Grd-level erformane redton models or alaton-level shedulng systems [18,19]. To further exlore ths new load balanng aroah, we are urrently workng on the omarson of the theoretally derved otmzaton arameters for some sef toologes of arallel alatons wth those redted by our heurst algorthm. In order to otmze the resoure management strategy of mang the dstrbuted omonents of the alaton roblem solvng envronment, we benhmarked the ndvdual omonents of the Vrtual Reator on a set of dverse Russan-Duth Grd resoures, and extensvely studed the behavour of the arallel solvers wth varous roblem tyes and nut data on dfferent resoure nfrastrutures. The results learly show that even wthn one solver dfferent trends an exst n the alaton requrements and arallel effeny deendng on the roblem tye and omutatonal arameters, therefore dstnt resoure management and otmzaton strateges shall be aled, and automated roedures for load balanng are needed to suessfully solve omlex smulaton roblems on the Grd. Aknowledgments. The authors would lke to thank Irna Shoshmna, Alfredo Trado-Ramos and the RDG Grd deloyment team for ther assstane. The researh was onduted wth fnanal suort from the Duth Natonal Sene Foundaton NWO and the Russan Foundaton for Bas Researh under grants number 047.016.007 and 047.016.018, and wth artal suort from the Vrtual Laboratory for e-sene Bsk rojet [8]. Referenes 1. www.fdr.om, www.fluent.om, www.semteh.us, www.softmat.ru 2. V.V. Krzhzhanovskaya, P.M.A. Sloot, and Yu. E. Gorbahev. Grd-based Smulaton of Industral Thn-Flm Produton. Smulaton: Transatons of the Soety for Modelng and Smulaton Internatonal, V. 81, No. 1,. 77-85 (2005) 3. V.V. Krzhzhanovskaya, M.A. Zatevakhn, A.A. Ignatev, Y.E. Gorbahev, W.J. Goedheer and P.M.A. Sloot. A 3D Vrtual Reator for Smulaton of Slon-Based Flm Produton. Proeedngs of the ASME/JSME PVP Conferene. ASME PVP-Vol. 491-2,. 59-68, PVP2004-3120 (2004) 4. roj-oenlab-datagrd-ubl.web.ern.h, www.nbrn.net, www.fusongrd.org, www.globus.org/allane/rojets.h, ms.a.sanda.gov, www.us-vo.org 5. The CrossGrd EU Sene rojet: htt://www.eu-crossgrd.org 6. Hgh Performane Smulaton on the Grd rojet: htt://grd.sa.ru 7. Nmrod-G: htt://www.sse.monash.edu.au/~davda/nmrod/ 8. The Vrtual Laboratory for e-sene rojet: htt://www.vl-e.nl 9. Davd W. Walker, Maozhen L, Omer Rana, Matthew S. Shelds, and Y. Huang. The software arhteture of a dstrbuted roblem-solvng envronment. Conurreny - Prate and Exerene, 12(15):1455-1480, 2000. 10. V.V. Krzhzhanovskaya et al. Dstrbuted Smulaton of Slon-Based Flm Growth. Proeedngs of the 4 th PPAM onferene, LNCS, V. 2328,. 879-888. Srnger-Verlag 2002 16

11. R. Kufrn. PerfSute: An Aessble, Oen Soure Performane Analyss Envronment for Lnux. 6 th Internatonal Conferene on Lnux Clusters. Chael Hll, NC. (2005) 12. J.D. Tereso et al. Resoure-Aware Sentf Comutaton on a Heterogeneous Cluster. Comutng n Sene & Engneerng, V. 7, N 2,. 40-50, 2005 13. R. Davd et al. Soure Code Transformatons Strateges to Load-Balane Grd Alatons. LNCS vol. 2536,. 82-87, Srnger-Verlag, 2002 14. A. Barak,, G. Sha,, and R. Wheeler The MOSIX Dstrbuted Oeratng System, Load Balanng for UNIX, LNCS, vol. 672, Srnger-Verlag, 1993 15. K.A. Iskra; F. van der Lnden; Z.W. Hendrkse; B.J. Overender; G.D. van Albada and P.M.A. Sloot: The mlementaton of Dynamte - an envronment for mgratng PVM tasks, Oeratng Systems Revew, vol. 34, nr 3. 40-55. Assoaton for Comutng Mahnery, Seal Interest Grou on Oeratng Systems, July 2000. 16. G. Shao, R. Wolsk and F. Berman. Master/Slave Comutng on the Grd. Proeedngs of Heterogeneous Comutng Worksho, 3-16, IEEE Comuter Soety (2000) 17. S.Snha, M.Parashar. Adatve Runtme Parttonng of AMR Alatons on Heterogeneous Clusters. In Proeedngs of 3rd IEEE Intl. Conferene on Cluster Comutng, 435-442, 2001 18. F. Berman, R. Wolsk, H. Casanova, W. Crne H. Dal, M. Faerman, S. Fguera, J. Hayes, G. Obertell, J. Shof, G. Shao, S. Smallen, N. Srng, A. Su, D. Zagorodnov. Adatve Comutng on the Grd Usng ALeS. IEEE Trans. on Parallel and Dstrbuted Systems, vol. 14, no. 4(2003) 369 382 19. X.-H. Sun, M. Wu. Grd Harvest Serve A System for Long-Term, Alaton-Level Task Shedulng. Pro. of 2003 IEEE Internatonal Parallel and Dstrbuted Proessng Symosum (IPDPS 2003)(2003) 20. G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D.Walker. Solvng Problems on Conurrent Proessors, volume 1, Prente-Hall, 1988. 21. J.F. de Ronde; A. Shoneveld and P.M.A. Sloot: Load Balanng by Redundant Deomoston and Mang, Future Generaton Comuter Systems, vol. 12, nr 5. 391-407, Arl 1997 22. J.F. de Ronde. Mang n Hgh erformane Comutng. A ase study on Fnte Element Smulaton, PhD thess, Unversty of Amsterdam, 1998 23. Chn Lu, Sau-Mng Lau. An Adatve Load Balanng Algorthm forheterogeneous Dstrbuted Systems wth Multle Task Classes, Internatonal Conferene on Dstrbuted Comutng Systems (ICDCS'96) 24. Zhlng Lan, Valere E. Taylor, Greg Bryan. Dynam Load Balanng of SAMR Alatons on Dstrbuted Systems, Proeedngs of the 2001 ACM/IEEE onferene on Sueromutng 25. Yongbng Zhang, K. Hakozak, H. Kameda, K. Shmzu. A erformane omarson of adatve and stat load balanng n heterogeneous dstrbuted systems, the 28th Annual Smulaton Symosum,. 332, 1995. 17