Load Balancing in Charm++ Eric Bohm

Size: px
Start display at page:

Download "Load Balancing in Charm++ Eric Bohm"

Transcription

1 Load Balancing in Charm++ and AMPI Eric Bohm

2 How to Diagnose Load Imbalance? Often hidden in statements such as: o Very high synchronization overhead Most processors are waiting at a reduction Count total amount of computation (ops/flops) per processor o In each phase! o Because the balance may change from phase to phase August 5th, 2009 Charm++ and AMPI: Session II 2

3 Golden Rule of Load Balancing Fallacy: objective of load balancing is to minimize variance in load across processors Example: 50,000 tasks of equal size, 500 processors: A: All processors get 99, except last 5 gets = 199 OR, B: All processors have 101, except last 5 get 1 Identical variance, but situation A is much worse! Golden Rule: It is ok if a few processors idle, but avoid having processors that are overloaded with work Finish time = max i {Time on processor i}, excepting data dependence and communication overhead issues August 5th, 2009 Charm++ and AMPI: Session II 3

4 Amdahls s Law and Grainsize Before we get to load balancing: Original law : o If a program has K % sequential section, then speedup is limited to 100/K. If the rest of the program is parallelized completely Grainsize corollary: o If any individual piece of work is > K time units, and the sequential program takes T seq, Speedup is limited to T seq / K So: o o Examine performance data via histograms to find the sizes of remappable work units If some are too big, change the decomposition method to make smaller units August 5th, 2009 Charm++ and AMPI: Session II 4

5 Grainsize (working) Definition: the amount of computation per potentially parallel event (task creation, enqueue/dequeue, messaging, locking..) Time 1 processor Grainsize p processors August 5th, 2009 Charm++ and AMPI: Session II 5

6 Rules of Thumb for Grainsize Make it as small as possible, as long as it amortizes the overhead More specifically, ensure: o Average grainsize is greater than k v (say 10v) o No single grain should be allowed to be too large Must be smaller than T/p, but actually we can express it as Must be smaller than k m v (say 100v) Important corollary: o You can be close to optimal grainsize without having to think about P, the number of processors August 5th, 2009 Charm++ and AMPI: Session II 8

7 Molecular Dynamics in NAMD Collection of [charged] atoms, with bonds o Newtonian mechanics o Thousands of atoms (1, ,000) o 1 femtosecond time step, millions needed! At each time step o Calculate forces on each atom Bonds: Non bonded: electrostatic and van der Waal s Short distance: every timestep Long distance: every 4 timesteps using PME (3D FFT) Multiple Time Stepping o Calculate velocities and advance positions Collaboration with K. Schulten, R. Skeel, and coworkers August 5th, 2009 Charm++ and AMPI: Session II 10

8 Hybrid Decomposition Object Based Parallelization for MD: Force Decomp. + Spatial Decomp. We have many objects to load balance: o Each diamond can be assigned to any proc. o Number of diamonds (3D): 14 Number of Patches August 5th, 2009 Charm++ and AMPI: Session II 11

9 Grainsize Analysis via Histograms Grainsize distribution number of objects Solution: Split compute objects that may have too much work, using a heuristic based on the number of interacting atoms grainsize in milliseconds Problem August 5th, 2009 Charm++ and AMPI: Session II 13

10 Fine Grained Decomposition on BlueGene Force Evaluation Integration Decomposing atoms into smaller bricks gives finer grained parallelism August 5th, 2009 Charm++ and AMPI: Session II 16

11 Load Balancing Strategies Classified by when it is done: o o o Initially Dynamic: Periodically Dynamic: Continuously Classified by whether decisions are taken with global information o o o Fully centralized Quite good a choice when load balancing period is high Fully distributed Each processor knows only about a constant number of neighbors Extreme case: totally local decision (send work to a random destination processor, with some probability). Use aggregated global information, and detailed neighborhood info. August 5th, 2009 Charm++ and AMPI: Session II 17

12 Dynamic Load Balancing Scenarios: Examples representing typical classes of situations o Particles distributed over simulation space Dynamic: because Particles move. Cases: Highly non uniform distribution (cosmology) Relatively Uniform distribution o Structured grids, with dynamic refinements/coarsening o Unstructured grids with dynamic refinements/coarsening August 5th, 2009 Charm++ and AMPI: Session II 18

13 Measurement Based Load Balancing Principle of persistence o Object communication patterns and computational loads tend to persist over time, in spite of dynamic behavior Abrupt but infrequent changes Slow and small changes Runtime instrumentation o Measures communication volume and computation time Measurement based load balancers o Use the instrumented data base periodically to make new decisions o Many alternative strategies can use the database August 5th, 2009 Charm++ and AMPI: Session II 21

14 Load Balancing Steps Regular Timesteps Detailed, aggressive Load Balancing Time Instrumented Timesteps Refinement Load Balancing August 5th, 2009 Charm++ and AMPI: Session II 23

15 Charm++ Strategies Centralized GreedyLB GreedyCommLB RecBisectBfLB MetisLB TopoCentLB RefineLB RefineCommLB OrbLB NeighborLB NeighborCommLB WSLB Distributed HybridLB o Combine strategies hierarchically August 5th, 2009 Charm++ and AMPI: Session II 25

16 Load Balancer in Action Automatic Load Balancing in Crack Propagation Number of Iterations Per second Elements Added 2. Load Balancer Invoked 3. Chunks Migrated Iteration Num ber August 5th, 2009 Charm++ and AMPI: Session II 28

17 Distributed Load Balancing Centralized strategies o Still ok for 3000 processors for NAMD Distributed balancing is needed when: o o Number of processors is large and/or Load variation is rapid Large machines: o o o Need to handle locality of communication Topology sensitive placement Need to work with scant global information Approximate or aggregated global information (average/max load) Incomplete global info (only neighborhood ) Work diffusion strategies (1980s work by Kale and others!) Achieving global effects by local action August 5th, 2009 Charm++ and AMPI: Session II 29

18 Load Balancing on Large Machines Existing load balancing strategies don t scale on extremely large machines Limitations of centralized strategies: o Central node: memory/communication bottleneck o Decision making algorithms tend to be very slow Limitations of distributed strategies: o Difficult to achieve well informed load balancing decisions August 5th, 2009 Charm++ and AMPI: Session II 30

19 Simulation Study Memory Overhead Simulation performed with the performance simulator BigSim Memory usage (MB) K 256K 512K 1M 32K processors 64K processors Number of objects lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D mesh. August 5th, 2009 Charm++ and AMPI: Session II 31

20 Hierarchical Load Balancers Hierarchical distributed load balancers o Divide into processor groups o Apply different strategies at each level o Scalable to a large number of processors August 5th, 2009 Charm++ and AMPI: Session II 33

21 Our HybridLB Scheme Refinement-based Load balancing 1 Load Data Load Data (OCG) Greedy-based Load balancing token object August 5th, 2009 Charm++ and AMPI: Session II 34

22 Hybrid Load Balancing Performance Time(s) 500 Simulation of lb_test for 64K processors Load Balance Time K 512K 1M Number of Objects Maximum predicted load (seconds) GreedyCommLB HybridLB(GreedyCommLB) Application Time 256K 512K 1M Number of Objects N procs Memory 6.8MB 22.57MB 22.63MB lb_test benchmark s actual run on BG/L at IBM (512K objects) August 5th, 2009 Charm++ and AMPI: Session II 35

23 Load Balancing: Hands on August 5th, 2009 Charm++ and AMPI: Session II 36

24 Simple Imbalance LB_Test.C 1D Array of chares Half of which have 2x computation load Strong scaling o make will produce LB_Test o Run LB_Test with these parameters Arguments: Chares per core, iterations, workload multiplier, array size Use at least 7 processors (precede those arguments with np 7) August 5th, 2009 Charm++ and AMPI: Session II 37

25 Output Without Balancing Charm++> cpu topology info is being gathered. Charm++> 1 unique compute nodes detected. Running on 7 processors with 40 chares per pe All array elements ready at seconds. Computation Begins [0] Element 0 took seconds for work 1 at iteration 0 sumc 4.664e+13 [1] Element 40 took seconds for work 2 at iteration 0 sumc 8.748e+14 [0] Element 0 took seconds for work 1 at iteration 99 sumc 4.664e+13 [1] Element 40 took seconds for work 2 at iteration 99 sumc 8.748e+14 Total work performed = seconds Average total chare work per iteration = seconds Average iteration time = seconds Done after seconds August 5th, 2009 Charm++ and AMPI: Session II 38

26 Analyze Performance Productivity => not wasting your time o Measure twice, cut once make projections o Produces LB_Test_prj o Change your job script to run LB_Test_prj o mkdir nobalancetrace o Add arguments +logsize traceroot $PWD/nobalancetrace o Execution will create trace files in nobalancetrace August 5th, 2009 Charm++ and AMPI: Session II 39

27 Download and Visualize Download the contents of nobalancetrace Or extract sample from nobalancetrace.tar o tar xf nobalancetrace.tar Run projections o Load LB_Test_prj.sts Open timeprofile on several steps o 4s to 8s for sample August 5th, 2009 Charm++ and AMPI: Session II 40

28 Time Profile no Balance August 5th, 2009 Charm++ and AMPI: Session II 41

29 Fix Migration Fix the pup routine for the LB_Test chare o PUP each member variable p varname; o Need to do memory allocation when unpacking if(p.isunpacking){ /* allocate dynamic members */ } o PUP dynamically created arrays PUParray(p, varname, numelements); o Remove the CkAbort August 5th, 2009 Charm++ and AMPI: Session II 42

30 Add Load Balancer Support Add call to AtSync in LB_Test::next_iter if ((iteration == balance_iteration) && usesatsync) { AtSync(); } else { compute(); } Add ResumeFromSync void ResumeFromSync(void) { // Called by Load balancing framework compute(); } Answer is in LB_Test_final.C August 5th, 2009 Charm++ and AMPI: Session II 43

31 Use GreedyLB Change job script to run LB_Test_LB Add argument +balancer GreedyLB o Run on the same number of processors with the same arguments o August 5th, 2009 Charm++ and AMPI: Session II 44

32 Output with Balancing Charm++> cpu topology info is being gathered. Charm++> 1 unique compute nodes detected. [0] GreedyLB created Running on 7 processors with 40 chares per pe All array elements ready at seconds. Computation Begins [0] Element 0 took seconds for work 1 at iteration 0 sumc 4.664e+13 [1] Element 40 took seconds for work 2 at iteration 0 sumc 8.748e+14 [6] Element 0 took seconds for work 1 at iteration 99 sumc 4.664e+13 [6] Element 40 took seconds for work 2 at iteration 99 sumc 8.748e+14 Total work performed = seconds Average total chare work per iteration = seconds Average iteration time = seconds Done after seconds August 5th, 2009 Charm++ and AMPI: Session II 45

33 Compare Consider average iteration time Consider total CPU time o Walltime * number of processors o The more processors you use, the more important it is to reduce iteration time through efficiency o Look for overloaded processors Underloading is just a symptom Overload implies bottleneck August 5th, 2009 Charm++ and AMPI: Session II 46

34 Usage Profile Use Usage Profile from Tools menu Examine area before load balancing o Note, intervals are in 100ms o 3000ms to 4000ms works for the sample August 5th, 2009 Charm++ and AMPI: Session II 47

35 Analyze Performance Again Productivity => not wasting your time o Measure twice, cut once Make projections o Produces LB_Test_LB_prj o Change your job script to run LB_Test_LB_prj o mkdir balancetrace o Add arguments +logsize traceroot $PWD/balancetrace o Execution will create trace files in balancetrace August 5th, 2009 Charm++ and AMPI: Session II 48

36 Usage Profile Before Balance August 5th, 2009 Charm++ and AMPI: Session II 49

37 Timeline Across Balancer Open timeline spanning load balancing o 4s to 8s works for sample o Try a large time span on a few cores then zoom in August 5th, 2009 Charm++ and AMPI: Session II 50

38 Summary Look for load imbalance Migratable objects are not hard to use Charm++ has significant infrastructure to help o On your own try this benchmark at varying processor numbers See the impact on scaling with different array sizes See the impact on total runtime when the number of iterations grows large. Try other load balancers 1p.html#lbFramework August 5th, 2009 Charm++ and AMPI: Session II 51

39 Sanjay Kale & Eric Bohm INTERMEDIATE CHARM ++ August 5th, 2009 Charm++ and AMPI: Session II 52

40 Outline Messages Groups, nodegroups Startup process Fault tolerance Advanced o Communication optimization o Advanced arrays o Conditional parking o Make your own LB strategy o Interact with CCS and Python o Higher level languages August 5th, 2009 Charm++ and AMPI: Session II 53

41 Parameter Marshalling Application passes parameters by value o myproxy.myentry(... arguments...); PUP::able types as arguments The receiver cannot maintain a pointer to the input The system allocates a message containing the parameters to send (CkMarshallMsg) entry void receive(int v); entry void startstep(); entry void eastghost(int n, double vals[n]); n vals_off vals_cnt vals August 5th, 2009 Charm++ and AMPI: Session II 54

42 message InfoMsg; class InfoMsg : public CMessage_InfoMsg { int iter;... other data... methods } Messages Necessary in some situations o E.g. Specify order of operations (priority) Possible optimizations o Avoid memcopy and memory allocation o Reuse the same message multiple times E.g. Yield processor using a message void MyArray::compute(InfoMsg *msg) {... do some work } if (workdone) delete msg; else thisproxy[thisindex].compute(msg); August 5th, 2009 Charm++ and AMPI: Session II 55

43 Jacobi::startStep() { Ghost *msg = new (localrows) Ghost(localRows); for (int i = 1; i < localrows; i++) msg >vals[i 1] = values[i][localcols+1]; } Variable Messages (Jacobi) } thisproxy(thisindex.x + 1, thisindex.y).westghost(msg);... Jacobi::northGhost(Ghost *msg) { north = msg; ghostreceived ++; A[0] = msg >vals 1; attemptcompute(); } Jacobi::attemptCompute() {... delete north; } class Ghost : CMessage_Ghost{ int len; double *vals; } len vals *vals north message Ghost { double vals[]; A[0][1]A[0][localCols] August 5th, 2009 Charm++ and AMPI: Session II 56

44 Message Priorities Application assigns priorities to some messages Charm scheduler respects priorities while draining message queues Separate message queues for zero, negative and positive priorities It is an optimization Beware of starvation! o A message might never get scheduled Charm ++ does not guarantee the delivery order, only a best effort August 5th, 2009 Charm++ and AMPI: Session II 57

45 Message Priorities (Cont.) Different queuing strategies CK_QUEUEING_FIFO, CK_QUEUEING_LIFO To specify the priority: int prio =... MsgType *msg = new (8*sizeof(int)) MsgType; *CkPriorityPtr(msg) = prio; CkSetQueueing(msg, CK_QUEUEING_IFIFO); CK_QUEUEING_IFIFO, CK_QUEUEING_ILIFO negative high 0 positive low August 5th, 2009 Charm++ and AMPI: Session II 58

46 mainchare Main {... Main::Main(CkArgMsg }; *m) : CBase_Main(m) { array [1D] MyArray {... CProxy_MyGroup }; group1, group2; group MyGroup { entry MyGroup(int n); entry MyGroup(); }; } Groups Collection of chares in which exactly one chare is present in each processor o Indexable with the processor rank It is an optimization o Useful for libraries, when each processor needs a local branch to service local chares Ex. Software cache manager: all chares in a processor share the same read only data, avoid extra communication arrayproxy = CProxy_MyArray::ckNew(nElem); group1 = CProxy_MyGroup::ckNew(); group2 = Cproxy_MyGroup::ckNew(100); August 5th, 2009 Charm++ and AMPI: Session II 59

47 Groups (Cont.) Should not be used to perform computation in place of chare arrays! o Groups are not load balanced Nodegroups: o Like groups, but with one chare per node o Differ from groups only if Charm ++ compiled for SMP o Can execute on any processor within the node, even concurrently Keywork exclusive to prevent data races August 5th, 2009 Charm++ and AMPI: Session II 60

48 Startup Process initnode and initproc executed o Run once every node or processor, respectively o Declared in the.ci file All mainchare constructors executed o Create chare arrays/groups Constructor methods are executed immediately on proc 0 o They can set readonly variables Readonly are synchronized Every other entry method is executed o This includes constructor methods on proc#0 August 5th, 2009 Charm++ and AMPI: Session II 61 61

49 Fault Tolerance Checkpointing o Simply PUP all Charm ++ entities to disk o Trigger with CkStartCheckpoint( dir, cb) o Callback cb called upon checkpoint completion Called both after checkpoint and upon restart o To restart: +restart <logdir> Live recovery methods (experimental) o Double in memory checkpoint o Message logging Only faulty processor rolls back August 5th, 2009 Charm++ and AMPI: Session II 62

50 Fault Tolerance: Example Main::Main(CkMigrateMessage *m) : CBase_Main(m) { // Subtle: Chare proxy readonly needs to be updated // manually because of the object pointer inside it! mainproxy = thisproxy; } void Main::pup(PUP::er &p) {... } readonly CProxy_Main mainproxy; mainchare [migratable] Main {... }; group [migratable] MyGroup {... }; void Main::next(CkReductionMsg *m) { if ((++step % 10) == 0) { CkCallback cb(ckindex_hello::sayhi(),helloproxy); CkStartCheckpoint("log",cb); } else { helloproxy.sayhi(); } delete m; } August 5th, 2009 Charm++ and AMPI: Session II 63

51 Sanjay Kale & Eric Bohm ADVANCED TUTORIAL August 5th, 2009 Charm++ and AMPI: Session II 64

52 Communication Optimization Optimize certain most recurrent communication patterns o Streaming: reduce overhead of many small msgs o Multicast o All to all Each must be used with its own API o Each may have multiple alternative implementations, embodying different strategies o Programmer can choose best strategy for their scenario August 5th, 2009 Charm++ and AMPI: Session II 65

53 Advanced Arrays Sections: create proxies representing slices of a chare array o Optimize communication with Comlib or CkMulticast o Ex: Row/column of a 2D chare array Mapping: manually specify map of chares to PEs o Ex: Place communicating objects on same processor Bound arrays: Tie two chare arrays together o The system places and migrates corresponding indices together o Ex: FFT helper library bound to work array August 5th, 2009 Charm++ and AMPI: Session II 66

54 Conditional Packing for SMP Pass pointer if destination is one the same node Copy data into message if remote destination message Slice { conditional Boomarray<double> data; } chare Integrate { entry Integrate(Slice *m); entry Integrate(Boomarray<double> d conditional); } void Integrate::Integrate(Slice *msg) { Boomarray<double> &b = *msg >data;... do work using b... // Send back the modified data mainproxy.results(msg); } class Slice : CMessage_Slice { double *data; } August 5th, 2009 Charm++ and AMPI: Session II 67

55 Make Your Own LB strategy You can overwrite automatic measurements, with application supplied performance estimates o Reimplement UserSetLBLoad() in your chare o Use setobjtime(time) and getobjtime() Or, you can implement a new strategy o foolb::work(centrallb::ldstats* stats, int count) o Use the gathered data, decide a new assignment of objects to processor o The system will handle migration of objects August 5th, 2009 Charm++ and AMPI: Session II 68

56 CCS: Converse Client Server Allows interactivity User registers callbacks to execute when certain messages are received by the application from the outside CcsRegisterHandler( myrequest, CkCallback(CkIndex_Main::request(0), mainproxy)); Current uses: o LiveViz (visualization) o CharmDebug o Projections August 5th, 2009 Charm++ and AMPI: Session II 69

57 Interact with Python Scripting Upload Python scripts via CCS and run them on demand There are three ways in which Python scripts can interact with application o Low level read/write (access single variables) o High level (call local entry methods) o Iterative (apply a Python function to a set of objects) Client binding for C++ and Java August 5th, 2009 Charm++ and AMPI: Session II 70

58 Higher Level Languages Incomplete but simple languages Target specific patterns of interaction Interoperate effectively with each other o And with Charm ++, AMPI o Because of message drive scheduler in Charm ++ SDAG: describes life cycle of a chare clearly Charisma: orchestrates multiple collections of chares, describing global flow of data and control MSA (Multiphase Shared Arrays): disciplined shared memory August 5th, 2009 Charm++ and AMPI: Session II 71

59 More References Online tutorial: o Charm ++ manual: o o CCS and liveviz under Converse manual Comprehensive FAQ o August 5th, 2009 Charm++ and AMPI: Session II 72

Optimizing Load Balance Using Parallel Migratable Objects

Optimizing Load Balance Using Parallel Migratable Objects Optimizing Load Balance Using Parallel Migratable Objects Laxmikant V. Kalé, Eric Bohm Parallel Programming Laboratory University of Illinois Urbana-Champaign 2012/9/25 Laxmikant V. Kalé, Eric Bohm (UIUC)

More information

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware

More information

Charm++, what s that?!

Charm++, what s that?! Charm++, what s that?! Les Mardis du dev François Tessier - Runtime team October 15, 2013 François Tessier Charm++ 1 / 25 Outline 1 Introduction 2 Charm++ 3 Basic examples 4 Load Balancing 5 Conclusion

More information

Layer Load Balancing and Flexibility

Layer Load Balancing and Flexibility Periodic Hierarchical Load Balancing for Large Supercomputers Gengbin Zheng, Abhinav Bhatelé, Esteban Meneses and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign,

More information

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed communication-aware load balancing with TreeMatch in Charm++ Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration

More information

Load Imbalance Analysis

Load Imbalance Analysis With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

NAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon

NAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,,

More information

Introduction to Parallel Computing Issues

Introduction to Parallel Computing Issues Introduction to Parallel Computing Issues Laxmikant Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Dept. of Computer Science And Theoretical Biophysics Group Beckman Institute University

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

SMock A Test Platform for the Evaluation of Monitoring Tools

SMock A Test Platform for the Evaluation of Monitoring Tools SMock A Test Platform for the Evaluation of Monitoring Tools User Manual Ruth Mizzi Faculty of ICT University of Malta June 20, 2013 Contents 1 Introduction 3 1.1 The Architecture and Design of SMock................

More information

Running a Workflow on a PowerCenter Grid

Running a Workflow on a PowerCenter Grid Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Cloud Friendly Load Balancing for HPC Applications: Preliminary Work

Cloud Friendly Load Balancing for HPC Applications: Preliminary Work Cloud Friendly Load Balancing for HPC Applications: Preliminary Work Osman Sarood, Abhishek Gupta and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign Urbana,

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

ZooKeeper. Table of contents

ZooKeeper. Table of contents by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...

More information

Chapter 5 Linux Load Balancing Mechanisms

Chapter 5 Linux Load Balancing Mechanisms Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still

More information

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012 MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

Load balancing in SOAJA (Service Oriented Java Adaptive Applications)

Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

PeerMon: A Peer-to-Peer Network Monitoring System

PeerMon: A Peer-to-Peer Network Monitoring System PeerMon: A Peer-to-Peer Network Monitoring System Tia Newhall, Janis Libeks, Ross Greenwood, Jeff Knerr Computer Science Department Swarthmore College Swarthmore, PA USA newhall@cs.swarthmore.edu Target:

More information

PETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI

PETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI PETASCALE DATA STORAGE INSTITUTE 3 universities, 5 labs, G. Gibson, CMU, PI SciDAC @ Petascale storage issues www.pdsi-scidac.org Community building: ie. PDSW-SC07 (Sun 11th) APIs & standards: ie., Parallel

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University

More information

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE 10G ENTERPRISE EDITION ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

More information

Multilevel Load Balancing in NUMA Computers

Multilevel Load Balancing in NUMA Computers FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and

More information

Feedback guided load balancing in a distributed memory environment

Feedback guided load balancing in a distributed memory environment Feedback guided load balancing in a distributed memory environment Constantinos Christofi August 18, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract

More information

Practical Performance Understanding the Performance of Your Application

Practical Performance Understanding the Performance of Your Application Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance

More information

Big Data Processing with Google s MapReduce. Alexandru Costan

Big Data Processing with Google s MapReduce. Alexandru Costan 1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:

More information

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source

More information

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY. 6.828 Operating System Engineering: Fall 2005

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY. 6.828 Operating System Engineering: Fall 2005 Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.828 Operating System Engineering: Fall 2005 Quiz II Solutions Average 84, median 83, standard deviation

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will

More information

NVIDIA Tools For Profiling And Monitoring. David Goodwin

NVIDIA Tools For Profiling And Monitoring. David Goodwin NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale

More information

Improved metrics collection and correlation for the CERN cloud storage test framework

Improved metrics collection and correlation for the CERN cloud storage test framework Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report

More information

Guideline for stresstest Page 1 of 6. Stress test

Guideline for stresstest Page 1 of 6. Stress test Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number

More information

Grid Scheduling Dictionary of Terms and Keywords

Grid Scheduling Dictionary of Terms and Keywords Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status

More information

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers Load Balancing for Charm++ Applications on Large Supercomputers Gengbin Zheng, Esteban Meneses, Abhinav Bhatelé and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

The Complete Performance Solution for Microsoft SQL Server

The Complete Performance Solution for Microsoft SQL Server The Complete Performance Solution for Microsoft SQL Server Powerful SSAS Performance Dashboard Innovative Workload and Bottleneck Profiling Capture of all Heavy MDX, XMLA and DMX Aggregation, Partition,

More information

Job Scheduling with Moab Cluster Suite

Job Scheduling with Moab Cluster Suite Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Database Replication with Oracle 11g and MS SQL Server 2008

Database Replication with Oracle 11g and MS SQL Server 2008 Database Replication with Oracle 11g and MS SQL Server 2008 Flavio Bolfing Software and Systems University of Applied Sciences Chur, Switzerland www.hsr.ch/mse Abstract Database replication is used widely

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

How To Improve Performance On A Single Chip Computer

How To Improve Performance On A Single Chip Computer : Redundant Arrays of Inexpensive Disks this discussion is based on the paper:» A Case for Redundant Arrays of Inexpensive Disks (),» David A Patterson, Garth Gibson, and Randy H Katz,» In Proceedings

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources

GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 2/25/2006 1 Overview Grid/NetSolve

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

MOSIX: High performance Linux farm

MOSIX: High performance Linux farm MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

Shoal: IaaS Cloud Cache Publisher

Shoal: IaaS Cloud Cache Publisher University of Victoria Faculty of Engineering Winter 2013 Work Term Report Shoal: IaaS Cloud Cache Publisher Department of Physics University of Victoria Victoria, BC Mike Chester V00711672 Work Term 3

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1 Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems

More information

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,

More information

Real-Time Monitoring Framework for Parallel Processes

Real-Time Monitoring Framework for Parallel Processes International Journal of scientific research and management (IJSRM) Volume 3 Issue 6 Pages 3134-3138 2015 \ Website: www.ijsrm.in ISSN (e): 2321-3418 Real-Time Monitoring Framework for Parallel Processes

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Distribution transparency. Degree of transparency. Openness of distributed systems

Distribution transparency. Degree of transparency. Openness of distributed systems Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 01: Version: August 27, 2012 1 / 28 Distributed System: Definition A distributed

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Introduction to application performance analysis

Introduction to application performance analysis Introduction to application performance analysis Performance engineering We want to get the most science and engineering through a supercomputing system as possible. The more efficient codes are, the more

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Efficient database auditing

Efficient database auditing Topicus Fincare Efficient database auditing And entity reversion Dennis Windhouwer Supervised by: Pim van den Broek, Jasper Laagland and Johan te Winkel 9 April 2014 SUMMARY Topicus wants their current

More information

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation Track Name: Application Infrastructure Topic : WebSphere Application Server Top 10 Performance Tuning Recommendations. Presenter Name : Vishal A Charegaonkar WebSphere Architect (Performance and Monitoring)

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86

More information

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 June 29-30, 2015 Contacts Alexandre Boudnik Senior Solution Architect, EPAM Systems Alexandre_Boudnik@epam.com

More information

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing Research Inventy: International Journal Of Engineering And Science Vol.2, Issue 10 (April 2013), Pp 53-57 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com Fair Scheduling Algorithm with Dynamic

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Study of Various Load Balancing Techniques in Cloud Environment- A Review

Study of Various Load Balancing Techniques in Cloud Environment- A Review International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-04 E-ISSN: 2347-2693 Study of Various Load Balancing Techniques in Cloud Environment- A Review Rajdeep

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Analyzing IBM i Performance Metrics

Analyzing IBM i Performance Metrics WHITE PAPER Analyzing IBM i Performance Metrics The IBM i operating system is very good at supplying system administrators with built-in tools for security, database management, auditing, and journaling.

More information

Reliable Adaptable Network RAM

Reliable Adaptable Network RAM Reliable Adaptable Network RAM Tia Newhall, Daniel Amato, Alexandr Pshenichkin Computer Science Department, Swarthmore College Swarthmore, PA 19081, USA Abstract We present reliability solutions for adaptable

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp

Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since

More information

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information