Load Balancing in Charm++ Eric Bohm
|
|
- Godwin Bishop
- 8 years ago
- Views:
Transcription
1 Load Balancing in Charm++ and AMPI Eric Bohm
2 How to Diagnose Load Imbalance? Often hidden in statements such as: o Very high synchronization overhead Most processors are waiting at a reduction Count total amount of computation (ops/flops) per processor o In each phase! o Because the balance may change from phase to phase August 5th, 2009 Charm++ and AMPI: Session II 2
3 Golden Rule of Load Balancing Fallacy: objective of load balancing is to minimize variance in load across processors Example: 50,000 tasks of equal size, 500 processors: A: All processors get 99, except last 5 gets = 199 OR, B: All processors have 101, except last 5 get 1 Identical variance, but situation A is much worse! Golden Rule: It is ok if a few processors idle, but avoid having processors that are overloaded with work Finish time = max i {Time on processor i}, excepting data dependence and communication overhead issues August 5th, 2009 Charm++ and AMPI: Session II 3
4 Amdahls s Law and Grainsize Before we get to load balancing: Original law : o If a program has K % sequential section, then speedup is limited to 100/K. If the rest of the program is parallelized completely Grainsize corollary: o If any individual piece of work is > K time units, and the sequential program takes T seq, Speedup is limited to T seq / K So: o o Examine performance data via histograms to find the sizes of remappable work units If some are too big, change the decomposition method to make smaller units August 5th, 2009 Charm++ and AMPI: Session II 4
5 Grainsize (working) Definition: the amount of computation per potentially parallel event (task creation, enqueue/dequeue, messaging, locking..) Time 1 processor Grainsize p processors August 5th, 2009 Charm++ and AMPI: Session II 5
6 Rules of Thumb for Grainsize Make it as small as possible, as long as it amortizes the overhead More specifically, ensure: o Average grainsize is greater than k v (say 10v) o No single grain should be allowed to be too large Must be smaller than T/p, but actually we can express it as Must be smaller than k m v (say 100v) Important corollary: o You can be close to optimal grainsize without having to think about P, the number of processors August 5th, 2009 Charm++ and AMPI: Session II 8
7 Molecular Dynamics in NAMD Collection of [charged] atoms, with bonds o Newtonian mechanics o Thousands of atoms (1, ,000) o 1 femtosecond time step, millions needed! At each time step o Calculate forces on each atom Bonds: Non bonded: electrostatic and van der Waal s Short distance: every timestep Long distance: every 4 timesteps using PME (3D FFT) Multiple Time Stepping o Calculate velocities and advance positions Collaboration with K. Schulten, R. Skeel, and coworkers August 5th, 2009 Charm++ and AMPI: Session II 10
8 Hybrid Decomposition Object Based Parallelization for MD: Force Decomp. + Spatial Decomp. We have many objects to load balance: o Each diamond can be assigned to any proc. o Number of diamonds (3D): 14 Number of Patches August 5th, 2009 Charm++ and AMPI: Session II 11
9 Grainsize Analysis via Histograms Grainsize distribution number of objects Solution: Split compute objects that may have too much work, using a heuristic based on the number of interacting atoms grainsize in milliseconds Problem August 5th, 2009 Charm++ and AMPI: Session II 13
10 Fine Grained Decomposition on BlueGene Force Evaluation Integration Decomposing atoms into smaller bricks gives finer grained parallelism August 5th, 2009 Charm++ and AMPI: Session II 16
11 Load Balancing Strategies Classified by when it is done: o o o Initially Dynamic: Periodically Dynamic: Continuously Classified by whether decisions are taken with global information o o o Fully centralized Quite good a choice when load balancing period is high Fully distributed Each processor knows only about a constant number of neighbors Extreme case: totally local decision (send work to a random destination processor, with some probability). Use aggregated global information, and detailed neighborhood info. August 5th, 2009 Charm++ and AMPI: Session II 17
12 Dynamic Load Balancing Scenarios: Examples representing typical classes of situations o Particles distributed over simulation space Dynamic: because Particles move. Cases: Highly non uniform distribution (cosmology) Relatively Uniform distribution o Structured grids, with dynamic refinements/coarsening o Unstructured grids with dynamic refinements/coarsening August 5th, 2009 Charm++ and AMPI: Session II 18
13 Measurement Based Load Balancing Principle of persistence o Object communication patterns and computational loads tend to persist over time, in spite of dynamic behavior Abrupt but infrequent changes Slow and small changes Runtime instrumentation o Measures communication volume and computation time Measurement based load balancers o Use the instrumented data base periodically to make new decisions o Many alternative strategies can use the database August 5th, 2009 Charm++ and AMPI: Session II 21
14 Load Balancing Steps Regular Timesteps Detailed, aggressive Load Balancing Time Instrumented Timesteps Refinement Load Balancing August 5th, 2009 Charm++ and AMPI: Session II 23
15 Charm++ Strategies Centralized GreedyLB GreedyCommLB RecBisectBfLB MetisLB TopoCentLB RefineLB RefineCommLB OrbLB NeighborLB NeighborCommLB WSLB Distributed HybridLB o Combine strategies hierarchically August 5th, 2009 Charm++ and AMPI: Session II 25
16 Load Balancer in Action Automatic Load Balancing in Crack Propagation Number of Iterations Per second Elements Added 2. Load Balancer Invoked 3. Chunks Migrated Iteration Num ber August 5th, 2009 Charm++ and AMPI: Session II 28
17 Distributed Load Balancing Centralized strategies o Still ok for 3000 processors for NAMD Distributed balancing is needed when: o o Number of processors is large and/or Load variation is rapid Large machines: o o o Need to handle locality of communication Topology sensitive placement Need to work with scant global information Approximate or aggregated global information (average/max load) Incomplete global info (only neighborhood ) Work diffusion strategies (1980s work by Kale and others!) Achieving global effects by local action August 5th, 2009 Charm++ and AMPI: Session II 29
18 Load Balancing on Large Machines Existing load balancing strategies don t scale on extremely large machines Limitations of centralized strategies: o Central node: memory/communication bottleneck o Decision making algorithms tend to be very slow Limitations of distributed strategies: o Difficult to achieve well informed load balancing decisions August 5th, 2009 Charm++ and AMPI: Session II 30
19 Simulation Study Memory Overhead Simulation performed with the performance simulator BigSim Memory usage (MB) K 256K 512K 1M 32K processors 64K processors Number of objects lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D mesh. August 5th, 2009 Charm++ and AMPI: Session II 31
20 Hierarchical Load Balancers Hierarchical distributed load balancers o Divide into processor groups o Apply different strategies at each level o Scalable to a large number of processors August 5th, 2009 Charm++ and AMPI: Session II 33
21 Our HybridLB Scheme Refinement-based Load balancing 1 Load Data Load Data (OCG) Greedy-based Load balancing token object August 5th, 2009 Charm++ and AMPI: Session II 34
22 Hybrid Load Balancing Performance Time(s) 500 Simulation of lb_test for 64K processors Load Balance Time K 512K 1M Number of Objects Maximum predicted load (seconds) GreedyCommLB HybridLB(GreedyCommLB) Application Time 256K 512K 1M Number of Objects N procs Memory 6.8MB 22.57MB 22.63MB lb_test benchmark s actual run on BG/L at IBM (512K objects) August 5th, 2009 Charm++ and AMPI: Session II 35
23 Load Balancing: Hands on August 5th, 2009 Charm++ and AMPI: Session II 36
24 Simple Imbalance LB_Test.C 1D Array of chares Half of which have 2x computation load Strong scaling o make will produce LB_Test o Run LB_Test with these parameters Arguments: Chares per core, iterations, workload multiplier, array size Use at least 7 processors (precede those arguments with np 7) August 5th, 2009 Charm++ and AMPI: Session II 37
25 Output Without Balancing Charm++> cpu topology info is being gathered. Charm++> 1 unique compute nodes detected. Running on 7 processors with 40 chares per pe All array elements ready at seconds. Computation Begins [0] Element 0 took seconds for work 1 at iteration 0 sumc 4.664e+13 [1] Element 40 took seconds for work 2 at iteration 0 sumc 8.748e+14 [0] Element 0 took seconds for work 1 at iteration 99 sumc 4.664e+13 [1] Element 40 took seconds for work 2 at iteration 99 sumc 8.748e+14 Total work performed = seconds Average total chare work per iteration = seconds Average iteration time = seconds Done after seconds August 5th, 2009 Charm++ and AMPI: Session II 38
26 Analyze Performance Productivity => not wasting your time o Measure twice, cut once make projections o Produces LB_Test_prj o Change your job script to run LB_Test_prj o mkdir nobalancetrace o Add arguments +logsize traceroot $PWD/nobalancetrace o Execution will create trace files in nobalancetrace August 5th, 2009 Charm++ and AMPI: Session II 39
27 Download and Visualize Download the contents of nobalancetrace Or extract sample from nobalancetrace.tar o tar xf nobalancetrace.tar Run projections o Load LB_Test_prj.sts Open timeprofile on several steps o 4s to 8s for sample August 5th, 2009 Charm++ and AMPI: Session II 40
28 Time Profile no Balance August 5th, 2009 Charm++ and AMPI: Session II 41
29 Fix Migration Fix the pup routine for the LB_Test chare o PUP each member variable p varname; o Need to do memory allocation when unpacking if(p.isunpacking){ /* allocate dynamic members */ } o PUP dynamically created arrays PUParray(p, varname, numelements); o Remove the CkAbort August 5th, 2009 Charm++ and AMPI: Session II 42
30 Add Load Balancer Support Add call to AtSync in LB_Test::next_iter if ((iteration == balance_iteration) && usesatsync) { AtSync(); } else { compute(); } Add ResumeFromSync void ResumeFromSync(void) { // Called by Load balancing framework compute(); } Answer is in LB_Test_final.C August 5th, 2009 Charm++ and AMPI: Session II 43
31 Use GreedyLB Change job script to run LB_Test_LB Add argument +balancer GreedyLB o Run on the same number of processors with the same arguments o August 5th, 2009 Charm++ and AMPI: Session II 44
32 Output with Balancing Charm++> cpu topology info is being gathered. Charm++> 1 unique compute nodes detected. [0] GreedyLB created Running on 7 processors with 40 chares per pe All array elements ready at seconds. Computation Begins [0] Element 0 took seconds for work 1 at iteration 0 sumc 4.664e+13 [1] Element 40 took seconds for work 2 at iteration 0 sumc 8.748e+14 [6] Element 0 took seconds for work 1 at iteration 99 sumc 4.664e+13 [6] Element 40 took seconds for work 2 at iteration 99 sumc 8.748e+14 Total work performed = seconds Average total chare work per iteration = seconds Average iteration time = seconds Done after seconds August 5th, 2009 Charm++ and AMPI: Session II 45
33 Compare Consider average iteration time Consider total CPU time o Walltime * number of processors o The more processors you use, the more important it is to reduce iteration time through efficiency o Look for overloaded processors Underloading is just a symptom Overload implies bottleneck August 5th, 2009 Charm++ and AMPI: Session II 46
34 Usage Profile Use Usage Profile from Tools menu Examine area before load balancing o Note, intervals are in 100ms o 3000ms to 4000ms works for the sample August 5th, 2009 Charm++ and AMPI: Session II 47
35 Analyze Performance Again Productivity => not wasting your time o Measure twice, cut once Make projections o Produces LB_Test_LB_prj o Change your job script to run LB_Test_LB_prj o mkdir balancetrace o Add arguments +logsize traceroot $PWD/balancetrace o Execution will create trace files in balancetrace August 5th, 2009 Charm++ and AMPI: Session II 48
36 Usage Profile Before Balance August 5th, 2009 Charm++ and AMPI: Session II 49
37 Timeline Across Balancer Open timeline spanning load balancing o 4s to 8s works for sample o Try a large time span on a few cores then zoom in August 5th, 2009 Charm++ and AMPI: Session II 50
38 Summary Look for load imbalance Migratable objects are not hard to use Charm++ has significant infrastructure to help o On your own try this benchmark at varying processor numbers See the impact on scaling with different array sizes See the impact on total runtime when the number of iterations grows large. Try other load balancers 1p.html#lbFramework August 5th, 2009 Charm++ and AMPI: Session II 51
39 Sanjay Kale & Eric Bohm INTERMEDIATE CHARM ++ August 5th, 2009 Charm++ and AMPI: Session II 52
40 Outline Messages Groups, nodegroups Startup process Fault tolerance Advanced o Communication optimization o Advanced arrays o Conditional parking o Make your own LB strategy o Interact with CCS and Python o Higher level languages August 5th, 2009 Charm++ and AMPI: Session II 53
41 Parameter Marshalling Application passes parameters by value o myproxy.myentry(... arguments...); PUP::able types as arguments The receiver cannot maintain a pointer to the input The system allocates a message containing the parameters to send (CkMarshallMsg) entry void receive(int v); entry void startstep(); entry void eastghost(int n, double vals[n]); n vals_off vals_cnt vals August 5th, 2009 Charm++ and AMPI: Session II 54
42 message InfoMsg; class InfoMsg : public CMessage_InfoMsg { int iter;... other data... methods } Messages Necessary in some situations o E.g. Specify order of operations (priority) Possible optimizations o Avoid memcopy and memory allocation o Reuse the same message multiple times E.g. Yield processor using a message void MyArray::compute(InfoMsg *msg) {... do some work } if (workdone) delete msg; else thisproxy[thisindex].compute(msg); August 5th, 2009 Charm++ and AMPI: Session II 55
43 Jacobi::startStep() { Ghost *msg = new (localrows) Ghost(localRows); for (int i = 1; i < localrows; i++) msg >vals[i 1] = values[i][localcols+1]; } Variable Messages (Jacobi) } thisproxy(thisindex.x + 1, thisindex.y).westghost(msg);... Jacobi::northGhost(Ghost *msg) { north = msg; ghostreceived ++; A[0] = msg >vals 1; attemptcompute(); } Jacobi::attemptCompute() {... delete north; } class Ghost : CMessage_Ghost{ int len; double *vals; } len vals *vals north message Ghost { double vals[]; A[0][1]A[0][localCols] August 5th, 2009 Charm++ and AMPI: Session II 56
44 Message Priorities Application assigns priorities to some messages Charm scheduler respects priorities while draining message queues Separate message queues for zero, negative and positive priorities It is an optimization Beware of starvation! o A message might never get scheduled Charm ++ does not guarantee the delivery order, only a best effort August 5th, 2009 Charm++ and AMPI: Session II 57
45 Message Priorities (Cont.) Different queuing strategies CK_QUEUEING_FIFO, CK_QUEUEING_LIFO To specify the priority: int prio =... MsgType *msg = new (8*sizeof(int)) MsgType; *CkPriorityPtr(msg) = prio; CkSetQueueing(msg, CK_QUEUEING_IFIFO); CK_QUEUEING_IFIFO, CK_QUEUEING_ILIFO negative high 0 positive low August 5th, 2009 Charm++ and AMPI: Session II 58
46 mainchare Main {... Main::Main(CkArgMsg }; *m) : CBase_Main(m) { array [1D] MyArray {... CProxy_MyGroup }; group1, group2; group MyGroup { entry MyGroup(int n); entry MyGroup(); }; } Groups Collection of chares in which exactly one chare is present in each processor o Indexable with the processor rank It is an optimization o Useful for libraries, when each processor needs a local branch to service local chares Ex. Software cache manager: all chares in a processor share the same read only data, avoid extra communication arrayproxy = CProxy_MyArray::ckNew(nElem); group1 = CProxy_MyGroup::ckNew(); group2 = Cproxy_MyGroup::ckNew(100); August 5th, 2009 Charm++ and AMPI: Session II 59
47 Groups (Cont.) Should not be used to perform computation in place of chare arrays! o Groups are not load balanced Nodegroups: o Like groups, but with one chare per node o Differ from groups only if Charm ++ compiled for SMP o Can execute on any processor within the node, even concurrently Keywork exclusive to prevent data races August 5th, 2009 Charm++ and AMPI: Session II 60
48 Startup Process initnode and initproc executed o Run once every node or processor, respectively o Declared in the.ci file All mainchare constructors executed o Create chare arrays/groups Constructor methods are executed immediately on proc 0 o They can set readonly variables Readonly are synchronized Every other entry method is executed o This includes constructor methods on proc#0 August 5th, 2009 Charm++ and AMPI: Session II 61 61
49 Fault Tolerance Checkpointing o Simply PUP all Charm ++ entities to disk o Trigger with CkStartCheckpoint( dir, cb) o Callback cb called upon checkpoint completion Called both after checkpoint and upon restart o To restart: +restart <logdir> Live recovery methods (experimental) o Double in memory checkpoint o Message logging Only faulty processor rolls back August 5th, 2009 Charm++ and AMPI: Session II 62
50 Fault Tolerance: Example Main::Main(CkMigrateMessage *m) : CBase_Main(m) { // Subtle: Chare proxy readonly needs to be updated // manually because of the object pointer inside it! mainproxy = thisproxy; } void Main::pup(PUP::er &p) {... } readonly CProxy_Main mainproxy; mainchare [migratable] Main {... }; group [migratable] MyGroup {... }; void Main::next(CkReductionMsg *m) { if ((++step % 10) == 0) { CkCallback cb(ckindex_hello::sayhi(),helloproxy); CkStartCheckpoint("log",cb); } else { helloproxy.sayhi(); } delete m; } August 5th, 2009 Charm++ and AMPI: Session II 63
51 Sanjay Kale & Eric Bohm ADVANCED TUTORIAL August 5th, 2009 Charm++ and AMPI: Session II 64
52 Communication Optimization Optimize certain most recurrent communication patterns o Streaming: reduce overhead of many small msgs o Multicast o All to all Each must be used with its own API o Each may have multiple alternative implementations, embodying different strategies o Programmer can choose best strategy for their scenario August 5th, 2009 Charm++ and AMPI: Session II 65
53 Advanced Arrays Sections: create proxies representing slices of a chare array o Optimize communication with Comlib or CkMulticast o Ex: Row/column of a 2D chare array Mapping: manually specify map of chares to PEs o Ex: Place communicating objects on same processor Bound arrays: Tie two chare arrays together o The system places and migrates corresponding indices together o Ex: FFT helper library bound to work array August 5th, 2009 Charm++ and AMPI: Session II 66
54 Conditional Packing for SMP Pass pointer if destination is one the same node Copy data into message if remote destination message Slice { conditional Boomarray<double> data; } chare Integrate { entry Integrate(Slice *m); entry Integrate(Boomarray<double> d conditional); } void Integrate::Integrate(Slice *msg) { Boomarray<double> &b = *msg >data;... do work using b... // Send back the modified data mainproxy.results(msg); } class Slice : CMessage_Slice { double *data; } August 5th, 2009 Charm++ and AMPI: Session II 67
55 Make Your Own LB strategy You can overwrite automatic measurements, with application supplied performance estimates o Reimplement UserSetLBLoad() in your chare o Use setobjtime(time) and getobjtime() Or, you can implement a new strategy o foolb::work(centrallb::ldstats* stats, int count) o Use the gathered data, decide a new assignment of objects to processor o The system will handle migration of objects August 5th, 2009 Charm++ and AMPI: Session II 68
56 CCS: Converse Client Server Allows interactivity User registers callbacks to execute when certain messages are received by the application from the outside CcsRegisterHandler( myrequest, CkCallback(CkIndex_Main::request(0), mainproxy)); Current uses: o LiveViz (visualization) o CharmDebug o Projections August 5th, 2009 Charm++ and AMPI: Session II 69
57 Interact with Python Scripting Upload Python scripts via CCS and run them on demand There are three ways in which Python scripts can interact with application o Low level read/write (access single variables) o High level (call local entry methods) o Iterative (apply a Python function to a set of objects) Client binding for C++ and Java August 5th, 2009 Charm++ and AMPI: Session II 70
58 Higher Level Languages Incomplete but simple languages Target specific patterns of interaction Interoperate effectively with each other o And with Charm ++, AMPI o Because of message drive scheduler in Charm ++ SDAG: describes life cycle of a chare clearly Charisma: orchestrates multiple collections of chares, describing global flow of data and control MSA (Multiphase Shared Arrays): disciplined shared memory August 5th, 2009 Charm++ and AMPI: Session II 71
59 More References Online tutorial: o Charm ++ manual: o o CCS and liveviz under Converse manual Comprehensive FAQ o August 5th, 2009 Charm++ and AMPI: Session II 72
Optimizing Load Balance Using Parallel Migratable Objects
Optimizing Load Balance Using Parallel Migratable Objects Laxmikant V. Kalé, Eric Bohm Parallel Programming Laboratory University of Illinois Urbana-Champaign 2012/9/25 Laxmikant V. Kalé, Eric Bohm (UIUC)
More informationDynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
More informationCharm++, what s that?!
Charm++, what s that?! Les Mardis du dev François Tessier - Runtime team October 15, 2013 François Tessier Charm++ 1 / 25 Outline 1 Introduction 2 Charm++ 3 Basic examples 4 Load Balancing 5 Conclusion
More informationLayer Load Balancing and Flexibility
Periodic Hierarchical Load Balancing for Large Supercomputers Gengbin Zheng, Abhinav Bhatelé, Esteban Meneses and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign,
More informationDistributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
More informationLoad Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
More informationScientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
More informationNAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon
NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,,
More informationIntroduction to Parallel Computing Issues
Introduction to Parallel Computing Issues Laxmikant Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Dept. of Computer Science And Theoretical Biophysics Group Beckman Institute University
More informationParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
More informationSMock A Test Platform for the Evaluation of Monitoring Tools
SMock A Test Platform for the Evaluation of Monitoring Tools User Manual Ruth Mizzi Faculty of ICT University of Malta June 20, 2013 Contents 1 Introduction 3 1.1 The Architecture and Design of SMock................
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationChapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationCloud Friendly Load Balancing for HPC Applications: Preliminary Work
Cloud Friendly Load Balancing for HPC Applications: Preliminary Work Osman Sarood, Abhishek Gupta and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign Urbana,
More informationCentralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More informationChapter 5 Linux Load Balancing Mechanisms
Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationMapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationLoad balancing in SOAJA (Service Oriented Java Adaptive Applications)
Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationPeerMon: A Peer-to-Peer Network Monitoring System
PeerMon: A Peer-to-Peer Network Monitoring System Tia Newhall, Janis Libeks, Ross Greenwood, Jeff Knerr Computer Science Department Swarthmore College Swarthmore, PA USA newhall@cs.swarthmore.edu Target:
More informationPETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI
PETASCALE DATA STORAGE INSTITUTE 3 universities, 5 labs, G. Gibson, CMU, PI SciDAC @ Petascale storage issues www.pdsi-scidac.org Community building: ie. PDSW-SC07 (Sun 11th) APIs & standards: ie., Parallel
More informationAn Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationAn Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University
More informationORACLE DATABASE 10G ENTERPRISE EDITION
ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.
More informationMultilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationMultiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and
More informationFeedback guided load balancing in a distributed memory environment
Feedback guided load balancing in a distributed memory environment Constantinos Christofi August 18, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract
More informationPractical Performance Understanding the Performance of Your Application
Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationMPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
More informationDepartment of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY. 6.828 Operating System Engineering: Fall 2005
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.828 Operating System Engineering: Fall 2005 Quiz II Solutions Average 84, median 83, standard deviation
More informationA Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
More informationHDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
More informationCS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
More informationNVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
More informationImproved metrics collection and correlation for the CERN cloud storage test framework
Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report
More informationGuideline for stresstest Page 1 of 6. Stress test
Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number
More informationGrid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
More informationHierarchical Load Balancing for Charm++ Applications on Large Supercomputers
Load Balancing for Charm++ Applications on Large Supercomputers Gengbin Zheng, Esteban Meneses, Abhinav Bhatelé and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign
More informationDistributed Data Management
Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that
More informationThe Complete Performance Solution for Microsoft SQL Server
The Complete Performance Solution for Microsoft SQL Server Powerful SSAS Performance Dashboard Innovative Workload and Bottleneck Profiling Capture of all Heavy MDX, XMLA and DMX Aggregation, Partition,
More informationJob Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationDatabase Replication with Oracle 11g and MS SQL Server 2008
Database Replication with Oracle 11g and MS SQL Server 2008 Flavio Bolfing Software and Systems University of Applied Sciences Chur, Switzerland www.hsr.ch/mse Abstract Database replication is used widely
More informationTableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
More informationHow To Improve Performance On A Single Chip Computer
: Redundant Arrays of Inexpensive Disks this discussion is based on the paper:» A Case for Redundant Arrays of Inexpensive Disks (),» David A Patterson, Garth Gibson, and Randy H Katz,» In Proceedings
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationGridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources
GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 2/25/2006 1 Overview Grid/NetSolve
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationMOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm
More information1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
More informationShoal: IaaS Cloud Cache Publisher
University of Victoria Faculty of Engineering Winter 2013 Work Term Report Shoal: IaaS Cloud Cache Publisher Department of Physics University of Victoria Victoria, BC Mike Chester V00711672 Work Term 3
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationfind model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1
Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems
More informationReal Time Network Server Monitoring using Smartphone with Dynamic Load Balancing
www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,
More informationReal-Time Monitoring Framework for Parallel Processes
International Journal of scientific research and management (IJSRM) Volume 3 Issue 6 Pages 3134-3138 2015 \ Website: www.ijsrm.in ISSN (e): 2321-3418 Real-Time Monitoring Framework for Parallel Processes
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationDistribution transparency. Degree of transparency. Openness of distributed systems
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 01: Version: August 27, 2012 1 / 28 Distributed System: Definition A distributed
More informationLoad Distribution in Large Scale Network Monitoring Infrastructures
Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
More informationImprove Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationIntroduction to application performance analysis
Introduction to application performance analysis Performance engineering We want to get the most science and engineering through a supercomputing system as possible. The more efficient codes are, the more
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationEfficient database auditing
Topicus Fincare Efficient database auditing And entity reversion Dennis Windhouwer Supervised by: Pim van den Broek, Jasper Laagland and Johan te Winkel 9 April 2014 SUMMARY Topicus wants their current
More informationWebSphere Architect (Performance and Monitoring) 2011 IBM Corporation
Track Name: Application Infrastructure Topic : WebSphere Application Server Top 10 Performance Tuning Recommendations. Presenter Name : Vishal A Charegaonkar WebSphere Architect (Performance and Monitoring)
More informationVirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationA Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
More informationIn-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015
In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 June 29-30, 2015 Contacts Alexandre Boudnik Senior Solution Architect, EPAM Systems Alexandre_Boudnik@epam.com
More informationFair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing
Research Inventy: International Journal Of Engineering And Science Vol.2, Issue 10 (April 2013), Pp 53-57 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com Fair Scheduling Algorithm with Dynamic
More informationMulti-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
More informationStudy of Various Load Balancing Techniques in Cloud Environment- A Review
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-04 E-ISSN: 2347-2693 Study of Various Load Balancing Techniques in Cloud Environment- A Review Rajdeep
More informationCloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
More informationAnalyzing IBM i Performance Metrics
WHITE PAPER Analyzing IBM i Performance Metrics The IBM i operating system is very good at supplying system administrators with built-in tools for security, database management, auditing, and journaling.
More informationReliable Adaptable Network RAM
Reliable Adaptable Network RAM Tia Newhall, Daniel Amato, Alexandr Pshenichkin Computer Science Department, Swarthmore College Swarthmore, PA 19081, USA Abstract We present reliability solutions for adaptable
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationSIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
More informationWITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
More informationLecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
More informationPetascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
More informationA Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More information