Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA
In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks Applications: Computational Fluid Dynamics Seismic Propagation Molecular Dynamics Network Security Analysis 2
In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks Applications: Computational Fluid Dynamics Seismic Propagation Molecular Dynamics Network Security Analysis 3
In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks Applications: Computational Fluid Dynamics Seismic Propagation Molecular Dynamics Network Security Analysis 4
Generalized Execution Loop Simulation Rendering Execution: Data write Data read Memory: 5
Generalized Execution Loop Execution: Task 1 Task 2 Data write Data read Memory: 6
Parallel Execution Task Split Problem: Task (Context) Switch T1 T2 Processor 1: Processor 2: Data write Data read Memory: Disadvantage of context switch: - Overhead of another kernel launch - Flash of the cache lines - Disallow persistent threads 7
Parallel Execution: Pipelining Task 1 Task 2 Processor 1: Processor 2: t t t+1 t+1 Data write Data read Memory: + Simplified kernel for each + Better share memory and cache usage + Persistent thread for distributed scheduling 8
Parallel Execution: Pipelining Problem: bubble in the pipeline Task 1 Task 2 Processor 1: Processor 2: t t t+1 t+1 Data write Data read Memory: 9
Multi- Pipeline Architecture Multi- Array Sim Sim Read Write FIFO Data Buffer Time Step 1 Time Step 2 Sim W R Sim W R Time Step n Sim W R 10
Adaptive Load Balancing Multi- Array Sim Sim FIFO Data Buffer Full Buffer: Shift toward Rendering Empty Buffer: Shift toward Simulation Read Read Read Sim Write Write Sim Write Sim Sim Adaptive and Distributed Scheduling 11
Task Partition Intra-frame partition Inter-frame partition t t t t t t t+1 t+2 t+3 t t+1 t+2 t+3 12
Task Partition for ual Simulation Simulation: Intra frame partition Rendering: Inter frame partition Multi- Array Sim Sim Read Write FIFO Data Buffer 13
Problem: Scheduling Algorithm Performance Model: n: The number of assigned s. Schedule to optimize: M i : The number of assigned Simulation s. 14
Case Study Application N-body Simulation with Ray-Traced rendering Performance model parameters: Simulation: number of iterations (i) number of simulated bodies (p) Rendering: number of samples for super sampling (s) Scheduling Optimization: M t = f (i t, s t, p t ) 15
Static Load-Balancing Assumption: the performance parameters do NOT change at run-time. M t = f (i t, s t, p t ) M = f (i, s, p) Data driven modeling approach: Sample the 3 dimensional (i,s,p) as a rigid grid Use tri-linear interpolation to get the result for the new inputs 16
Static Load-Balancing: Results Performance Parameter Sampling Load Balancing 16 Samples, 80 iterations 4 Samples, 80 iterations 17
Dynamic Load Balancing Assumption: Performance parameters change during the run-time. Find the indirect load-balance indicator p Execution time of the previous time step Problem: Performance different between two time steps can be dramatic. The fullness of the buffer F 18
Dynamic Load Balancing: Result Stability of the Dynamic Scheduling Algorithm No parameter change (only at the beginning) Parameters change at the dotted line. 19
Comparison: Dynamic vs. Static Scheduling 2000 Particles 4000 Particles Performance Speedup over static load-balancing 20
Conclusion + Pipelining + Dynamic load balancing - Fine granularity load balancing (SM level) - Communication overhead - Programmability: Software framework, Library 21
Question(s): Contact Information: Yong Cao Computer Science Department Virginia Tech Email: yongcao@vt.edu Website: www.cs.vt.edu/~yongcao 22