NAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon

NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,, and Klaus Schulten Presented by Abel Licon

Overview Background Scalability and load imbalance Other approaches NAMD2 Design Addressing Load imbalance Results Load imbalance Performance Scalability Conclusion

Scalability What does it mean for a program to be scalable? More processors = faster turn around Communication creates overhead No program is continuously scalable Isoefficiently Scalable If we retain efficiency by increasing the size of the problem the program is said to be isoeffecient. Efficiency = Sequential/(P*Parallel) Background - Scalability

Load Imbalance Not all processors will have the same distribution of atoms. Time will be wasted when processors with few atoms finish before those with many atoms Lose advantage of having many processors Background Load Imbalance

Distributed MD Replicate Data (RD) Every node has the same data OK for small systems Communication Cost = O(N log P) Atom Decomposition (AD) Arbitrarily distribute atoms to processors Potential need to communicate with all processors Communication Cost = O(N) Background Other Approaches

Distributed MD (II) Force Decomposition (FD) Force matrix distributed among processors Better than RD but still not scalable Communication Cost = O(N/P 1/2 ) Quantized Spatial Decomposition (QSD) Space decomposed in boxes Boxes bigger than cut-off (26 neighbors) Efficiency ratio is isoefficiently scalable Communication Cost = O(N/P) Background Other Approaches

Challenges in Existing Methods None of these methods are both scalable and free of load balance Communication could potentially be redundant Background

Better Solution? QSD is an attractive solution but has a load imbalance issue. Need to address both load imbalance and scalability None of the solutions offer both What can we do? Background

NAMD2 NAMD2 combines QSD and FD QSD is isoeffiently scalable FD can help solve load imbalance problem Use both spatial and force decomposition via: Distribute N atoms to P processors for scalability Distribute force calculations amongst processors to balance the load NAMD2

Design Use object oriented paradigm High modularity Easier to extend Easier to understand Separate into classes: Patches Compute objects Proxies Sequences NAMD2 -Design

Patches Box containing coordinates and forces of atoms Linked list of atom neighbors Dimensions slightly larger than cut-off Updating list every step is expensive Margin is given to optimize list updates Margin = 1.5 Angstroms NAMD2 -Design

Compute Objects Allow to easily add a new algorithm To try out new algorithms, simply extend the class Makes adding new algorithms easy Handle force computations Non-Bonded within cut-off Bonded NAMD2 -Design

Compute Objects (II) Non-Bonded Interactions Self-Compute Objects for within patch force calculation Pair-Compute Object for between patch force calculation Bonded Interactions Common Downstream Method NAMD2 -Design

Proxies Communication could potentially be redundant May be multiple compute objects per processor Compute objects need the same information Use a proxy object to handle communication Cuts communication costs NAMD2 -Design

Sequencers Describes life cycle of a patch Defines strategy You can think of this as the driver Again, new strategies can be easily added NAMD2 -Design

NAMD2 -Design Communication

Addressing Load Balancing Initial Load Balancing Non-bonded self force compute objects placed with native patch Bonded compute object placed one per node Non-bonded pair force objects placed in upstream processors NAMD2 -Addressing Load Imbalance

Addressing Load Balancing (II) Dynamically balance the load at runtime Could make both bonded and non-bonded compute objects migratable Migration code complicates things We can balance the load by only using non-bonded compute objects NAMD2 -Addressing Load Imbalance

Addressing Load Balancing (III) Keep a min-heap of processors Processor with lightest load next in heap Keep a max-heap of migratable objects Compute Objects with highest highest cost next in heap Assign compute objects, Proxies and Patches keeping spatial locality in mind. NAMD2 -Addressing Load Imbalance

Results Load Balancing Results

Results Performance Across Molecules

Results Performance Across Machines

Results Time Step Performance

Results Scalability Results

Conclusion NAMD2: Object oriented design for easy extensibility Combines QSD and FD to have a scalable load balanced program Shown that load balancing is feasibly with QSD Achieved speedups of 120 using 180 processors Conclusion