NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,, and Klaus Schulten Presented by Abel Licon
Overview Background Scalability and load imbalance Other approaches NAMD2 Design Addressing Load imbalance Results Load imbalance Performance Scalability Conclusion
Scalability What does it mean for a program to be scalable? More processors = faster turn around Communication creates overhead No program is continuously scalable Isoefficiently Scalable If we retain efficiency by increasing the size of the problem the program is said to be isoeffecient. Efficiency = Sequential/(P*Parallel) Background - Scalability
Load Imbalance Not all processors will have the same distribution of atoms. Time will be wasted when processors with few atoms finish before those with many atoms Lose advantage of having many processors Background Load Imbalance
Distributed MD Replicate Data (RD) Every node has the same data OK for small systems Communication Cost = O(N log P) Atom Decomposition (AD) Arbitrarily distribute atoms to processors Potential need to communicate with all processors Communication Cost = O(N) Background Other Approaches
Distributed MD (II) Force Decomposition (FD) Force matrix distributed among processors Better than RD but still not scalable Communication Cost = O(N/P 1/2 ) Quantized Spatial Decomposition (QSD) Space decomposed in boxes Boxes bigger than cut-off (26 neighbors) Efficiency ratio is isoefficiently scalable Communication Cost = O(N/P) Background Other Approaches
Challenges in Existing Methods None of these methods are both scalable and free of load balance Communication could potentially be redundant Background
Better Solution? QSD is an attractive solution but has a load imbalance issue. Need to address both load imbalance and scalability None of the solutions offer both What can we do? Background
NAMD2 NAMD2 combines QSD and FD QSD is isoeffiently scalable FD can help solve load imbalance problem Use both spatial and force decomposition via: Distribute N atoms to P processors for scalability Distribute force calculations amongst processors to balance the load NAMD2
Design Use object oriented paradigm High modularity Easier to extend Easier to understand Separate into classes: Patches Compute objects Proxies Sequences NAMD2 -Design
Patches Box containing coordinates and forces of atoms Linked list of atom neighbors Dimensions slightly larger than cut-off Updating list every step is expensive Margin is given to optimize list updates Margin = 1.5 Angstroms NAMD2 -Design
Compute Objects Allow to easily add a new algorithm To try out new algorithms, simply extend the class Makes adding new algorithms easy Handle force computations Non-Bonded within cut-off Bonded NAMD2 -Design
Compute Objects (II) Non-Bonded Interactions Self-Compute Objects for within patch force calculation Pair-Compute Object for between patch force calculation Bonded Interactions Common Downstream Method NAMD2 -Design
Proxies Communication could potentially be redundant May be multiple compute objects per processor Compute objects need the same information Use a proxy object to handle communication Cuts communication costs NAMD2 -Design
Sequencers Describes life cycle of a patch Defines strategy You can think of this as the driver Again, new strategies can be easily added NAMD2 -Design
NAMD2 -Design Communication
Addressing Load Balancing Initial Load Balancing Non-bonded self force compute objects placed with native patch Bonded compute object placed one per node Non-bonded pair force objects placed in upstream processors NAMD2 -Addressing Load Imbalance
Addressing Load Balancing (II) Dynamically balance the load at runtime Could make both bonded and non-bonded compute objects migratable Migration code complicates things We can balance the load by only using non-bonded compute objects NAMD2 -Addressing Load Imbalance
Addressing Load Balancing (III) Keep a min-heap of processors Processor with lightest load next in heap Keep a max-heap of migratable objects Compute Objects with highest highest cost next in heap Assign compute objects, Proxies and Patches keeping spatial locality in mind. NAMD2 -Addressing Load Imbalance
Results Load Balancing Results
Results Performance Across Molecules
Results Performance Across Machines
Results Time Step Performance
Results Scalability Results
Conclusion NAMD2: Object oriented design for easy extensibility Combines QSD and FD to have a scalable load balanced program Shown that load balancing is feasibly with QSD Achieved speedups of 120 using 180 processors Conclusion