NAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon

Similar documents
Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications

Overview of NAMD and Molecular Dynamics

Scaling Applications to Massively Parallel Machines Using Projections Performance Analysis Tool

Introduction to Parallel Computing Issues

Adaptive Load Balancing for MPI Programs

Parallel Scalable Algorithms- Performance Parameters

Optimizing Load Balance Using Parallel Migratable Objects

Cloud Friendly Load Balancing for HPC Applications: Preliminary Work

Contributions to Gang Scheduling

Layer Load Balancing and Flexibility

A Review of Customized Dynamic Load Balancing for a Network of Workstations

ACHIEVING SCALABLE PARALLEL MOLECULAR DYNAMICS USING DYNAMIC SPATIAL DOMAIN DECOMPOSITION TECHNIQUES

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations

Data Mining in the Swamp

DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS

In Memory Accelerator for MongoDB

Distributed System Principles

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance metrics for parallel systems

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Scientific Computing Programming with Parallel Objects

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Firewall Compressor: An Algorithm for Minimizing Firewall Policies

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

Performance metrics for parallelism

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

The mathematics of RAID-6

Load Balancing on a Grid Using Data Characteristics

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Advanced Computer Architecture

Partitioning and Divide and Conquer Strategies

Binary Heap Algorithms

MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS

Design and Implementation of a Massively Parallel Version of DIRECT

Load Balancing in Charm++ Eric Bohm

How To Write A Hexadecimal Program

Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

LAMMPS Developer Guide 23 Aug 2011

Developing MapReduce Programs

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

PARALLELS CLOUD STORAGE

Feedback guided load balancing in a distributed memory environment

CHAPTER 1 INTRODUCTION

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

How To Write A Load Balancing Program

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Microsoft Private Cloud Fast Track

System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies

CHAPTER 1 ENGINEERING PROBLEM SOLVING. Copyright 2013 Pearson Education, Inc.

Six Strategies for Building High Performance SOA Applications

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Reliable Systolic Computing through Redundancy

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers

Energy Efficient MapReduce

Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure:

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

COS 318: Operating Systems. Virtual Machine Monitors

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Dynamic Load Balancing in CP2K

Lecture 14: Data transfer in multihop wireless networks. Mythili Vutukuru CS 653 Spring 2014 March 6, Thursday

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.

Parallel Computing. Benson Muite. benson.

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Hybrid Molecular Orbitals

NoSQL. Thomas Neumann 1 / 22

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Load Balancing Techniques

Compact Representations and Approximations for Compuation in Games

Load Balancing. Load Balancing 1 / 24

A Service for Data-Intensive Computations on Virtual Clusters

Chapter 1 - Web Server Management and Cluster Topology

TORA : Temporally Ordered Routing Algorithm

Distributed Computing and Big Data: Hadoop and MapReduce

An HPC Application Deployment Model on Azure Cloud for SMEs

Introduction to Principal Components and FactorAnalysis

Distributed communication-aware load balancing with TreeMatch in Charm++

Optimizing Distributed Application Performance Using Dynamic Grid Topology-Aware Load Balancing

HPC Deployment of OpenFOAM in an Industrial Setting

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Distributed Data Management

Multihoming and Multi-path Routing. CS 7260 Nick Feamster January

Scaling Research. In Computational Finance May 5, David Lin, Managing Director 0 FOR INSTITUTIONAL USE ONLY NOT FOR PUBLIC DISTRIBUTION

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

The Application of Distributed Computing to the Investigation of Protein Conformational Change

Meeting Worldwide Demand for your Content

Dynamic Load Balancing in a Network of Workstations

Issue in Focus: Consolidating Design Software. Extending Value Beyond 3D CAD Consolidation

LOAD BALANCING TECHNIQUES

?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*

Introduction to Cloud Computing

Transcription:

NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,, and Klaus Schulten Presented by Abel Licon

Overview Background Scalability and load imbalance Other approaches NAMD2 Design Addressing Load imbalance Results Load imbalance Performance Scalability Conclusion

Scalability What does it mean for a program to be scalable? More processors = faster turn around Communication creates overhead No program is continuously scalable Isoefficiently Scalable If we retain efficiency by increasing the size of the problem the program is said to be isoeffecient. Efficiency = Sequential/(P*Parallel) Background - Scalability

Load Imbalance Not all processors will have the same distribution of atoms. Time will be wasted when processors with few atoms finish before those with many atoms Lose advantage of having many processors Background Load Imbalance

Distributed MD Replicate Data (RD) Every node has the same data OK for small systems Communication Cost = O(N log P) Atom Decomposition (AD) Arbitrarily distribute atoms to processors Potential need to communicate with all processors Communication Cost = O(N) Background Other Approaches

Distributed MD (II) Force Decomposition (FD) Force matrix distributed among processors Better than RD but still not scalable Communication Cost = O(N/P 1/2 ) Quantized Spatial Decomposition (QSD) Space decomposed in boxes Boxes bigger than cut-off (26 neighbors) Efficiency ratio is isoefficiently scalable Communication Cost = O(N/P) Background Other Approaches

Challenges in Existing Methods None of these methods are both scalable and free of load balance Communication could potentially be redundant Background

Better Solution? QSD is an attractive solution but has a load imbalance issue. Need to address both load imbalance and scalability None of the solutions offer both What can we do? Background

NAMD2 NAMD2 combines QSD and FD QSD is isoeffiently scalable FD can help solve load imbalance problem Use both spatial and force decomposition via: Distribute N atoms to P processors for scalability Distribute force calculations amongst processors to balance the load NAMD2

Design Use object oriented paradigm High modularity Easier to extend Easier to understand Separate into classes: Patches Compute objects Proxies Sequences NAMD2 -Design

Patches Box containing coordinates and forces of atoms Linked list of atom neighbors Dimensions slightly larger than cut-off Updating list every step is expensive Margin is given to optimize list updates Margin = 1.5 Angstroms NAMD2 -Design

Compute Objects Allow to easily add a new algorithm To try out new algorithms, simply extend the class Makes adding new algorithms easy Handle force computations Non-Bonded within cut-off Bonded NAMD2 -Design

Compute Objects (II) Non-Bonded Interactions Self-Compute Objects for within patch force calculation Pair-Compute Object for between patch force calculation Bonded Interactions Common Downstream Method NAMD2 -Design

Proxies Communication could potentially be redundant May be multiple compute objects per processor Compute objects need the same information Use a proxy object to handle communication Cuts communication costs NAMD2 -Design

Sequencers Describes life cycle of a patch Defines strategy You can think of this as the driver Again, new strategies can be easily added NAMD2 -Design

NAMD2 -Design Communication

Addressing Load Balancing Initial Load Balancing Non-bonded self force compute objects placed with native patch Bonded compute object placed one per node Non-bonded pair force objects placed in upstream processors NAMD2 -Addressing Load Imbalance

Addressing Load Balancing (II) Dynamically balance the load at runtime Could make both bonded and non-bonded compute objects migratable Migration code complicates things We can balance the load by only using non-bonded compute objects NAMD2 -Addressing Load Imbalance

Addressing Load Balancing (III) Keep a min-heap of processors Processor with lightest load next in heap Keep a max-heap of migratable objects Compute Objects with highest highest cost next in heap Assign compute objects, Proxies and Patches keeping spatial locality in mind. NAMD2 -Addressing Load Imbalance

Results Load Balancing Results

Results Performance Across Molecules

Results Performance Across Machines

Results Time Step Performance

Results Scalability Results

Conclusion NAMD2: Object oriented design for easy extensibility Combines QSD and FD to have a scalable load balanced program Shown that load balancing is feasibly with QSD Achieved speedups of 120 using 180 processors Conclusion