Parallel Programming and Scientific Computing. Benjamin Madej PhD Student Walker Molecular Dynamics Lab. Benjamin Madej (C)

Size: px

Start display at page:

Download "Parallel Programming and Scientific Computing. Benjamin Madej PhD Student Walker Molecular Dynamics Lab. Benjamin Madej (C)"

Lenard Dustin Sparks
7 years ago
Views:

1 Parallel Programming and Scientific Computing Benjamin Madej PhD Student Walker Molecular Dynamics Lab

2 The Walker Molecular Dynamics Lab, March 2013 wmd-lab.org

3 wmd-lab.org

4 Overview 1. Scientific computing 2. Parallel programming 3. My experience: pmemd 4. Parallela

5 Scientific computing has many important applications including computational chemistry. SDSC Gordon supercomputer sdsc.edu Cellular membrane simulation

6 nobelprize.org Computational chemists win the Nobel Prize in Chemistry, 2013.

7 Nucleic acids, proteins, lipids AMBER Hamiltonian describing the motion of the atoms Molecular dynamics is a type of computational chemistry simulation of the motions of atoms.

8 Molecular dynamics in motion

9 AMBER Molecular Dynamics ambermd.org

10 ambermd.org Amber Developers' Meeting, January 2012 Amber software is developed in collaboration with many research groups around the world.

11 Overview 1. Scientific computing 2. Parallel programming 3. My experience: pmemd 4. Parallela

12 wikipedia.org How much faster is a parallel program? A certain portion of a program will likely always be serial, limiting performance (Amdahl's law). However, the speedup is proportional to the size of the system and the number of processors (Gustafson's law).

13 Increase frequency Increase core count Why is parallel programming being used now? Parallel hardware has been around a long time. It is no longer possible to increase core frequency due to energy and heat limitations. One solution is to increase the number of cores. Multi-core processors and graphics processing units (GPUs) are widely available, at low cost. Programming languages and frameworks are now available to write code for these processors.

Multi-core computers AMD, Intel, Intel MIC Cluster computers

14 engadget.com intel.com nvidia.com Current popular parallel architectures and hardware Multi-core computers AMD, Intel, Intel MIC Cluster computers Clusters and supercomputers General-purpose computing on graphics processing units (GPGPUs) NVIDIA, AMD

15 Threads Time Sync Parallel programming paradigms Data and task parallelism Data dependencies Synchronization and race conditions Communication and message passing

16 Commonly used parallel programming methods MPI (Message Passing Interface) OpenMP (Open Multi-Processing) POSIX Threads GPU parallel programming methods OpenACC CUDA C/C++/Fortran (Compute Unified Device Architecture) CUDA Accelerated Libraries OpenCL

17 MPI (Message Passing Interface) A widely used distributed memory, message-passing system. An API is available for writing C or Fortran programs.

18 OpenMP (Open Multi-Processing) It is a framework for shared memory multiprocessing. A portable API that implements multithreading in the code. Compiler directives are commonly used. POSIX Threads Library in C for shared memory multiprocessing.

GPU parallel programming methods OpenACC An API for parallel computing on heterogeneous CPU/GPU hardware Similar to OpenMP, but with GPU support CUDA C/C++/Fortran (Compute Unified Device

19 GPU parallel programming methods OpenACC An API for parallel computing on heterogeneous CPU/GPU hardware Similar to OpenMP, but with GPU support CUDA C/C++/Fortran (Compute Unified Device Architecture) A platform for parallel programming on NVIDIA GPUs Extensions to programming languages CUDA Accelerated Libraries OpenCL A framework for programming on heterogeneous processors. GPU Architecture: Streaming Multiprocessors (SM) nvidia.com

20 Overview 1. Scientific computing 2. Parallel programming 3. My experience: pmemd 4. Parallela

21 Amber Molecular Dynamics and Parallel Architectures Pmemd Pmemd output The main molecular dynamics code included in Amber An inherently serial code (Markov chain) lines of code Fortran 90, MPI, CUDA C

22 Why does pmemd need accelerated hardware and code? Before optimization, simulation length was limited to picoseconds. However, biomolecular events often take place on time scales of microseconds to seconds. Molecular dynamics is computationally expensive. Some of the most expensive portions of the code are the force evaluations, specifically for nonbonded atomic interactions. This leads to theoretical and algorithmic choices.

23 Method 1 Method 2 Software Hardware Developers Supercomputing Performance Method 3 Users Questions and decisions about optimization for pmemd Who are the users of the code? Is it necessary to optimize the code? How much of the code can be split into independent parts? What hardware and software is available to use? What performance is necessary?

24 Parallel coding nvidia.com It's fairly easy to write parallel code, but there's no guarantee it will be faster. Usually it takes a lot of optimization to realize performance gains. Knowledge of the hardware is key. Profiling and benchmarking is essential for measuring results. Are these software tools available?

http://ambermd.org/amber10.bench1.html MPI A bit of Amber history: Amber 10 http://ambermd.org/amber10.bench1.html Pmemd uses MPI for transfer of data between cores and nodes.

25 MPI A bit of Amber history: Amber 10 Pmemd uses MPI for transfer of data between cores and nodes. It scales across cores and nodes. Pmemd used a spatial decomposition for evaluating forces in the system. However, communication costs overwhelm performance gains at a certain number of cores and nodes.

26 CUDA nvidia.com Portions of the Pmemd code were rewritten with CUDA C kernels on GPU. The most significant portion was new nonbonded electrostatics kernels. This includes spatial and temporal decomposition of the n2 problem.

27 CUDA Progress Pmemd now includes a multi-gpu implementation. The precision model was revised to deal with single and double precision performance issues on NVIDIA GPUs.

28 nvidia.com Rosevillechamber.com Costs of optimization of pmemd Hardware: NVIDIA GPUs and CPUs are now readily available in many machines. Software: Drivers, development tools are available. People: Someone actually has to write the code. Related costs: The hardware has to be stored somewhere and the electricity costs have to be paid.

29 Returns of CUDA optimization Performance is greater than that achieved on CPU-based US supercomputers.

30 Returns of CUDA optimization Change in the scientific paradigm of simulations because simulations are just as fast on consumer hardware. The hardware is accessible to everyone: laboratories, researchers, students. The simulations are more energy efficient than CPUs as measured in performance per watt.

31 Webdesignerdepot.com Problems with CUDA optimization It is difficult to maintain the code. It is still an experimental code with an added layer of complexity. It requires updates to take advantage of future hardware.

32 Overview 1. Scientific computing 2. Parallel programming 3. My experience: pmemd 4. Parallela

33 raspberripi.org Pmemd on ARM Architecture nvidia.com Compiled serial and GPU version using gcc/nvcc ARM compilers. Raspberry Pi NVIDIA KAYLA NVIDIA CARMA

34 Parallela Zynq-7000 Series Dual-core ARM A9 CPU (Z-7010 or Z-7020) 16 or 64-core Epiphany Multicore Accelerator 1GB RAM MicroSD Card 2x USB general purpose expansion connectors 10/100/1000 Ethernet HDMI port Ships with Ubuntu OS 3.4 x 2.15 form factor

Parallela and Parallel Programming Epiphany SDK GCC, GDB Libraries to communicate with Epiphany coprocessor Host-device structure Device loads

35 Parallela and Parallel Programming Epiphany SDK GCC, GDB Libraries to communicate with Epiphany coprocessor Host-device structure Device loads device programs onto each core OpenCL COPTRTHR SDK from Brown Deer Technology Available as release candidate

36 Parallela features Open hardware and drivers makes it a community-focused project. It has a great potential educational impact by making parallel computing more affordable. Parallela has low power requirements.

Information about Parallela http://www.parallella.

37 Information about Parallela Hardware is not shipping, currently (Nov. 2013).

38 Parallel hardware is fundamentally changing the way that programs can be developed and run, especially in scientific computing.

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware