Microscopic and mesoscopic simulations of amorphous systems using LAMMPS and GPU-based algorithms



Similar documents
LIGGGHTS OPEN SOURCE DEM: COUPLING TO DNS OF TURBULENT CHANNEL FLOW

Data Visualization for Atomistic/Molecular Simulations. Douglas E. Spearot University of Arkansas

Overview of NAMD and Molecular Dynamics

walberla: A software framework for CFD applications on Compute Cores

LAMMPS Developer Guide 23 Aug 2011

walberla: A software framework for CFD applications

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

4 Microscopic dynamics

HP ProLiant SL270s Gen8 Server. Evaluation Report

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Simulation Platform Overview

Turbomachinery CFD on many-core platforms experiences and strategies

Overview of HPC Resources at Vanderbilt

A Case Study - Scaling Legacy Code on Next Generation Platforms

CBE 6333, R. Levicky 1 Review of Fluid Mechanics Terminology

Accelerating CFD using OpenFOAM with GPUs

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

Lecture 24 - Surface tension, viscous flow, thermodynamics

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

Case Study on Productivity and Performance of GPGPUs

High Performance Computing in CST STUDIO SUITE

Basic Principles in Microfluidics

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Next Generation GPU Architecture Code-named Fermi

Molecular simulation of fluid dynamics on the nanoscale

Building Platform as a Service for Scientific Applications

Adaptive Resolution Molecular Dynamics Simulation (AdResS): Changing the Degrees of Freedom on the Fly

Turbulence, Heat and Mass Transfer (THMT 09) Poiseuille flow of liquid methane in nanoscopic graphite channels by molecular dynamics simulation

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Lecture 16 - Free Surface Flows. Applied Computational Fluid Dynamics

Interpretation of density profiles and pair correlation functions

Chapter Outline. Mechanical Properties of Metals How do metals respond to external loads?

Hardware design for ray tracing

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Introduction to Solid Modeling Using SolidWorks 2012 SolidWorks Simulation Tutorial Page 1

Scalable coupling of Molecular Dynamics (MD) and Direct Numerical Simulation (DNS) of multi-scale flows

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Chapter Outline Dislocations and Strengthening Mechanisms

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

1. Fluids Mechanics and Fluid Properties. 1.1 Objectives of this section. 1.2 Fluids

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Fluid Mechanics: Static s Kinematics Dynamics Fluid

HPC with Multicore and GPUs

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Chapter Outline. Diffusion - how do atoms move through solids?

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Physics 9e/Cutnell. correlated to the. College Board AP Physics 1 Course Objectives

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Chapter Outline Dislocations and Strengthening Mechanisms

DISTANCE DEGREE PROGRAM CURRICULUM NOTE:

Software Development around a Millisecond

Size effects. Lecture 6 OUTLINE

HIGH PERFORMANCE BIG DATA ANALYTICS

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Molecular Dynamics Simulations

Graduate Courses in Mechanical Engineering

Parallel Algorithm Engineering

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Pore pressure. Ordinary space

Texture Cache Approximation on GPUs

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

NUMERICAL ANALYSIS OF THE EFFECTS OF WIND ON BUILDING STRUCTURES

Benchmark Tests on ANSYS Parallel Processing Technology

Heat Transfer and Thermal-Stress Analysis with Abaqus

Evaluation of CUDA Fortran for the CFD code Strukti

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Part I Courses Syllabus

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Lecture 3 Fluid Dynamics and Balance Equa6ons for Reac6ng Flows

CONVERGE Features, Capabilities and Applications

Jean-Pierre Panziera Teratec 2011

Dynamic Process Modeling. Process Dynamics and Control

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

CE 204 FLUID MECHANICS

Best practices for efficient HPC performance with large models

Finite Element Formulation for Beams - Handout 2 -

Large-Scale Reservoir Simulation and Big Data Visualization

Finite Element Analysis

OpenACC Parallelization and Optimization of NAS Parallel Benchmarks

Bytes and BTUs: Holistic Approaches to Data Center Energy Efficiency. Steve Hammond NREL

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Cluster Implementation and Management; Scheduling

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

OpenFOAM Optimization Tools

Transcription:

Microscopic and mesoscopic simulations of amorphous systems using LAMMPS and GPU-based algorithms E. Ferrero and F. Puosi Laboratoire Interdisciplinaire de Physique (LIPhy), UJF and CNRS 14/5/214 Journée des utilisateurs CIMENT

Amorphous materials Hard glasses Soft glasses Metallic/oxide glasses Polymers Colloids Foams Length scale 1 nm Length scale.1 μm Particles are arranged in a disordered way, like in liquids, but they are jammed like in solids Flow of amorphous media is an outstanding question in rheology and plasticity!2

Length scales in the deformation of amorphous systems Rodney et al, MSMSE (211) Microscopic scale: scale of individual objects (droplets, colloids, particles). Dynamics is extremely complex and numerically highly time consuming at this scale. Mesoscopic scale: scale where the elementary local plastic events that constitute the flow of amorphous materials do occur. Macroscopic scale: scale relevant to engineering problems. Phenomenological laws describe the flow behavior.!3

Length scales in the deformation of amorphous systems Rodney et al, MSMSE (211) 1st part by F. Puosi Microscopic scale: scale of individual objects (droplets, colloids, particles). Dynamics is extremely complex and numerically highly time consuming at this scale. Mesoscopic scale: scale where the elementary local plastic events that constitute the flow of amorphous materials do occur. 2nd part by E. Ferrero Macroscopic scale: scale relevant to engineering problems. Phenomenological laws describe the flow behavior.!4

Molecular Dynamics (MD) basics N particles (atoms, groups of atoms, meso-particles) in a box (rigid walls, periodic boundary conditions) All physics in energy potential (forces) - pair-wise interactions (LJ, Coulombic), many-body forces (EAM, Tersoff, REBO), molecular forces (springs, torsions) Integrate Newton s equation of motion Velocity-Verlet formulation: - update V by1/2 step (using F) - update X (using V) - build neighbor lists (occasionally) - compute F (using X) - apply constraints & boundary conditions - update V by1/2 step (using new F) - output and diagnostics CPU time break-down: - inter-particle forces = 8%! - neighbor lists = 15% - everything else = 5% Properties via time-averaging ensemble configurations!5

What is LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator! http://lammps.sandia.gov Classical MD code Open source (GPL), highly portable C++ Simulations at varying length and time scales: electrons atomistic corse-grained continuum Spatial-decomposition of simulation domain for parallelism GPU and OpenMP enhanced Can be coupled to other scales: QM, kmc, FE!6

LAMMPS on Froggy Benchmark: atomic fluid 32 atoms for 1 time-steps reduced density.8442 (liquid) force cutoff 2.5 sigma ( 2.5 diameters) neighbor skin.3 sigma (neighbor atom 55) NVE time integration 1 Parallel Efficiency (%) 8 6 4 2 FROGGY Cray XT5 Cray XT3 Dual hex-core Xeon IBM BG/L IBM p69+ Xeon/Myrinet cluster 1 1 1 Processors!7

GPU acceleration in LAMMPS USER-CUDA package! GPU version of pair styles, fixes and computes an entire LAMMPS calculation run entirely on the GPU for many time steps only one core per GPU better speed-up if the number of atom per GPU is large Milions of atom-timestep / second 8 6 4 2 atomic fluid 1GPU (mixed) 2 GPUs (mixed) CPU (1 core) 1 k 1 k 1 k 1 M 1 M Atoms!8

GLASSDEF project Microscopic aspects: Parallel Replica Dynamics in glasses: a method to extend timescales accessible to Molecular Dynamics simulation (FP, D. Rodney and J.-L. Barrat) Plasticity in 2D sheared amorphous solids: investigation of the spatio temporal correlations of plastic activity in athermal amorphous solids (FP, A. Nicolas, J. Rottler and J.-L. Barrat) Microscopic test of a mean field model for the flow of amorphous systems (FP, J. Olivier, K. Martens and J.-L. Barrat) Avalanches in amorphous systems: scaling description of avalanches in the shear flow of athermal amorphous solids at finite shear rates (FP, E. Ferrero, C. Liu, L. Marradi, K. Martens, A. Nicolas and J.-L. Barrat)!9

Elementary plastic events At low temperature, the onset of plastic deformation in glasses is due to the accumulation of elementary plastic events, consisting of localized in space and time atomic rearrangements involving only a few tens of atoms, the so-called Shear Transformations (STs) Tanguy et al., EPJE (26) Open question: How do the elastic response to a ST build in time?!1

Fictitious STs in amorphous solids t=1 t=2.6 1.6 1.5 2D binary mixture of athermal LennardJones particles. We apply an instantaneous local shear transformation and we observe the response of the system 5.5 5.4.3.4.3.2-5.2-5.1-1 -1-5 5 1.1-1 -1-5 t=5 5 1 t=1.6 1.6 1.5 5.5 5.4.3.4.3.2-5.2-5.1-1 -1-5 5 1.1-1 -1-5 t=2 5 1.6 1.5 5.5 5.4.3.4.3.2-5.2-5.1 Disorder average of the response: average over different spatial realizations of the ST. -5 5 1.1-1 -1-5 t=1 5 1 t=5.6 1.6 1.5 5.5 5.4.3.4.3.2-5.2-5.1-1 -1 t=5.6 1-1 -1-5 5 1.1-1 -1-5 5 1 F. Puosi, J. Rottler and J.-L. Barrat, PRE (214)!11

Comparison with Continuum Elasticity Theory Equilibrium response: Eshelby inclusion problem Simulations Theory Transient regime: solution of a diffusion equation for t h e d i s p l a c e m e n t fi e l d i n a n h o m o g e n e o u s m e d i u m i n t h e presence of a perturbation due to a set of two force dipoles in the origin. <u r (r, = /4, t)> 1-1 1-2 1-3 1-4 t 1 2 5 1 2 5 1 2 5 1 2 5 =.1 1-5 1 1 r F. Puosi, J. Rottler and J.-L. Barrat, PRE (214)!12

... a mesoscopic approximation A 2D scalar field for the local stress propagator local in fourier space plastic strain change: Case of thermally activated events Plastic activation Elastic recovery + (eventually) independent tracers moving according to the resulting deformation fields.

Workflow (physical protocol) InitializeEverything(); for (i=; i<t_max; i++){ UpdateStateVariablesActivated(); ComputePlasticStrainChange(); TransformToFourierSpace(); Convolution(); #ifdef TRACERS CalculateDisplacementField(); #endif AntitransformFromFourierSpace(); EulerIntegrationStep(); #ifdef TRACERS UpdateTracersPositions(); #endif if (condition(i)){ DoSomeMeasurementsAndAccumulate(); } } PrintResults(); CUDA_Kernels, cudamemset CUDA_Kernel CUDA_Kernel or Thrust call cufft CUDA_Kernel CUDA_Kernel or Thrust call cufft CUDA_Kernel CUDA_Kernel Thrust calls + C host code C host code

CUDA basic kernel example Euler integration step Kernel Call pre-computed in Fourier space

cufft library call Plan declaration Plan usage

CUDA kernel RNG use Philox: A counter-based RNG from the Random 123 library (211) Fast, easy to parallelize, use minimal memory/cache resources (ideal for GPU), require very little code. Have passed all of the rigorous SmallCrush, Crush and BigCrush tests in the extensive TestU1 suite.

Thrust Library basic usage High-Level Parallel Algorithms Library Parallel Analog of the C++ Standard Template Library (STL) Portability and Interoperability (CUDA, TBB, OpenMP,...) (REAL is either float or double)

Thrust Library advanced usage

Profiling with NVVP

Performance Speedups ~ 5x ( CPU code not optimal ) A lot more to exploit, starting with CPU-GPU concurrency Unexpected similarity between Fermi and Kepler GPUs - cuda 6. vs cuda 5.? - ECC enabled/disabled? - very low occupancy in both cases?

Froggy use Nice nodes, nice GPUs, clear tutorials, everything runs smoothly. GPUs are evolving fast. Opportunity to upgrade the cluster compute power by replacing ONLY the accelerator. Oar GPU target can perhaps be improved. - Possible cross access socket-gpu when CUDA_SET_DEVICE() is used by the application. - Allow possibility of targeting the same GPU with many processes. timesharing=user,* - In general 1CPU+1GPU jobs --> we waste 15CPUs. A GPU process should block the full node if it wants to have exclusive access to the GPU. Annoying, but necessary thing: Max Wall-time = 96hs - Stop/Restart coding, is difficult when we compute many things on-the-fly. - Automatic checkpoints (BLCR like...)? Is it possible for a GPU context?