BigDFT An introduction



Similar documents
Graphic Processing Units: a possible answer to High Performance Computing?

2IP WP8 Materiel Science Activity report March 6, 2013

A Beginner s Guide to Materials Studio and DFT Calculations with Castep. P. Hasnip (pjh503@york.ac.uk)

Visualization and post-processing tools for Siesta

NMR and IR spectra & vibrational analysis

Quantum Effects and Dynamics in Hydrogen-Bonded Systems: A First-Principles Approach to Spectroscopic Experiments

First Step : Identifying Pseudopotentials

Analysis, post-processing and visualization tools

The Quantum ESPRESSO Software Distribution

An Introduction to Hartree-Fock Molecular Orbital Theory

DFT in practice : Part I. Ersen Mete

1. Follow the links from the home page to download socorro.tgz (a gzipped tar file).

ME6130 An introduction to CFD 1-1

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

DARPA, NSF-NGS/ITR,ACR,CPA,

OpenFOAM Optimization Tools

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

TESLA Report

High-fidelity electromagnetic modeling of large multi-scale naval structures

Electric Dipole moments as probes of physics beyond the Standard Model

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Q-Chem: Quantum Chemistry Software for Large Systems. Peter M.W. Gill. Q-Chem, Inc. Four Triangle Drive Export, PA 15632, USA. and

Introduction to CFD Analysis

YALES2 porting on the Xeon- Phi Early results

Dynamic Load Balancing in CP2K

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Variational approach to restore point-like and curve-like singularities in imaging

Molecular Dynamics Simulations

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

Inner Product Spaces

HPC Deployment of OpenFOAM in an Industrial Setting

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

6 J - vector electric current density (A/m2 )

CHEM6085: Density Functional Theory Lecture 2. Hamiltonian operators for molecules

Statistical Machine Learning

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

A Learning Based Method for Super-Resolution of Low Resolution Images

Linear Threshold Units

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

Mesh Discretization Error and Criteria for Accuracy of Finite Element Solutions

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Lecture 3: Linear methods for classification

Physics 9e/Cutnell. correlated to the. College Board AP Physics 1 Course Objectives

= N 2 = 3π2 n = k 3 F. The kinetic energy of the uniform system is given by: 4πk 2 dk h2 k 2 2m. (2π) 3 0

Customer Training Material. Lecture 2. Introduction to. Methodology ANSYS FLUENT. ANSYS, Inc. Proprietary 2010 ANSYS, Inc. All rights reserved.

The RAMSES code and related techniques 3. Gravity solvers

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Express Introductory Training in ANSYS Fluent Lecture 1 Introduction to the CFD Methodology

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

5MD00. Assignment Introduction. Luc Waeijen

COMPUTATION OF THREE-DIMENSIONAL ELECTRIC FIELD PROBLEMS BY A BOUNDARY INTEGRAL METHOD AND ITS APPLICATION TO INSULATION DESIGN

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

Course Outline for the Masters Programme in Computational Engineering

Summary. Load and Open GaussView to start example

A Case Study - Scaling Legacy Code on Next Generation Platforms

What is molecular dynamics (MD) simulation and how does it work?

Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.

Introduction to CFD Analysis

W009 Application of VTI Waveform Inversion with Regularization and Preconditioning to Real 3D Data

Multi Scale Design of nanomaterials with simulations on hybrid architectures: (the muscade project)

Best practices for efficient HPC performance with large models

Free Electron Fermi Gas (Kittel Ch. 6)

Data Visualization for Atomistic/Molecular Simulations. Douglas E. Spearot University of Arkansas

Molecular vibrations: Iterative solution with energy selected bases

DO PHYSICS ONLINE FROM QUANTA TO QUARKS QUANTUM (WAVE) MECHANICS

Java Modules for Time Series Analysis

MPI Hands-On List of the exercises

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX,

Mulliken suggested to split the shared density 50:50. Then the electrons associated with the atom k are given by:

HPC enabling of OpenFOAM R for CFD applications

CONVERGE Features, Capabilities and Applications

How to run SIESTA Víctor M. García Suárez Gollum School. MOLESCO network 20-May-2015 Partially based on a talk by Javier Junquera

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

MASTER OF SCIENCE IN PHYSICS MASTER OF SCIENCES IN PHYSICS (MS PHYS) (LIST OF COURSES BY SEMESTER, THESIS OPTION)

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Virtual Landmarks for the Internet

Multi-Block Gridding Technique for FLOW-3D Flow Science, Inc. July 2004

Visualization Plugin for ParaView

Nonlinear Iterative Partial Least Squares Method

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Basis Sets in Quantum Chemistry C. David Sherrill School of Chemistry and Biochemistry Georgia Institute of Technology

Product Synthesis. CATIA - Product Engineering Optimizer 2 (PEO) CATIA V5R18

Sampling the Brillouin-zone:

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

How High a Degree is High Enough for High Order Finite Elements?

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Intelligent Heuristic Construction with Active Learning

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Transcription:

Atomistic and molecular simulations on massively parallel architectures CECAM PARIS, FRANCE An introduction Thierry Deutsch, Damien Caliste, Luigi Genovese L_Sim - CEA Grenoble 17 July 2013

Objectives of the lesson To know what are the main parameters in a DFT calculation To run a calculation (i.e. the physicist problem) 1 2 Exchange-correlation 3 High-Performance Computing (parallel and GPU) 4

A basis for nanosciences: the project STREP European project: (2005-2008) Four partners, 15 contributors: CEA-INAC Grenoble, U. Basel, U. Louvain-la-Neuve, U. Kiel Aim: To develop an ab-initio DFT code based on Daubechies Wavelets, to be integrated in ABINIT, distributed freely (GNU-GPL license) L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, et al., Daubechies wavelets as a basis set for density functional pseudopotential calculations, J. Chem. Phys. 129, 014109 (2008)

version 1.7: capabilities Free, surface and periodic boundary conditions Geometry optimizations (with constraints) Born-Oppenheimer Molecular Dynamics Saddle point searches (Nudged-Elastic Band Method) Vibrations External electric fields Unoccupied KS orbitals Collinear and Non-collinear magnetism All functionals of the ABINIT package Hybrid functionals Empirical van der Waals interactions (many flavors) Also available within the ABINIT package

Kohn-Sham formalism H-K theorem: E is an unknown functional of the density E = E[ρ] Density Functional Theory Kohn-Sham approach Mapping of an interacting many-electron system into a system with independent particles moving into an effective potential. Find a set of orthonormal orbitals Ψ i (r) that minimizes: E = 1 2 N/2 i=1 N/2 ρ(r) = i=1 Ψ i (r) 2 Ψ i (r)dr + 1 ρ(r)v H (r)dr 2 + E xc [ρ(r)] + V ext (r)ρ(r)dr Ψ i (r)ψ i (r) 2 V H (r) = 4πρ(r)

Spirit of Few input files (posinp.xyz, input.dft) Few input parameters ex. not parameters for parallel calculations Use input files with mandatory parameters Optional input files as input.kpt (for periodic systems) or input.perf (parameters for performance)

How to run Some pre-processing bigdft-tool nprocs [name] Help to find the optimal number of processors for the run that fit into the computer memory. Also useful to check that all input files are without errors. The main run mpirun -np 8 bigdft [name] tee [name].log Number of MPI tasks : 8 OpenMP parallelization : Yes Maximal OpenMP threads per MPI task : 1 Material acceleration : The outputs No #iproc=0 log is output on stdout, so redirect it. generated data are stored in a subdirectory called data[-name].

The input files (or not) posinp.xyz: the atomic coordinates and box definition. psppar.element: pseudo-potentials for each element; input.dft: DFT related parameters; input.geopt: geometrie relaxations and MD; input.kpt: the k-point mesh and band structure path; input.mix: diagonalisation mechanism; input.sic, tddft, occ, perf: other parameters; Only the coordinates are mandatory. All lines of different input files are mandatory and output on log. When is run, all default values are output in files called default.xxx. Naming scheme All input files may be renamed by providing a name argument to the command line i.e. bigdft name.

Atomic positions: posinp.xyz or posinp.ascii units reduced, angstroem, bohr or atomic Boundary conditions periodic, surface 8 reduced periodic 10.26085 10.26085 10.26085 Si 0. 0. 0. Si 0.5 0.5 0. Si 0.5 0. 0.5 Si 0. 0.5 0.5 Si 0.25 0.25 0.25 Si 0.75 0.75 0.25 Si 0.75 0.25 0.75 Si 0.25 0.75 0.75

Atomic positions: posinp.xyz or posinp.ascii units reduced, angstroem, bohr or atomic Boundary conditions periodic, surface 3 angstroem surface 3.0 0.0 3.0 O 1.5 0.0 1.5 H.7285 0.620919 1.5 H 2.2715 0.620919 1.5

Atomic positions: posinp.xyz or posinp.ascii units reduced, angstroem, bohr or atomic Boundary conditions periodic, surface 5 atomic Si 1.62 1.62 1.62 H 3.24 3.24 3.24 H 0 0 3.24 H 3.24 0 0 H 0 3.24 0

Atomic coordinate format uses XYZ format plus additional modifications: physical unit, after the number of atoms on the first line put either bohr or angstroem. boundary conditions, on second line, start with one of freebc, surface X lat 0 Z lat or periodic X lat Y lat Z lat. frozen positions, add a f, fy or fxz at the end of an atom line. input polarisation, add values at the end of an atom line. On output, forces are added for each element at the end of the XYZ file: 2 angstroem freebc Na -1 0 0 Cl +1 0 0 forces Na 2.654367E-2 0 0 Cl -2.654367E-2 0 0

Performing a DFT calculation A self-consistent equation ρ(r) = Ψ i (r)ψ i (r), where ψ i satisfies i ( 12 ) 2 + V H [ρ] + V xc [ρ] + V ext + V pseudo ψ i = E i ψ i, (Kohn-Sham) DFT Ingredients A basis set for expressing the ψ i An potential, functional of the density several approximations exists (LDA,GGA,... ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,... ) An (iterative) algorithm for finding the wavefunctions ψ i A (good) computer...

s in real space: Wavelet basis sets Properties of wavelet basis sets: 1.5 1 0.5 0-0.5-1 -1.5 localized both in real and in Fourier space allow for adaptivity (for core electrons) are a systematic basis set -6-4 -2 0 2 4 6 8 A basis set both adaptive and systematic, real space based x φ(x) ψ(x)

A brief description of wavelet theory Two kind of basis functions A Multi-Resolution real space basis The functions can be classified following the resolution level they span. Scaling Functions The functions of low resolution level are a linear combination of high-resolution functions = + m φ(x) = h j φ(2x j) j= m Centered on a resolution-dependent grid: φ j = φ 0 (x j).

A brief description of wavelet theory Wavelets They contain the DoF needed to complete the information which is lacking due to the coarseness of the resolution. φ(2x) = m j= m = 1 2 + 1 2 h j φ(x j) + m j= m g j ψ(x j) Increase the resolution without modifying grid space SF + W = same DoF of SF of higher resolution ψ(x) = m j= m g j φ(2x j) All functions have compact support, centered on grid points.

Adaptivity of the mesh Adaptivity Resolution can be refined following the grid point. The grid is divided in: Data can thus be stored in compressed form. Localization property Low resolution pts (SF, 1 DoF) High resolution pts (SF + W, 8 DoF) Points of different resolution belong to the same grid. Empty regions must not be filled with basis functions. Optimal for big inhomogeneous systems, O(N) approach.

Adaptivity of the mesh Atomic positions (H 2 O), hgrid

Adaptivity of the mesh Fine grid (high resolution, frmult)

Adaptivity of the mesh Coarse grid (low resolution, crmult)

Defining the basis set in input.dft Main specific variables to control the basis set hgrid the grid size frmult the expansion around atoms for fine grid crmult the expansion around atoms for coarse grid Convergence properties Decreasing hgrid more degrees of freedom. Increasing crmult less box constraint. It is important to improve both parameters together to avoid error on one hiding improvement made on the other one.

input.dft: hgrid, crmult, frmult hgrid Step grid: twice as bigger than the lowest gaussian coefficient of pseudopotential Roughly 0.45 (only orthorhombic mesh) crmult Extension of the coarse resolution (factor 6.): depends on the extension of the HOMO for the atom frmult Extension of the fine resolution (factor 8.): depends on the pseudopotential coefficient In practice, we use only hgrid from 0.3 to 0.55.

Wavelets: No integration error Orthogonality, scaling relation dx φ k (x)φ j (x) = δ kj φ(x) = 1 2 m j= m h j φ(2x j) The hamiltonian-related quantities can be calculated up to machine precision in the given basis. The accuracy is only limited by the basis set (O ( h 14 grid) ) Exact evaluation of kinetic energy Obtained by convolution with filters: f(x) = c l φ l (x), 2 f(x) = c l φ l (x), l l c l = c j a l j, a l φ 0 (x) 2 xφ l (x), j

Wavelets: Separability in 3D The 3-dim scaling basis is a tensor product decomposition of 1-dim Scaling Functions/ Wavelets. φ e x,e y,e z j x,j y,j z (x,y,z) = φ e x j x (x)φ e y j y (y)φ e z j z (z) With (j x,j y,j z ) the node coordinates, φ (0) j and φ (1) j the SF and the W respectively. Gaussians and wavelets The separability of the basis allows us to save computational time when performing scalar products with separable functions (e.g. gaussians): Initial wavefunctions (input guess) Poisson solver Non-local pseudopotentials

Performing a DFT calculation A self-consistent equation ρ(r) = Ψ i (r)ψ i (r), where ψ i satisfies i ( 12 ) 2 + V H [ρ] + V xc [ρ] + V ext + V pseudo ψ i = E i ψ i, (Kohn-Sham) DFT Ingredients A basis set for expressing the ψ i An potential, functional of the density several approximations exists (LDA,GGA,... ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,... ) An (iterative) algorithm for finding the wavefunctions ψ i A (good) computer...

Gaussian type separable s (HGH) Local part V loc (r) = Z [ ] [ ion r erf + exp 1 ( ) ] 2 r r 2rloc 2 r loc { ( ) 2 ( ) 4 ( r r r C 1 + C 2 + C 3 + C 4 r loc r loc Nonlocal (separable) part H( r, r ) H sep ( r, r ) = p l 1(r) = 2 r l+ 3 2 l 2 i=1 r loc Y s,m (ˆr) p s i (r) h s i p s i (r ) Ys,m(ˆr ) m + Y p,m (ˆr) p p 1 (r) hp 1 pp 1 (r ) Yp,m(ˆr ) m r l+ 7 2 l ) 6 } r l e 1 2 ( r ) 2 r l p Γ(l l r l+2 e 1 2 ( r ) 2 r l 2(r) = 2 + 32 ) Γ(l + 7) 2

file (as psppar.n) No need if you use LDA or PBE functionals Default pseudopotentials inside! Put in the same directory as the calculation. Large library: http://www.abinit.org/downloads/psp-links Accurate and transferable

Performing a DFT calculation A self-consistent equation ρ(r) = Ψ i (r)ψ i (r), where ψ i satisfies i ( 12 ) 2 + V H [ρ] + V xc [ρ] + V ext + V pseudo ψ i = E i ψ i, (Kohn-Sham) DFT Ingredients A basis set for expressing the ψ i An potential, functional of the density several approximations exists (LDA,GGA,... ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,... ) An (iterative) algorithm for finding the wavefunctions ψ i A (good) computer...

Exchange-Correlation Energy (input.dft) E xc [ρ] = K exact [ρ] K KS [ρ] + V e e exact [ρ] V Hartree [ρ] K KS [ρ]: Kinetic energy for a non-interaction electron gas Kohn-Sham formalism is exact (mapping of an interacting electron system into a non-interacting system) Many types from ABINIT: Use same codes (ixc: LDA=1, PBE=11, see manual) lib: code for exchange code for correlation ( 101130 see manual) Hybrid functionals from lib

Performing a DFT calculation A self-consistent equation ρ(r) = Ψ i (r)ψ i (r), where ψ i satisfies i ( 12 ) 2 + V H [ρ] + V xc [ρ] + V ext + V pseudo ψ i = E i ψ i, (Kohn-Sham) DFT Ingredients A basis set for expressing the ψ i An potential, functional of the density several approximations exists (LDA,GGA,... ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,... ) An (iterative) algorithm for finding the wavefunctions ψ i A (good) computer...

Kohn-Sham Equations: Operators Apply different operators Having a one-electron hamiltonian in a mean field. [ 1 ] 2 2 + V eff (r)(r) ψ i = ε i ψ i V eff (r) = V ext (r) + V ρ(r ) r r dr + µ xc (r) E[ρ] can be expressed by the orthonormalized states of one particule: ψ i (r) with the fractional occupancy number f i (0 f i 1) : ρ(r) = f i ψ i (r) 2 i

Kohn-Sham Equations: Computing Energies Calculate different integrals E[ρ] = K [ρ] + U[ρ] U[ρ] = K [ρ] = 1 2 2 m e i V dr f iψ i 2 ψ i dr V ext(r)ρ(r)+ 1 ρ(r)ρ(r ) dr dr V 2 V r r } {{ } Hartree We minimise with the variables ψ i (r) and f i with the constraint dr ρ(r) = N el. V + E xc [ρ] } {{ } exchange correlation

KS Equations: Self-Consistent Field Set of self-consistent equations: { 1 } 2 2 + V eff ψ i = ε i ψ i 2 m e with an effective potential: V eff (r) = V ext (r)+ ) dr ρ(r V r r } {{ } Hartree δe xc + δρ(r) } {{ } exchange correlation and: ρ(r) = 2 i f i ψ i (r) 2

KS Equations: Self-Consistent Field Set of self-consistent equations: { 1 } 2 2 + V eff ψ i = ε i ψ i 2 m e with an effective potential: V eff (r) = V ext (r)+ ) dr ρ(r V r r } {{ } Hartree δe xc + δρ(r) } {{ } exchange correlation and: ρ(r) = 2 i f i ψ i (r) 2 Poisson Equation: V Hartree = ρ (Laplacian: = 2 x 2 + 2 y 2 + 2 z 2 )

KS Equations: Self-Consistent Field Set of self-consistent equations: { 1 } 2 2 + V eff ψ i = ε i ψ i 2 m e with an effective potential: V eff (r) = V ext (r)+ ) dr ρ(r V r r } {{ } Hartree δe xc + δρ(r) } {{ } exchange correlation and: ρ(r) = 2 i f i ψ i (r) 2 Poisson Equation: V Hartree = ρ (Laplacian: = 2 x 2 + 2 y 2 + 2 z 2 )

KS Equations: Self-Consistent Field Set of self-consistent equations: { 1 } 2 2 + V eff ψ i = ε i ψ i 2 m e with an effective potential: V eff (r) = V ext (r)+ ) dr ρ(r V r r } {{ } Hartree δe xc + δρ(r) } {{ } exchange correlation and: ρ(r) = 2 i f i ψ i (r) 2 Poisson Equation: V Hartree = ρ (Laplacian: = 2 x 2 + 2 y 2 + 2 z 2 )

KS Equations: Self-Consistent Field Set of self-consistent equations: { 1 } 2 2 + V eff ψ i = ε i ψ i 2 m e with an effective potential: V eff (r) = V ext (r)+ ) dr ρ(r V r r } {{ } Hartree δe xc + δρ(r) } {{ } exchange correlation and: ρ(r) = 2 i f i ψ i (r) 2 Poisson Equation: V Hartree = ρ (Laplacian: = 2 x 2 + 2 y 2 + 2 z 2 )

KS Equations: Self-Consistent Field Set of self-consistent equations: { 1 } 2 2 + V eff ψ i = ε i ψ i 2 m e with an effective potential: V eff (r) = V ext (r)+ ) dr ρ(r V r r } {{ } Hartree δe xc + δρ(r) } {{ } exchange correlation and: ρ(r) = 2 i f i ψ i (r) 2 Poisson Equation: V Hartree = ρ (Laplacian: = 2 x 2 + 2 y 2 + 2 z 2 )

Direct Minimisation: Flowchart { } ψ i = caφ i a a Basis: adaptive mesh (0,...,N φ ) Orthonormalized

Direct Minimisation: Flowchart { } ψ i = caφ i a a inv FWT + Magic Filter Basis: adaptive mesh (0,...,N φ ) Orthonormalized occ ρ(r) = i ψ i (r) 2 Fine non-adaptive mesh: < 8N φ

Direct Minimisation: Flowchart { } ψ i = caφ i a a inv FWT + Magic Filter Basis: adaptive mesh (0,...,N φ ) Orthonormalized occ ρ(r) = i ψ i (r) 2 Fine non-adaptive mesh: < 8N φ Poisson solver V H (r) = V xc [ρ(r)] G(.)ρ V NL ({ψ i }) V effective

Direct Minimisation: Flowchart { } ψ i = caφ i a a inv FWT + Magic Filter Basis: adaptive mesh (0,...,N φ ) Orthonormalized occ ρ(r) = i ψ i (r) 2 Fine non-adaptive mesh: < 8N φ Poisson solver V H (r) = V xc [ρ(r)] G(.)ρ V NL ({ψ i }) V effective 1 2 2 Kinetic Term

Direct Minimisation: Flowchart { } ψ i = caφ i a a inv FWT + Magic Filter Basis: adaptive mesh (0,...,N φ ) Orthonormalized 1 2 2 occ ρ(r) = i ψ i (r) 2 Fine non-adaptive mesh: < 8N φ Poisson solver V H (r) = V xc [ρ(r)] G(.)ρ V NL ({ψ i }) FWT FWT δca i = E total c i (a) + Λ ij ca j j V effective Kinetic Term Λ ij =< ψ i H ψ j >

Direct Minimisation: Flowchart { } ψ i = caφ i a a inv FWT + Magic Filter Basis: adaptive mesh (0,...,N φ ) Orthonormalized 1 2 2 occ ρ(r) = i ψ i (r) 2 Fine non-adaptive mesh: < 8N φ Poisson solver V H (r) = V xc [ρ(r)] G(.)ρ V NL ({ψ i }) FWT δca i = E total c i (a) + Λ ij ca j j c new,i a FWT preconditioning = c i a + h step δc i a V effective Kinetic Term Λ ij =< ψ i H ψ j > Steepest Descent, DIIS

Direct Minimisation: Flowchart { } ψ i = caφ i a a inv FWT + Magic Filter Basis: adaptive mesh (0,...,N φ ) Orthonormalized 1 2 2 occ ρ(r) = i ψ i (r) 2 Fine non-adaptive mesh: < 8N φ Poisson solver V H (r) = V xc [ρ(r)] G(.)ρ V NL ({ψ i }) FWT Stop when δc i a small δca i = E total c i (a) + Λ ij ca j j c new,i a FWT preconditioning = c i a + h step δc i a V effective Kinetic Term Λ ij =< ψ i H ψ j > Steepest Descent, DIIS

input.dft: idsx DIIS history Use DIIS algorithm Direct Inversion in the Iterative Subspace Use idsx=6 previous iterations Memory-consuming (decrease if too big) Steepest descent if idsx=0 itermax=50 Specify maximum number of iterations Typically 15 iterations to converge gnrm_cv=1.e-4 convergence criterion Residue δψ 2 < gnrm_cv

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

Description of the file input.dft 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 1 0 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 7 6 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) 0 0 0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals T disable the symmetry detection

s: wavelets self-consistent (SCF) Harris-Foulkes functional relativistic pseudopotential norm-conserving all electrons PAW Hybrid functionals beyond LDA GW non-relativistic [ N 3 scaling 1 2 2 O(N) methods periodic non-periodic non-collinear collinear +V(r) non-spin polarized spin polarized LDA,GGA ] +µ xc (r) ψ k i = ε k i ψ k i atomic orbitals plane waves augmented real space LDA+U Gaussians Slater numerical finite difference Wavelet

Optimal for isolated systems 10 1 Test case: cinchonidine molecule (44 atoms) Plane waves Absolute energy precision (Ha) 10 0 10-1 10-2 10-3 10-4 h = 0.4bohr E c = 40 Ha h = 0.3bohr Wavelets E c = 90 Ha 10 5 10 6 Number of degrees of freedom E c = 125 Ha Allows a systematic approach for molecules Considerably faster than Plane Waves codes. 10 (5) times faster than ABINIT (CPMD) Charged systems can be treated explicitly with the same time

Systematic basis set Two parameters for tuning the basis The grid spacing hgrid The extension of the low resolution points crmult

Comparison with a code based on gaussian basis set The Daubechies expansion of a gaussian is known Results from gaussian codes can be imported in Gaussian code (CP2K) test, H 2 O and DZVP basis E [Ha] Input Guess Energy 17.033 17.123 CP2K Imported energy converged energy 6-31G Conv energy 17.132 Wavelet Conv energy with finite-size corrections 3 4 5 6 crmult (basis extension)

using interpolating scaling function The Hartree potential is calculated in the interpolating scaling function basis. Poisson solver with interpolating scaling functions From the density ρ( j) on an uniform grid it calculates: ρ(r ) V H (r) = r r dr Very fast and accurate, optimal parallelization Can be used independently from the DFT code Integrated quantities (energies) are easy to extract Non-adaptive, needs data uncompression Explicitly free boundary conditions No need to subtract supercell interactions (charged systems).

for surface boundary conditions A Poisson solver for surface problems Based on a mixed reciprocal-direct space representation V px,p y (z) = 4π dz G( p ;z z )ρ px,p y (z), Can be applied both in real or reciprocal space codes No supercell or screening functions More precise than other existing approaches Example of the plane capacitor: Periodic Hockney Our approach V V V y y y x x x

input.geopt: or molecular dynamics DIIS 500 ncount_cluster: max steps during geometry relaxation 1.d0 1.d-3 frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise maxval: maximum value of atomic forces 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history Different algorithms: SDCG: Steepest Descent Conjugate Gradient DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6

input.geopt: or molecular dynamics DIIS 500 ncount_cluster: max steps during geometry relaxation 1.d0 1.d-3 frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise maxval: maximum value of atomic forces 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history Different algorithms: SDCG: Steepest Descent Conjugate Gradient DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6

input.geopt: or molecular dynamics DIIS 500 ncount_cluster: max steps during geometry relaxation 1.d0 1.d-3 frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise maxval: maximum value of atomic forces 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history Different algorithms: SDCG: Steepest Descent Conjugate Gradient DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6

input.geopt: or molecular dynamics DIIS 500 ncount_cluster: max steps during geometry relaxation 1.d0 1.d-3 frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise maxval: maximum value of atomic forces 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history Different algorithms: SDCG: Steepest Descent Conjugate Gradient DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6

input.geopt: or molecular dynamics DIIS 500 ncount_cluster: max steps during geometry relaxation 1.d0 1.d-3 frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise maxval: maximum value of atomic forces 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history Different algorithms: SDCG: Steepest Descent Conjugate Gradient DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6

input.geopt: or molecular dynamics DIIS 500 ncount_cluster: max steps during geometry relaxation 1.d0 1.d-3 frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise maxval: maximum value of atomic forces 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history Different algorithms: SDCG: Steepest Descent Conjugate Gradient DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6

High-Performance Computing: Massively parallel Two kinds of parallelisation By orbitals (Hamiltonian application, preconditioning) By components (overlap matrices, orthogonalisation) A few (but big) packets of data More demanding in bandwidth than in latency No need of fast network Optimal speedup (eff. 85%), also for big systems Cubic scaling code For systems bigger than 500 atomes (1500 orbitals) : orthonormalisation operation is predominant (N 3 )

Orbital distribution scheme Used for the application of the hamiltonian The hamiltonian (convolutions) is applied separately onto each wavefunction ψ 1 MPI 0 ψ 2 ψ 3 ψ 4 ψ 5 MPI 1 MPI 2

Coefficient distribution scheme Used for scalar product & orthonormalisation BLAS routines (level 3) are called, then result is reduced MPI 0 MPI 1 MPI 2 ψ 1 ψ 2 ψ 3 ψ 4 ψ 5 Communications are performed via MPI_ALLTOALLV

Performance in parallel () Distribution per orbitals (wavelets) Nothing to specify in input files

GPU-ported operations in (double precision) 20 Convolutions (OpenCL rewritten) GPU speedups between 10 8 10 and 20 can be 6 obtained for different 4 sections 2 70 GPU speedup (Double prec.) 18 16 14 12 locden locham precond 0 10 20 30 40 50 60 70 Wavefunction size (MB) GPU speedup (Double prec.) 60 50 40 30 20 10 DGEMM DSYRK 0 0 500 1000 1500 2000 2500 Number of orbitals Linear algebra (CUBLAS library) The interfacing with CUBLAS is immediate, with considerable speedups

code on Hybrid architectures code can run on hybrid CPU/GPU supercomputers In multi-gpu environments, double precision calculations No Hot-spot operations Different code sections can be ported on GPU up to 20x speedup for some operations, 7x for the full parallel code 100 1000 80 Percent 60 40 100 10 Seconds (log. scale) 20 0 1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16 CPU code Hybrid code (rel.) Number of cores 1 Time (sec) Comm Other PureCPU precond locham locden LinAlg

input.perf file OpenCL acceleration: Add this line in in input.perf accel OCLGPU See the web page /Wiki/index.php?title=Input.perf for more information about all keywords (verbosity, input guess,...) associated to the performance of the code. The log file gives the list of possible keywords.

: Future developments self-consistent (SCF) Harris-Foulkes functional relativistic pseudopotential norm-conserving all electrons PAW Hybrid functionals beyond LDA GW non-relativistic [ N 3 scaling 1 2 2 O(N) methods periodic non-periodic non-collinear collinear +V(r) non-spin polarized spin polarized LDA,GGA ] +µ xc (r) ψ k i = ε k i ψ k i atomic orbitals plane waves augmented real space LDA+U Gaussians Slater numerical finite difference Wavelet

Linear scaling version 16 min: 1000 atoms with the cubic scaling, 8000 atoms with the linear scaling one

Functionality: Mixed between physics and chemistry approach Order N (real space basis set as wavelets) Multi-scale approach (more than 1000 atoms feasible for a better parametrization) Numerical experience: One-day simulation Better exploration of atomic configurations Molecular Dynamics (4s per step for 32 water molecules)

Hands on See /Wiki/index.php?title=Category:Tutorials First runs with Basis-set convergence Acceleration example on different platforms: Kohn-Sham DFT Operation with GPU acceleration