An asynchronous Oneshot method with Load Balancing



Similar documents
Iterative Solvers for Linear Systems

Nonlinear Optimization: Algorithms 3: Interior-point methods

General Framework for an Iterative Solution of Ax b. Jacobi s Method

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

10. Proximal point method

(Quasi-)Newton methods

Mathematical finance and linear programming (optimization)

Real Time Scheduling Basic Concepts. Radek Pelánek

Optimization of Supply Chain Networks

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

Discrete mechanics, optimal control and formation flying spacecraft

Solutions of Equations in One Variable. Fixed-Point Iteration II

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Reinforcement Learning

Numerical Analysis An Introduction

Finite Difference Approach to Option Pricing

Numerical Verification of Optimality Conditions in Optimal Control Problems

CHAPTER 3 SECURITY CONSTRAINED OPTIMAL SHORT-TERM HYDROTHERMAL SCHEDULING

Lecture Cost functional

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Lecture 10. Finite difference and finite element methods. Option pricing Sensitivity analysis Numerical examples

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

SIXTY STUDY QUESTIONS TO THE COURSE NUMERISK BEHANDLING AV DIFFERENTIALEKVATIONER I

Model order reduction via Proper Orthogonal Decomposition

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Six Strategies for Building High Performance SOA Applications

IMPROVED NETWORK PARAMETER ERROR IDENTIFICATION USING MULTIPLE MEASUREMENT SCANS

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Duality of linear conic problems

Distributed Machine Learning and Big Data

Parameter Estimation for Bingham Models

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

Designing Efficient Software for Solving Delay Differential Equations. C.A.H. Paul. Numerical Analysis Report No Oct 2000

Cooling and Euler's Method

CHAPTER 7 APPLICATIONS TO MARKETING. Chapter 7 p. 1/54

CITY AND REGIONAL PLANNING Consumer Behavior. Philip A. Viton. March 4, Introduction 2

Feedback guided load balancing in a distributed memory environment

HPC Deployment of OpenFOAM in an Industrial Setting

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

OpenFOAM Optimization Tools

Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology

HPC enabling of OpenFOAM R for CFD applications

Lecture 16 - Free Surface Flows. Applied Computational Fluid Dynamics

8. Linear least-squares

Optimal Hiring of Cloud Servers A. Stephen McGough, Isi Mitrani. EPEW 2014, Florence

R u t c o r Research R e p o r t. A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS.

Numerical Methods For Image Restoration

Econometrics Simple Linear Regression

Numerical methods for American options

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Simulation of Fluid-Structure Interactions in Aeronautical Applications

Solving ODEs in Matlab. BP205 M.Tremont

Envelope Theorem. Kevin Wainwright. Mar 22, 2004

SavvyDox Publishing Augmenting SharePoint and Office 365 Document Content Management Systems

Integrating Benders decomposition within Constraint Programming

Spatial Decomposition/Coordination Methods for Stochastic Optimal Control Problems. Practical aspects and theoretical questions

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Decentralized control of stochastic multi-agent service system. J. George Shanthikumar Purdue University

Airport Planning and Design. Excel Solver

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

A Multi-layered Domain-specific Language for Stencil Computations

Flexible Distributed Capacity Allocation and Load Redirect Algorithms for Cloud Systems

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

A Review of Customized Dynamic Load Balancing for a Network of Workstations

Big Data - Lecture 1 Optimization reminders

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Metric Spaces. Chapter Metrics

24. The Branch and Bound Method

Introduction to Markov Chain Monte Carlo

SINGLE-STAGE MULTI-PRODUCT PRODUCTION AND INVENTORY SYSTEMS: AN ITERATIVE ALGORITHM BASED ON DYNAMIC SCHEDULING AND FIXED PITCH PRODUCTION

Reinforcement Learning

Interior-Point Algorithms for Quadratic Programming

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

Scheduling Real-time Tasks: Algorithms and Complexity

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

C3.8 CRM wing/body Case

Differentiating a Time-dependent CFD Solver

Equilibrium computation: Part 1

TESLA Report

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

4.6 Linear Programming duality

A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES

A MULTI-PERIOD INVESTMENT SELECTION MODEL FOR STRATEGIC RAILWAY CAPACITY PLANNING

Passive Discovery Algorithms

Clustering and scheduling maintenance tasks over time

Transcription:

An asynchronous Oneshot method with Load Balancing Torsten Bosse Argonne National Laboratory, IL

. 2 of 24 Design Optimization Problem min f(u,y) subject to c(u,y) = 0, (u,y) R m+n for sufficiently smooth functions f : R m n R and c : R m n R n Assumption a slowly convergent fixed-point solver G : R m n R n for c: y k+1 = G(u,y k ) [c(u,y) = 0 y = G(u,y)] with a global contraction rate G y (u,y) ρ < 1, e.g. ρ 0.999 Lagrange function with adjoint variable ȳ R n L(u,y,ȳ) f(u,y) + ȳ [G(u,y) y] Find (u,y,ȳ ) satisfying first order necessary / KKT conditions 0 = Lȳ (u,y,ȳ ) G(u,y ) y (primal) 0 = L y (u,y,ȳ ) f y (u,y ) + [G y (u,y) I] ȳ (adjoint) 0 = L u (u,y,ȳ ) f u (u,y ) + G u (u,y )ȳ (design)

. 3 of 24 Fixed-Point Solver for the State Equation Assume Optimal control u and y is known. Goal Find state y satisfying c(u,y ) = 0. Solution Use slowly convergent state fixed-point solver G : U Y Y, i.e., start with an initial approximation y 0 Y and iterate Notation y k+1 = G(u,y k ) for k = 0,1,2,... such that y = lim k y k solves y = G(u,y ) c(u,y ) = 0 due to global contraction rate G y (u,y) ρ < 1 and Banach FPT. (state) (state) (state)... Interpretation Fixed-point iteration is some solver for a PDE/numerical model that describes a physical process represented by the equality c(u, y) = 0

. 4 of 24 Fixed-Point Solver for the State + Adjoint Equation Assume Optimal control u is known. Goal Find state y satisfying c(u,y ) = 0 and adjoint solution ȳ with 0 = f y (u,y ) + [G y (u,y ) I] ȳ BUT only allowed to use Jacobi-vector, vector-jacobi products - evaluation/inversion/factorization of G y might be to costly! Idea For an initial approximation ȳ 0 and given y the adjoint iteration ȳ k+1 = f y (u,y) + G y (u,y)ȳ k is also a fixed point iteration due to G y (u,y) ρ < 1. Updating scheme according to Christianson (state) (state) (adjoint) (adjoint)... (B.C.)

. 5 of 24 Fixed-Point Solver for the State + Adjoint Equation (Griewank) Problem The B.C. updating is sequential, i.e., converge primal iteration y k+1 = G(u,y k ) for k = 0,1,2,... to a solution y = G(u,y ) (or an approximation) before iterating ȳ k+1 = f y (u,y ) + G y (u,y )ȳ k for k = 0,1,2,... from initial ȳ 0 until ȳ = lim k ȳ k with L y (u,y,ȳ ) = 0. Alternative approach (Griewank): Update (y k,ȳ k ) in parallel, i.e., y k+1 = G(u },y k ) ȳ k+1 = f y (u,y k ) + G y (u,y k )ȳ k for k = 0,1,2,... or, in short, use the Piggyback method (state,adjoint) (state,adjoint)... (A.G.)

??? 6 of 24 Observations Observation I Changes of y k+1 = G(u,y k ) in primal update cause a lag-effect of the adjoint iteration ȳ k+1 = f y (u,y k ) + G y (u,y k )ȳ k, i.e., L y (u,y k,ȳ k ) might grow while y k is not sufficiently converged. Observation II Multiple primal for/before an adjoint update reduce the lag effect Observation III Primal precision/number of forward iterations unkown for B.C. Question What are the consequences and can we do better? B.C. Residual Pure Pure/B.C. A.G.??? Adjoint (B.C.) Adjoint (A.G.) Time Motivation primal and adjoint updates have usually different execution times

7 of 24 Blurred Piggyback - Best of both worlds Idea I Run primal and adjoints in an asynchronous parallel fashion, i.e., (state) (state) (state) (state) (state)... (adjoint) (adjoint) (adjoint) (adjoint) (adjoint)... Idea II Always use the latest available information y for the adjoint update. Quasi-Code 1: Input: u = u, y = t y, ȳ = tȳ ; t y shared 2: while (y, ȳ) not converged do 3: Process 0: 4: Compute state t y = G(t u,y) 5: Update state y = t y 6: Process 1: 7: Compute adjoint tȳ = f y (t u,t y ) + ȳ G y (t u,t y ) 8: Update adjoint ȳ = tȳ 9: end while 10: Output: y, ȳ Convergence Based on G y (u,y) ρ < 1 and smoothness of f, G

. 8 of 24 Numerical results for Laplace problem Objective Tracking type functional f(u,y) = 1 2 y y d 2 + µ 2 u 2, y d triangle Constraint Laplace y = u, y Ω = 0 over Ω = [0,1] Slow Contraction Jacobi-iteration y k+1 = D 1 (u Ry k ) for A = D + R, D diagonal Measurements of the runtime seem to be inconsistent with the convergence history Residual 10 1 0.1 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1e-08 1e-09 B.C. A.G. B.P. PURE Residual Pure Adjoint (B.C.) Adjoint (B.P.) Adjoint (A.G.) 0 40000 10000 20000 50000 60000 30000 Iteration Time PURE A.G. B.C. B.P. 1.35978293e-02 1.5283478e-01 2.7537666e-02 1.7725375e-02

. 8 of 24 Numerical results for Laplace problem Objective Tracking type functional f(u,y) = 1 2 y y d 2 + µ 2 u 2, y d triangle Constraint Laplace y = u, y Ω = 0 over Ω = [0,1] Slow Contraction Jacobi-iteration y k+1 = D 1 (u Ry k ) for A = D + R, D diagonal Measurements of the runtime seem to be inconsistent with the convergence history Residual 10 1 0.1 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1e-08 1e-09 B.C. A.G. B.P. PURE Residual Pure Adjoint (B.C.) Adjoint (B.P.) Adjoint (A.G.) 0 40000 10000 20000 50000 60000 30000 Iteration Time PURE A.G. B.C. B.P. 1.35978293e-02 1.5283478e-01 2.7537666e-02 1.7725375e-02 G G, f y + Gy ȳ, sync G or f y + Gy ȳ λ 1 [G] and λ 2 [f y + Gy ȳ]

Load Balancing Assumption I The functions f, G and their derivatives can be implemented as parallel programs and only scale up to a certain degree Assumption II Finite amount of computational resources available λ R + Idea Use load balancing λ = λ 1 + λ 2 and find a speed up of overall process by optimal distribution (λ 1,λ 2 ) of the primal/adjoint updates on available resources Goal Minimize computational time T (λ 1,λ 2 ) required to find (y,ȳ ) Example of a static load balancing for three different tuple (λ 1,λ 2 ) Adjoint (B.P.) Residual Pure Residual Pure Residual Pure Adjoint (B.P.) Adjoint (B.P.) Time Time Time. 9 of 24

. 10 of 24 Dynamic Load Balancing Automatic setup based on information about f, G and their derivatives, residual, scalability, extra cost for data transfer, communication, experience,... develop self adjusting method that efficiently distributes the computational resources during the course of optimization Example Consider the following dynamic load balancing Phase I: Mainly reduce primal error (B.C. style) Phase II: Reduce primal error and adjoint error (A.G./B.P. style) Phase III: Mainly reduce adjoint error (B.C. style) Adjoint (B.P.) Adjoint (B.P.) Adjoint (B.P.) Residual Pure Residual Pure Residual Pure Phase I Phase II Phase III

11 of 24 (Summary I) State and Adjoint equation can be solved by three different fixed-point approaches: Christianson (state) (state) (adjoint) (adjoint)... (B.C.) Griewank (state,adjoint) (state,adjoint)... (A.G.) Blurred (state) (state) (state) (state) (state)... (adjoint) (adjoint) (adjoint) (adjoint)... (B.P.) (New) Blurred Piggyback method (asynchronous + blurred state information for the adjoint) seems to be favorable (at least for this example, this setting,...) A.G. and B.C. can be recovered as a special case of B.P. by load balancing. (Dynamic) load balancing to reduce execution time - that s the important thing!!!

12 of 24 Oneshot methods Goal Solve optimization problem min (u,y) R m+n f(u,y) s.t. y = G(u,y). Updates u + = u αb 1 L u (u,y,ȳ), y + = G(u,y),and ȳ + = ȳ + L y (u,y,ȳ) Sequencing Combine the three update steps into an optimization method. Choose order of state, adjoint and design updates, their multiplicity s, parameters B and α to ensure (optimal) contraction of the combined method towards stationary points. Scenarios (design,state,adjoint) (design,state,adjoint)... (Jacobi) (design) (state) (adjoint) (design)... (Seidel) (design) (state,adjoint) (design)... (Mixed) (design) (state) s (adjoint) s (design)... (Multi-Step)

. 13 of 24 Most wanted Oneshot methods Jacobi method by Griewank/Gauger/Slawig/Kaland... (design,state,adjoint) (design,state,adjoint)... with doubly augmented Lagrangian preconditioner B L uu + αg u G u + βl uy L yu Seidel (multi-step) by B/Lehmann/Griewank (design) (state) s (adjoint) s (design)... with projected Lagrangian preconditioner (or modifications) B [ ][ ] I Z ][ L uu L uy I,Z = (I G y ) 1 G u. L yu L yy Z Generalization Both/(almost) a all considered one-shot methods are a special case of the following general one-shot method and can be derived by a suitable load balancing schemes a to the best of my knowledge one can even skip almost

. 13 of 24 Algorithm 1 General Oneshot Method 1: Input: u, y, ȳ, α, B 2: while (u, y, ȳ) not converged do 3: Process 0: 4: Monitor processes, determine allocation of resources 5: Process 1: 6: Update α and B 7: Compute design t u = u αb 1 L u (u,y,ȳ) 8: Process 2: 9: Compute state t y = G(u,y) 10: Process 3: 11: Compute adjoint tȳ = L y (u,y,ȳ) + ȳ 12: Process 4: 13: Update design u = t u 14: Process 5: 15: Update state y = t y 16: Process 6: 17: Update adjoint ȳ = tȳ 18: end while 19: Output: u, y, ȳ, α, B

14 of 24 Load balancing Resources need to be distributed for 0+1+1+1+3 processes. Jacobi method and corresponding load balancing (design,state,adjoint) (design,state,adjoint)... STATE ADJOINT DESIGN WORK RUN-TIME RUN-TIME M.Seidel method and corresponding load balancing (design) (state) s (adjoint) s (design)... STATE ADJOINT DESIGN WORK RUN-TIME RUN-TIME Problem Both methods (Jacobi, Multi-Seidel) are not optimal: idle vs. cross-communication

Blurred Oneshot method Idea Run all three processes (design,state,adjoint) in parallel but in an asynchronous fashion and allow blurred updates analogous to the blurred piggyback approach. STATE ADJOINT DESIGN WORK RUN-TIME RUN-TIME WARNING We have a coupling: u depends on y,ȳ and vice versa. Solution Use dynamic load balancing to enforce convergence Load balancing should be mostly stable for efficiency reasons but may still vary. Intuition In the worst case, one can slow down the control updates by reducing the computational resources for this process and crawl along/to the manifold primal and adjoint feasible Manifold M.. 15 of 24

16 of 24 Blurred Oneshot Algorithm 1: Input: u = t u, y = t y, ȳ = tȳ, α, B, (t u,t y,tȳ shared ) 2: while (u, y, ȳ) not converged do 3: Process 0: 4: Monitor processes, determine allocation of resources 5: Process 1: 6: Update α and B 7: Compute design t u = u αb 1 L u (u,t y,tȳ ) 8: Update design u = t u 9: Process 2: 10: Compute state t y = G(t u,y) 11: Update state y = t y 12: Process 3: 13: Compute adjoint tȳ = f y (t u,t y ) + ȳ G y (t u,t y ) 14: Update adjoint ȳ = tȳ 15: end while 16: Output: u, y, ȳ, α, B

Numerical results for Laplace design optimization problem Setup Blurred Oneshot with L-BFGS approximation of projected Hessian, no load balancing (not yet), α by quasi-backtrack Residuals of an typical optimization evaluation 1 Control State Adjoint Residual 1e-05 1e-10 1e-15 0 5000 10000 15000 20000 Iteration. 17 of 24

. 18 of 24 Time dependent problems - Gauger/Günther/Wang/(Griewank) Problem 1 T min f(u, y(τ))dτ such that u,y T 0 y(τ) + c(y(τ),u) = 0 τ [0,T ], y(0) = y τ 0 State y : [0,T ] Y is a function that varies over a time interval [0,T ] R Time discr. 0 = τ 0 < τ 1 < < τ N = T and approx. y y i = y(τ i ) R n Assume I time integration of the constraint is a one-step method a and state y i at time t i is implicitly given by state y i 1 at previous time t i 1, i.e., R(u,y i,y i 1 ) = 0, i = 1,...,N Assume II that there exists a contractive fixed-point solver y k+1 i = G(u,yi k,y i 1 ) : U Y Y to compute solutions lim k yi k of the previous integration scheme R with contraction rate G yi (u,y i,y i 1 ) ρ G < 1. a e.g. implicit Euler - although higher order methods are also possible

. 19 of 24 Extended Fixed-Point iteration Extended Mapping G : U Y Y with Y = Y N = Y Y defined by G 1 (u,y) G(u,y 1,y 0 ) G 2 (u,y) G(u,y G(u,y) =. = 2,G(u,y 1,y 0 )). G N (u,y) G(u,y N,G(u,y N 1,G( G(u,y 1,y 0 )) ))) Contraction is inherited by the extended mapping G, i.e. Extended Problem can be formulated G y (u,y) ρ G = ρ G < 1. min f(u,y) such that y = G(u,y) with y 0 = y(0) u,y where the objective objective functional is approximated by N f(u,y) = i f N (u,y i ) i=1 t 1 T T 0 f(u,y(t))

Comments Interpretation Extended fixed-point mapping can be understood as parallel update on the discrete state approximation for each time-step STATE TIME Lag-Effect and different execution times of G PROCESS RUN-TIME Blurred updates with Load Balancing for different time-steps strongly recommended!. 20 of 24

. 21 of 24 Blurred Oneshot Method for time-dependent Problems 1: Input: u = t u, y = (t y1,...,t yn ), ȳ i = tȳi, α, B 2: while (u,y 1,...,y N,ȳ 1,...,ȳ N ) not converged do 3: Process 0: Monitor processes, determine allocation of resources 4: Process 1: 5: Update α and B 6: Compute design t u = u αb 1 L u (u,t y1,...,t yn,tȳ1,...,tȳn ) 7: Update design u = t u 8: Process 2: 9: Compute state t y1 = G(t u,y 1,y 0 ) 10: Update state y 1 = t y1 11: Process 3: 12: Compute state t y2 = G(t u,y 2,t y1 ) 13: Update state y 2 = t y2 14:... 15: Process N+1: 16: Compute state t yn = G(t u,y N,t yn 1 ) 17: Update state y N = t yn 18: 19: Process N+2: 20: Compute adjoint tȳ1 = f y (t u,t y1 ) + G y (t u,t y1,...,t yn ) ȳ 1 21: Update adjoint ȳ 1 = tȳ1 22:... 23: Process 2N+1: 24: Compute adjoint tȳn = f y (t u,t yn ) + G y (t u,t y1,...,t yn ) ȳ N 25: Update adjoint ȳ N = tȳn 26: end while

Observation Load balance I for the extended primal iteration, e.g. Dynamic Allocation of Computational Resources Allocated Resource Process Load balance II similar for extended adjoint iteration Load balance III between primal and adjoint updates still possible Load balance IV between (primal,adjoint) and design also possible GOAL Allocating the resources to all 2N+2 processes needs to be harmonized like an orchestra to achieve maximal efficiency.. 22 of 24

. 23 of 24 Summary Blurred Piggyback/Oneshot using asynchronous and inconsistent updates General Piggyback/Oneshot method for design optimization Extension to Time dependent problems can be improved by (Optimal) Load Balancing for use in HPC, convergence, efficiency Outlook - TODO It s still a loooooong way to go... Thank you for your attention!

24 of 24 Challenge Shared memory in MPI 3.0 could be a great challenge for AD in the future. Any suggestions? (Except of saying: Don t use it!)