High Order Reverse Mode of AD Theory and Implementation

Similar documents
BERNSTEIN POLYNOMIALS

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Project Networks With Mixed-Time Constraints

Loop Parallelization

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

where the coordinates are related to those in the old frame as follows.

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Recurrence. 1 Definitions and main statements

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Least Squares Fitting of Data

Generalizing the degree sequence problem

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

Ring structure of splines on triangulations

Support Vector Machines

A fast method for binary programming using first-order derivatives, with application to topology optimization with buckling constraints

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Mathematical Derivation of Least Squares

Forecasting the Direction and Strength of Stock Market Movement

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Adaptive Fractal Image Coding in the Frequency Domain

Calculation of Sampling Weights

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Enabling P2P One-view Multi-party Video Conferencing

This circuit than can be reduced to a planar circuit

What is Candidate Sampling

SIMPLE LINEAR CORRELATION

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

SUMMARY. Topology optimization, buckling, eigenvalue, derivative, structural optimization 1. INTRODUCTION

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

A machine vision approach for detecting and inspecting circular parts

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Implementation of Deutsch's Algorithm Using Mathcad

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Fast degree elevation and knot insertion for B-spline curves

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

Fisher Markets and Convex Programs

Compiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

Overview of monitoring and evaluation

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

8 Algorithm for Binary Searching in Trees

Implementation of Boolean Functions through Multiplexers with the Help of Shannon Expansion Theorem

Adaptive Clinical Trials Incorporating Treatment Selection and Evaluation: Methodology and Applications in Multiple Sclerosis

An interactive system for structure-based ASCII art creation

Extending Probabilistic Dynamic Epistemic Logic

Probability and Optimization Models for Racing

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Online Inference of Topics with Latent Dirichlet Allocation

An MILP model for planning of batch plants operating in a campaign-mode

Binomial Link Functions. Lori Murray, Phil Munz

Using Series to Analyze Financial Situations: Present Value

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

O(n) mass matrix inversion for serial manipulators and polypeptide chains using Lie derivatives Kiju Lee, Yunfeng Wang and Gregory S.

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Dynamic Pricing for Smart Grid with Reinforcement Learning

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

L10: Linear discriminants analysis

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management

Copulas. Modeling dependencies in Financial Risk Management. BMI Master Thesis

An Integrated Semantically Correct 2.5D Object Oriented TIN. Andreas Koch

1. Math 210 Finite Mathematics

Economic Interpretation of Regression. Theory and Applications

Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes

Multiple stage amplifiers

On fourth order simultaneously zero-finding method for multiple roots of complex polynomial equations 1

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

A Crossplatform ECG Compression Library for Mobile HealthCare Services

Finite difference method

An Efficient and Simplified Model for Forecasting using SRM

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Alternate Approximation of Concave Cost Functions for

A Study on Secure Data Storage Strategy in Cloud Computing

A generalized hierarchical fair service curve algorithm for high network utilization and link-sharing

Prediction of Disability Frequencies in Life Insurance

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Sensor placement for leak detection and location in water distribution networks

A Fast Incremental Spectral Clustering for Large Data Sets

Comparison of Control Strategies for Shunt Active Power Filter under Different Load Conditions

Quantization Effects in Digital Filters

Efficient Project Portfolio as a tool for Enterprise Risk Management

Transcription:

Hgh Order Reverse Mode of AD Theory and Implementaton Mu Wang and Alex Pothen Department of Computer Scence Purdue Unversty September 30, 2016 Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 1 / 1

Research Overvew Second order reverse mode : More effcent n evaluatng Hessan n both complexty and memory usage n many applcatons. Proved to be equvalent to an varance of vertex elmnaton on the computatonal graph of the gradent Hgh order reverse mode : Hgh order reverse mode : evaluatng dervatve tensor d f up to any order n reverse mode Implementaton : ReverseAD Applcatons: Uncertanty quantfcaton Chemstry : exchange-correlaton (XC) energy functonal Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 2 / 1

Research Overvew Second order reverse mode : More effcent n evaluatng Hessan n both complexty and memory usage n many applcatons. Proved to be equvalent to an varance of vertex elmnaton on the computatonal graph of the gradent Hgh order reverse mode : Hgh order reverse mode : evaluatng dervatve tensor d f up to any order n reverse mode Implementaton : ReverseAD Applcatons: Uncertanty quantfcaton Chemstry : exchange-correlaton (XC) energy functonal Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 2 / 1

Research Overvew Second order reverse mode : More effcent n evaluatng Hessan n both complexty and memory usage n many applcatons. Proved to be equvalent to an varance of vertex elmnaton on the computatonal graph of the gradent Hgh order reverse mode : Hgh order reverse mode : evaluatng dervatve tensor d f up to any order n reverse mode Implementaton : ReverseAD Applcatons: Uncertanty quantfcaton Chemstry : exchange-correlaton (XC) energy functonal Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 2 / 1

Research Overvew Second order reverse mode : More effcent n evaluatng Hessan n both complexty and memory usage n many applcatons. Proved to be equvalent to an varance of vertex elmnaton on the computatonal graph of the gradent Hgh order reverse mode : Hgh order reverse mode : evaluatng dervatve tensor d f up to any order n reverse mode Implementaton : ReverseAD Applcatons: Uncertanty quantfcaton Chemstry : exchange-correlaton (XC) energy functonal Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 2 / 1

Research Overvew Second order reverse mode : More effcent n evaluatng Hessan n both complexty and memory usage n many applcatons. Proved to be equvalent to an varance of vertex elmnaton on the computatonal graph of the gradent Hgh order reverse mode : (ths talk) Hgh order reverse mode : evaluatng dervatve tensor d f up to any order n reverse mode Implementaton : ReverseAD Applcatons: Uncertanty quantfcaton Chemstry : exchange-correlaton (XC) energy functonal Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 2 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Background For a scalar objectve functon f : R n R Frst order AD: Forward : [F1 f ](x, ẋ) = f x ẋ = f T ẋ Reverse : [R1 f ](x) = ( f x 1,, f x 1 ) = f Second order AD: (Pure) Forward : [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ Mxed : [R1 F 1 f ](x, ẋ) = 2 f ẋ (Pure) Reverse : [R2 f ](x) = 2 f Hgh order AD: (Pure) Forward : Hgh order taylor coeffcents (Pure) Reverse : Hgh order reverse mode Mxed modes then can be generated Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 3 / 1

Hgh Order Forward Mode Accumulate hgh order taylor coeffcents 1 : d f : d-th order dervatve tensor (symmetrc). d f ẋ : A tensor-vector product, (d 1)-th order symmetrc tensor [ [[. d f ẋ] ẋ ] ] ẋ : A scalar, the d-th order taylor coeffcents. 1 Grewank, Andreas, Jean Utke, and Andrea Walther. Evaluatng hgher dervatve tensors by forward propagaton of unvarate Taylor seres. Mathematcs of Computaton of the Amercan Mathematcal Socety 69.231 (2000): 1117-1130. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 4 / 1

Hgh Order Forward Mode Accumulate hgh order taylor coeffcents 1 : d f : d-th order dervatve tensor (symmetrc). d f ẋ : A tensor-vector product, (d 1)-th order symmetrc tensor [ [[. d f ẋ] ẋ ] ] ẋ : A scalar, the d-th order taylor coeffcents. d = 2: [F 2 f ](x, ẋ) = 1 2ẋT 2 f ẋ [ 2 f ] j = [F 2 f ](x, e + e j ) [F 2 f ](x, e ) [F 2 f ](x, e j ) 1 Grewank, Andreas, Jean Utke, and Andrea Walther. Evaluatng hgher dervatve tensors by forward propagaton of unvarate Taylor seres. Mathematcs of Computaton of the Amercan Mathematcal Socety 69.231 (2000): 1117-1130. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 4 / 1

Hgh Order Forward Mode Accumulate hgh order taylor coeffcents 1 : d f : d-th order dervatve tensor (symmetrc). d f ẋ : A tensor-vector product, (d 1)-th order symmetrc tensor [ [[. d f ẋ] ẋ ] ] ẋ : A scalar, the d-th order taylor coeffcents. General case: [F d f ](x, ẋ) = 1 d![ [[ d f ẋ] ẋ ] [ d f ] 1 d : a lnear combnaton of ] ẋ {[F d f ](x, ė) : ė Span{e 1,, e d }}} 1 Grewank, Andreas, Jean Utke, and Andrea Walther. Evaluatng hgher dervatve tensors by forward propagaton of unvarate Taylor seres. Mathematcs of Computaton of the Amercan Mathematcal Socety 69.231 (2000): 1117-1130. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 4 / 1

Hgh Order Forward Mode Accumulate hgh order taylor coeffcents 1 : d f : d-th order dervatve tensor (symmetrc). d f ẋ : A tensor-vector product, (d 1)-th order symmetrc tensor [ [[. d f ẋ] ẋ ] ] ẋ : A scalar, the d-th order taylor coeffcents. General case: [F d f ](x, ẋ) = 1 d![ [[ d f ẋ] ẋ ] [ d f ] 1 d : a lnear combnaton of ] ẋ {[F d f ](x, ė) : ė Span{e 1,, e d }}} Complexty : O( ( (n+d 1)) d l) 1 Grewank, Andreas, Jean Utke, and Andrea Walther. Evaluatng hgher dervatve tensors by forward propagaton of unvarate Taylor seres. Mathematcs of Computaton of the Amercan Mathematcal Socety 69.231 (2000): 1117-1130. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 4 / 1

Reverse Mode Revst Defnton After process the SAC v = ϕ (v j ) {vj :v j v } n reverse mode, the process SACs defne an equvalent functon f (S ). The objectve functon s the composton of f and the remanng SACs and S s the current lve varable set. Observaton reverse mode computes the dervatves of f (S ) n each step by followng the order chan rule. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 5 / 1

Reverse Mode Revst Defnton After process the SAC v = ϕ (v j ) {vj :v j v } n reverse mode, the process SACs defne an equvalent functon f (S ). The objectve functon s the composton of f and the remanng SACs and S s the current lve varable set. Observaton Second order reverse mode computes the frst and the second order dervatves of f (S ) n each step by followng the frst and second order chan rule. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 5 / 1

Reverse Mode Revst Defnton After process the SAC v = ϕ (v j ) {vj :v j v } n reverse mode, the process SACs defne an equvalent functon f (S ). The objectve functon s the composton of f and the remanng SACs and S s the current lve varable set. Observaton Second order Hgh order reverse mode computes the frst and the second order dervatves up to order d of f (S ) n each step by followng the frst and second hgh order chan rule. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 5 / 1

Hgh Order Chan Rule Observaton Hgh order reverse mode computes the dervatves up to order d of f n each step by followng the hgh order chan rule. When process v = ϕ (v j ) {vj :v j v }: S = S +1 \ {v } {v j : v j v } f (S ) = f +1 (S +1 \ {v }, v = ϕ (v j ) {vj :v j v }) Hgh order chan rule: dervatves of f +1 (S +1 ) dervatves of f (S ) General case of Faà d Bruno equaton Specal case of the equaton n Ma, 2009 2 2 Ma, Tsoy-Wo. Hgher chan formula proved by combnatorcs. the electronc journal of combnatorcs 16.1 (2009): N21. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 6 / 1

Hgh Order Chan Rule Observaton Hgh order reverse mode computes the dervatves up to order d of f n each step by followng the hgh order chan rule. When process v = ϕ (v j ) {vj :v j v }: S = S +1 \ {v } {v j : v j v } f (S ) = f +1 (S +1 \ {v }, v = ϕ (v j ) {vj :v j v }) Hgh order chan rule: dervatves of f +1 (S +1 ) dervatves of f (S ) General case of Faà d Bruno equaton Specal case of the equaton n Ma, 2009 2 2 Ma, Tsoy-Wo. Hgher chan formula proved by combnatorcs. the electronc journal of combnatorcs 16.1 (2009): N21. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 6 / 1

Hgh Order Chan Rule Multset A multset D s a generalzaton of the noton of a set n whch members are allowed to appear more than once. We use D S to represent the famly of all multsets over S. That s: D S = {D : D = {e 1, e 2,, e d }, e S, 1 d} Dervatve Mappng For a functon f (S), ts order d dervatve tensor can be represented as a mappng from D D S, D = d to R as: T f (D) = D f D = D f v 1 v 2 v D Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 7 / 1

Hgh Order Chan Rule Multset A multset D s a generalzaton of the noton of a set n whch members are allowed to appear more than once. We use D S to represent the famly of all multsets over S. That s: D S = {D : D = {e 1, e 2,, e d }, e S, 1 d} Dervatve Mappng For a functon f (S), ts order d dervatve tensor can be represented as a mappng from D D S, D = d to R as: T f (D) = D f D = D f v 1 v 2 v D Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 7 / 1

Hgh Order Chan Rule Multset A multset D s a generalzaton of the noton of a set n whch members are allowed to appear more than once. We use D S to represent the famly of all multsets over S. That s: D S = {D : D = {e 1, e 2,, e d }, e S, 1 d} Dervatve Mappng For a functon f (S), ts order d dervatve tensor can be represented as a mappng from D D S, D = d to R as: T f (D) = D f D = D f v 1 v 2 v D Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 7 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Frst order : D = {v} a (v) = a +1 (v) + v a +1(v ) v a +1(v ) : D L =, D 1 = {v} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 8 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Frst order : D = {v} a (v) = a +1 (v) + v a +1(v ) v a +1(v ) : D L =, D 1 = {v} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 8 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Frst order : D = {v} a (v) = a +1 (v) + v a +1(v ) v a +1(v ) : D L =, D 1 = {v} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 8 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Frst order : D = {v} a (v) = a +1 (v) + v a +1(v ) v a +1(v ) : D L =, D 1 = {v} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 8 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Frst order : D = {v} a (v) = a +1 (v) + v a +1(v ) v a +1(v ) : D L =, D 1 = {v} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 8 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Second order : D = {v, u} h (v, u) = h +1 (v, u) + v h +1(v, u) + u h +1(v, v ) + v u h +1(v, v ) + 2 ϕ v u a +1(v ) Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 9 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Second order : D = {v, u} h (v, u) = h +1 (v, u) + v h +1(v, u) + u h +1(v, v ) + v u h +1(v, v ) + 2 ϕ v u a +1(v ) Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 9 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Second order : D = {v, u} h (v, u) = h +1 (v, u) + v h +1(v, u) + u h +1(v, v ) + v u h +1(v, v ) + 2 ϕ v u a +1(v ) v h +1(v, u) : D L = {u}, D 1 = {v} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 9 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Second order : D = {v, u} h (v, u) = h +1 (v, u) + v h +1(v, u) + u h +1(v, v ) + v u h +1(v, v ) + 2 ϕ v u a +1(v ) u h +1(v, v ) : D L = {v}, D 1 = {u} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 9 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Second order : D = {v, u} h (v, u) = h +1 (v, u) + v h +1(v, u) + u h +1(v, v ) + v u h +1(v, v ) + 2 ϕ v u a +1(v ) v u h +1(v, v ) : D L =, D 1 = {v}, D 2 = {u} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 9 / 1

Hgh Order Chan Rule T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D. Second order : D = {v, u} h (v, u) = h +1 (v, u) + v h +1(v, u) + u h +1(v, v ) + v u h +1(v, v ) + 2 ϕ v u a +1(v ) 2 ϕ v u a +1(v ) : D L =, D 1 = {v, u} Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 9 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Complexty T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L D L, D 1,, D r s a partton of D. B d+1 summatons : (d + 1) th Bell number. O(B d+1 s d 1 ) updates for each SAC. Overall complexty : O(B d+1 s d 1 l), s = max{s } ϕ ] T f+1 (D L {v r }) D 1 D r When d = 1 : O(l) Baur-Strassen theorem. When d = 2 : O(l s) second order reverse mode When d = 3 : O(l s 2 ) thrd order reverse mode. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 10 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (D L, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (D L, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (DL, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (DL, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (DL, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (DL, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r D L, D 1,, D r s a partton of D Generate all D L, s.t, T f+1 (D L {v r }) 0 and D 1,, D r, s.t, 0, 1 r D Then perform ncremental updates on D = D L D 1 D r More than one way to partton D nto D L, D 1,, D r. SymCoeff (DL, D 1,, D r ) : Multplcty that partton D nto D L, D 1,, D r. Flat code for pre-computed symmetrc coeffcents 5k lnes of generated code for up to sxth order Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r ReverseAD : an operator overloadng mplementaton of the hgh order reverse mode n C++11. Avalable at https://gthub.com/wangmu0701/reversead. Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Hgh Order Reverse Mode : Implementaton T f (D) = T f+1 (D) + D L D [ D D j = D 1 D r =D\D L ϕ ] T f+1 (D L {v r }) D 1 D r ReverseAD : an operator overloadng mplementaton of the hgh order reverse mode n C++11. Avalable at https://gthub.com/wangmu0701/reversead. Monotonc ndexng for varables on the trace v j v = ndex(v j ) < ndex(v ) Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 11 / 1

Performance : Synthetc Functon A synthetc functon desgned wth parameters: n : number of ndependent varables s : sze of lve varables durng the functon evaluaton l : the complexty of the functon Dense dervatves z z, s 2.0 + z 2.0, y = t, =1 z 2.0 0.5, ID(z) = log(exp(z)), t = ID k ID 1 (z ), z = t, t = n x. =1 1.0/(1.0/z), sn(asn(z)). Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 12 / 1

Performance : Synthetc Functon A synthetc functon desgned wth parameters: n : number of ndependent varables s : sze of lve varables durng the functon evaluaton l : the complexty of the functon Dense dervatves z z, s 2.0 + z 2.0, y = t, =1 z 2.0 0.5, ID(z) = log(exp(z)), t = ID k ID 1 (z ), z = t, t = n x. =1 1.0/(1.0/z), sn(asn(z)). Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 12 / 1

Performance : Synthetc Functon Fxed l, let n and s change smultaneously Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 13 / 1

Performance : Synthetc Functon Fxed l, let n and s change smultaneously Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 13 / 1

Performance : Synthetc Functon Fxed l and n, changed s Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 14 / 1

Performance : Synthetc Functon Fxed l and n, changed s Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 14 / 1

Performance : Synthetc Functon Fxed l and s, changed n Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 15 / 1

Performance : Synthetc Functon Fxed l and s, changed n Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 15 / 1

Applcaton : XCFUN (on gong) Arbtrary order Exchange-Correlaton functonal lbrary https://gthub.com/dftlbs/xcfun Usng lbtaylor to evaluate dervatves of functonals Up to thrd order n current mplementaton Small number of ndependents : 20 at most Not so-complex functonals On a collecton of functonals: Thrd order Lbtaylor : 81ms Thrd order ReverseAD : 20ms Fourth order ReverseAD : 83ms Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 16 / 1

Applcaton : XCFUN (on gong) Arbtrary order Exchange-Correlaton functonal lbrary https://gthub.com/dftlbs/xcfun Usng lbtaylor to evaluate dervatves of functonals Up to thrd order n current mplementaton Small number of ndependents : 20 at most Not so-complex functonals On a collecton of functonals: Thrd order Lbtaylor : 81ms Thrd order ReverseAD : 20ms Fourth order ReverseAD : 83ms Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 16 / 1

Concluson and Future Work Hgh order dervatve tensors could (and probably should) be drectly evaluated va reverse mode. A seres of algorthms to evaluate dervatves T f up to order d : F d F 1 R d 1 R d R d : symmetrc dervatve tensor d f F 1 R[ d 1 : tensor-vector d f ẋ [[ d f ẋ] ẋ ] ] ẋ Fd : The structural (and sparsty) propertes of Tf determnes the optmal method. General compresson and recovery usng F1 R d 1. perfectly parallelzable Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 17 / 1

Concluson and Future Work Hgh order dervatve tensors could (and probably should) be drectly evaluated va reverse mode. A seres of algorthms to evaluate dervatves T f up to order d : F d F 1 R d 1 R d R d : symmetrc dervatve tensor d f F 1 R[ d 1 : tensor-vector d f ẋ [[ d f ẋ] ẋ ] ] ẋ Fd : The structural (and sparsty) propertes of Tf determnes the optmal method. General compresson and recovery usng F1 R d 1. perfectly parallelzable Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 17 / 1

Concluson and Future Work Hgh order dervatve tensors could (and probably should) be drectly evaluated va reverse mode. A seres of algorthms to evaluate dervatves T f up to order d : F d F 1 R d 1 R d R d : symmetrc dervatve tensor d f F 1 R[ d 1 : tensor-vector d f ẋ [[ F d : d f ẋ] ẋ ] ] ẋ The structural (and sparsty) propertes of T f determnes the optmal method. General compresson and recovery usng F1 R d 1. perfectly parallelzable Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 17 / 1

Concluson and Future Work Hgh order dervatve tensors could (and probably should) be drectly evaluated va reverse mode. A seres of algorthms to evaluate dervatves T f up to order d : F d F 1 R d 1 R d R d : symmetrc dervatve tensor d f F 1 R[ d 1 : tensor-vector d f ẋ [[ F d : d f ẋ] ẋ ] ] ẋ The structural (and sparsty) propertes of T f determnes the optmal method. General compresson and recovery usng F1 R d 1. perfectly parallelzable Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 17 / 1

Concluson and Future Work Hgh order dervatve tensors could (and probably should) be drectly evaluated va reverse mode. A seres of algorthms to evaluate dervatves T f up to order d : F d F 1 R d 1 R d R d : symmetrc dervatve tensor d f F 1 R[ d 1 : tensor-vector d f ẋ [[ F d : d f ẋ] ẋ ] ] ẋ The structural (and sparsty) propertes of T f determnes the optmal method. General compresson and recovery usng F1 R d 1. perfectly parallelzable Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 17 / 1

Concluson and Future Work Hgh order dervatve tensors could (and probably should) be drectly evaluated va reverse mode. A seres of algorthms to evaluate dervatves T f up to order d : F d F 1 R d 1 R d R d : symmetrc dervatve tensor d f F 1 R[ d 1 : tensor-vector d f ẋ [[ F d : d f ẋ] ẋ ] ] ẋ The structural (and sparsty) propertes of T f determnes the optmal method. General compresson and recovery usng F1 R d 1. perfectly parallelzable Mu Wang and Alex Pothen Hgh Order Reverse AD September 30, 2016 17 / 1