Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

Outline 1 Simulation Metamodeling introduction and overview 2 Multi-Level Monte Carlo Metamodeling with Imry Rosenbaum http: //users.iems.northwestern.edu/~staum/mlmcm.pdf 3 Generalized Integrated Brownian Fields for Simulation Metamodeling with Peter Salemi and Barry L. Nelson http://users.iems.northwestern.edu/~staumgibf.pdf

MCQMC / IBC Application Domain Industrial Engineering & Operations Research using math to analyze systems and improve decisions Stochastic Simulation: production, logistics, financial,... integration: µ = E[Y ] = Y (ω) dω parametric integration: approx. µ def. by µ(x) = E[Y (x)] optimization: min{µ(x) : x X }

What is Stochastic Simulation Metamodeling? Stochastic simulation model example Fuel injector production line System performance measure µ(x) = E[Y (x)] x: design of production line Y (x): number of fuel injectors produced Simulating each scenario (20 replications) takes 8 hours Stochastic simulation metamodeling Simulation output Ȳ (x i ) at x i, i = 1,..., n Predict µ(x) by ˆµ(x), even without simulating at x ˆµ(x) is usually a weighted average of Ȳ (x 1 ),..., Ȳ (x n )

Overview of Multi-Level Monte Carlo (MLMC) Error in Stochastic Simulation Metamodeling prediction ˆµ(x) = k w i (x)ȳ (x i) i=1 variance: Var[ˆµ(x)] caused by variance of simulation output interpolation error (bias): E[ˆµ(x)] µ(x)

Main Idea of Multi-Level Monte Carlo Ordinary Monte Carlo to reduce variance: large number n of replications per simulation run (design point) to reduce bias: large number k of design points (fine grid) very large computational effort kn Multi-Level Monte Carlo to reduce variance: coarser grids, many replications each to reduce bias: finer grids, few replications each less computational effort / better convergence rate

Our Contributions Theoretical mix and match or expand (from Heinrich s papers): derive desired conclusions under desired assumptions to suit IE goals and applications Practical algorithm design (based on Giles) Experimental show how much MLMC speeds up realistic examples in IE

Heinrich (2001): MLMC for Parametric Integration Assumptions Approximate µ given by µ(x) = Ω Y (x, ω) dω over x X. X R d and Ω R d 2 : bounded, open, Lipschitz boundary. With respect to x, Y has weak derivatives up to rth order. Y and weak derivatives are L q -integrable in (x, ω). Sobolev embedding condition: r/d > 1/q. Measure error as ( Ω ˆµ µ p q dω) 1/p, where p = min{2, q}. Conclusion: There is a MLMC method with optimal rate. MLMC attains the best rate of convergence in C, the number of evaluations of Y. The error bound is proportional to C r/d if r/d < 1 1/p C 1/p 1 log C if r/d = 1 1/p C 1/p 1 if r/d > 1 1/p.

Assumptions Smoothness: assume r = 1 Stock option, Y (x, ω) = max{xr(ω) K, 0} Queueing: waiting time W n+1 = max{w n + B n A n+1, 0} Inventory: S n = min{i n + P n, D n }, I n+1 = I n S n Parameter Domain Assume X R d is compact (not open). Heinrich and Sindambiwe (1999), Daun and Heinrich (2014) If X were open, we would have to extrapolate. No need to approximate unbounded µ near a boundary of X. Domain of Integration Ω R d 2 is not important; d 2 does not appear in theorem.

Changing Perspective Measure of Error Use p = q = 2 to get Root Mean Integrated Squared Error ( X (ˆµ(x) µ(x))2 dx dω ) 1/2 Ω

Changing Perspective Measure of Error Use p = q = 2 to get Root Mean Integrated Squared Error ( X (ˆµ(x) µ(x))2 dx dω ) 1/2 Ω Sobolev Embedding Criterion with r = 1, q = 2 r/d > 1/q becomes 1/d > 1/2, i.e. d = 1!??

Changing Perspective Measure of Error Use p = q = 2 to get Root Mean Integrated Squared Error ( X (ˆµ(x) µ(x))2 dx dω ) 1/2 Ω Sobolev Embedding Criterion with r = 1, q = 2 r/d > 1/q becomes 1/d > 1/2, i.e. d = 1!?? Why We Don t Need the Sobolev Embedding Condition Assume the domain X is compact. Assume Y (, ω) is (almost surely) Lipschitz continuous. Conclude Y (, ω) is (almost surely) bounded.

Our Assumptions On the Stochastic Simulation Metamodeling Problem X R d is compact Y (x) has finite variance for all x X Y (x, ω) Y (x, ω) κ(ω) x x, a.s., and E[κ 2 ] <. On the Approximation Method and MLMC Design ˆµ(x) = N i=1 w i(x)ȳ (x i ) where each w i (x) 0 and Total weight on points x i far from x gets close to 0. Total weight on points x i near x gets close to 1. Thresholds for far / near and close to are O(N 1/2φ ) as number N of points increases. Examples: piecewise linear interpolation on a grid; nearest-neighbors, Shepard s method, kernel smoothing

Approximation Method Used in Examples Kernel Smoothing ˆµ(x) = N w i (x)ȳ (x i ) i=1 weight w i (x) is 0 if x i is outside the cell containing x otherwise, proportional to exp( x x i ) weights are normalized to sum to 1

Our Conclusions MLMC Performance As number N of points used in a level increases, Errors due to bias and refinement variance are like O(N 1/φ ). Example: nearest-neighbor approximation on grid, φ = d/2 Computational Complexity (based on Giles 2013) To attain RMISE < ɛ, the required number of evaluations of Y is O(ɛ 2(1+φ) ) for standard Monte Carlo and for MLMC it is O(ɛ 2φ ) if φ > 1 O((ɛ 1 (log ɛ 1 )) 2 ) if φ = 1 O(ɛ 2 ) if φ < 1.

Sketch of Algorithm (based on Giles 2008) Goal: add levels until target RMISE < ɛ is achieved. 1 INITIALIZE level l = 0. 2 SIMULATE at level l: 1 Run level l simulation experiment with M 0 replications. 2 Observe sample variance of simulation output. 3 Choose number of replications M l to control variance; run more replications if needed. 3 TEST CONVERGENCE: 1 Use Monte Carlo to estimate the size of the refinement ˆµ l, X ( ˆµ l(x)) 2 dx. 2 If refinements are too large compared to target RMISE, increment l and return to step 2. 4 CLEAN UP: Finalize number of replications M 0,..., M l to control variance; run more replications at each level if needed.

Asian Option Example, d = 3 MLMC up to 150 times better than standard Monte Carlo

Inventory System Example, d = 2 MLMC was 130-8900 times better than standard Monte Carlo

Conclusion on Multi-Level Monte Carlo Celebration Multi-Level Monte Carlo works for typical IE stochastic simulation metamodeling too! Future Research Handle discontinuities in simulation output. Combine with good experiment designs. Grids are not good in high dimension.

Introduction: Generalized Integrated Brownian Field Kriging / Interpolating Splines Pretend µ is a realization of a Gaussian random field M with mean function m and covariance function σ 2. Kriging predictor: ˆµ(x) = m(x) + σ 2 (x)σ 1 (Ȳ m) = m(x) + i β i σ 2 (x, x i ) σ 2 (x) is a vector with ith element σ 2 (x, x i ) Σ is a matrix with i, jth element σ 2 (x i, x j ) Ȳ m is a vector with ith element Ȳ (x i) m(x i ) Stochastic Kriging / Smoothing Splines ˆµ(x) = m(x) + σ 2 (x)(σ + C) 1 (Ȳ m) = m(x) + i β i σ 2 (x, x i ) C = covariance matrix of noise, estimated from replications

Radial Basis Functions vs. Integrated Brownian Field Radial Basis Functions Basis Functions from r-fold Integrated Brownian Field (d) r = 0 (e) r = 1 (f) r = 2

Response Surfaces in IE Stochastic Simulation (g) Credit Risk (h) Inventory

r-integrated Brownian Field B r Covariance function / reproducing kernel σ 2 (x, y) = d i=1 1 (r!) 2 1 0 (x i u i ) r +(y i u i ) r + du i Inner product f, g = (f ([r r]) (u))(g ([r r]) (u)) du (0,1) d Space Tensor product of Sobolev Hilbert space H r (0, 1) with boundary conditions f (j) (0) = 0 for j = 0,..., r What s missing? polynomials of degree r

Removing Boundary Conditions: d = 1 Generalized integrated Brownian motion r x k X r (x) = θk Z k k! + θ r+1 B r (x) k=0 k=0 Covariance function / reproducing kernel r σ 2 x k y k 1 (x, y) = θ k (k!) 2 + θ (x u) r +(y u) r + r+1 (r!) 2 Sobolev space H r (0, 1), no boundary conditions Inner product f, g = r k=0 1 (f (k) (u))(g (k) (u)) + 1 θ k θ r+1 0 1 0 du (f (r) (u))(g (r) (u)) du

Multidimensional, Without Boundary Conditions Tensor-Product RKHS with Weights Example of reproducing kernel for d = 2, r = 1 K(x, y) = θ 00 + θ 10 x 1 y 1 + θ 20 (x 1 y 1 ) + θ 01 x 2 y 2 + θ 02 (x 2 y 2 ) +θ 11 x 1 x 2 y 1 y 2 + θ 12 x 1 y 1 (x 2 y 2 ) +θ 21 (x 1 y 1 )x 2 y 2 + θ 22 (x 1 y 1 )(x 2 y 2 ) In general, one weight for each of d i=1 (r i + 2) subspaces.

Multidimensional, Without Boundary Conditions Tensor-Product RKHS with Weights Example of reproducing kernel for d = 2, r = 1 K(x, y) = θ 00 + θ 10 x 1 y 1 + θ 20 (x 1 y 1 ) + θ 01 x 2 y 2 + θ 02 (x 2 y 2 ) +θ 11 x 1 x 2 y 1 y 2 + θ 12 x 1 y 1 (x 2 y 2 ) +θ 21 (x 1 y 1 )x 2 y 2 + θ 22 (x 1 y 1 )(x 2 y 2 ) In general, one weight for each of d i=1 (r i + 2) subspaces. Generalized Integrated Brownian Field Covariance function / reproducing kernel σ 2 (x, y) = d i=1 ( ri k=0 xi k y k 1 i θ i,k (k!) 2 + θ (x i u i ) r i + (y ) i u i ) r i + i,r i +1 0 (r i!) 2 du i In general, number of weights is d i=1 (r i + 2).

Our Contributions more parsimonious parametrization makes maximum likelihood estimation easier and MLE search for r 1,..., r d GIBF has Markov property d = 1: proof d > 1: conjecture IE simulation examples stochastic and deterministic simulation standard and nonstandard information

Credit Risk Example, d = 2 Experiment design: 63 Sobol points, predictions in a smaller square Factor by which MISE decreased using (1,1)-GIBF Number of replications 100 25 Noise level none low medium Without gradient estimates 94 111 120 With gradient estimates 81 83 69 (i) Credit risk surface (j) Gaussian (k) (1, 1)-GIBF

Conclusion on Generalized Integrated Brownian Field Emancipating Simulation Metamodeling from Geostatistics a new covariance function for kriging, designed for simulation metamodeling in engineering Superior Practical Performance 4-120 times better than Gaussian covariance function in 2-6 dimensional examples with or without gradient information

Thank You!