Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Similar documents
Lecture 3: Linear methods for classification

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Linear Threshold Units

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Basics of Statistical Machine Learning

Monte Carlo Simulation

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Machine Learning and Pattern Recognition Logistic Regression

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Introduction to General and Generalized Linear Models

Statistical Machine Learning

Optimal order placement in a limit order book. Adrien de Larrard and Xin Guo. Laboratoire de Probabilités, Univ Paris VI & UC Berkeley

Statistics Graduate Courses

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Chapter 2: Binomial Methods and the Black-Scholes Formula

Moving Least Squares Approximation

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

From CFD to computational finance (and back again?)

Monte Carlo Methods in Finance

Designing a learning system

Monte Carlo Methods and Models in Finance and Insurance

Pricing and calibration in local volatility models via fast quantization

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Linear Models for Classification

Two-Stage Stochastic Linear Programs

Econometrics Simple Linear Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Classification Problems

STA 4273H: Statistical Machine Learning

Nonlinear Regression:

Convolution. 1D Formula: 2D Formula: Example on the web:

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

An Introduction to Machine Learning

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

How To Understand The Theory Of Probability

Introduction to Support Vector Machines. Colin Campbell, Bristol University

A General Approach to Variance Estimation under Imputation for Missing Survey Data

1 The Brownian bridge construction

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Erdős on polynomials

A Study on the Comparison of Electricity Forecasting Models: Korea and China

Simulating Stochastic Differential Equations

Poisson Models for Count Data

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Inner Product Spaces

Lectures on Stochastic Processes. William G. Faris

Christfried Webers. Canberra February June 2015

Lecture 2: ARMA(p,q) models (part 3)

Natural cubic splines

The Kelly criterion for spread bets

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

MACHINE LEARNING IN HIGH ENERGY PHYSICS

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg University of Mannheim. Econ 714,

Stochastic Inventory Control

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Support Vector Machines for Classification and Regression

Credit Risk Models: An Overview

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Geography 4203 / GIS Modeling. Class (Block) 9: Variogram & Kriging

Simple Linear Regression Inference

Simulation-based optimization methods for urban transportation problems. Carolina Osorio

Is a Brownian motion skew?

Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Calculating VaR. Capital Market Risk Advisors CMRA

Equity-Based Insurance Guarantees Conference November 1-2, New York, NY. Operational Risks

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Error estimates for nearly degenerate finite elements

MATHEMATICAL METHODS OF STATISTICS

Lecture 2: Universality

BANACH AND HILBERT SPACE REVIEW

Sales forecasting # 2

IL GOES OCAL A TWO-FACTOR LOCAL VOLATILITY MODEL FOR OIL AND OTHER COMMODITIES 15 // MAY // 2014

Stochastic Gradient Method: Applications

Monte Carlo-based statistical methods (MASM11/FMS091)

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Simple and efficient online algorithms for real world applications

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

The Ergodic Theorem and randomness

Additional sources Compilation of sources:

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Transcription:

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

Outline 1 Simulation Metamodeling introduction and overview 2 Multi-Level Monte Carlo Metamodeling with Imry Rosenbaum http: //users.iems.northwestern.edu/~staum/mlmcm.pdf 3 Generalized Integrated Brownian Fields for Simulation Metamodeling with Peter Salemi and Barry L. Nelson http://users.iems.northwestern.edu/~staumgibf.pdf

MCQMC / IBC Application Domain Industrial Engineering & Operations Research using math to analyze systems and improve decisions Stochastic Simulation: production, logistics, financial,... integration: µ = E[Y ] = Y (ω) dω parametric integration: approx. µ def. by µ(x) = E[Y (x)] optimization: min{µ(x) : x X }

What is Stochastic Simulation Metamodeling? Stochastic simulation model example Fuel injector production line System performance measure µ(x) = E[Y (x)] x: design of production line Y (x): number of fuel injectors produced Simulating each scenario (20 replications) takes 8 hours Stochastic simulation metamodeling Simulation output Ȳ (x i ) at x i, i = 1,..., n Predict µ(x) by ˆµ(x), even without simulating at x ˆµ(x) is usually a weighted average of Ȳ (x 1 ),..., Ȳ (x n )

Overview of Multi-Level Monte Carlo (MLMC) Error in Stochastic Simulation Metamodeling prediction ˆµ(x) = k w i (x)ȳ (x i) i=1 variance: Var[ˆµ(x)] caused by variance of simulation output interpolation error (bias): E[ˆµ(x)] µ(x)

Main Idea of Multi-Level Monte Carlo Ordinary Monte Carlo to reduce variance: large number n of replications per simulation run (design point) to reduce bias: large number k of design points (fine grid) very large computational effort kn Multi-Level Monte Carlo to reduce variance: coarser grids, many replications each to reduce bias: finer grids, few replications each less computational effort / better convergence rate

Our Contributions Theoretical mix and match or expand (from Heinrich s papers): derive desired conclusions under desired assumptions to suit IE goals and applications Practical algorithm design (based on Giles) Experimental show how much MLMC speeds up realistic examples in IE

Heinrich (2001): MLMC for Parametric Integration Assumptions Approximate µ given by µ(x) = Ω Y (x, ω) dω over x X. X R d and Ω R d 2 : bounded, open, Lipschitz boundary. With respect to x, Y has weak derivatives up to rth order. Y and weak derivatives are L q -integrable in (x, ω). Sobolev embedding condition: r/d > 1/q. Measure error as ( Ω ˆµ µ p q dω) 1/p, where p = min{2, q}. Conclusion: There is a MLMC method with optimal rate. MLMC attains the best rate of convergence in C, the number of evaluations of Y. The error bound is proportional to C r/d if r/d < 1 1/p C 1/p 1 log C if r/d = 1 1/p C 1/p 1 if r/d > 1 1/p.

Assumptions Smoothness: assume r = 1 Stock option, Y (x, ω) = max{xr(ω) K, 0} Queueing: waiting time W n+1 = max{w n + B n A n+1, 0} Inventory: S n = min{i n + P n, D n }, I n+1 = I n S n Parameter Domain Assume X R d is compact (not open). Heinrich and Sindambiwe (1999), Daun and Heinrich (2014) If X were open, we would have to extrapolate. No need to approximate unbounded µ near a boundary of X. Domain of Integration Ω R d 2 is not important; d 2 does not appear in theorem.

Changing Perspective Measure of Error Use p = q = 2 to get Root Mean Integrated Squared Error ( X (ˆµ(x) µ(x))2 dx dω ) 1/2 Ω

Changing Perspective Measure of Error Use p = q = 2 to get Root Mean Integrated Squared Error ( X (ˆµ(x) µ(x))2 dx dω ) 1/2 Ω Sobolev Embedding Criterion with r = 1, q = 2 r/d > 1/q becomes 1/d > 1/2, i.e. d = 1!??

Changing Perspective Measure of Error Use p = q = 2 to get Root Mean Integrated Squared Error ( X (ˆµ(x) µ(x))2 dx dω ) 1/2 Ω Sobolev Embedding Criterion with r = 1, q = 2 r/d > 1/q becomes 1/d > 1/2, i.e. d = 1!?? Why We Don t Need the Sobolev Embedding Condition Assume the domain X is compact. Assume Y (, ω) is (almost surely) Lipschitz continuous. Conclude Y (, ω) is (almost surely) bounded.

Our Assumptions On the Stochastic Simulation Metamodeling Problem X R d is compact Y (x) has finite variance for all x X Y (x, ω) Y (x, ω) κ(ω) x x, a.s., and E[κ 2 ] <. On the Approximation Method and MLMC Design ˆµ(x) = N i=1 w i(x)ȳ (x i ) where each w i (x) 0 and Total weight on points x i far from x gets close to 0. Total weight on points x i near x gets close to 1. Thresholds for far / near and close to are O(N 1/2φ ) as number N of points increases. Examples: piecewise linear interpolation on a grid; nearest-neighbors, Shepard s method, kernel smoothing

Approximation Method Used in Examples Kernel Smoothing ˆµ(x) = N w i (x)ȳ (x i ) i=1 weight w i (x) is 0 if x i is outside the cell containing x otherwise, proportional to exp( x x i ) weights are normalized to sum to 1

Our Conclusions MLMC Performance As number N of points used in a level increases, Errors due to bias and refinement variance are like O(N 1/φ ). Example: nearest-neighbor approximation on grid, φ = d/2 Computational Complexity (based on Giles 2013) To attain RMISE < ɛ, the required number of evaluations of Y is O(ɛ 2(1+φ) ) for standard Monte Carlo and for MLMC it is O(ɛ 2φ ) if φ > 1 O((ɛ 1 (log ɛ 1 )) 2 ) if φ = 1 O(ɛ 2 ) if φ < 1.

Sketch of Algorithm (based on Giles 2008) Goal: add levels until target RMISE < ɛ is achieved. 1 INITIALIZE level l = 0. 2 SIMULATE at level l: 1 Run level l simulation experiment with M 0 replications. 2 Observe sample variance of simulation output. 3 Choose number of replications M l to control variance; run more replications if needed. 3 TEST CONVERGENCE: 1 Use Monte Carlo to estimate the size of the refinement ˆµ l, X ( ˆµ l(x)) 2 dx. 2 If refinements are too large compared to target RMISE, increment l and return to step 2. 4 CLEAN UP: Finalize number of replications M 0,..., M l to control variance; run more replications at each level if needed.

Asian Option Example, d = 3 MLMC up to 150 times better than standard Monte Carlo

Inventory System Example, d = 2 MLMC was 130-8900 times better than standard Monte Carlo

Conclusion on Multi-Level Monte Carlo Celebration Multi-Level Monte Carlo works for typical IE stochastic simulation metamodeling too! Future Research Handle discontinuities in simulation output. Combine with good experiment designs. Grids are not good in high dimension.

Introduction: Generalized Integrated Brownian Field Kriging / Interpolating Splines Pretend µ is a realization of a Gaussian random field M with mean function m and covariance function σ 2. Kriging predictor: ˆµ(x) = m(x) + σ 2 (x)σ 1 (Ȳ m) = m(x) + i β i σ 2 (x, x i ) σ 2 (x) is a vector with ith element σ 2 (x, x i ) Σ is a matrix with i, jth element σ 2 (x i, x j ) Ȳ m is a vector with ith element Ȳ (x i) m(x i ) Stochastic Kriging / Smoothing Splines ˆµ(x) = m(x) + σ 2 (x)(σ + C) 1 (Ȳ m) = m(x) + i β i σ 2 (x, x i ) C = covariance matrix of noise, estimated from replications

Radial Basis Functions vs. Integrated Brownian Field Radial Basis Functions Basis Functions from r-fold Integrated Brownian Field (d) r = 0 (e) r = 1 (f) r = 2

Response Surfaces in IE Stochastic Simulation (g) Credit Risk (h) Inventory

r-integrated Brownian Field B r Covariance function / reproducing kernel σ 2 (x, y) = d i=1 1 (r!) 2 1 0 (x i u i ) r +(y i u i ) r + du i Inner product f, g = (f ([r r]) (u))(g ([r r]) (u)) du (0,1) d Space Tensor product of Sobolev Hilbert space H r (0, 1) with boundary conditions f (j) (0) = 0 for j = 0,..., r What s missing? polynomials of degree r

Removing Boundary Conditions: d = 1 Generalized integrated Brownian motion r x k X r (x) = θk Z k k! + θ r+1 B r (x) k=0 k=0 Covariance function / reproducing kernel r σ 2 x k y k 1 (x, y) = θ k (k!) 2 + θ (x u) r +(y u) r + r+1 (r!) 2 Sobolev space H r (0, 1), no boundary conditions Inner product f, g = r k=0 1 (f (k) (u))(g (k) (u)) + 1 θ k θ r+1 0 1 0 du (f (r) (u))(g (r) (u)) du

Multidimensional, Without Boundary Conditions Tensor-Product RKHS with Weights Example of reproducing kernel for d = 2, r = 1 K(x, y) = θ 00 + θ 10 x 1 y 1 + θ 20 (x 1 y 1 ) + θ 01 x 2 y 2 + θ 02 (x 2 y 2 ) +θ 11 x 1 x 2 y 1 y 2 + θ 12 x 1 y 1 (x 2 y 2 ) +θ 21 (x 1 y 1 )x 2 y 2 + θ 22 (x 1 y 1 )(x 2 y 2 ) In general, one weight for each of d i=1 (r i + 2) subspaces.

Multidimensional, Without Boundary Conditions Tensor-Product RKHS with Weights Example of reproducing kernel for d = 2, r = 1 K(x, y) = θ 00 + θ 10 x 1 y 1 + θ 20 (x 1 y 1 ) + θ 01 x 2 y 2 + θ 02 (x 2 y 2 ) +θ 11 x 1 x 2 y 1 y 2 + θ 12 x 1 y 1 (x 2 y 2 ) +θ 21 (x 1 y 1 )x 2 y 2 + θ 22 (x 1 y 1 )(x 2 y 2 ) In general, one weight for each of d i=1 (r i + 2) subspaces. Generalized Integrated Brownian Field Covariance function / reproducing kernel σ 2 (x, y) = d i=1 ( ri k=0 xi k y k 1 i θ i,k (k!) 2 + θ (x i u i ) r i + (y ) i u i ) r i + i,r i +1 0 (r i!) 2 du i In general, number of weights is d i=1 (r i + 2).

Our Contributions more parsimonious parametrization makes maximum likelihood estimation easier and MLE search for r 1,..., r d GIBF has Markov property d = 1: proof d > 1: conjecture IE simulation examples stochastic and deterministic simulation standard and nonstandard information

Credit Risk Example, d = 2 Experiment design: 63 Sobol points, predictions in a smaller square Factor by which MISE decreased using (1,1)-GIBF Number of replications 100 25 Noise level none low medium Without gradient estimates 94 111 120 With gradient estimates 81 83 69 (i) Credit risk surface (j) Gaussian (k) (1, 1)-GIBF

Conclusion on Generalized Integrated Brownian Field Emancipating Simulation Metamodeling from Geostatistics a new covariance function for kriging, designed for simulation metamodeling in engineering Superior Practical Performance 4-120 times better than Gaussian covariance function in 2-6 dimensional examples with or without gradient information

Thank You!