Stochastic Model Predictive Control

Similar documents
Optimization of warehousing and transportation costs, in a multiproduct multi-level supply chain system, under a stochastic demand

Neuro-Dynamic Programming An Overview

Reinforcement Learning

Reinforcement Learning

Optimization under uncertainty: modeling and solution methods

Real-Time Embedded Convex Optimization

C21 Model Predictive Control

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Current Accounts in Open Economies Obstfeld and Rogoff, Chapter 2

Spatial Decomposition/Coordination Methods for Stochastic Optimal Control Problems. Practical aspects and theoretical questions

Probability and Random Variables. Generation of random variables (r.v.)

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Lecture 13 Linear quadratic Lyapunov theory

Stochastic control of HVAC systems: a learning-based approach. Damiano Varagnolo

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Robust and Data-Driven Optimization: Modern Decision-Making Under Uncertainty

Robust output feedbackstabilization via risk-sensitive control

Dynamic Modeling, Predictive Control and Performance Monitoring

Stochastic Gradient Method: Applications

Chapter 3 Nonlinear Model Predictive Control

A joint control framework for supply chain planning

Nonlinear Model Predictive Control of Hammerstein and Wiener Models Using Genetic Algorithms

Modelling Incomplete Markets for Long Term Asset Liability Management

Introduction to the Finite Element Method (FEM)

Customers with positive demand lead times place orders in advance of their needs. A

1 Short Introduction to Time Series

Matching Investment Strategies in General Insurance Is it Worth It? Aim of Presentation. Background 34TH ANNUAL GIRO CONVENTION

Binomial lattice model for stock prices

Using MATLAB in a Graduate Electrical Engineering Optimal Control Course

Fleet Management. Warren B. Powell and Huseyin Topaloglu. June, 2002

Lecture 13: Risk Aversion and Expected Utility

Section 1.4. Lines, Planes, and Hyperplanes. The Calculus of Functions of Several Variables

Lecture 7: Finding Lyapunov Functions 1

An Improved Dynamic Programming Decomposition Approach for Network Revenue Management

Towards Dual MPC. Tor Aksel N. Heirung B. Erik Ydstie Bjarne Foss

Applied Computational Economics and Finance

Predictive Control Algorithms: Stability despite Shortened Optimization Horizons

A Single-Unit Decomposition Approach to Multi-Echelon Inventory Systems

OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS

Herd Management Science

Partial Fractions. (x 1)(x 2 + 1)

Distributed Control in Transportation and Supply Networks

Two-Stage Stochastic Linear Programs

COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS

Energy and Buildings

10. Proximal point method

Fleet Management. Warren B. Powell and Huseyin Topaloglu. June, 2002

Time Series Analysis

Optimal feedback strength for noise suppression in. auto-regulatory gene networks

Government Debt Management: the Long and the Short of it

Formulations of Model Predictive Control. Dipartimento di Elettronica e Informazione

Online Learning of Optimal Strategies in Unknown Environments

19 LINEAR QUADRATIC REGULATOR

Dynamic Asset Allocation Using Stochastic Programming and Stochastic Dynamic Programming Techniques

Monte Carlo-based statistical methods (MASM11/FMS091)

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina

State-Space Feedback Control for Elastic Distributed Storage in a Cloud Environment

Nonlinear Programming Methods.S2 Quadratic Programming

STOCHASTIC MODELING OF INSURANCE BUSINESS WITH DYNAMICAL CONTROL OF INVESTMENTS 1

How To Understand And Understand Finance

Uncertainty modeling revisited: What if you don t know the probability distribution?

Sales forecasting # 2

Tetris: A Study of Randomized Constraint Sampling

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

CHAPTER 9. Integer Programming

Building and Using Spreadsheet Decision Models

Lecture 13: Martingales

A Simultaneous Deterministic Perturbation Actor-Critic Algorithm with an Application to Optimal Mortgage Refinancing

Optimization of Supply Chain Networks

Duality of linear conic problems

Continued Fractions and the Euclidean Algorithm

5.1 Dynamic programming and the HJB equation


Lending in Last Resort to Governments

itesla Project Innovative Tools for Electrical System Security within Large Areas

Online Model Predictive Control of a Robotic System by Combining Simulation and Optimization

Proposal 1: Model-Based Control Method for Discrete-Parts machining processes

How To Solve A Sequential Mca Problem

An Incomplete Market Approach to Employee Stock Option Valuation

Nonlinear Model Predictive Control: From Theory to Application

Approaches to Capacity Planning

Dynamics of Small Open Economies

PYKC Jan Lecture 1 Slide 1

Discrete Optimization

Robust Path Planning and Feedback Design under Stochastic Uncertainty

Appendix to: When Can Life-cycle Investors Benefit from Time-varying Bond Risk Premia?

Real Business Cycle Models

A Comparison of the Optimal Costs of Two Canonical Inventory Systems

Bayesian Statistics: Indian Buffet Process

Monte Carlo-based statistical methods (MASM11/FMS091)

Sampled-Data Model Predictive Control for Constrained Continuous Time Systems

Lecture notes: single-agent dynamics 1

SYSTEMS OF REGRESSION EQUATIONS

Adaptive Business Intelligence

Statistical Tests for Multiple Forecast Comparison

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Introduction to Kalman Filtering

Coordinated Pricing and Inventory in A System with Minimum and Maximum Production Constraints

Transcription:

Stochastic Model Predictive Control stochastic finite horizon control stochastic dynamic programming certainty equivalent model predictive control Prof. S. Boyd, EE364b, Stanford University

Causal state-feedback control linear dynamical system, over finite time horizon: x t+1 = Ax t + Bu t + w t, t = 0,..., T 1 x t R n is state, u t R m is the input at time t w t is the process noise (or exogeneous input) at time t X t = (x 0,..., x t ) is the state history up to time t causal state-feedback control: u t = φ t (X t ) = ψ t (x 0, w 0,..., w t 1 ), t = 0,..., T 1 φ t : R (t+1)n R m called the control policy at time t Prof. S. Boyd, EE364b, Stanford University 1

Stochastic finite horizon control (x 0, w 0,..., w T 1 ) is a random variable objective: J = E ( T ) 1 t=0 l t(x t, u t ) + l T (x T ) convex stage cost functions l t : R n R m R, t = 0,..., T 1 convex terminal cost function l T : R n R J depends on control policies φ 0,..., φ T 1 constraints: u t U t, t = 0,..., T 1 convex input constraint sets U 0,..., U T 1 stochastic control problem: choose control policies φ 0,..., φ T 1 to minimize J, subject to constraints Prof. S. Boyd, EE364b, Stanford University 2

Stochastic finite horizon control an infinite dimensional problem: variables are functions φ 0,..., φ T 1 can restrict policies to finite dimensional subspace, e.g., φ t all affine key idea: we have recourse (a.k.a. feedback, closed-loop control) we can change u t based on the observed state history x 0,..., x t cf standard ( open loop ) optimal control problem, where we commit to u 0,..., u T 1 ahead of time in general case, need to evaluate J (for given control policies) via Monte Carlo simulation Prof. S. Boyd, EE364b, Stanford University 3

Solution via dynamic programming let V t (X t ) be optimal value of objective, from t on, starting from initial state history X t V T (X T ) = l T (x T ); J = EV 0 (x 0 ) V t can be found by backward recursion: for t = T 1,...,0 V t (X t ) = inf v U {l t(x t, v) + E(V t+1 ((X t, Ax t + Bv + w t )) X t )} V t, t = 0,..., T are convex functions optimal policy is causal state feedback φ t(x t ) = argmin{l t (x t, v) + E(V t+1 ((X t, Ax t + Bv + w t )) X t )} v U Prof. S. Boyd, EE364b, Stanford University 4

Independent process noise assume x 0, w 0,..., w T 1 are independent V t depends only on the current state x t (and not the state history X t ) Bellman equations: V T (x T ) = l T (x T ); for t = T 1,...,0, V t (x t ) = inf v U {l t(x t, v) + EV t+1 (Ax t + Bv + w t )} optimal policy is a function of current state x t φ (x t ) = argmin{l t (x t, v) + EV t+1 (Ax t + Bv + w t )} v U Prof. S. Boyd, EE364b, Stanford University 5

Linear quadratic stochastic control special case of linear stochastic control U t = R m x 0, w 0,..., w T 1 are independent, with E x 0 = 0, E w t = 0, Ex 0 x T 0 = Σ, Ew t w T t = W t l t (x t, u t ) = x T t Q t x t + u T t R t u t, with Q t 0, R t 0 l T (x T ) = x T T Q Tx T, with Q T 0 Prof. S. Boyd, EE364b, Stanford University 6

can show value functions are quadratic, i.e., V t (x t ) = x T t P t x t + q t, t = 0,..., T Bellman recursion: P T = Q T, q T = 0; for t = T 1,...,0, V t (z) = inf v {zt Q t z + v T R t v +E((Az + Bv + w t ) T P t+1 (Az + Bv + w t ) + q t+1 )} works out to P t = A T P t+1 A A T P t+1 B(B T P t+1 B + R t ) 1 B T P t+1 A + Q t q t = q t+1 + Tr(W t P t+1 ) Prof. S. Boyd, EE364b, Stanford University 7

optimal policy is linear state feedback: φ t(x t ) = K t x t, K t = (B T P t+1 B + R t ) 1 B T P t+1 A (which, strangely, does not depend on Σ, W 0,..., W T 1 ) optimal cost J = EV 0 (x 0 ) = Tr(ΣP 0 ) + q 0 = Tr(ΣP 0 ) + T 1 t=0 Tr(W t P t+1 ) Prof. S. Boyd, EE364b, Stanford University 8

Certainty equivalent model predictive control at every time t we solve the certainty equivalent problem T 1 minimize τ=t l t(x τ, u τ ) + l T (x T ) subject to u τ U τ, τ = t,..., T 1 x τ+1 = Ax τ + Bu τ + ŵ τ t, τ = t,..., T 1 with variables x t+1,..., x T, u t,..., u T 1 and data x t, ŵ t t,...,ŵ T 1 t ŵ t t,...,ŵ T 1 t are predicted values of w t,..., w T 1 based on X t (e.g., conditional expectations) call solution x t+1,..., x T, ũ t,...,ũ T 1 we take φ mpc (X t ) = ũ t φ mpc is a function of X t since ŵ t t,...,ŵ T 1 t are functions of X t Prof. S. Boyd, EE364b, Stanford University 9

Certainty equivalent model predictive control widely used, e.g., in revenue management based on (bad) approximations: future values of disturbance are exactly as predicted; there is no future uncertainty in future, no recourse is available yet, often works very well Prof. S. Boyd, EE364b, Stanford University 10

Example system with n = 3 states, m = 2 inputs; horizon T = 50 A, B chosen randomly quadratic stage cost: l t (x, u) = x 2 2 + u 2 2 quadratic final cost: l T (x) = x 2 2 constraint set: U = {u u 0.5} x 0, w 0,..., w T 1 iid N(0,0.25I) Prof. S. Boyd, EE364b, Stanford University 11

sample trace of x 1 and u 1 Stochastic MPC: Sample trajectory 2 x1(t) 1 0 1 2 0 10 20 30 40 50 0.5 u1(t) 0 0.5 0 10 20 30 40 50 t Prof. S. Boyd, EE364b, Stanford University 12

Cost histogram 150 100 J relax J sat 50 0 0 200 400 600 800 1000 150 100 J relax J mpc 50 0 0 200 400 600 800 1000 Prof. S. Boyd, EE364b, Stanford University 13

Simple lower bound for quadratic stochastic control x 0, w 0,..., w T 1 independent quadratic stage and final cost relaxation: ignore U t ; yields linear quadratic stochastic control problem solve relaxed problem exactly; optimal cost is J relax J J relax for our numerical example, J mpc = 224.7 (via Monte Carlo) J sat = 271.5 (linear quadratic stochastic control with saturation) J relax = 141.3 Prof. S. Boyd, EE364b, Stanford University 14