Stochastic Model Predictive Control stochastic finite horizon control stochastic dynamic programming certainty equivalent model predictive control Prof. S. Boyd, EE364b, Stanford University
Causal state-feedback control linear dynamical system, over finite time horizon: x t+1 = Ax t + Bu t + w t, t = 0,..., T 1 x t R n is state, u t R m is the input at time t w t is the process noise (or exogeneous input) at time t X t = (x 0,..., x t ) is the state history up to time t causal state-feedback control: u t = φ t (X t ) = ψ t (x 0, w 0,..., w t 1 ), t = 0,..., T 1 φ t : R (t+1)n R m called the control policy at time t Prof. S. Boyd, EE364b, Stanford University 1
Stochastic finite horizon control (x 0, w 0,..., w T 1 ) is a random variable objective: J = E ( T ) 1 t=0 l t(x t, u t ) + l T (x T ) convex stage cost functions l t : R n R m R, t = 0,..., T 1 convex terminal cost function l T : R n R J depends on control policies φ 0,..., φ T 1 constraints: u t U t, t = 0,..., T 1 convex input constraint sets U 0,..., U T 1 stochastic control problem: choose control policies φ 0,..., φ T 1 to minimize J, subject to constraints Prof. S. Boyd, EE364b, Stanford University 2
Stochastic finite horizon control an infinite dimensional problem: variables are functions φ 0,..., φ T 1 can restrict policies to finite dimensional subspace, e.g., φ t all affine key idea: we have recourse (a.k.a. feedback, closed-loop control) we can change u t based on the observed state history x 0,..., x t cf standard ( open loop ) optimal control problem, where we commit to u 0,..., u T 1 ahead of time in general case, need to evaluate J (for given control policies) via Monte Carlo simulation Prof. S. Boyd, EE364b, Stanford University 3
Solution via dynamic programming let V t (X t ) be optimal value of objective, from t on, starting from initial state history X t V T (X T ) = l T (x T ); J = EV 0 (x 0 ) V t can be found by backward recursion: for t = T 1,...,0 V t (X t ) = inf v U {l t(x t, v) + E(V t+1 ((X t, Ax t + Bv + w t )) X t )} V t, t = 0,..., T are convex functions optimal policy is causal state feedback φ t(x t ) = argmin{l t (x t, v) + E(V t+1 ((X t, Ax t + Bv + w t )) X t )} v U Prof. S. Boyd, EE364b, Stanford University 4
Independent process noise assume x 0, w 0,..., w T 1 are independent V t depends only on the current state x t (and not the state history X t ) Bellman equations: V T (x T ) = l T (x T ); for t = T 1,...,0, V t (x t ) = inf v U {l t(x t, v) + EV t+1 (Ax t + Bv + w t )} optimal policy is a function of current state x t φ (x t ) = argmin{l t (x t, v) + EV t+1 (Ax t + Bv + w t )} v U Prof. S. Boyd, EE364b, Stanford University 5
Linear quadratic stochastic control special case of linear stochastic control U t = R m x 0, w 0,..., w T 1 are independent, with E x 0 = 0, E w t = 0, Ex 0 x T 0 = Σ, Ew t w T t = W t l t (x t, u t ) = x T t Q t x t + u T t R t u t, with Q t 0, R t 0 l T (x T ) = x T T Q Tx T, with Q T 0 Prof. S. Boyd, EE364b, Stanford University 6
can show value functions are quadratic, i.e., V t (x t ) = x T t P t x t + q t, t = 0,..., T Bellman recursion: P T = Q T, q T = 0; for t = T 1,...,0, V t (z) = inf v {zt Q t z + v T R t v +E((Az + Bv + w t ) T P t+1 (Az + Bv + w t ) + q t+1 )} works out to P t = A T P t+1 A A T P t+1 B(B T P t+1 B + R t ) 1 B T P t+1 A + Q t q t = q t+1 + Tr(W t P t+1 ) Prof. S. Boyd, EE364b, Stanford University 7
optimal policy is linear state feedback: φ t(x t ) = K t x t, K t = (B T P t+1 B + R t ) 1 B T P t+1 A (which, strangely, does not depend on Σ, W 0,..., W T 1 ) optimal cost J = EV 0 (x 0 ) = Tr(ΣP 0 ) + q 0 = Tr(ΣP 0 ) + T 1 t=0 Tr(W t P t+1 ) Prof. S. Boyd, EE364b, Stanford University 8
Certainty equivalent model predictive control at every time t we solve the certainty equivalent problem T 1 minimize τ=t l t(x τ, u τ ) + l T (x T ) subject to u τ U τ, τ = t,..., T 1 x τ+1 = Ax τ + Bu τ + ŵ τ t, τ = t,..., T 1 with variables x t+1,..., x T, u t,..., u T 1 and data x t, ŵ t t,...,ŵ T 1 t ŵ t t,...,ŵ T 1 t are predicted values of w t,..., w T 1 based on X t (e.g., conditional expectations) call solution x t+1,..., x T, ũ t,...,ũ T 1 we take φ mpc (X t ) = ũ t φ mpc is a function of X t since ŵ t t,...,ŵ T 1 t are functions of X t Prof. S. Boyd, EE364b, Stanford University 9
Certainty equivalent model predictive control widely used, e.g., in revenue management based on (bad) approximations: future values of disturbance are exactly as predicted; there is no future uncertainty in future, no recourse is available yet, often works very well Prof. S. Boyd, EE364b, Stanford University 10
Example system with n = 3 states, m = 2 inputs; horizon T = 50 A, B chosen randomly quadratic stage cost: l t (x, u) = x 2 2 + u 2 2 quadratic final cost: l T (x) = x 2 2 constraint set: U = {u u 0.5} x 0, w 0,..., w T 1 iid N(0,0.25I) Prof. S. Boyd, EE364b, Stanford University 11
sample trace of x 1 and u 1 Stochastic MPC: Sample trajectory 2 x1(t) 1 0 1 2 0 10 20 30 40 50 0.5 u1(t) 0 0.5 0 10 20 30 40 50 t Prof. S. Boyd, EE364b, Stanford University 12
Cost histogram 150 100 J relax J sat 50 0 0 200 400 600 800 1000 150 100 J relax J mpc 50 0 0 200 400 600 800 1000 Prof. S. Boyd, EE364b, Stanford University 13
Simple lower bound for quadratic stochastic control x 0, w 0,..., w T 1 independent quadratic stage and final cost relaxation: ignore U t ; yields linear quadratic stochastic control problem solve relaxed problem exactly; optimal cost is J relax J J relax for our numerical example, J mpc = 224.7 (via Monte Carlo) J sat = 271.5 (linear quadratic stochastic control with saturation) J relax = 141.3 Prof. S. Boyd, EE364b, Stanford University 14