Policis for Simultanous Estimation and Optimization Migul Sousa Lobo Stphn Boyd Abstract Policis for th joint idntification and control of uncrtain systms ar prsntd h discussion focuss on th cas of a multipl input, singl output linar systm, with no dynamics and quadratic cost, and systm paramtrs assumd to hav a nown Gaussian distribution Extnsions for multipl output, and for finit impuls rspons systms ar straightforward h policis proposd ar huristics, and an approximation of th optimal dynamic programming solution, that xploit convx optimization tchniqus Numrical xprimnts ar ncouraging Introduction his papr addrsss th problm of controlling uncrtain systms, whr a policy for joint idntification and control (or dual control) is rquird Whil th masur of succss of such a policy is by its control prformanc alon, it may b dsirabl to sacrific som immdiat control prformanc in ordr to slct inputs that gnrat mor information about th systm and thrby improv control prformanc in th futur If th policy is passiv with rspct to larning, i, if in th slction of th inputs no attntion is paid to thir ffct on systm idntification, th ovrall control prformanc can b svrly dgradd (in th xtrm cas, and for som systms, this can lad to intrmittnt instability, or bursting phnomna, as dscribd by B Andrson []) his papr discusss th cas of a multipl input, singl output linar systm, with no dynamics and quadratic cost his choic is justifid by th nd to clarify concpts, and to p xprssions simpl Extnsions for multipl output, and for finit impuls rspons systms ar straightforward h simpl problm discussd hr has, nvrthlss, a numbr of industrial applications With th currntly availabl convx optimization tchniqus, and givn vr incrasing procssor spd and mmory, convx programs can b solvd in ral-tim at vr fastr rats, which opns th way to many nw control policis Problm statmnt Considr a squnc of linar input-output rlations with random disturbancs, y = b u +, =,, () Stanford Univrsity, -mail: mlobo@stanforddu Stanford Univrsity, -mail: boyd@stanforddu For updats, s: http://wwwstanforddu/~boyd/groupindxhtml Rsarch supportd in part by NSF Grant ECS-977, by AFOSR Grant F496-98--47, and by th Portugus Govrnmnt undr Praxis XXI W rfr to as th tim indx, and to as th horizon Inputs u,,u R n ar to b slctd, with th goal of producing outputs y,,y R with som dsird proprty h,, Rar disturbancs (or output noiss), ach with normal distribution N (,σ), and mutually indpndnt h systm paramtrs b R n ar imprcisly nown h aprioridistribution of b is normal N (ˆb, Σ ), and indpndnt of th W ll assum th covarianc matrix Σ R n n to b positiv dfinit, and dfin th a priori information matrix as Π =Σ h goal is to optimiz a prformanc masur which is a function of th outputs y,,y In this papr, w s to minimiz th xpctd valu of th sum of th squars of th dviations from som dsird output trajctory y ds,,y ds R (i, th l -norm of th tracing rror) h full squnc y ds,,y ds is assumd nown apriori In addition, w considr an additional cost trm quadratic in th inputs, wightd by ρ h xpctd cost is thn φ = E = E = = y y ds + ρu u b u + y ds + ρu u h xpctation is ovr th distributions of b,,, W dfin a fasibl policy tobonwhrthchoicof th u is non-anticipating in, in th sns that it rlis only on information availabl up to tim, in particular on y,,y It may also us th aprioriinformation about th distributions of b and,,, and about th full squnc of dsird outputs y ds,,y ds Formally, th problm consists in finding th functions ψ,,ψ, of th form u = ψ (ˆb, Π,σ,y ds,,y ds,y,,y ), () that minimiz () In a fasibl policy, u is a random variabl masurabl σ(y,,y ) (W ll also considr a gnralization, which w call randomizd fasibl policy, whr u is masurabl σ(y,,y,w), with w som indpndnt random variabl introducd to allow for th randomization of u ) Intuitivly, finding th bst input u rquirs solving th tradoff btwn ) choosing a u xpctd to produc an output that is clos to y ds and ) introducing prturbations in thinputto improv nowldg of b and, as a consqunc, obtain bttr prformanc in problms +,,htwo goals may conflict For instanc, if a zro output is dsird at som tim, th smallst xpctd rror is obtaind by ()
slcting a zro input u But a zro input is also th last informativ In trms of th ovrall xpctd cost, it may b bttr to slct a (small) non-zro input that is mor informativ, in th sns that it improvs th accuracy with which w can stimat b, lading to improvd tracing prformanc in futur tims + to his trad-off btwn dsign for stimation (or xprimnt dsign, or systm idntification) and optimization (or control), is th cntral concrn of our study h tru solution to th problm is givn by a dynamic program which is vry hard to solv numrically It rquirs numrical intgration ovr a high dimnsional spac (what has bn calld th curs of dimnsionality ) W propos an approximation which rsults in a smidfinit program, for which vry ffctiv solution mthods hav bn dvlopd in rcnt yars Prior to th dvlopmnt of ths algorithms, th approximation w introduc might hav bn considrd almost as complx as th original problm For furthr discussion of dual control and dynamic programming, s A Fl dbaum [, ], Y Bar-Shalom [4, 5, 6], Kumar and Varaiya [7, 68], and D Brtsas [8, 6] h othr approach discussd in this papr is huristic in natur, and rlats to som rcnt wor on plant-frindly idntification (s Gncli and Niolaou [9], and Cooly and L []) W plac this ida in a mor gnral and productiv framwor, by introducing masurs of input informativnss Rsults for this problm translat dirctly to th rcding horizon cas, whr aftr th application of th first input u th problm is xtndd to includ considration of y ds + (so that th horizon rmains constant) Not, howvr, that rcding horizon control is but a huristic for th infinit horizon problm, which w do not addrss in this papr For rfrncs on modl prdictiv prdictiv control s, g, R Bitmad t al [] Conditional distribution of b his sction succinctly prsnts, for latr us, standard rsults on th conditional distribution of b givn th outputs y,,y (s, g, L Ljung []) Dfin U = ¾ u u, Y = ¾ y y, Y ds = ¾ y ds y ds h conditional distribution of b givn U and Y is normal N (ˆb, Σ )=N(ˆb,Π ), with Π = Π +σ U U, ˆb = Π Π ˆb +σ U Y Equivalnt, rcursiv formulas ar Π + = Π +σ u + u +, ˆb+ = ˆb + σ Π + u + y + ˆb u + Not that, sinc all distributions ar assumd normal, ˆb and Π ar sufficint statistics and summariz all information availabl about b hy can b intrprtd as a systm stat W ll also us th notation b = b ˆb Passiv larning (adaptiv control) Crtainty quivalnt policy his sction addrsss a suboptimal fasibl policy, which is th quivalnt of adaptiv control applid to our problm Considr th conditional distribution, dscribd by ˆb and Π, and updatd as dscribd in At ach tim indx, th input u is chosn as if b was prcisly nown, and qual to ˆb hat is, th input is slctd as if Σ = and, in this sns, w call it a crtainty quivalnt policy In th slction of th input at ach tim stp, w us ˆb which dpnds on y,,y (ˆb also dpnds on u,,u, which ar in turn functions of y,,y ) If it wr tru that Σ =, th xpctd cost conditional on Y would b È φ = l= (y l yl ds ) + σ +(ˆb u y ds ) + ρu u + È l=+ σ + E (ˆb u l y ds l ) + ρu l u l Diffrntiating with rspct to u and quating to zro, w obtain th dsird policy, u = (ˆb ˆb +ρi) ˆb y ds = ˆb y ˆb ds, + ρ whr w usd th matrix invrsion lmma for a ran on updat (and th fact that, if Σ =,u +,,u ar indpndnt of u ) With this policy, th xpctd cost (using th tru valu of Σ )is φ=σ (y ds ) ˆb Π +E ρ + ˆb = ρ + ˆb ρ + ˆb Not that for =,,, ˆb and Π ar random variabls, bcaus thy ar functions of y,,y In this policy, ˆb is usd as an stimat of b h accuracy of this stimat improvs at ach tim indx, du to th information gaind from succssiv outputs (summarizd in thupdatingofπ and ˆb ) From th last quation w s that small Π,,Π yild a larg xpctd cost (whr small hr may b tan to man, g, asmallλ min) Nvrthlss, in th slction of u,,u, no ffort is mad to ma th Π larg (not, from, that Π is quadratic in u,,u ) h inputs ar dsignd without rgard for thir ffct on th stimation procdur, warranting th trm passiv larning Rgularizd policy Considr now anothr passiv larning policy, whr instad of using th crtainty quivalnt approximation at tim, th conditional distribution of b givn y,,y is tan into account h input u is slctd to minimiz only th immdiat xpctd cost givn th availabl information E y y ds + ρu u = u Π u + Y = ˆb u y ds + σ + ρu u h minimizing input, obtaind by diffrntiating and quating to zro, is u = Π + ˆb ˆb + ρi ˆb y ds Not that Π can b sn as a rgularization trm In som sns, it adds a masur of caution to account for th
uncrtainty in th stimat of b As Π, th optimal input gos to zro h xpctd cost, for ρ =, simplifis to φ = σ (y ds ) + E = +ˆb Π ˆb As Π, th minimum xpctd cost approachs an uppr bound, which is th cost of slcting a zro input Again, small Π,,Π yild a larg xpctd cost h policy is suboptimal bcaus th slction of u dos not ta into account its ffct on Π,,Π (which ar quadratic in u ) his is, in ffct, a grdy policy: At ach tim indx, u is slctd to minimiz th immdiat xpctd cost, E((y y ds ) ), without rgard for futur costs As in, thr is no dsign for stimation, in th sns that th bnfits to b gaind from slcting inputs that ma th Π larg ar not considrd Drivativ of th xpctd cost with rspct to information An invaluabl and gnrally ovrlood fact is that, for many rgularizd or robust policis, th drivativ of th cost with rspct to th information matrix is asily computd For th rgularizd passiv larning policy with ρ =,w hav that d σ + d Π (yds ) +ˆb Πˆb (y ds ) = ˆbˆb +ˆb Πˆb 4 Prsistncy of xcitation, dithring, and maximally informativ inputs W hav sn that th cost incurrd at tim may b larg if λ min(π ) is small In othr words, if th covarianc Σ =Π is larg, ˆb is an unrliabl stimat of th systm paramtrs b, and this lads to poor prformanc h most immdiat solution is to nsur that som masur of information, such as λ min(π )=λ min(π + σ U U ), is larg his can b translatd into a rquirmnt that U b wllconditiond, which is what is usually mant by prsistncy of xcitation Informally, w want th u to span th whol input spac 4 Dithring Svral solutions hav bn proposd to satisfy th prsistncy of xcitation rquirmnt Dithring is a randomizd fasibl policy that consists of adding to th inputs som whit nois (i, normal, zro man and indpndnt random trms) his can wor wll sinc, givn a high nough nois lvl, Π will b wll-conditiond with high probability Although it has th advantag of vry simpl implmntation, dithring is obviously a sub-optimal policy, and th slction of a good nois lvl can b problmatic Exampl Considr th problm dscribd by 6 n =, ˆb =, Π =, σ =, =, Y ds = [ ] W implmnt a dithrd policy basd on rgularizd passiv larning ( ) Figur plots th xpctd cost as th varianc of th random trms (th input nois lvl) rangs from 9 to hs valus wr obtaind by Mont Carlo simulation, with runs at ach input nois lvl (th corrsponding rror bars ar also plottd) Not that this may sm countrintuitiv: h prformanc of th policy is improvd by adding indpndnt nois to th inputs As th nois lvl gos to zro, w approach th non-dithrd rgularizd passiv larning policy, which has an avrag cost of 9 (±4) At th optimal input nois lvl (slctd a postriori), th avrag cost is 4 (±9) Of cours, any practical dithring policy must slct th input nois lvl a priori, which may b difficult <φ> Figur : 5 5 5 5 8 6 4 log σdith Expctd cost as a function of dithring lvl 4 Masuring and valuing information A mor thoughtful approach is to slct a prturbation that, for a givn lvl of control disturbanc, maximizs th information gathrd For this purpos, w nd a masur of information A list of possibl masurs, using th naming convntion from xprimnt dsign, is E-optimal: λ max (Σ ) = λ min(π ) D-optimal: log dt Π A-optimal: r (Σ ) = r Π h E-optimal and A-optimal masurs may b scald to account for th drivativ of th xpctd cost with rspct to information ( ), whil th D-optimal masur is invariant with scaling For a numrically ffctiv huristic, w would li a masur that is concav in th inputs hs masurs ar concav-quadratic, and linarizing Π in U mas thm concav his linarization can b xpctd to wor wll if th prturbations introducd for th purpos of idntification ar small W rturn to this point in 55 4 Maximally informativ inputs On approach consists of xprssing xplicitly th trad-off btwn control and information, by adding to th objctiv function an xtra trm valuing information A vry simpl xampl of such a policy is to slct th u that minimizs u Π u +(ˆb u y ds ) + ρu u γλ min(π ), and liwis for u,,u, with th appropriat updating of Π and ˆb h xtra trm mas this policy, in part, an xprimnt dsign problm As with dithring, a prturbation will b introducd in th input, what w might now call
an intllignt nois h factor γ> wighs th trad-off btwn idntification and control Slcting γ prsnts th sam difficultis as for slcting th dithring lvl An altrnativ approach is what has bn trmd plantfrindly idntification Although it is ssntially a solution to a diffrnt problm, plant-frindly idntification can b usd as a huristic for simultanous stimation and control h ida is to slct th maximally informativ input from within th st of inputs that p som masur of th tracing rror within a bound For a simpl xampl, w us a constraint on th absolut tracing rror (spcifid by M R) h policy is dfind by th program maximiz λ min(π ) subjct to ˆb u y ds M h bound M can b sn as th trad-off factor, with a rol similar to γ in th prvious problm Howvr, M is a mor physically maningful numbr, and should b asir to slct in practical applications h constraint on prformanc usd hr disrgards th uncrtainty in b A robust constraint can b usd in its plac (such a constraint is convx in th inputs in fact, it is a scond-ordr con constraint, s S Boyd t al []) Both ths problms ar convx if w linariz Π in u, in which cas thy ar radily solvd If this procdur linarization followd by th solution of a convx program is itratd, a (local) minimum of th non-convx problm will b rachd Not that ths huristics do not us th futur dsird outputs y ds, which w assumd nown his is part of th suboptimal natur of th huristics, and has th ffct of rducing th snsitivity of thir prformanc with rspct to th futur trajctory his rducd snsitivity may b a positiv fatur in applications whr th futur trajctory is not fully crtain 5 Optimal policy, dynamic program and approximation hs huristic approachs still lav us with som qustions, in particular about ) what masur of information to us, and ) how to dcid on th invitabl trad-off btwn th informativnss of u and th output rror xpctd to rsult from its application Roughly spaing, th answr to th scond qustion is that th trad-off should b such that th currnt loss in tracing prformanc (incurrd for th sa of informativnss) quals th total xpctd futur gains in tracing prformanc (du to improvd information about th systm) his, in turn, lads to an answr for th first qustion: h information masur should b such that it capturs th xpctd futur gain in tracing prformanc h tru solution to th problm is givn by a dynamic program, of which w will outlin th drivation his dynamic program is, howvr, hard to solv W propos an approximation which rsults in a smidfinit program 5 Optimal policy for = W assum, from hr on, ρ = Considr th simplst cas, whr = An input u is to b slctd so as to minimiz th xpctd cost φ = = E b, y y ds = E b, b u + y ds = E b, b u ßÞÐ = u Π u ßÞ Ð +(ˆb u y ds ßÞ Ð +(ˆb u y ds ßÞ Ð ) ) + σ ßÞÐ + ßÞÐ h thr trms mard can b intrprtd as ) th cost du to inaccuracy in th stimat of b, ) th cost du to dviation from th crtainty quivalnc policy, and ) th cost du to output nois h input that minimizs this function, obtaind by diffrntiating and quating to zro, is u = ψ (ˆb, Π,σ,y ds )= Π +ˆb ˆb ˆby ds Not that Π can b sn as a rgularization trm As Π bcoms small, th optimal input gos to zro h minimum xpctd cost is ) φ =(ˆb, Π,σ,y ds )=σ + (yds +ˆb Πˆb, whr w usd th matrix invrsion lmma for a ran on updat Not that φ = is convx in Σ and concav in Π For small Π, th minimum xpctd cost approachs an uppr bound, which is th cost of slcting a zro input 5 Optimal policy for = For =, th xpctd cost is φ = = E b,, y y ds + y y ds = E b,, ( b u +(ˆb u y ds )+ ) +( b u +(ˆb u y ds )+ ) = E b, ( b u +(ˆb u y ds )+ ) +E y E b,, ( b u +(ˆb u y ds )+ ) y = u Π u +(ˆb u y ds ) + σ +E y u Π u +(ˆb u y ds ) + σ whr (from ) Π =Π +σ u u, ˆb =Π Π ˆb+σ, (4) u y W usd th towr proprty of conditional xpctation, and th fact that, if y is givn, thn ˆb and u ar constants and b has zro man and covarianc Π Also, it is trivial to s that b and ar indpndnt, and b and ar indpndnt φ = is to b minimizd ovr u = ψ and u = ψ,withψ a function of y and u (both ψ and ψ ar also functions of ˆb,Π,σ,y ds and y ds, but for clarity ths paramtrs
will b omittd) h minimum of φ = can b found by minimizing first ovr ψ (i, finding th minimizing scond input u as a function of th first input u and output y ) o find th minimum of (4) w will nd to comput inf E y ψ (u,y ) Π ψ (u,y ) ψ (, ) +ˆb ψ (u,y ) y ds + σ = = E y inf ψ (u,y ) Π ψ (u,y ) ψ (, ) +ˆb ψ (ψ,y ) y ds + σ = E y φ =(ˆb, Π,σ,y ds ) = σ +(y ds ) E y +ˆb Πˆb (5) W conclud that th minimum xpctd cost is φ =(ˆb, Π,σ,y ds ) = inf u u Π u +(u ˆb y ds ) + σ + E y φ =(ˆb, Π,σ,y ds ) Not that w hav just drivd Bllman s principl of optimality from first principls for this particular problm h solution rquirs computing an intgral of th form E X, X N(,σ ) a X + a X + a whr a = σ 4 u Π u, a =ˆb u σ, a =+ˆb Π ˆb, X= b u + N(,σ ), σ = u Π u +σ h dnominator polynomial can b shown to b positiv for all X If an itrativ optimization procdur is to b usd, this xpctation must b valuatd numrically at ach itration Altrnativly, w will propos using a simpl approximation Exampl Considr th prvious xampl (in 4), but with a shortr horizon In particular, =, Y ds =[ ], and n, ˆb, Π,σ as bfor For a givn u, th xpctd cost φ is computd assuming that u is slctd optimally at = h xpctation in (5) is valuatd by numrical intgration Ranging ovr valus for th two ntris of u, this producs Figur h optimum is achivd at u = 998, for which th xpctd cost is φ =568 his is to b compard with th standard procdur of minimizing th xpctd squar rror at ach tim stp (i, th rgularizd passiv larning policy), which yilds u =,andφ=9 5 Approximat solution for = Considr th approximation E X a X + a X + a a With this approximation, th problm bcoms that of minimizing u Π u +(u ˆb y ds ) +σ + (yds ) +ˆb Πˆb φ 4 8 6 4 u, u, Figur : Expctd cost as a function of first and scond ntry of u (with u optimal) ovr u R n Undoing th minimization ovr u,ws that this is quivalnt to minimizing f = = u Π u + u (Π + σ u u ) u +(u ˆb y ds ) +(u ˆb y ds ) +σ ovr u,u R n his approximation is quivalnt to maing th approximation ˆb ˆb in (4), which will b th motivation for an xtnsion of th approximation for any > W will ta ˆb ˆb, =,,, in th quivalnt xprssion for th xpctd cost An intuitiv dscription of this approximation is as follows First, not that th aprioridistribution of b can b dscribd by th llipsoid (x ˆb ) Π (x ˆb ) (th maximum volum st with a givn probability) Liwis, th conditional distribution of b givn y can b dscribd by th llipsoid (x ˆb ) Π (x ˆb ) h total cost will dpnd on both th cntrs (ˆb, ˆb ) and th volums (dfind by Π, Π ) of th two llipsoids From on tim indx to th nxt, with th addd nowldg of y, th cntr and volum of th llipsoid chang (s Figur ) h cntr changs randomly, and this is th trm that introducs incrasd complxity in th dynamic program (as a sid not, this random chang has a zro man normal distribution that dpnds on th inputs, and is asily computd) On th othr hand, th volum changs in a dtrministic fashion Givn th inputs, this chang in volum can b prcisly prdictd With th approximation dscribd, w ar assuming that th chang in volum is mor important in dtrmining th cost than th chang in cntr, i, w assum that th cost is much lss snsitiv to th man of th distribution than to its covarianc his is rasonabl for systms that ar not ovrdtrmind, which includs our problm ˆb Figur : Changs in th conditional distribution of b ˆb Σ Σ
Exampl With th sam xampl as in 5, for a givn u w comput f = Again w assum that u is slctd optimally Ranging ovr valus for th two ntris of u, this producs Figur 4 h minimum of th approximat objctiv function f = is achivd at u = 84 98 h ap- proximat xpctd cost at this point is f = =997, and th tru xpctd cost is φ = =59 h prformanc dgradation rlativ to th optimal policy is 9% with th approximation, as compard to 55% with th rgularizd passiv larning policy Figur 5 plots th approximation rror as a function of u Not th small rror in th rgion whr th optimum is locatd, which sms to b a gnral fatur of this approximation With this approximation, w can rmov th nstd conditional xpctation, and group th inf oprators, so that φ inf f, u,,u with f = = u Π u + = (ˆb u y ds ) + σ Finding this minimum is not a convx program, which gratly limits our ability to solv larg scal problms in practic 55 Convx approximation (linarization of Π ) A convx approximation of th objctiv function abov can b obtaind by linarizing th information matrix in th inputs Writing U = U + U,andfor U small, f 4 Figur 4: f φ 8 6 4 8 6 4 u, u, Approximation of th xpctd cost u ßÞ Ð u, ßÞ Ð ) + σ u, Figur 5: Approximation rror 54 Optimal policy and approximation for > Following th prvious analysis for = and =,andby induction on, th problm of minimizing φ can b writtn as a dynamic program h optimum is givn by φ = ϕ, with ϕ = inf u E((y y ds ) )+E(ϕ +)) =inf u u Π +(ˆb u y ds +E(ϕ +), ßÞÐ for =,,, and ϕ + = All infimums ar ovr th spac of fasibl policis, i, ovr all u masurabl σ(y,,y ) h thr mard trms can b intrprtd as ) th cost du to inaccuracy in th stimat of b, )th cost du to th prturbation introducd to improv stimation of b, and ) th cost du to output nois o ma th dynamic program tractabl w ta th sam approach as bfor, and us th approximation ˆb ˆb, ˆb ˆb, ˆb ˆb Liwis, Π U U (U) U +(U) U + U U = (U) U +U U (U) U Π + σ (U ) U + U U (U ) U = P h trm omittd is O(σ U ) It is positiv smidfinit, hnc th approximation undrvalus information W can xpct that a solution basd on this approximation will b consrvativ in th introduction of prturbations for th purpos of idntification h problm now involvs a sum of matrix fractional and quadratic trms, all of which ar convx, minimiz = u P u + = (ˆb u y ds ) whr P is as abov, and th variabls ar u,,u R n his is a matrix-fractional and scond-ordr con program, which is quivalnt to th smidfinit program minimiz = (α + β ) subjct to α (ˆb u y ds ) (ˆb u y ds, ) β u, =,, u P P =Π +σ j= =,, u j(u j) + u ju j u j(u j), =,,, whr th variabls ar α,,α,β,,β R, and u,,u R n Algorithms for solving smidfinit programs ar of polynomial complxity h complxity of solving this particular problm with an intrior-point mthod is boundd by O 7 9 n For mor on smidfinit programming s, g, Vandnbrgh and Boyd [4] 56 Algorithm A possibl practical algorithm is as follows Find a nominal input squnc u,,u È according to a simpl policy, such as minimizing = u Π u + È = (ˆb u y ds ) his amounts to solving without accounting for th bnfits of xtra information
Linariz th information matrics Π,,Π around th nominal input squnc, to obtain th affin functions P (u ),,P (u,,u ) (o avoid th obvious convrgnc problms that occur whn u =, w add a small random trm to th nominal input squnc bfor linarizing) Solv th smidfinit program abov, to obtain a nw nominal input squnc 4 Rlinariz around th nw nominal input squnc and rpat th optimization his may b rpatd for a fixd numbr of tims or until convrgnc (numrical xprimnts hav shown convrgnc aftr a vry small numbr of itrations) 5 Apply th first input of th rsulting input squnc to th systm, masur th output, updat th distribution of b, and rpat with horizon (For th rcding horizon cas, instad of rpating with a dcrasing horizon, a nw dsird output y ds + is introducd aftr application of u ) Exampl As a numrical xampl, considr th problm dscribd for th dithring xampl in 4, with th sam simulation mthodology W saw thn that th xpctd cost with rgularizd passiv larning was 9 (±4), and that th xpctd cost with th bst dithring lvl was 4 (±9) With th algorithm dscribd hr, th xpctd cost is (±8) [5] P Drsin, M Athans, and D Kndric Som proprtis of th dual adaptiv stochastic control algorithm IEEE rans Aut Control, AC-6(5): 8, 98 [6] P Moorj and Y Bar-Shalom An adaptiv dual controllr for a MIMO-ARMA systm IEEE rans Aut Control, 4(7):795 8, 989 [7] P R Kumar and P Varaiya Stochastic Systms, Estimation, Idntification and Adaptiv Control Prntic-Hall, Nw Jrsy, 986 [8] D Brtsas Dynamic Programming and Optimal Control Athna Scintific, Blmont, Massachustts, 995 [9] H Gncli and M Niolaou Nw approach to constraind prdictiv control with simultanous modl idntification AIChE Journal, 4():857 868, Octobr 996 [] B L Cooly and J H L Control-rlvant xprimnt dsign for multivariabl systms Draft, contact: jhl@ngauburndu, 997 [] V Wrtz R R Bitmad, M Gvrs Adaptiv Optimal Control : h hining Man s GPC Prntic Hall, Nw Jrsy, 99 [] L Ljung Systm Idntification: hory for th Usr Prntic-Hall, 987 [] S Boyd, C Crusius, and A Hansson Control applications of nonlinar convx programming Journal of Procss Control, 8(5-6): 4, 998 Spcial issu for paprs prsntd at th 997 IFAC Confrnc on Advancd Procss Control, Jun 997, Banff [4] L Vandnbrgh and S Boyd Smidfinit programming In Siam Rviw, 995 6 Conclusions Whil th computation of th xact solution to th simultanous stimation and optimization problm sms to b fundamntally intractabl, th mathmatical tools and computing rsourcs now availabl should allow us to solv ffctiv approximations of th problm in ral-tim for many applications In this papr, w hav dscribd som arly rsults for a simpl class of problms his class is nvrthlss complx nough to xplor th y idas involvd, and straightforward xtnsions includ th class of finit impuls rspons dynamic systms Numrical xampls hav shown that, at last in som cass, th approximation introducd can prform vastly bttr than standard adaptiv control tchniqus Futur rsarch will loo into xtnding ths rsults, in particular for widr classs of problms, and into dvloping a bttr undrstanding of th proprtis of th diffrnt huristics and approximations Rfrncs [] Brian D O Andrson Adaptiv systms, lac of prsistncy of xcitation and bursting phnomna Automatica, ():47 58, 985 [] A A Fl dbaum hory of dual control, I Automat Rmot Control, (9):4 49, 96 [] A A Fl dbaum Optimal Control Systms Acadmic Prss, Nw Yor, 965 [4] Y Bar-Shalom Stochastic dynamic programming: Caution and probing IEEE rans Aut Control, AC- 6(5):84 95, 98