adaptive control; stochastic systems; certainty equivalence principle; long-term

Transcription

1 COMMUICATIOS I IFORMATIO AD SYSTEMS c 2006 Inernaional Press Vol. 6, o. 4, pp , ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS: THE BET O THE BEST PRICIPLE S. BITTATI AD M. C. CAMPI Absrac. Over he las hree decades, he cerainy equivalence principle has been he fundamenal paradigm in he design of adapive conrol laws. I is well known, however, ha for general conrol crierions he performance achieved hrough is use is sricly subopimal. In order o overcome his difficuly, wo differen approaches have been proposed: i) he use of cos-biased parameer esimaors; and ii) he injecion of probing signals ino he sysem so as o enforce consisency in he parameer esimae. This paper presens an overview of he cos-biased approach. ew insigh is achieved in his paper by he formalizaion of a general cos-biased principle named Be On he Bes -BOB. BOB may work in siuaions in which more sandard implemenaions of he cos-biasing idea may fail o achieve opimaliy. Key words: average cos; opimaliy. adapive conrol; sochasic sysems; cerainy equivalence principle; long-erm. Inroducion: an overview of adapaion as a means o achieve an ideal conrol objecive. An adapive conrol problem is a conrol problem in which some parameer describing he sysem is known wih uncerainy. During he operaion of he conrol sysem, he conroller collecs informaion on he sysem behavior, hereby reducing he level of uncerainy regarding he value of he parameer. In urn, as he level of uncerainy is reduced, he conroller is uned more accuraely on he sysem parameer so as o obain a beer conrol resul. In his procedure i is essenial ha he conroller chooses he conrol acions so as o minimize he performance index, as well as probe he sysem so ha uncerainy is reduced o beer selec fuure conrol acions. In his paper we consider adapive long-erm average opimal conrol problems. In adapive conrol, due o he uncerainy affecing he rue value of he sysem parameer, he conrol law canno be expeced o be opimal in finie ime. When he cos crierion is of he long-erm average ype, however, he conrol performance in finie ime does no affec he asympoic value of he conrol cos. Hence, even in an adapive conex here is a hope o achieve opimaliy, i.e. o drive he long-erm The auhors would like o acknowledge he financial suppor of he aional Ineres Research Projec Innovaive echniques for idenificaion and adapive conrol of indusrial sysems. Dep. of Elecrical Engineering and Informaion, Poliecnico di Milano, piazza L. da Vinci 38, 2023, Milano, Ialy. [email protected] Dep. of Elecrical Engineering and Auomaion, Universiy of Brescia, via Branze 38, 2523 Brescia, Ialy. [email protected] 299

2 300 S. BITTATI AD M. C. CAMPI average cos o he value which would have been obained under complee knowledge of he sysem. When his happens, we say ha he adapive conrol law mees he ideal objecive. A conrol problem wih an unknown sysem parameer is equivalen o a conrol problem wih complee sysem knowledge in which he sae comprises he se of all probabiliy disribuions on he unknown parameer, Sriebel (965). This heoreical resul, however, does no ranslae ino pracical soluion mehods due o he complexiy involved in handling he corresponding infinie dimensional problem. To make he problem racable, i is common pracice o resor o special soluion mehods able o abae he compuaional complexiy. The mos common special soluion mehods rely on he so-called cerainy equivalence principle, Bar-Shalom and Tse (974), Bar-Shalom and Wall (974). The unknown parameer is esimaed via some esimaion mehod and he esimae is used as if i were he rue value of he unknown parameer. In his approach, he disribuion of he unknown parameer is simply subsiued by a single esimae represening, in some sense, he mos probable value of i. Cerainy equivalen adapive conrol schemes have been sudied by many auhors. Goodwin e al. (98) prove ha a cerainy equivalen conroller based on he sochasic approximaion algorihm achieves he ideal objecive for minimum oupu variance coss. This resul has been exended o leas squares minimum oupu variance adapive conrol in Sin and Goodwin (982), Biani e al. (990), Campi (99), and Biani and Campi (996). A complee analysis of a minimum oupu variance self-uning regulaor equipped wih he exended leas squares algorihm can be found in Guo and Chen (99). Again, he main resul is ha his adapive scheme achieves he ideal objecive. The fac ha he ideal objecive is me in he siuaions described in he above menioned papers is due o he special properies of he minimum oupu variance cos crierion. On he oher hand, i is well known ha he cerainy equivalence principle suffers from a general idenifiabiliy problem, namely he parameer esimae can converge wih posiive probabiliy o a false value, e.g. Asröm and Wienmark (973), Becker e al. (985), Campi (996), Campi and Kumar (998). When a cos crierion oher han he oupu variance is considered, his idenifiabiliy problem leads o a sricly subopimal performance. See e.g. Lin e al. (985), Polderman (986a,b), and van Schuppen (994) for a discussion on his problem in differen conexs. In order o overcome his problem, wo approaches have been proposed in he lieraure. The firs one consiss in adding a diher noise o he conrol inpu so as o improve he exciaion characerisics of he signals, Caines and Laforune (984). As

3 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 30 a consequence, sandard parameer esimaors are hen able o provide consisen esimaes and he menioned idenifiabiliy problem auomaically disappears. However, as noed by Chen and Guo (987a), his may resul in a degradaion of he conrol sysem performance. Asympoic opimaliy is recovered by leing he addiive noise vanish in he long run (aenuaing exciaion). Many opimaliy resuls have been esablished along his line, Chen and Guo (986, 987a,b, 988, 99), Guo and Chen (99), Guo (996), Duncan e al. (999), while persisence of exciaion condiions of differen ypes have been used in Duncan and Pasik-Duncan (986,99), Caines (992). The second approach has is origins in he work by Kumar and Becker (982). I consiss in he employmen of a cos-biased parameer esimaor, and does no require he use of any exra probing signal. The basic idea is as follows. Consider a sandard (i.e. wihou biasing) esimaor operaing in a closed-loop adapive conrol sysem. I is naural o expec ha his esimaor is able o correcly describe he closed-loop behavior of he sysem. Thus, one expecs ha he asympoic behavior of he rue sysem wih he loop closed by he adapively chosen conroller will be he same as he behavior of he esimaed sysem wih he loop closed by he same conroller. This implies ha he long-erm average cos associaed wih hese wo conrol sysems will be he same. Since he adapive conroller is seleced o be opimal for he esimaed sysem, his also means ha he adapively conrolled rue sysem aains he opimal performance for he esimaed sysem. On he oher hand, he fac ha he esimaor is able o describe he closed-loop behavior of he rue sysem by no means implies ha he rue sysem has been correcly esimaed. As a maer of fac, i is possible ha he esimaed sysem and he rue sysem share he same behavior in he acual closed-loop condiions, while hey would behave differenly in oher siuaions. Even more so, i can be he case ha if one knew he rue sysem a he sar, an opimal conroller for i could be designed ha oudoes he performance obained by he adapively chosen conroller. These observaions carry wo consequences. Firs, he adapive conroller can be sricly subopimal. Second, if his is he case, hen he asympoically esimaed sysem has associaed an opimal cos which is sricly larger han he opimal cos for he rue sysem. In his way, we come o he conclusion ha he sandard parameer esimaor has a naural endency o reurn esimaes wih an opimal cos larger han or equal o ha of he rue sysem and, if i is sricly larger, his leads o a sricly subopimal performance. Moivaed by his observaion, Kumar and Becker (982) conceived of inroducing a cos-biasing erm in he parameer esimaor ha favors hose parameer esimaes corresponding o a smaller opimal cos. The cos-bias mus be srong enough such ha he esimaor can never sick a a parameer esimae wih an opimal cos larger

4 302 S. BITTATI AD M. C. CAMPI han he rue one. A he same ime, however, i mus be delicae so ha he abiliy of idenifying he closed-loop dynamics is no desroyed. If hese wo objecives are me simulaneously, hen he performance of he rue closed-loop sysem will be he same as he one of he esimaed closed-loop sysem. Moreover, he laer is no worse han he opimal performance for he rue sysem due o he cos-biasing and, herefore, opimaliy is achieved. This approach has been invesigaed in differen conexs in he following papers, Kumar and Becker (982), Kumar (983a,b), Milio and Cruz (987), Borkar (993), Campi and Kumar (998), Prandini and Campi (200). Objecive of his paper This paper is primarily an overview of cos-biased adapaion as a means o achieve opimaliy. Addiionally, he cos-biased idea is here cas ino a novel and fruiful viewpoin via he inroducion of a new principle named Be On he Bes - BOB. BOB bears a promise of more general applicabiliy han sandard implemenaions of he cos-biasing idea. Srucure of he paper The srucure of he paper is as follows. The BOB principle is presened in Secions 2 and 3. As an example, in Secion 4 he BOB principle is applied o a scalar adapive linear quadraic Gaussian (LQG) conrol problem. 2. The adapive conrol seing. This secion serves he purpose of inroducing he general conrol se-up and ha of fixing noaions. Explici assumpions on he sochasic naure of signals are delayed o subsequen secions. Measurabiliy condiions are assumed for graned hroughou. Consider a linear ime invarian sysem described as () x + = A(θ )x + B(θ )u + w () +, (2) y = C(θ )x + w (2) +, where x R n is he sae, u R is he conrol variable, y R is he sysem oupu, w () + and w(2) + are noise processes. θ is an unknown rue parameer belonging o a given parameer se Θ. The adapive conrol process akes place as follows. A ime he adapive conroller has access o he observaions o = {u, u 2,...,u, y, y 2,...,y }. Based on his, i selecs he conrol inpu u. As a consequence of his conrol acion, he sae ransis from x o x + according o equaion (), a new oupu y + generaed according o equaion (2) becomes available and a cos c(u, y ) is paid. Then, he observaion se is updaed o o + = o {u, y + } and he conroller

5 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 303 selecs he subsequen conrol inpu. A conrol law is a sequence of funcions l : R R R, and l (o ) is he corresponding conrol inpu afer we have observed o = {u, u 2,..., u, y, y 2,..., y }. The conrol objecive is o minimize he long-erm average cos crierion J = limsup c(u, y ). For any θ Θ and for a sysem as in () and (2) wih θ in place of θ, we assume ha, for any conrol law l, J Jθ a.s. (almos surely), where Jθ is a deerminisic quaniy, and ha Jθ is achieved a.s. by applying a conrol law l θ,. l θ, is named an opimal conrol law. Our objecive is ha of driving J o he opimal value Jθ for he rue sysem wih parameer θ. In he acual implemenaion of a conrol acion, however, θ is no known and, herefore, informaion regarding is value mus be accrued hrough ime via he observaions u and y (adapive conrol problem). 3. The Be On he Bes (BOB) principle. The BOB principle has firs appeared in he conference paper Campi (997). This is he firs ime his principle is discussed in a journal paper. We sar wih an example in which he cerainy equivalence principle leads o a conrol cos which is sricly subopimal. A similar example is also provided in Kumar (983b), where, differenly from he presen case, a finie parameer se is considered. This example will serve as a sar for he subsequen discussion where we firs summarize some well-recognized facs regarding he cerainy equivalence approach. The discussion will hen culminae in he formulaion of he BOB principle. Example. Consider he sysem x + = a x + b u + w +, where w is an i.i.d. (0, ) noise process and sae x is accessible: y = x. Vecor [ a b ] is unknown bu we know ha i belongs o a compac se Θ = {[ a b ] : b = 8a/5 3/5, a [0, ]}. Our objecive is o minimize he long-erm average cos limsup / [qx2 + u 2 ], where q = 25/24. In order o deermine an esimae of [ a b ] he sandard leas squares algorihm is used. This amouns o selecing a ime he vecor [ a LS ] which minimizes he index k= (x k+ ax k bu k ) 2. Once esimae [ a LS ] has been deermined, according o he cerainy equivalence principle he opimal conrol law for parameer [ a LS ] is applied. Suppose now ha a a cerain insan poin he leas squares esimae is [ a LS ] = [ ]. Since he corresponding opimal conrol law is given by u = 5/8 x (see e.g. Bersekas (987)), he squared error a ime + urns ou o be

6 304 S. BITTATI AD M. C. CAMPI (x + ax bu ) 2 = (x + ax (8a/5 3/5)( 5/8 x )) 2 = (x + 3/8 x ) 2, [ a b ] Θ. The imporan feaure of his las expression is ha i is independen of parameer [ a b ] Θ. Hence, he erm added a ime + o he leas squares index does no influence he locaion of is minimizer and he leas squares esimae remains unchanged a ime +: [ a LS + bls + ] = [ als ] = [ ]. As he same raionale can be repeaed in he subsequen insan poins, we can conclude ha he esimae sicks a [ ]. ow, he imporan fac is ha he leas squares esimaes can in fac ake value [ ] wih posiive probabiliy, even when he rue parameer is differen from [ ]. Moreover, he opimal cos for he rue parameer may be sricly lower han he incurred cos obained by applying he opimal conrol law for parameer [ ]. To see ha his is he case, suppose ha [ a b ] = [ 0 3/5 ] and assume ha he sysem is iniialized wih x = and u = 0. Then, a ime = 2 he leas squares esimae minimizes he cos (x 2 a) 2 = (w 2 a) 2. Thus, [ a LS 2 2 ] = [ ] whenever w 2 >, which happens wih posiive probabiliy. In addiion, i is easily seen ha he cos associaed wih he opimal conrol law for parameer [ ] is 5/3 whereas he opimal cos for he rue parameer [ a b ] = [ 0 3/5 ] is 25/24. A careful analysis of he example above reveals where he rouble comes from wih a sraighforward use of he cerainy equivalence principle. When he subopimal conrol u = 5/8 x is seleced based on he curren esimae [ a LS ] = [ ], he resuling observaion is y + = x + = 3/8 x +w +. This observaion is in perfec agreemen wih he one which would have been obained if [ a LS ] = [ ] were he rue parameer. Therefore, here is no reason for having doubs as o he correcness of he esimae [ a LS he nex ime poin. ] and hus his esimae is kep unchanged a This is jus a single example of a general esimabiliy problem arising in adapive conrol problems. This general esimabiliy problem can be described as follows: applying o he rue sysem a conrol which is opimal for he esimaed sysem may resul in observaions which concur wih hose ha would have been obained if he esimaed sysem were he rue sysem; if he esimaion mehod drives he esimae o a value such ha he above happens, hen however, here is no clue ha he sysem is incorrecly esimaed and, consequenly, he esimae remains unchanged; he adoped conrol law is opimal for he esimaed sysem, while i may be sricly subopimal for he rue sysem. A way ou of his pernicious mechanism is o employ a more fine grained esi-

7 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 305 maion mehod based on he opimal long-erm average cos for he differen sysems wih parameers θ Θ. Developing his idea will lead us o he formulaion of he Be On he Bes principle. We sar by observing he following elemenary fac: suppose we apply o he rue sysem a conrol law which is opimal for anoher sysem. If he long-erm average cos we pay is differen from he opimal cos for his second sysem, hen his sysem is falsified by he observaions and i can be dropped from he se of possible rue sysems. Suppose now ha a a cerain insan poin, we selec among he sysems which are sill unfalsified he one wih lower opimal cos. Then, if we pay a cos differen from he expeced one, we can falsify his sysem. In he opposie, we canno falsify i, bu hen we are paying a cos which is minimal over he se of possible rue sysems. Indeed, his implies ha we are acually paying he opimal cos for he rue sysem. These consideraions can be summarized as follows: selecing a conrol law which is opimal for he bes unfalsified sysem (i.e. he sysem wih lower opimal cos among hose ha are as far unfalsified by he observaions) may lead o an esimabiliy problem only when we are achieving opimaliy. This is in conras wih wha happens wih he sraighforward cerainy equivalence principle, where an esimabiliy problem may arise and, ye, he incurred cos may be sricly subopimal. The above observaions sugges ha a very naural way o overcome he esimabiliy problem posed by he cerainy equivalence principle is simply o ieraively selec among he unfalsified sysems he one wih minimal opimal cos and hen apply he opimal conrol law for i. We hen arrive a formulaing he following procedure of general validiy: The Be On he Bes (BOB) principle A he generic insan poin, do he following:. deermine he se of unfalsified sysems; 2. selec he sysem in he unfalsified se wih lowes opimal cos; 3. apply he decision which is opimal for he seleced sysem. 3.. Mahemaical formalizaion of he BOB-principle. In his secion we more precisely formalize he concep of unfalsified sysem and exhibi in mahemaical erms he properies of he unfalsified se such ha applying he BOB-principle leads o opimaliy. Le U denoe he unfalsified se a ime. Clearly, his se will depend on he observaions o = {u, u 2,...,u, y, y 2,...,y } available a ime, and so i is in

8 306 S. BITTATI AD M. C. CAMPI fac a sochasic se. Moreover, we noe ha se U depends hrough u, u 2,...,u on he conrol law l k applied from ime k = o ime k =. Once he conrol law has been fixed, processes u and y are compleely deermined and so is he sequence of unfalsified ses U. The quesion ha we need now o address is: wha are he mahemaical condiions U has o saisfy so ha applicaion of he BOB principle leads o opimaliy? This quesion is answered in his secion. Assume ha argmin θ U Jθ exiss and call i θ min. Selec he opimal conrol acion for θ min : u = l θ min, (o ). Then, Condiion i) lim sup c(u, y ) = limsup J θ min a.s. Condiion ii) θ k> U k a.s. Securing condiion i) appears a doable objecive under general circumsances. In fac, if he long-erm average cos paid by applying he opimal conrol law for θ min were differen from he expeced average cos, here would be evidence ha such a θ min has o be falsified (and, herefore, θ min should no be in U for some ). Condiion ii) simply says ha he falsificaion procedure mus no be overselecive so ha i also falsifies he rue sysem (noe ha considering k> U k raher han he sraighforward U allows for ransien phenomena due o sochasic flucuaions). The following simple heorem poins ou he effeciveness of he BOB-principle when condiions i) and ii) are me. Theorem. Under condiions i) and ii), he BOB-procedure achieves he ideal objecive, i.e. lim sup c(u, y ) = J θ a.s. Proof. Condiion ii) implies ha θ U,, where is a suiable insan poin, a.s. From his, inf θ U J θ J θ,, a.s. Since, according o he BOB-procedure, a each insan poin we selec in U he parameer θ min J θ min, we obain limsup / J θ min Thus, applying condiion i) yields lim sup c(u, y ) = limsup wih lower opimal cos limsup / J θ = J θ. J θ min Jθ, a.s., so concluding ha he incurred cos is opimal.

9 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS Discussion. The BOB principle bears a promise of more general applicabiliy han oher formulaions of he cos-biasing idea. In previous conribuions such as Kumar (993b) and Campi and Kumar (998), idenificaion was based on a delicae wo-erm esimaion crierion, where he firs erm was he sandard maximum likelihood and he second erm was a cos-biasing erm. A correc esimaion of he closed-loop dynamics was relying on he presence of he firs erm, which, so o say, had no o be bogged down by he second erm ha was pushing he esimae owards parameer locaions corresponding o lower opimal coss. In urn, his was calling for he presence of sysem noise ha could suiably excie he rue sysem. This delicae balance is auomaically overcome wih he BOB philosophy: if he esimae is biased owards parameers ha correspond o a superopimal cos, in he long run an average cos larger han he expeced superopimal cos will cerainly be obained and, herefore, according o condiion i) in Secion 3. his parameer will be discarded. For a pracical implemenaion of he BOB procedure, wha remains o deermine is he acual falsificaion rule. This deerminaion is dependen on he specific conrol se-up. To make hings concree, in he nex secion we presen an applicaion o a scalar adapive conrol problem. The recen ineresing work by Levanony and Caines (2005) can be seen as anoher applicaion of his same BOB principle. In his laer paper, he analysis is carried ou for sysems wih mulivariae sae hanks o he observaion ha opimizing he LQG cos resriced o a region ha shrinks around where closed-loop idenificaion holds necessarily leads o a consisen esimae. See also Levanony and Caines (200a,b) for a recursive implemenaion of he algorihm. 4. An applicaion of he BOB principle: scalar adapive LQG conrol. 4.. Problem posiion. Consider he scalar sysem (3) x + = a x + b u + w +, where w is a noise process described as an i.i.d. Gaussian sequence wih zero mean and uniary variance. The rue parameer θ = [ a b ] is unknown and belongs o a known compac se Θ R 2 such ha b 0, [ a b ] Θ (conrollabiliy condiion). The sysem sae is observed wihou noise, i.e. y = x. Finally, he long-erm cos crierion is given by (4) lim sup [qx 2 + u2 ], q > 0. In he case in which he rue parameer θ is known, i is a sandard maer o compue he opimal conrol law ha minimizes crierion (4) (see e.g. Bersekas

10 308 S. BITTATI AD M. C. CAMPI (987)). Leing p(a, b ) be he posiive soluion o he scalar Riccai equaion p = he conrol inpu a ime is compued as where gain K(a, b ) is given by (a ) 2 p (b ) 2 p + + q, u = K(a, b )x, K(a, b ) = a b p(a, b ) (b ) 2 p(a, b ) +. The corresponding opimal cos is simply J (a,b ) = p(a, b ). In he adapive case where θ is no known, we se he following Adapive conrol problem Find a conrol law l such ha, wih he posiion u = l (o ), we achieve he ideal objecive, i.e. limsup / [qx2 + u2 ] = J (a,b ) a.s., [ a b ] Θ Solving he adapive conrol problem via he BOB-principle. To aack he adapive conrol problem wih he BOB-principle we need o find a suiable falsificaion crierion. The resuling unfalsified ses should saisfy condiions i) and ii) in Theorem. A hin on how o selec he unfalsified ses so as o saisfy condiion ii) is provided by Lemma below. ame [ a LS ] he leas squares esimae of [ a b ]: [ a LS ] := argmin [a b] R 2 (x k+ ax k bu k ) 2, k= and define φ k := [ x k u k ], and V := k= φt k φ k. Lemma. Choose a funcion µ such ha log k= x2 k = o(µ ) and define he unfalsified se sequence hrough equaion (5) U := { [ a b ] Θ : Then, ([ a b ] [ a LS [ a b ] k> U k a.s. ])V ([ a b ] [ a LS ]) T µ }. Proof. The leas squares esimae [ a LS ] wries:

11 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 309 [ a LS ] = ( ) ( φ k x k+ φ T k φ k k= = [ a b ] + ( k= ) ) ( φ k w k+ φ T k k) φ. k= k= Thus, ([ a b ] [ a LS ])V ([ a b ] [ a LS ( ) ( ) ( ) = φ k w k+ φ T k φ k φ T k w k+. k= k= k= ]) T By applying resul iii) in Lemma of Lai and Wei (982) we know ha he righ hand side of his las equaion is O(log k= φ k 2 ) a.s.. Moreover, by equaion u k sup [a b] Θ K(a, b) x k we know ha log k= φ k 2 = O(log k= x2 k ). In conclusion, ( ) ([ a b ] [ a LS ])V ([ a b ] [ a LS ]) T = O log The hesis immediaely follows from his las equaion by recalling ha O(log x 2 k) = o(µ ). k= Lemma delivers a lower bound for µ, he fulfillmen of which implies ha condiion ii) is saisfied. ex, we need o deermine a condiion such ha condiion i) is saisfied as well. This will lead us o inroduce an upper bound for µ. We sar by proving he following sabiliy resul. Theorem 2. Choose a funcion µ such ha µ = o(log 2 k= x2 k ) and se u = K(a, b )x, where [ a b ] belongs almos surely o se U defined hrough equaion (5). Then, lim sup [ x r + u r ] < a.s., r. The proof of Theorem 2 is based on he following auxiliary lemma, he echnical proof of which is given in Appendix A. Lemma 2. Under he same assumpions as in Theorem 2 we have k= x 2 k a.s.

12 30 S. BITTATI AD M. C. CAMPI (a a )x + (b b )u r = o( x r ) a.s., r 2., T where T is a se of ime insan poins depending on such ha T 2, ( sands for cardinaliy). Proof of Theorem 2. Fix an ineger. For [, ], rewrie sysem (3) as follows { (a + b K(a, b ))x + p + w +, T (6) x + = (a + b K(a, b ))x + w +, T where p is a perurbaion erm defined as p := (a a )x + (b b )u, and T is he se of insan poins menioned in Lemma 2. Se α := sup a + b K(a, b), [a b] Θ ρ := sup a + bk(a, b). [a b] Θ Since K(a, b) is he opimal gain for sysem x + = ax + bu + w +, he closed-loop dynamical marix a + bk(a, b) is sable, i.e. a + bk(a, b) <. Since Θ is a compac se, we hen have ρ <. Wih hese posiions, sae x generaed by sysem (6) can be bounded as follows x α 2 ρ ( k) 2 p k + α 2 ρ ( k) 2 w k+ + α 2 ρ ( ) 2 x. k=,k T Form his, k= (7) x r c, T p r + c w + r + c, where c is a suiable consan. oe now ha w + r = O() a.s.. Moreover, in view of Lemma 2, for any r 2 we have:, T p r = o( x r ) a.s.. By subsiuing hese esimaes in (7) we obain ( ) x r = o x r + O() a.s., r 2,

13 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 3 from which he conclusion is immediaely drawn ha lim sup x r < a.s., r 2. Resul limsup u r < a.s., r 2 also follows by noing ha u sup [a b] Θ K(a, b) x. In conclusion, lim sup [ x r + u r ] < a.s., r 2. Finally, we observe ha he boundedness resul for r 2 obviously implies ha a similar resul holds rue for any r, so ha he sabiliy resul remains proven for any r. The nex lemma gives an upper bound for µ such ha condiion i) in Theorem is saisfied. This resul used in conjuncion wih Lemma provides us wih he condiions such ha he BOB-principle can be successfully used. Lemma 3. Le µ = log s k= x2 k for some s < 2 and se u = K(a min, b min )x, where [ a min b min ] := argmin [ a b ] U J[ a b ]. Then lim sup [qx 2 + u 2 ] = limsup J (a,b ) a.s. Proof. For noaional simpliciy, hroughou we wrie a and b for a min Le F := σ(w, w 2,..., w ). and b min. The dynamic programming equaion for model x + = a x + b u + w + wries (Bersekas (987)) J (a,b ) + p(a, b )x 2 = qx 2 + u2 + E[p(a, b )(a x + b u + w + ) 2 F ] = qx 2 + u2 + E[p(a, b )x 2 + F ] + p(a, b ) { (a x + b u ) 2 (a x + b u ) 2}, from which J(a + { p(a,b ), b )x 2 E[p(a +, b + )x 2 + F ] } } {{ } A = [qx 2 + u 2 ] + E[(p(a, b ) p(a +, b + ))x 2 + F ] } {{ } B + (8) p(a, b ) { (a x + b u ) 2 (a x + b u ) 2}. } {{ } C

14 32 S. BITTATI AD M. C. CAMPI Le us sudy separaely he differen erms appearing in his equaion. A) Term (A) can be rewrien as p(a, b )x 2 p(a +, b + )x { p(a+, b + )x 2 + E[p(a +, b + )x 2 + F ] }. The firs erm obviously ends o zero. As for he second erm, noe ha if we assume ha i does no end o zero, hen here exiss a ime sequence k and a real number α > 0 such ha x k 2 > α k, k. From his, limsup x 4 limsup k k x k 4 > limsup k k α 2 2 k =. This conradics Theorem 2 and, so, he second erm ends o zero as well. In he hird erm, α + := p(a +, b + )x 2 + E[p(a +, b + )x 2 + F ] is a maringale difference. Therefore, α + 0, provided ha 2 E[α 2 + F ] < (see Hall and Heyde (980), Theorem 2.8). Since p(a +, b + ) is bounded, i is easily seen ha his las condiion is implied by 2 [ x 4 + u 4 ] <. Again, his conclusion can be drawn by conradicion from Theorem 2. In fac, if his conclusion were false, sequence /2 [ x 4 + u 4 ] would be unbounded, and herefore here would exis a sequence of insan poins k such ha [ x k 4 + u k 4 ] > /2 k, k. From his, limsup [ x 4 + u 4 ] 4 limsup k [ x k 4 + u k 4 ] 4 > limsup k k 2 k = and his is in conradicion wih Theorem 2. In conclusion, A 0 a.s.. B) oice firs ha, by Schwarz inequaliy, (p(a, b ) p(a +, b + ))x 2 + (p(a, b ) p(a +, b + )) x 2 + ( ) /2 ( (p(a, b ) p(a +, b + )) 2 /2 x+) 4. In his las expression, he second erm remains bounded by Theorem 2, while he firs erm ends o zero (see Appendix B for he proof of his fac). Thus, (9) lim (p(a, b ) p(a +, b + ))x 2 + = 0 a.s.

15 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 33 Finally, conclusion lim E[(p(a, b ) p(a +, b + ))x 2 + F ] is drawn from (9) by observing ha a.s. β + := (p(a, b ) p(a +, b + ))x 2 + E[(p(a, b ) p(a +, b + ))x 2 + F ] is a maringale difference for which, by calculaions resembling hose developed in poin (A) for α, we have β + 0. C) By Schwarz inequaliy, p(a, b ) { (a x + b u ) 2 (a x + b u ) 2} ( ) /2 sup p(a, b) ((a x + b u ) (a x + b u )) 2 [a b] Θ ( ) /2 ((a x + b u ) + (a x + b u )) 2 Since / ((a x + b u )+(a x + b u )) 2 remains bounded (see Theorem 2), o show ha C 0 i suffices o prove ha / ((a x +b u ) (a x +b u )) 2 0. From Lemma 2 we have ((a x + b u ) (a x + b u )) 2 ( = o x 2 ) + ((a x + b u ) (a x + b u )) 2. T The firs erm in he righ hand side ends o zero because of he sabiliy Theorem 2. As for he second erm, by recalling ha T 2, i is easy o prove ha i ends o zero by argumens similar o hose used in poin (A) o show ha p(a +, b + )x By insering all he parial resuls in equaion (8) he hesis is obained. By selecing he unfalsified se a ime as given in definiion (5) wih he bounds on µ as suggesed by Lemma and 3, he BOB procedure wries Adapive conrol mehod A ime, do he following:

16 34 S. BITTATI AD M. C. CAMPI. deermine U as in definiion (5) wih µ = log s k= x2 k, s (, 2); 2. compue [ a min b min ] as he minimizer of J(a,b) in U : [ a min b min ] := arg min [a b] U J(a,b) ; 3. compue u by applying he opimal conrol law for [ a min u = K(a min, b min )x. b min ]: The effeciveness of his adapive conrol mehod is guaraneed by Theorem due o ha condiions i) and ii) follow from Lemma and Lemma 3, so delivering he following opimaliy heorem. Theorem 3. Wih he conrol law chosen according o he adapive conrol mehod, we achieve he ideal objecive, i.e. lim sup / [qx 2 + u2 ] = J (a,b ) a.s., [ a b ] Θ. Appendix A Define Since [ a v v,. v := [ (a a ) (b b ) ]. b ] Θ, sequence v is bounded. Denoe by v an upper bound for v : We sar by proving ha (remember ha φ = [ x u ]): ( ) (0) φ v T r = o x r a.s. r 2. To his purpose, noe firs ha () Indeed, ( x 2 = O x r ) a.s. ( x 2 = = [ x r [ x 2 ) r/2 x r ] 2/r 2/r ] 2 r x, r (using Jensen s inequaliy)

17 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 35 and limsup / x r < a.s. since process x is affeced by he noise process w. Secondly, by observing ha [ a b ] U and [ a b ] U oo for large enough (see Lemma ), (2) ( (φ v T )2 = o log 2 x 2 ) a.s. Equaion (0) is easily derived from () and (2) as follows: r/2 φ v T r (φ v T ) 2 ) ( ) = o log r (using (2)) = o = o ( ( x 2 ) x 2 x r ). (using ()) Fix now a real number ǫ > 0 and an ineger. Define a sequence of subspaces S, =, 2,..., + of R 2 by he following backward recursive procedure: for = +, se S = ; for =,,...,, se (he symbol v,s sands for he projecion of vecor v ono subspace S) (3) S = { S+, if v,s + S + span{v }, oherwise. Denoe by T he se of insan poins a which subspace S expands: if T, hen S S + sricly. These insan pins are obviously a mos wo. Le denoe hem by and 2 ( > 2 ). Moreover, le i() := max{i : i }. Since v v,, he angle beween v and v 2 may end o zero only if ǫ 0. Then, here exiss a consan c(ǫ) dependen on ǫ, bu independen of, such ha Thus, for each [, ] we have i() φ,s r c(ǫ) φ v T i r. i= φ v T r k φ,s v T,S r + k φ,s v,s T r i() kǫ r φ r + k v r c(ǫ) φ v T i r, i=

18 36 S. BITTATI AD M. C. CAMPI where k is a suiable consan depending on r. So,, T φ v T r kǫ r φ r + k v r c(ǫ) kǫ r φ r + k v r c(ǫ) i(), T i= i() i i= ( kǫ r φ r + k v r c(ǫ)2o φ v T i r φ v T i r x r ). (using (0)) Since φ r c x r, where c is a suiable consan, from his las inequaliy we conclude ha lim sup, T φ v T r x r kǫ r c. Due o he arbirariness of ǫ, his complees he proof of he lemma. Appendix B oe firs ha (4) µ + µ 0 a.s. (5) Indeed, µ + µ = log s x 2 k log s k= k= log 2 x 2 k log 2 = log k= k= x2 k k= x2 k ( x 2 2 log k= x2 k log k= k= x 2 k x 2 k x 2 k + log (when log 2 x 2 k >, since s < 2) k= x 2 k ) k= x 2 k. (using relaion log( + x) x) k= In his las expression, k= x2 k grows linearly (in fac, k= x2 k does no grow less han linearly because of he presence of noise w affecing he sysem equaion (3) and does no grow faser han linearly because of he sabiliy Theorem 2). Moreover, by similar argumens as hose used in poin (A) of he proof of Lemma 3, x 2 = o(/2 ). Using hese esimaes in (5) we obain µ + µ o ( ) log /2 a.s.,

19 which implies (4). ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 37 Consider now definiion (5) of se U. In he ligh of equaions (4) and also considering ha kernel V is increasing and ha [ a LS ] [ a LS + + ] 0 a.s., we can conclude ha any parameer [a b] U + U has a disance from U ha ends o zero as, namely sup [ab] U+ U inf [a b ] U [a b] [a b ] 0,. Since J(, ) = p(, ) is a coninuous funcion in Θ we hen have ha here exis a vanishing funcion ǫ (ǫ 0) such ha p(a, b ) p(a +, b + ) = J (a,b ) J (a +,b +) ǫ. Finally, leing + denoe he se of insan poins [, ] such ha p(a, b ) p(a +, b + ) 0, p(a, b ) p(a +, b + ) sup p(a, b) + 2 (p(a, b ) p(a +, b + )) [a b] Θ + sup p(a, b) + 2 ǫ [a b] Θ + = o() a.s., so concluding he proof. REFERECES [] K. Asröm and B. Wienmark, 973, On self-uning regulaors, Auomaica, 9, pp [2] Y. Bar-Shalom and E. Tse, 974, Dual effor, cerainy equivalence and separaion in sochasic conrol, IEEE Trans. on Auomaic Conrol, AC-9, pp [3] Y. Bar-Shalom and K. Wall, 974, Dual adapive conrol and uncerainy effecs in macroeconomic modelling, Auomaica, 6, pp [4] A. Becker, P. R. Kumar and C. Z. Wei, 985, Adapive conrol wih he sochasic approximaion algorihm: Geomery and convergence, IEEE Trans. on Auomaic Conrol, AC-30, pp [5] D. Bersekas, 987, Dynamic Programming, Prenice-Hall, J. [6] S. Biani, P. Bolzern and M. C. Campi, 990, Recursive leas squares idenificaion algorihms wih incomplee exciaion: convergence analysis and applicaion o adapive conrol, IEEE Trans. on Auomaic Conrol, AC-35, pp [7] S. Biani and M. C. Campi, 996, Leas squares based self-uning conrol sysems, In: Idenificaion, Adapaion, Learning. The science of learning models from daa (S.Biani and G.Picci eds.). Springer-Verlag ATO ASI series - Compuer and sysems sciences, [8] V. S. Borkar, 993, On he Milio Cruz adapive conrol scheme for Markov chains, J. of Opimizaion Theory and Applicaions, 77, pp

20 38 S. BITTATI AD M. C. CAMPI [9] P. E. Caines, 992, Coninuous-ime sochasic adapive conrol: non-explosion, ǫ-consisency and sabiliy, Sys. Conr. Le., 9, pp [0] P. E. Caines and S. Laforune, 984, Adapive conrol wih recursive idenificaion for sochasic linear sysems, IEEE Trans. on Auomaic Conrol, AC-29, pp [] M. C. Campi, 99, On he convergence of minimum-variance direcional-forgeing adapive conrol schemes, Auomaica, 28, pp [2] M. C. Campi, 996, The problem of pole-zero cancellaion in ransfer funcion idenificaion and applicaion o adapive sabilizaion, Auomaica, 32, pp [3] M. C. Campi, 997, Achieving opimaliy in adapive conrol: he Be On he Bes approach, In: Proc. 36h Conf. on Decision and Conrol. [4] M. C. Campi and P. R. Kumar, 998, Adapive linear quadraic Gaussian conrol: he cosbiased approach revisied, SIAM Journal on Conrol and Opimizaion, 36, pp [5] H. F. Chen and L. Guo, 986, Convergence rae of leas squares idenificaion and adapive conrol for sochasic sysems, Inernaional Journal of Conrol, 44, pp [6] H. F. Chen and L. Guo, 987a, Asympoically opimal adapive conrol wih consisen parameer esimaes, SIAM Journal on Conrol and Opimizaion, 25, pp [7] H. F. Chen and L. Guo, 987b, Opimal adapive conrol and consisen parameer esimaes for ARMAX model wih quadraic cos, SIAM Journal on Conrol and Opimizaion, 25, pp [8] H. F. Chen and L. Guo, 988, A robus sochasic adapive conroller, IEEE Trans. on Auomaic Conrol, AC-33, pp [9] H. F. Chen and L. Guo, 99, Idenificaion and sochasic adapive conrol, Birkhauser. [20] T. E. Duncan, L. Guo and B. Pasik-Duncan, 999, Adapive coninuous-ime linear quadraic Gaussian conrol, IEEE Trans. on Auomaic Conrol, AC-44, pp [2] T. E. Duncan and B. Pasik-Duncan, 986, A parameer esimae associaed wih he adapive conrol of sochasic sysems, In: Analysis and opimizaion of sysems, Conrol and informaion sc., Springer., 83, pp [22] T. E. Duncan and B. Pasik-Duncan, 99, Some mehods for he adapive conrol of coninuous ime linear sochasic sysems, In: Topics in sochasic sysems: modelling, esimaion and adapive conrol, Conrol and informaion sc., Springer, 6, pp [23] G. C. Goodwin, P. J. Ramadge and P. E. Caines, 98, Discree-ime sochasic adapive conrol, SIAM Journal on Conrol and Opimizaion, 9, pp [24] L. Guo, 996, Self-convergence of weghed leas-squares wih applicaions o sochasic adapive conrol, IEEE Trans. on Auomaic Conrol, AC-4, pp [25] L. Guo and H. F. Chen, 99, The Asrom-Wienmark self-uning regulaor revisied and ELS-based adapive rackers, IEEE Trans. on Auomaic Conrol, AC-36, pp [26] P. Hall and C. C. Heyde, 980, Maringale limi heory and is applicaions, Academic Press,.Y. [27] P. R. Kumar, 983a, Simulaneous idenificaion and adapive conrol of unknown sysems over finie parameer ses, IEEE Trans. on Auomaic Conrol, AC-28, pp [28] P. R. Kumar, 983b, Opimal adapive conrol of linear-quadraic-gaussian sysems, SIAM Journal on Conrol and Opimizaion, 2, pp [29] P. R. Kumar and A. Becker, 982, A new family of opimal adapive conrollers for Markov chains, IEEE Trans. on Auomaic Conrol, AC-27, pp [30] T. L. Lai and C. Z. Wei, 982, Leas squares esimaes in sochasic regression models wih applicaions o idenificaion and conrol of dynamic sysems, Annals of Saisics, 0, pp [3] D. Levanony and P. E. Caines, 200a, Sochasic Lagrangian adapaion, Inernal Repor,

21 ADAPTIVE COTROL OF LIEAR TIME IVARIAT SYSTEMS 39 Mc Gill Universiy. [32] D. Levanony and P. E. Caines, 200b, On persisen exciaion for linear sysems wih sochasic coefficiens, SIAM Journal on Conrol and Opimizaion, 40, pp [33] D. Levanony and P. E. Caines, 2005, Sochasic linear-quadraic adapive conrol: a concepual scheme, In: Proc. 44h Conf. on Decision and Conrol, pp [34] W. Lin, W, P. R. Kumar and T. I. Seidman, 985, Will he self-uning approach work for general cos crieria? Sys. Conr. Le., 6, pp [35] R. A. Milio and J. B. Cruz, 987, An opimizaion-oriened approach o he adapive conrol of Markov chains, IEEE Trans. on Auomaic Conrol, AC-32, pp [36] J. W. Polderman, 986a, On he necessiy of idenifying he rue parameer in adapive LQ conrol, Sys. Conr. Le., 8, pp [37] J. W. Polderman, 986b, A noe on he srucure of wo subses of he parameer space in adapive conrol problems, Sys. Conr. Le., 8, pp [38] M. Prandini and M. C. Campi, 200, Adapive LQG conrol of inpu-oupu sysems: a cosbiased approach, SIAM Journal on Conrol and Opimizaion, 39, pp [39] K. S. Sin and G. C. Goodwin, 982, Sochasic adapive conrol using a modified leas squares algorihm, Auomaica, 8, pp [40] C. T. Sriebel, 965, Sufficien saisics in he opimal conrol of sochasic sysems, J. Mah. Anal. Appl., 2, pp [4] J. H. van Schuppen, 994, Tuning of Gaussian sochasic conrol sysems, IEEE Trans. on Auomaic Conrol, AC-39, pp