Authorized licensed use limited to: University of Illinois. Downloaded on July 27,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

Uiversl Dt Compressio d Lier Predictio Meir Feder d Adrew C. Siger y Jury, 998 The reltioship betwee predictio d dt compressio c be exteded to uiversl predictio schemes d uiversl dt compressio. Recet work shows tht miimizig the sequetil squred predictio error for idividul sequeces c be chieved usig the sme strtegies which miimize the sequetil codelegth for dt compressio of idividul sequeces. Deig \probbility" s expoetil fuctio of sequetil loss, results from uiversl dt compressio c be used to develop uiversl lier predictio lgorithms. Speciclly, we preset lgorithm for lier predictio of idividul sequeces which is twice-uiversl, over prmeters d model orders. Itroductio We describe sequetil lier predictio lgorithm which is\twice uiversl," over prmeters d model orders, for idividul sequeces uder the squre-error loss fuctio; the sequetilly ccumulted me-squre predictio error is s good s y lier predictor of order up to some M, where the prmeters my be tued to the dt. The lier predictio problem is trsformed ito oe of sequetil probbility ssigmet, equivlet to lossless compressio, which is ccomplished through double mixture; rst over ll lier predictors of give model order usig Gussi prior, d the over ll model orders up to some mximum order M. For squre error loss fuctios, the Gussi prior ebles the mixture probbility over the cotiuum of models to be foud i closed form. With respect to model orders, ite mixture is used with rbitrry prior. Usig lttice lters, the codig distributios of ll possible lier predictors with model orders up to M c be weighted i eciet recursive procedure whose complexity is ot lrger th tht for covetiol lier predictor of the lrgest model order. We derive upper boud o the excess predictio error which c be idetied with the excess codig redudcy i the ssiged Meir Feder is with the Deprtmet of Electricl Egieerig - Systems, Tel-Aviv Uiversity, Tel-Aviv, 69978, ISRAEL, E-mil: meir@eg.tu.c.il y Adrew Siger is with the Advced Systems Directorte t Sders, A Lockheed Mrti Compy, Nshu, NH 0306-0868, Tel: (603) 645-5647, Fx: (603) 645-573, E-mil: cs@lum.mit.edu Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

mixture probbilities. The boud holds for ll idividul sequeces of ll legths, ot oly for symptoticlly log sequeces. The two terms i the boud correspod to prmeter redudcy term, which is proportiol to p l()=, d model order redudcy term which is proportiol to l(p)=, where is the dt legth, d p is the best model order. Sttemet of the Problem d Mi Result Cosider the problem of desigig cusl predictor which observes sequece x, = x[0];x[];:::;x[, ], d the computes predictio of the vlue of x[] give the pst. We ssume tht the sequece x[] is bouded such tht jx[t]j <A< for ll t, but is otherwise rbitrry, rel-vlued sequece. We would like to desig predictor whose performce is t lest s good s the best btch lier predictor of y order less th some M<. This gol will be ccomplished i two steps. First, we will demostrte xed-order sequetil predictio lgorithm which performs s well s the best btch lier predictor of tht order. We will the costruct predictor which performs s well s the best xed-order predictor of order less th M. Theorem Let x bebouded, rel-vlued rbitrry sequece, such tht jx[t]j <A, t. Let R xx d rx be P thep-th order determiistic P utocorreltio mtrix d vector deed sr xx = t= x[t]x[t]t, d rx = t= x[t]x[t], where x[t] = [x[t, ];::: ;x[t, p]] T. Also ssume tht t Rt xx hs uique miimum eigevlue bouded wy from zero, 0 > 0, t. Let ^x [] = T x[] be the xed lier predictor with prmeters. Dee uiversl p-th order lier predictor s ^x p [] =^ u [] T x[], where ^ u [] = R, xx + c I, r, x, d d c re positive costts. Let l(x ; ^x p; ) be the ruig totl squred predictio error for the p-th order uiversl lier predictor, i.e. l(x ; ^x p;) = P t= (x[t], ^x p[t]). Dee twiceuiversl predictor ^x tu [], s^x tu [] = P M i= i[]^x i [], where i [] is deed s i [] = exp(, c l(x, ; ^x P, M exp(, k= c l(x, ; ^x, )): k; i; )) The P the totl squred predictio error of the twice-uiversl predictor, l(x ; ^x tu;) = (x[t], ^x t= tu[t]), stises A4 (p+) l(x; ^x ) mi tu; p; l(x; p ^x ; )+4A l 8 + + 8A l(m)+o(, ): Theorem tells us tht the verge squred predictio error of the uiversl predictio lgorithm is withi O(p l()=) of the best btch lier predictio lgorithm, uiformly, for every idividul sequece x. As we shll see, the cost terms c be idetied s prmeter redudcy term, proportiol to p l()= d model order redudcy term, proportiol to l(m)=. The proof of Theorem is completed i two steps. First we demostrte tht predictor geerted by mixture over ll Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

p-th-order lier predictors is uiversl with respect to the clss of ll p-th-order lier predictors. We the show tht secod mixture over ll model orders provides predictor which is uiversl with respect to both model orders d prmeters. Ech of these steps re cotied i the proofs of Theorems, d, i Sectios 3, d 4, respectively. The result is twice-uiversl [] [] lier predictor which implemets double-mixture over model orders d prmeters. This resembles the cotext tree weightig procedure i [3] which implemets double-mixture over the prmeters d model orders of cotext-trees used i dt compressio. Key to the developmet of such uiversl lgorithms is tht the mixture be implemetble by eciet lgorithm. We will show tht the computtiol complexity of this twice-uiversl predictor is o lrger th tht for covetiol lier predictor of the order M. 3 Fixed-Order Lier Predictio I this sectio, we cosider the problem of lier predictio with predictor of xedorder p. The predictor is prmeterized by thevector =[ ;::: ; p ] T, d the predicted vlue c be writte ^x [t] = T x[t], where x[t] =[x[t, ];::: ;x[t, p]] T. If the prmeter vector is selected such tht the totl squred predictio error is miimized overbtchofdtoflegth, the the coeciets re give by, [] = rg mi X t= (x[t], T x[t]) : The well-kow lest-squres solutio to this problem is give by [] =(R xx), r x, where R xx = P t= x[t]x[t]t, d r x = P t= x[t]x[t]. The prmeters [] c be computed recursively with the recursive lest squres (RLS) lgorithm. A commo pproch to sequetil predictio is to use the prmeters [t, ] to predict ^x[t] = [t, ]x[t]. This is the so-clled \plug-i" pproch, sice the best estimte of the prmeters bsed o the dt x t, re \plugged-i" to the predictor model for x[t]. It c be show [4] [5] tht the lest-squres optiml btch predictio error c be chieved sequetilly by the plug-i pproch ofthe RLS lgorithm to withi O(p l()=). This idictes tht the rte t which RLS chieves the btch performce is slower th the (p=) l()= which might beex- pected from uiversl codig results [6] [7], d is i greemet with the result i [7] which demostrtes tht lthough the plug-i pproch to sequetil probbility ssigmet c be optiml for certi model clsses i the stochstic cotext, it is ot optiml for idividul sequeces. For this reso, rther th selectig sigle set of prmeters to use for predictio, we use the mixture pprochofuiversl codig to obti the uiversl predictor coeciets. This ide hs lredy bee pplied i [] for predictio i probbilistic cotext. By trsformig the problem ito oe of probbility ssigmet, we c sequetilly ssig probbility to the sequece which islmostsgoodsthts- siged by the best lier predictor. As such, we cosider mes of estimtig the prmeters of the p-th order lier predictor [t] through priori mixture over 3 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

the cotiuum of ll possible prmeters ccordig to some prior. We ow show tht the predictio error of this uiversl predictor is s good s the best lier predictor determied from ll of the dt. Theorem Let x be bouded, rel-vlued rbitrry sequece, such tht jx[t]j <A for ll t d t Rt xx hs uique miimum eigevlue bouded wy from zero, 0 > 0. Let ^x [t] be the output of p-th-order lier predictor with prmeter vector, d l(x ; ^x ; ) be the ruig totl squred predictio error, i.e. l(x ; ^x ; ) = P t= (x[t], ^x [t]), where, ^x [] = T x[]. Dee uiversl predictor ^x u [], s ^x u [] = u [, ] T x[], where, u [] = h R xx + c I i, r x ; P P R,, xx = k= x[k]x[k]t, rx = k= x[k]x[k], d d c repositive costts. The Pthe totl squred predictio error of the p-th-order uiversl predictor, l(x ; ^x u;) = (x[t], ^x t= u[t]), stises l(x ; ^x u;) mi l(x ; ^x ;)+ 4 A p l A4 (p +) + 4 A p + O(, ): 8 Theorem tells us tht the verge squred predictio error of the p-th-order uiversl predictor is withi O(p l()=) of the best btch p-th-order lier predictio lgorithm, uiformly, forevery idividul sequece x. The bsic ide behid the proof for Theorem will be the followig. We dee \probbility" ssigmet of ech of the cotiuum of predictors to the dt sequece x such tht the probbility will be expoetilly decresig fuctio of the totl squred-error for tht predictor. This use of predictio error s probbility orlikelihood ws lso used by Risse [6] d Vovk[8]. By deig uiversl probbility s priori verge of the ssiged probbilities, the to rst order i the expoet, the uiversl probbility will be domited by the lrgest expoetil, i.e., the probbility ssigmet of the model order with the smllest totl squred error. For ite collectio of predictors, the redudcy of the mixture c be bouded by the egtive logrithm of the weight ssiged to the best model. However, for mixture over cotiuum of models, we must seek lterte boud o the redudcy. Speciclly, we obti the cojugte prior such tht the mixture over the prmeters c be obtied i closed form. We the relte the uiversl probbility ssigmet to the ccumulted squred error of the uiversl predictor, givig the desired result. Proof of Theorem : For ech set of prmeters, we dee the probbility P (x )=Bexp(, c l(x; ^x ; )) s expoetil fuctio of the sequetil loss o the dt. Over the cotiuum of predictors with coeciets, we ssig the priori Gussi mixture p() =( p ),p exp T ; d dee the uiversl probbility P u (x )= Z p()p (x )d: 4 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

We c the obti this uiversl probbility i closed form, P u (x )=B,p c R xx +,= I exp, R c x[0], rx T hr xx + c I i, r x ; P where R x[0] = k= x [k]. To compre the uiversl probbility with the mximum probbility over ll prmeters, observe tht mx P (x )=P (x )j = B exp =^ ML where, ^ ML =(R xx), rx. Sice, X l(x ; ^x )= ML ; we obti k=, c l(x ; ^xml; ) ;, x[k], (R xx ), rx T x[k] =R x[0], r T x (R xx), r x; P^ML (x )=Bexp, c (R x[0], rx T (R xx), rx) : Tkig their rtio, d fter some lgebr, we obti,, r T x P u (x ) mx P (x ) =,p c R xx +,= I exp R xx R xx + cr xx, r x Tkig the logrithm, d substitutig R xx = R xx,yields Pu (x, l ) = mx P (x ) l c R xx + I + rt x [R xx R xx + cr xx], rx = l c R xx + I + rt x [R xx R xx + cr xx], rx = p l()+ l c R xx + I + rt x p l()+ l(c,p p )+ l R xx + : () [ R xx R xx + c R xx], r x c I + pa,, A : To cotiue, we eed the followig lemm boudig the logrithm of the determit of positive deite mtrix, which is proved i the ppedix. Lemm For p p positive deite mtrix M whose elemets re ech bouded by C, i.e., jm i;j j <C,dpositive costt, thelogrithm of the determit of the mtrix M + I stises p + l jm + Ijpl + p l(c)+pl + 0 ; where 0 is the smllest eigevlue of M. 5 () Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

Applyig Lemm to (), we obti,c l P u (x ) mil(x ; ^x ;)+cp l c (p +) A + c + cpa4 : (3) i.e., We expd the deitio of the uiversl coditiol probbility s P u (x jx, )= R p()p (x )d R p(0 )P 0(x, )d 0 = () = Z p()p (x, ) R p(0 )P 0(x, )d 0: ()P (x jx, )d; Note tht () is proportiol to the performce of the model o the dt up to time,, P (x, ). Tht is, while the uiversl probbility is priori Gussi mixture over the probbilities ssiged to the sequece by ech of the prmeters, i order to miti this priori probbility, the coditiol probbilities, P u (x jx, ) must be weighted ccordig to their performce o the dt so fr, (). We dee the uiversl predictor s mixture over the prmeters usig the sme coditiol weights s the coditiol probbilities (). A strightforwrd but tedious clcultio veries tht the uiversl predictor deed by this mixture uses the prmeter vector u [t, ] t ech timet for predictio of the smple x[t], where u [t]= Z t ()d = h R t xx + c I i, r t x : Deig Pu ~ (x ) s the probbility from the predictor which is mixture over the prmeters usig the sme weights s the mixture over the probbilities P (x ), we hve ~P u (x )=B exp (, c X k= Comprig P u (x jx, )d ~ P u (x jx, ~P u (x jx, )=B exp ( Z ) x[k], k ()d x[k] : (4) ),, c x[], Z ) () T x[]d ; d, Z P u (x jx, )= ()B exp,, x[], T x[] ; c 6 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

we observe tht Pu ~ (x jx, ) is fuctio of covex combitio of the predicted vlues ^x [], while P u (x jx, ) is the sme covex combitio of the fuctio evluted t the sme vlues. By Jese's iequlity, ~P u (x jx, ) P u (x jx, ); (5) provided tht the fuctio f(z) =B exp(,(x[t], z) =c) is cocve over the domi of z, which leds to, p c (x[k], ^x [k]) p c:. Sice j i j < A=, (see [4]), the iequlity (5) holds for c A + A However, sice x[] is bouded, we c lwys decrese the predictio error by eforcig j^x []j <A, which leds to the selectio c 4A. Usig c =4A d (4) i (3), we obti l(x ; ^x u;) mi l(x ; ^x ;)+4A p l (p +) 8 + 4A + 4A6 p : (6) Our \probbility" ssigmet lgorithm hd two free costts to be set. Now tht we hve selected rge for the costt c, wecivestigte the costt. Miimizig the expressio i (6) with respect to yields, l(x ; ^x u;) mi l(x ; ^x ;)+ 4 A p l A4 (p +) + 8 + O(, ) where, =(A 4 = )+O(, ): We ote i prticulr tht the prmeter redudcy term i (6) is proportiol to p l()= rther th the p l()= redudcy show for the plug-i method of RLS. The redudcy is ctully of the form (p=) l()=, scled by the fctor c which ccouts for the eect of rge A of the sequece x[]. Comprig this result with ite umber M models, where the prmeter redudcy term would be bouded by O(l(M)=), we see tht the \eective" umber of models for the Gussi mixture, grows lierly with. This completes the proof of Theorem. 4 Proof of the Mi Result The proof of the mi result of the pper, Theorem, uses the results from Sectio 3 which boud the prmeter redudcy of the mixture model d result from [4] boudig the model order redudcy from secod mixture over the model orders. Proof of Theorem : Suppose set of lier predictors of order k, k M, re give, such tht t ech time smple, the k-th lier predictor produces the estimte ^x k []. For the \loss" of the k-th order predictor deed s its ruig totl squred predictio error, dee the probbility P k (x ) = B exp, c l(x ; ^x k;) ; 7 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

d the uiversl probbility P u (x ) P u (x )= M MX i= P i (x ): Whe ^x u [] is deed s uiversl predictor obtied by the sme sequetil mixture over the idividul predictors s over the probbilities, Theorem i [4] shows tht l(x; ^x ) mi u; i l(x; ^x i; )+8A l(m): Whe ech of the xed-order predictors re k-th-order uiversl lier predictors s deed i Sectio 3, the the overll predictor is formed by double-mixture; rst over prmeters, d the over model orders. The resultig predictio error of this twice-uiversl predictor, ^x tu [], stises, l(x ; ^x tu;) mi p; l(x ; ^x p ;)+ 4A p l A4 (p+) 8 + + 8A l(m)+o(, ): This completes the proof of Theorem. Theorem, the mi result of this pper, demostrtes tht predictio lgorithm bsed o double-mixture over model orders d prmeters, is ideed twice-uiversl. Oe observtio from this result, is tht the predictor prmeters re very similr to those which rise from the recursive lest squres procedure. I fct, if the covrice mtrix of the RLS lgorithm is iitilized with the vlue of R 0 xx =(c= )I 4( =A )I, the the remiig RLS procedure is uchged. For c 4A,we see tht c is greter th the lrgest istteous squre predictio error. We lso hve tht A = is rtio of the mximum possible squre vlue to the miimum verge squre vlue, or mesure of the \spred" of the sequece. To be uiversl, the priori mixture over the prmeters should hve lrge eough \vrice" to cover this rge. The rst term of the redudcy i (7) c be idetied s prmeter redudcy term, sice this is the excess predictio error iduced bove the btch error for give model order due to the lck of kowledge of the best btch prmeters for tht model order priori. Note tht the prmeter redudcy term here is of the form O(p l()=), which is i greemet with the stochstic cse, s implied both by Dvisso i [9] d the more geerl MDL [6]. We lso ote tht the model order redudcy term, 8A l(m)=, c be slightly improved upo. Rther th usig priori weights, w i ==M, we could hve weighted ech of the models iversely proportiol to their model order, i.e., w i = i, P M j= j, : The proof i [4] remis itct with the model order redudcy beig, l(w p )= rther th, l(=m)=, where p is the order of the model with the smllest predictio error. The resultig model order redudcy term becomes l(p)=+ll(m)=. 8 (7) Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

5 Algorithmic Issues A issue tht remis is the computtiol complexity of the uiversl pproch which icorportes the f;::: ;MgR p predicted vlues from ech ofthem model orders d the ech of the cotiuum of predictors withi give model order log with their sequetil predictio errors to compute ech predicted vlue. At rst glce, it might pper tht the cost of uiverslity is rther high, requirig the solutio of iite umber of lier predictio problems i prllel. However, sice the mixture over the prmeters c be ccomplished through properly iitilized RLS lgorithm, it oly remis to solve for ech of the RLS predictors for i =;::: ;M. The lier predictio problems for ech model order hve gret del i commo with oe-other, d this structure c be exploited. Ideed, just s the RLS lgorithm for give model order c be writte s time-recursio, there exist timed order-recursive solutios to the lest squres predictio problem, i which t ech time step, the M-th order predictio problem c be costructed by recursively solvig for ech of the predictors of lower order. The resultig complexity of these lgorithms c be mde to hve O(M) opertios per time smple which results i totl complexity ofo(m). A exmple lttice predictio lgorithm is give i [4]. 6 Cocludig Remrks The mi result of this pper, stted i Theorem, is lgorithm which is\twice uiversl" [] [] for lier predictio with respect to model orders d prmeters. The uiversl predictor preseted i this pper will perform s well s the best lier predictor of y order up to some mximum order, uiformly, for every idividul sequece. With this lgorithm, the problems of model order selectio d prmeter estimtio for lier predictio hve bee mitigted i fvor of performce-weighted verge mog ll model orders d ll prmeters. Eciet lttice lgorithms which recursively geerte ll of the lier predictors t the computtiol price of oly the lrgest model order d closed-form mixture prmeters yield lgorithm tht is computtiolly very eciet. Sice the mixture prmeters of the uiversl predictor c be idetied s the RLS prmeters with properly iitilized covrice, this pper lso gives cocrete rtiole for iitilizig RLS or Klm lter lgorithm with priori covrice; it mkes the lgorithm uiversl with respect to prmeters for idividul sequeces. 9 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

A Proof of Lemm : To prove thisboud,weotethtjmj <p!c p. Therefore, l jmj l(p!) + p l(c) =p l(k)+pl(c) p k=! px p + p l p k + p l(c) =p l + p l(c): k= Therefore, for eigevlues of M, k 0, p + px l jm + Ij pl + p l(c)+ k= p + p l + p l(c)+pl Refereces px l + k + 0 : [] B. Y. Rybko, \Twice-uiversl codig," Prob. If. Trs, vol. 0, pp. 73{7, Jul-Sep 984. [] B. Y. Rybko, \Predictio of rdom sequeces d uiversl codig," Prob. If. Trsmissio, vol. 4, pp. 87{96, Apr-Jue 988. [3] F. Willems, Y. Shtrkov, d T. Tjlkes, \The cotext-tree weightig method: Bsic properties," IEEE Trs. Ifo. Theory, vol. IT-4, pp. 653{664, My 995. [4] A. Siger d M. Feder, \Uiversl lier predictio over prmeters d model orders," submitted to IEEE Trsctios o Sigl Processig. [5] N. Merhv d M. Feder, \Uiversl schemes for sequetil decisio from idividul sequeces," IEEE Trs. Ifo. Theory, vol. 39, pp. 80{9, July 993. [6] J. Risse, \Uiversl codig, iformtio, predictio, d estimtio," IEEE Trs. Ifo. Theory, vol. IT-30, pp. 69{636, 984. [7] M. J. Weiberger, N. Merhv, d M. Feder, \Optiml sequetil probbility ssigmet for idividul sequeces," IEEE Trs. Ifo. Theory,vol. 40, pp. 384{ 396, Mrch 994. [8] V. Vovk, \Aggregtig strtegies (lerig)," i Proceedigs of the Third Aul Workshop o Computtiol Lerig Theory (M. Fulk d J. Cse, eds.), (S Mteo, CA), pp. 37{383, Morg Kufm, 990. [9] L. D. Dvisso, \The predictio error of sttiory Gussi time series of ukow covrice," IEEE Trs. Ifo. Theory, vol. IT-, pp. 57{53, Oct. 965. 0 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.