Authorized licensed use limited to: University of Illinois. Downloaded on July 27,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.



Similar documents
Chapter System of Equations

Repeated multiplication is represented using exponential notation, for example:

Universal coding for classes of sources

Gray level image enhancement using the Bernstein polynomials

MATHEMATICS FOR ENGINEERING BASIC ALGEBRA

A. Description: A simple queueing system is shown in Fig Customers arrive randomly at an average rate of

Present and future value formulae for uneven cash flow Based on performance of a Business

PREMIUMS CALCULATION FOR LIFE INSURANCE

Soving Recurrence Relations

Summation Notation The sum of the first n terms of a sequence is represented by the summation notation i the index of summation


COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

Application: Volume. 6.1 Overture. Cylinders

MATH 150 HOMEWORK 4 SOLUTIONS

Section 11.3: The Integral Test

n Using the formula we get a confidence interval of 80±1.64

CHAPTER-10 WAVEFUNCTIONS, OBSERVABLES and OPERATORS

MATHEMATICS SYLLABUS SECONDARY 7th YEAR

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

5 Boolean Decision Trees (February 11)

2-3 The Remainder and Factor Theorems

Fast Circuit Simulation Based on Parallel-Distributed LIM using Cloud Computing System

MANUFACTURER-RETAILER CONTRACTING UNDER AN UNKNOWN DEMAND DISTRIBUTION

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Ordinal Classification Method for the Evaluation Of Thai Non-life Insurance Companies

Department of Computer Science, University of Otago

Lecture 3 Gaussian Probability Distribution

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

Helicopter Theme and Variations

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

The Velocity Factor of an Insulated Two-Wire Transmission Line

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

INVESTIGATION OF PARAMETERS OF ACCUMULATOR TRANSMISSION OF SELF- MOVING MACHINE

Graphs on Logarithmic and Semilogarithmic Paper

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Modified Line Search Method for Global Optimization

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

Factoring Polynomials

Lattice-Reduction-Aided Equalization and Generalized Partial- Response Signaling for Point-to-Point Transmission over Flat- Fading MIMO Channels

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Reasoning to Solve Equations and Inequalities

Output Analysis (2, Chapters 10 &11 Law)

Distributions. (corresponding to the cumulative distribution function for the discrete case).

Chapter 7 Methods of Finding Estimators

Hypergeometric Distributions

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Transformer Maintenance Policies Selection Based on an Improved Fuzzy Analytic Hierarchy Process

Chapter 5: Inner Product Spaces

Irreducible polynomials with consecutive zero coefficients

Systems Design Project: Indoor Location of Wireless Devices

Determining the sample size

Slow-Rate Utility-Based Resource Allocation in Wireless Networks

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

How To Solve The Homewor Problem Beautifully

A Note on Sums of Greatest (Least) Prime Factors

Convexity, Inequalities, and Norms

Infinite Sequences and Series

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES

Released Assessment Questions, 2015 QUESTIONS

Overview on S-Box Design Principles

Misspecification Effects in the Analysis of Longitudinal Survey Data

Groundwater Management Tools: Analytical Procedure and Case Studies. MAF Technical Paper No: 2003/06. Prepared for MAF Policy by Vince Bidwell

The Stable Marriage Problem

Confidence Intervals for One Mean

Review: Classification Outline

Factors of sums of powers of binomial coefficients

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

On Formula to Compute Primes. and the n th Prime

CS103X: Discrete Structures Homework 4 Solutions

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

I. Chi-squared Distributions

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

3. Greatest Common Divisor - Least Common Multiple

Solving Logarithms and Exponential Equations

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

A gentle introduction to Expectation Maximization

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Discontinuous Simulation Techniques for Worm Drive Mechanical Systems Dynamics

Theorems About Power Series

1. MATHEMATICAL INDUCTION

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Applying Fuzzy Analytic Hierarchy Process to Evaluate and Select Product of Notebook Computers

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

THE ABRACADABRA PROBLEM

Research Article Sign Data Derivative Recovery


SPECIAL PRODUCTS AND FACTORIZATION

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

c. Values in statements are broken down by fiscal years; many projects are

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Transcription:

Uiversl Dt Compressio d Lier Predictio Meir Feder d Adrew C. Siger y Jury, 998 The reltioship betwee predictio d dt compressio c be exteded to uiversl predictio schemes d uiversl dt compressio. Recet work shows tht miimizig the sequetil squred predictio error for idividul sequeces c be chieved usig the sme strtegies which miimize the sequetil codelegth for dt compressio of idividul sequeces. Deig \probbility" s expoetil fuctio of sequetil loss, results from uiversl dt compressio c be used to develop uiversl lier predictio lgorithms. Speciclly, we preset lgorithm for lier predictio of idividul sequeces which is twice-uiversl, over prmeters d model orders. Itroductio We describe sequetil lier predictio lgorithm which is\twice uiversl," over prmeters d model orders, for idividul sequeces uder the squre-error loss fuctio; the sequetilly ccumulted me-squre predictio error is s good s y lier predictor of order up to some M, where the prmeters my be tued to the dt. The lier predictio problem is trsformed ito oe of sequetil probbility ssigmet, equivlet to lossless compressio, which is ccomplished through double mixture; rst over ll lier predictors of give model order usig Gussi prior, d the over ll model orders up to some mximum order M. For squre error loss fuctios, the Gussi prior ebles the mixture probbility over the cotiuum of models to be foud i closed form. With respect to model orders, ite mixture is used with rbitrry prior. Usig lttice lters, the codig distributios of ll possible lier predictors with model orders up to M c be weighted i eciet recursive procedure whose complexity is ot lrger th tht for covetiol lier predictor of the lrgest model order. We derive upper boud o the excess predictio error which c be idetied with the excess codig redudcy i the ssiged Meir Feder is with the Deprtmet of Electricl Egieerig - Systems, Tel-Aviv Uiversity, Tel-Aviv, 69978, ISRAEL, E-mil: meir@eg.tu.c.il y Adrew Siger is with the Advced Systems Directorte t Sders, A Lockheed Mrti Compy, Nshu, NH 0306-0868, Tel: (603) 645-5647, Fx: (603) 645-573, E-mil: cs@lum.mit.edu Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

mixture probbilities. The boud holds for ll idividul sequeces of ll legths, ot oly for symptoticlly log sequeces. The two terms i the boud correspod to prmeter redudcy term, which is proportiol to p l()=, d model order redudcy term which is proportiol to l(p)=, where is the dt legth, d p is the best model order. Sttemet of the Problem d Mi Result Cosider the problem of desigig cusl predictor which observes sequece x, = x[0];x[];:::;x[, ], d the computes predictio of the vlue of x[] give the pst. We ssume tht the sequece x[] is bouded such tht jx[t]j <A< for ll t, but is otherwise rbitrry, rel-vlued sequece. We would like to desig predictor whose performce is t lest s good s the best btch lier predictor of y order less th some M<. This gol will be ccomplished i two steps. First, we will demostrte xed-order sequetil predictio lgorithm which performs s well s the best btch lier predictor of tht order. We will the costruct predictor which performs s well s the best xed-order predictor of order less th M. Theorem Let x bebouded, rel-vlued rbitrry sequece, such tht jx[t]j <A, t. Let R xx d rx be P thep-th order determiistic P utocorreltio mtrix d vector deed sr xx = t= x[t]x[t]t, d rx = t= x[t]x[t], where x[t] = [x[t, ];::: ;x[t, p]] T. Also ssume tht t Rt xx hs uique miimum eigevlue bouded wy from zero, 0 > 0, t. Let ^x [] = T x[] be the xed lier predictor with prmeters. Dee uiversl p-th order lier predictor s ^x p [] =^ u [] T x[], where ^ u [] = R, xx + c I, r, x, d d c re positive costts. Let l(x ; ^x p; ) be the ruig totl squred predictio error for the p-th order uiversl lier predictor, i.e. l(x ; ^x p;) = P t= (x[t], ^x p[t]). Dee twiceuiversl predictor ^x tu [], s^x tu [] = P M i= i[]^x i [], where i [] is deed s i [] = exp(, c l(x, ; ^x P, M exp(, k= c l(x, ; ^x, )): k; i; )) The P the totl squred predictio error of the twice-uiversl predictor, l(x ; ^x tu;) = (x[t], ^x t= tu[t]), stises A4 (p+) l(x; ^x ) mi tu; p; l(x; p ^x ; )+4A l 8 + + 8A l(m)+o(, ): Theorem tells us tht the verge squred predictio error of the uiversl predictio lgorithm is withi O(p l()=) of the best btch lier predictio lgorithm, uiformly, for every idividul sequece x. As we shll see, the cost terms c be idetied s prmeter redudcy term, proportiol to p l()= d model order redudcy term, proportiol to l(m)=. The proof of Theorem is completed i two steps. First we demostrte tht predictor geerted by mixture over ll Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

p-th-order lier predictors is uiversl with respect to the clss of ll p-th-order lier predictors. We the show tht secod mixture over ll model orders provides predictor which is uiversl with respect to both model orders d prmeters. Ech of these steps re cotied i the proofs of Theorems, d, i Sectios 3, d 4, respectively. The result is twice-uiversl [] [] lier predictor which implemets double-mixture over model orders d prmeters. This resembles the cotext tree weightig procedure i [3] which implemets double-mixture over the prmeters d model orders of cotext-trees used i dt compressio. Key to the developmet of such uiversl lgorithms is tht the mixture be implemetble by eciet lgorithm. We will show tht the computtiol complexity of this twice-uiversl predictor is o lrger th tht for covetiol lier predictor of the order M. 3 Fixed-Order Lier Predictio I this sectio, we cosider the problem of lier predictio with predictor of xedorder p. The predictor is prmeterized by thevector =[ ;::: ; p ] T, d the predicted vlue c be writte ^x [t] = T x[t], where x[t] =[x[t, ];::: ;x[t, p]] T. If the prmeter vector is selected such tht the totl squred predictio error is miimized overbtchofdtoflegth, the the coeciets re give by, [] = rg mi X t= (x[t], T x[t]) : The well-kow lest-squres solutio to this problem is give by [] =(R xx), r x, where R xx = P t= x[t]x[t]t, d r x = P t= x[t]x[t]. The prmeters [] c be computed recursively with the recursive lest squres (RLS) lgorithm. A commo pproch to sequetil predictio is to use the prmeters [t, ] to predict ^x[t] = [t, ]x[t]. This is the so-clled \plug-i" pproch, sice the best estimte of the prmeters bsed o the dt x t, re \plugged-i" to the predictor model for x[t]. It c be show [4] [5] tht the lest-squres optiml btch predictio error c be chieved sequetilly by the plug-i pproch ofthe RLS lgorithm to withi O(p l()=). This idictes tht the rte t which RLS chieves the btch performce is slower th the (p=) l()= which might beex- pected from uiversl codig results [6] [7], d is i greemet with the result i [7] which demostrtes tht lthough the plug-i pproch to sequetil probbility ssigmet c be optiml for certi model clsses i the stochstic cotext, it is ot optiml for idividul sequeces. For this reso, rther th selectig sigle set of prmeters to use for predictio, we use the mixture pprochofuiversl codig to obti the uiversl predictor coeciets. This ide hs lredy bee pplied i [] for predictio i probbilistic cotext. By trsformig the problem ito oe of probbility ssigmet, we c sequetilly ssig probbility to the sequece which islmostsgoodsthts- siged by the best lier predictor. As such, we cosider mes of estimtig the prmeters of the p-th order lier predictor [t] through priori mixture over 3 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

the cotiuum of ll possible prmeters ccordig to some prior. We ow show tht the predictio error of this uiversl predictor is s good s the best lier predictor determied from ll of the dt. Theorem Let x be bouded, rel-vlued rbitrry sequece, such tht jx[t]j <A for ll t d t Rt xx hs uique miimum eigevlue bouded wy from zero, 0 > 0. Let ^x [t] be the output of p-th-order lier predictor with prmeter vector, d l(x ; ^x ; ) be the ruig totl squred predictio error, i.e. l(x ; ^x ; ) = P t= (x[t], ^x [t]), where, ^x [] = T x[]. Dee uiversl predictor ^x u [], s ^x u [] = u [, ] T x[], where, u [] = h R xx + c I i, r x ; P P R,, xx = k= x[k]x[k]t, rx = k= x[k]x[k], d d c repositive costts. The Pthe totl squred predictio error of the p-th-order uiversl predictor, l(x ; ^x u;) = (x[t], ^x t= u[t]), stises l(x ; ^x u;) mi l(x ; ^x ;)+ 4 A p l A4 (p +) + 4 A p + O(, ): 8 Theorem tells us tht the verge squred predictio error of the p-th-order uiversl predictor is withi O(p l()=) of the best btch p-th-order lier predictio lgorithm, uiformly, forevery idividul sequece x. The bsic ide behid the proof for Theorem will be the followig. We dee \probbility" ssigmet of ech of the cotiuum of predictors to the dt sequece x such tht the probbility will be expoetilly decresig fuctio of the totl squred-error for tht predictor. This use of predictio error s probbility orlikelihood ws lso used by Risse [6] d Vovk[8]. By deig uiversl probbility s priori verge of the ssiged probbilities, the to rst order i the expoet, the uiversl probbility will be domited by the lrgest expoetil, i.e., the probbility ssigmet of the model order with the smllest totl squred error. For ite collectio of predictors, the redudcy of the mixture c be bouded by the egtive logrithm of the weight ssiged to the best model. However, for mixture over cotiuum of models, we must seek lterte boud o the redudcy. Speciclly, we obti the cojugte prior such tht the mixture over the prmeters c be obtied i closed form. We the relte the uiversl probbility ssigmet to the ccumulted squred error of the uiversl predictor, givig the desired result. Proof of Theorem : For ech set of prmeters, we dee the probbility P (x )=Bexp(, c l(x; ^x ; )) s expoetil fuctio of the sequetil loss o the dt. Over the cotiuum of predictors with coeciets, we ssig the priori Gussi mixture p() =( p ),p exp T ; d dee the uiversl probbility P u (x )= Z p()p (x )d: 4 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

We c the obti this uiversl probbility i closed form, P u (x )=B,p c R xx +,= I exp, R c x[0], rx T hr xx + c I i, r x ; P where R x[0] = k= x [k]. To compre the uiversl probbility with the mximum probbility over ll prmeters, observe tht mx P (x )=P (x )j = B exp =^ ML where, ^ ML =(R xx), rx. Sice, X l(x ; ^x )= ML ; we obti k=, c l(x ; ^xml; ) ;, x[k], (R xx ), rx T x[k] =R x[0], r T x (R xx), r x; P^ML (x )=Bexp, c (R x[0], rx T (R xx), rx) : Tkig their rtio, d fter some lgebr, we obti,, r T x P u (x ) mx P (x ) =,p c R xx +,= I exp R xx R xx + cr xx, r x Tkig the logrithm, d substitutig R xx = R xx,yields Pu (x, l ) = mx P (x ) l c R xx + I + rt x [R xx R xx + cr xx], rx = l c R xx + I + rt x [R xx R xx + cr xx], rx = p l()+ l c R xx + I + rt x p l()+ l(c,p p )+ l R xx + : () [ R xx R xx + c R xx], r x c I + pa,, A : To cotiue, we eed the followig lemm boudig the logrithm of the determit of positive deite mtrix, which is proved i the ppedix. Lemm For p p positive deite mtrix M whose elemets re ech bouded by C, i.e., jm i;j j <C,dpositive costt, thelogrithm of the determit of the mtrix M + I stises p + l jm + Ijpl + p l(c)+pl + 0 ; where 0 is the smllest eigevlue of M. 5 () Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

Applyig Lemm to (), we obti,c l P u (x ) mil(x ; ^x ;)+cp l c (p +) A + c + cpa4 : (3) i.e., We expd the deitio of the uiversl coditiol probbility s P u (x jx, )= R p()p (x )d R p(0 )P 0(x, )d 0 = () = Z p()p (x, ) R p(0 )P 0(x, )d 0: ()P (x jx, )d; Note tht () is proportiol to the performce of the model o the dt up to time,, P (x, ). Tht is, while the uiversl probbility is priori Gussi mixture over the probbilities ssiged to the sequece by ech of the prmeters, i order to miti this priori probbility, the coditiol probbilities, P u (x jx, ) must be weighted ccordig to their performce o the dt so fr, (). We dee the uiversl predictor s mixture over the prmeters usig the sme coditiol weights s the coditiol probbilities (). A strightforwrd but tedious clcultio veries tht the uiversl predictor deed by this mixture uses the prmeter vector u [t, ] t ech timet for predictio of the smple x[t], where u [t]= Z t ()d = h R t xx + c I i, r t x : Deig Pu ~ (x ) s the probbility from the predictor which is mixture over the prmeters usig the sme weights s the mixture over the probbilities P (x ), we hve ~P u (x )=B exp (, c X k= Comprig P u (x jx, )d ~ P u (x jx, ~P u (x jx, )=B exp ( Z ) x[k], k ()d x[k] : (4) ),, c x[], Z ) () T x[]d ; d, Z P u (x jx, )= ()B exp,, x[], T x[] ; c 6 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

we observe tht Pu ~ (x jx, ) is fuctio of covex combitio of the predicted vlues ^x [], while P u (x jx, ) is the sme covex combitio of the fuctio evluted t the sme vlues. By Jese's iequlity, ~P u (x jx, ) P u (x jx, ); (5) provided tht the fuctio f(z) =B exp(,(x[t], z) =c) is cocve over the domi of z, which leds to, p c (x[k], ^x [k]) p c:. Sice j i j < A=, (see [4]), the iequlity (5) holds for c A + A However, sice x[] is bouded, we c lwys decrese the predictio error by eforcig j^x []j <A, which leds to the selectio c 4A. Usig c =4A d (4) i (3), we obti l(x ; ^x u;) mi l(x ; ^x ;)+4A p l (p +) 8 + 4A + 4A6 p : (6) Our \probbility" ssigmet lgorithm hd two free costts to be set. Now tht we hve selected rge for the costt c, wecivestigte the costt. Miimizig the expressio i (6) with respect to yields, l(x ; ^x u;) mi l(x ; ^x ;)+ 4 A p l A4 (p +) + 8 + O(, ) where, =(A 4 = )+O(, ): We ote i prticulr tht the prmeter redudcy term i (6) is proportiol to p l()= rther th the p l()= redudcy show for the plug-i method of RLS. The redudcy is ctully of the form (p=) l()=, scled by the fctor c which ccouts for the eect of rge A of the sequece x[]. Comprig this result with ite umber M models, where the prmeter redudcy term would be bouded by O(l(M)=), we see tht the \eective" umber of models for the Gussi mixture, grows lierly with. This completes the proof of Theorem. 4 Proof of the Mi Result The proof of the mi result of the pper, Theorem, uses the results from Sectio 3 which boud the prmeter redudcy of the mixture model d result from [4] boudig the model order redudcy from secod mixture over the model orders. Proof of Theorem : Suppose set of lier predictors of order k, k M, re give, such tht t ech time smple, the k-th lier predictor produces the estimte ^x k []. For the \loss" of the k-th order predictor deed s its ruig totl squred predictio error, dee the probbility P k (x ) = B exp, c l(x ; ^x k;) ; 7 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

d the uiversl probbility P u (x ) P u (x )= M MX i= P i (x ): Whe ^x u [] is deed s uiversl predictor obtied by the sme sequetil mixture over the idividul predictors s over the probbilities, Theorem i [4] shows tht l(x; ^x ) mi u; i l(x; ^x i; )+8A l(m): Whe ech of the xed-order predictors re k-th-order uiversl lier predictors s deed i Sectio 3, the the overll predictor is formed by double-mixture; rst over prmeters, d the over model orders. The resultig predictio error of this twice-uiversl predictor, ^x tu [], stises, l(x ; ^x tu;) mi p; l(x ; ^x p ;)+ 4A p l A4 (p+) 8 + + 8A l(m)+o(, ): This completes the proof of Theorem. Theorem, the mi result of this pper, demostrtes tht predictio lgorithm bsed o double-mixture over model orders d prmeters, is ideed twice-uiversl. Oe observtio from this result, is tht the predictor prmeters re very similr to those which rise from the recursive lest squres procedure. I fct, if the covrice mtrix of the RLS lgorithm is iitilized with the vlue of R 0 xx =(c= )I 4( =A )I, the the remiig RLS procedure is uchged. For c 4A,we see tht c is greter th the lrgest istteous squre predictio error. We lso hve tht A = is rtio of the mximum possible squre vlue to the miimum verge squre vlue, or mesure of the \spred" of the sequece. To be uiversl, the priori mixture over the prmeters should hve lrge eough \vrice" to cover this rge. The rst term of the redudcy i (7) c be idetied s prmeter redudcy term, sice this is the excess predictio error iduced bove the btch error for give model order due to the lck of kowledge of the best btch prmeters for tht model order priori. Note tht the prmeter redudcy term here is of the form O(p l()=), which is i greemet with the stochstic cse, s implied both by Dvisso i [9] d the more geerl MDL [6]. We lso ote tht the model order redudcy term, 8A l(m)=, c be slightly improved upo. Rther th usig priori weights, w i ==M, we could hve weighted ech of the models iversely proportiol to their model order, i.e., w i = i, P M j= j, : The proof i [4] remis itct with the model order redudcy beig, l(w p )= rther th, l(=m)=, where p is the order of the model with the smllest predictio error. The resultig model order redudcy term becomes l(p)=+ll(m)=. 8 (7) Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

5 Algorithmic Issues A issue tht remis is the computtiol complexity of the uiversl pproch which icorportes the f;::: ;MgR p predicted vlues from ech ofthem model orders d the ech of the cotiuum of predictors withi give model order log with their sequetil predictio errors to compute ech predicted vlue. At rst glce, it might pper tht the cost of uiverslity is rther high, requirig the solutio of iite umber of lier predictio problems i prllel. However, sice the mixture over the prmeters c be ccomplished through properly iitilized RLS lgorithm, it oly remis to solve for ech of the RLS predictors for i =;::: ;M. The lier predictio problems for ech model order hve gret del i commo with oe-other, d this structure c be exploited. Ideed, just s the RLS lgorithm for give model order c be writte s time-recursio, there exist timed order-recursive solutios to the lest squres predictio problem, i which t ech time step, the M-th order predictio problem c be costructed by recursively solvig for ech of the predictors of lower order. The resultig complexity of these lgorithms c be mde to hve O(M) opertios per time smple which results i totl complexity ofo(m). A exmple lttice predictio lgorithm is give i [4]. 6 Cocludig Remrks The mi result of this pper, stted i Theorem, is lgorithm which is\twice uiversl" [] [] for lier predictio with respect to model orders d prmeters. The uiversl predictor preseted i this pper will perform s well s the best lier predictor of y order up to some mximum order, uiformly, for every idividul sequece. With this lgorithm, the problems of model order selectio d prmeter estimtio for lier predictio hve bee mitigted i fvor of performce-weighted verge mog ll model orders d ll prmeters. Eciet lttice lgorithms which recursively geerte ll of the lier predictors t the computtiol price of oly the lrgest model order d closed-form mixture prmeters yield lgorithm tht is computtiolly very eciet. Sice the mixture prmeters of the uiversl predictor c be idetied s the RLS prmeters with properly iitilized covrice, this pper lso gives cocrete rtiole for iitilizig RLS or Klm lter lgorithm with priori covrice; it mkes the lgorithm uiversl with respect to prmeters for idividul sequeces. 9 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.

A Proof of Lemm : To prove thisboud,weotethtjmj <p!c p. Therefore, l jmj l(p!) + p l(c) =p l(k)+pl(c) p k=! px p + p l p k + p l(c) =p l + p l(c): k= Therefore, for eigevlues of M, k 0, p + px l jm + Ij pl + p l(c)+ k= p + p l + p l(c)+pl Refereces px l + k + 0 : [] B. Y. Rybko, \Twice-uiversl codig," Prob. If. Trs, vol. 0, pp. 73{7, Jul-Sep 984. [] B. Y. Rybko, \Predictio of rdom sequeces d uiversl codig," Prob. If. Trsmissio, vol. 4, pp. 87{96, Apr-Jue 988. [3] F. Willems, Y. Shtrkov, d T. Tjlkes, \The cotext-tree weightig method: Bsic properties," IEEE Trs. Ifo. Theory, vol. IT-4, pp. 653{664, My 995. [4] A. Siger d M. Feder, \Uiversl lier predictio over prmeters d model orders," submitted to IEEE Trsctios o Sigl Processig. [5] N. Merhv d M. Feder, \Uiversl schemes for sequetil decisio from idividul sequeces," IEEE Trs. Ifo. Theory, vol. 39, pp. 80{9, July 993. [6] J. Risse, \Uiversl codig, iformtio, predictio, d estimtio," IEEE Trs. Ifo. Theory, vol. IT-30, pp. 69{636, 984. [7] M. J. Weiberger, N. Merhv, d M. Feder, \Optiml sequetil probbility ssigmet for idividul sequeces," IEEE Trs. Ifo. Theory,vol. 40, pp. 384{ 396, Mrch 994. [8] V. Vovk, \Aggregtig strtegies (lerig)," i Proceedigs of the Third Aul Workshop o Computtiol Lerig Theory (M. Fulk d J. Cse, eds.), (S Mteo, CA), pp. 37{383, Morg Kufm, 990. [9] L. D. Dvisso, \The predictio error of sttiory Gussi time series of ukow covrice," IEEE Trs. Ifo. Theory, vol. IT-, pp. 57{53, Oct. 965. 0 Authorized licesed use limited to: Uiversity of Illiois. Dowloded o July 7,00 t 06:5:39 UTC from IEEE Xplore. Restrictios pply.