Universal coding for classes of sources



Similar documents
Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 7 Methods of Finding Estimators

5: Introduction to Estimation

THE ABRACADABRA PROBLEM

Overview of some probability distributions.

Normal Distribution.

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Soving Recurrence Relations

Department of Computer Science, University of Otago


Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

5 Boolean Decision Trees (February 11)

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Convexity, Inequalities, and Norms

3 Basic Definitions of Probability Theory

1 The Gaussian channel

How To Find The Optimal Data Compressio

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Maximum Likelihood Estimators.

Hypothesis testing. Null and alternative hypotheses

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

A probabilistic proof of a binomial identity

1 Computing the Standard Deviation of Sample Means

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Modified Line Search Method for Global Optimization

I. Chi-squared Distributions

How To Solve The Homewor Problem Beautifully

Ekkehart Schlicht: Economic Surplus and Derived Demand

THE HEIGHT OF q-binary SEARCH TREES

Asymptotic Growth of Functions

PSYCHOLOGICAL STATISTICS

Theorems About Power Series

Measures of Spread and Boxplots Discrete Math, Section 9.4

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Output Analysis (2, Chapters 10 &11 Law)

Overview on S-Box Design Principles

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Exploratory Data Analysis

Descriptive Statistics

On the Capacity of Hybrid Wireless Networks

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps

Statistical inference: example 1. Inferential Statistics

1. C. The formula for the confidence interval for a population mean is: x t, which was

CHAPTER 3 DIGITAL CODING OF SIGNALS

Sampling Distribution And Central Limit Theorem

Infinite Sequences and Series

Plug-in martingales for testing exchangeability on-line

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Section 11.3: The Integral Test

Incremental calculation of weighted mean and variance

Stock Market Trading via Stochastic Network Optimization

MARTINGALES AND A BASIC APPLICATION

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Chapter 14 Nonparametric Statistics

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Determining the sample size

CS100: Introduction to Computer Science

Systems Design Project: Indoor Location of Wireless Devices

The Stable Marriage Problem

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Quadrat Sampling in Population Ecology

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

MTO-MTS Production Systems in Supply Chains

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Math C067 Sampling Distributions

Estimating Probability Distributions by Observing Betting Practices

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Capacity of Wireless Networks with Heterogeneous Traffic

INVESTMENT PERFORMANCE COUNCIL (IPC)

Domain 1 - Describe Cisco VoIP Implementations

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

LECTURE 13: Cross-validation

Statistical Learning Theory

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

THE TWO-VARIABLE LINEAR REGRESSION MODEL

A Mathematical Perspective on Gambling

Transcription:

Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric sources, ad will ow start developig mathematical tools i order to ivestigate properties of uiversal codes that oer uiversal compressio w.r.t. a class of parametric sources. Prelimiaries Cosider a class of parametric models, where the parameter set characterizes the distributio for a specic source withi this class, {p ( ), }. Example Cosider the class of memoryless sources over a alphabet α = {, 2,..., r}. Here we have = {p (), p (2),..., p (r )}. () The goal is to d a xed to variable legth lossless code that is idepedet of, which is ukow, yet achieves l (X ) E H (X), (2) where expectatio is take w.r.t. the distributio implied by. We have see for p (x) = 2 p (x) + 2 p 2 (x) (3) that a code that is good for two sources (distributios) p ad p 2 exists, modulo the oe bit loss here. As a expasio beyod this idea, cosider p (x) = dw () p (X), (4) where w () is a prior. Example 2 Let us revisit the memoryless source, choose r = 2, ad dee the scalar parameter Versio.2: May 6, 203 2:48 pm -0500 http://creativecommos.org/liceses/by/3.0/ "Source models", (2) <http://cx.org/cotet/m4623/latest/#uid> http://cx.org/cotet/m46228/.2/

Coexios module: m46228 2 The = Pr (X i = ) = Pr (X i = 0). (5) ad p (x) = X () ( ) X (0) (6) Moreover, it ca be show that p (x) = 0 d X () ( ) X (0). (7) p (x) = X (0)! X ()!, (8) ( + )! this result appears i Krichevsky ad Tromov [2]. Is the source X implied by the distributio p (x) a ergodic source? Cosider the evet lim i= X i 2. Owig to symmetry, i the limit of large the probability of this evet uder p (x) must be 2, Pr{ lim X i 2 } = 2. (9) i= O the other had, recall that a ergodic source must allocate probability 0 or to this avor of evet. Therefore, the source implied by p (x) is ot ergodic. Recall the deitios of p (x) ad p (x) i (6) ad (7), respectively. Based o these deitios, cosider the followig, H (X ) = X A p (X ) logp (X ) = H (X Θ = ), H (X ) = X p (X ) logp (X ), H (X Θ) = dw () H (X Θ = ). We get the followig quatity for mutual iformatio betwee the radom variable Θ ad radom sequece X N, (0) I (Θ; X ) = H (X ) H (X Θ). () Note that this quatity represets the gai i bits that the parameter creates; more about this quatity will be metioed later. 2 Redudacy We ow dee the coditioal redudacy, r (l, ) = [E (l (X )) H (X )], (2) this quaties how far a codig legth fuctio l is from the etropy where the parameter is kow. Note that l (X ) = dw () E (l (X )) H (X ). (3) http://cx.org/cotet/m46228/.2/

Coexios module: m46228 3 Deote by c the collectio of lossless codes for legth- iputs, ad dee the expected redudacy of a code l C by The asymptotic expected redudacy follows, R (w, l) = dw () r (l, ), R (w) = if l C R (w, l). (4) R (w) = lim R (w), (5) assumig that the limit exists. We ca also dee the miimum redudacy that icorporates the worst prior for parameter, while keepig the best code. Similarly, R = sup R (w), (6) w W Let us derive R, R = lim R. (7) R = supif dw () [E (l (X )) H (X Θ = )] w l = supif E p [l (X ) H (X Θ)] w l = sup [H (X ) H (X Θ)] w = sup I (Θ; X ) = C, w where C is the capacity of a chael from the sequece x to the parameter [4]. That is, we try to estimate the parameter from the oisy chael. I a aalogous maer, we dee (8) R + = ifsupr (l, ) = if l sup = if Q l E sup where Q is the prior iduced by the codig legth fuctio l. [ log p (x ) 2 l(x ) D (P Q), ] (9) 3 Miimal redudacy Note that Therefore, w, l, R + = if l sup supr (l, ) r (l, ) supif w w (d) r (l, ) if w (d) r (l, ). l c l (20) w (d) r (l, ) = R. (2) http://cx.org/cotet/m46228/.2/

Coexios module: m46228 4 I fact, Gallager showed that R + = R. That is, the mi-max ad max-mi redudacies are equal. Let us revisit the Beroulli source p where = [0, ]. From the deitio of (6), which relies o a uiform prior for the sources, i.e., w () =,, it ca be show that there there exists a uiversal code with legth fuctio l such that [ ( )] E [l (x x () )] E h 2 + log ( + ) + 2, (22) where h 2 (p) = plog (p) ( p) log ( p) is the biary etropy. That is, the redudacy is approximately log () bits. Clarke ad Barro [] studied the weightig approach, p (x) = dw () p (x), (23) ad costructed a prior that achieves R = R + precisely for memoryless sources. Theorem 5 [] For memoryless source with a alphabet of size r, = (p (0), p (),, p (r )), R (w) = r ( ) ( ) I () 2 log + w (d) log + O (), (24) 2πe w () where O () vaishes uiformly as for ay compact subset of, ad [ ( lp ) ( ) ] T (x i ) lp (x i ) I () E is Fisher's iformatio. Note that whe the parameter is sesitive to chage we have large I (), which icreases the redudacy. That is, good sesitivity meas bad uiversal compressio. Deote I () J () = I (' ) d, (26) ' this is kow as Jerey's prior. Usig w () = J (), it ca be show that R = R +. Example 3 Let us derive the Fisher iformatio I () for the Beroulli source, p (x) = x() ( ) x(0) lp (x) = x () l + x (0) l ( ) E lp (x) ( lp (x) [ ( lp (x) = x () x (0) ) 2 = 2 x () ) 2 ] 2 + 2 x (0) ( ) 2 2x()x(0) ( ) = 2 + ( ) 2 2 ( ) E [ x () x (0)] = + 0 = ( ). Therefore, the Fisher iformatio satises I () = ( ). Example 4 Recall the KrichevskyTromov codig, which was metioed i Example 2. Usig the deitio of Jereys' prior (26), we see that J (). Takig the itegral over Jeery's prior, ( ) (25) (27) http://cx.org/cotet/m46228/.2/

Coexios module: m46228 5 p J (x ) = 0 c d x() ( ) x(0) ( ) = c 0 x() 2 ( ) x(0) 2 d Γ( x(0)+ = 2)Γ( x()+ 2) πγ(+), where we used the gamma fuctio. It ca be show that (28) where p J (x ( ) = p J xt+ x t ), (29) t=0 p J (x t+ x t ) = p J(x t+ ) p J(x t ), p J (x t+ = 0 x t ) = t x (0)+ 2 t+, p J (x t+ = x t ) = t x ()+ 2 t+. Similar to before, this uiversal code ca be implemeted sequetially. It is due to Krichevsky ad Tromov [2], its redudacy satises Theorem 5 (p. 4) by Clarke ad Barro [], ad it is commoly used i uiversal lossless compressio. 4 Rissae's boud Let us cosider o a ituitive level why C r 2 log() (30). Expedig r log () bits allows to dieretiate betwee ( ) r parameter vectors. That is, we would dieretiate betwee each of the r parameters with levels. Now cosider a Beroulli RV with (ukow) parameter. Oe perspective is that with drawigs of the RV, the stadard deviatio i the umber of 's is O ( ). That is, levels dieretiate betwee parameter levels up to a resolutio that reects the radomess of the experimet. A secod perspective is that of codig a sequece of Beroulli outcomes with a imprecise parameter, where it is coveiet to thik of a uiversal code i terms of rst quatizig the parameter ad the usig that (imprecise) parameter to ecode the iput x. parameter ML satises 2 For the Beroulli example, the maximum likelihood ML = argmax{ x() ( ) x(0) }, (3) ad pluggig this parameter = ML ito p (x) miimizes the codig legth amog all possible parameters,. It is readily see that ML = x (). (32) Suppose, however, that we were to ecode with ' = ML +. The the codig legth would be l (x) = log ( ( ' ) x()( ' ) x(0) ). (33) It ca be show that this codig legth is suboptimal w.r.t. l ML (x) by O ( 2) bits. Keep i mid that doublig the umber of parameter levels used by our uiversal ecoder requires a extra bit to ecode the extra factor of 2 i resolutio. It makes sese to exped this extra bit oly if it buys us at least oe other bit, meaig that O ( 2) =, which implies that we ecode ML to a resolutio of /, correspodig to O ( ) levels. Agai, this is a redudacy of 2log () bits per parameter. http://cx.org/cotet/m46228/.2/

Coexios module: m46228 6 Havig described Rissae's result ituitively, let us formalize matters. Cosider {p, }, where R K is a compact set. Suppose that there exists a estimator ^ such that (c) : p { ^ (x ) > c } δ (c), (34) where lim c δ (c) = 0. The we have the followig coverse result. Theorem 6 (Coverse to uiversal codig [5]) Give a parametric class that satises the above coditio (34), for all ε > 0 ad all codes l that do ot kow, r (l, ) ( ε) K log (), (35) 2 except for a class of i B ε () whose Lebesgue volume shriks to zero as icreases. That is, a uiversal code caot compress at a redudacy substatialy below 2log () bits per parameter. Rissae also proved the followig achievable result i his semial paper. Theorem 7 (Achievable to uiversal codig [5]) If p (x) is twice dieretiable i for every x, the there exists a uiversal code such that : r (l, ) ( + ε) K log() 2. 5 Uiversal codig for piecewise i.i.d. sources We have emphasized statioary parametric classes, but a parametric class ca be ostatioary. Let us show how uiversal codig ca be achieved for some ostatioary classes of sources by providig a example. Cosider = {0,,..., } where p (x ) = Q ( x ) Q2 ( x + ), (36) where Q ad Q 2 are both kow i.i.d. sources. This is a piecewise i.i.d. source; i each segmet it is i.i.d., ad there is a abrupt trasitio i statistics whe the rst segmet eds ad the secod begis. Here are two approaches to codig this source.. Ecode the best idex ML usig log ( + ) bits, the ecode p ML (x ). This is kow as two-part code or plug-i; after ecodig the idex, we plug the best parameter ito the distributio. Clearly, l (x) = mi 0 logp (x) + log ( + ) logp (x) + log ( + ) + 2. 2. The secod approach is a mixture, we allocate weights for all possible parameters, ( ) l (x) = log + i=0 p i (x ( ) < log + p ML (x ) = log (p ML (x)) + log ( + ). (37) (38) Merhav [3] provided redudacy theorems for this class of sources. Algorithmic approaches to the mixture appear i Shamir ad Merhav [6] ad Willems [7]. The theme that is commo to both approaches, the plug-i ad the mixture, is that they lose approximately log () bits i ecodig the locatio of the trasitio. Ideed, Merhav showed that the pealty for each trasitio i uiversal codig is approximately log () bits [3]. Ituitively, the reaso that the redudacy required to ecode the locatio of the trasitio is larger tha the 2log () from Rissae [5] is because the locatio of the trasitio must be described precisely to prevet payig a big codig legth pealty i ecodig segmets usig the wrog i.i.d. statistics. I cotrast, i ecodig our Beroulli example http://cx.org/cotet/m46228/.2/

Coexios module: m46228 7 a imprecisio of i ecodig ML i the rst part of the code yields oly a O () bit pealty i the secod part of the code. It is well kow that mixtures out-compress the plug-i. However, i may cases they do so by oly a small amout per parameter. For example, Baro et al. showed that the plug-i for i.i.d. sources loses approximately bit per parameter w.r.t. the mixture. Refereces [] B.S. Clarke ad A.R. Barro. Jereys' prior is asymptotically least favorable uder etropy risk. J. Stat. Plaig Iferece, 4():3782;60, 994. [2] R. Krichevsky ad V. Tromov. The performace of uiversal ecodig. IEEE Tras. If. Theory, 27(2):9982;207, 98. [3] N. Merhav. O the miimum descriptio legth priciple for sources with piecewise costat parameters. IEEE Tras. If. Theory, 39(6):96282;967, 993. [4] N. Merhav ad M. Feder. A strog versio of the redudacy-capacity theorem of uiversal codig. IEEE Tras. If. Theory, 4(3):7482;722, 995. [5] J. Rissae. Uiversal codig, iformatio, predictio, ad estimatio. IEEE Tras. If. Theory, 30(4):62982;636, Jul. 984. [6] G.I. Shamir ad N. Merhav. Low-complexity sequetial lossless codig for piecewise-statioary memoryless sources. IEEE Tras. If. Theory, 45(5):49882;59, 999. [7] F.M.J. Willems. Codig for a biary idepedet piecewise-idetically-distributed source. IEEE Tras. If. Theory, 42(6):22082;227, 996. http://cx.org/cotet/m46228/.2/