How To Find The Optimal Data Compressio

Transcription

1 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY Optimal Lossless Data Compressio: No-Asymptotics ad Asymptotics Ioais Kotoyiais, Fellow, IEEE, ad Sergio Verdú, Fellow, IEEE Abstract This paper provides a extesive study of the behavior of the best achievable rate ad other related fudametal limits i variable-legth strictly lossless compressio. I the oasymptotic regime, the fudametal limits of fixed-to-variable lossless compressio with ad without prefix costraits are show to be tightly coupled. Several precise, quatitative bouds are derived, coectig the distributio of the optimal code legths to the source iformatio spectrum, ad a exact aalysis of the best achievable rate for arbitrary sources is give. Fie asymptotic results are proved for arbitrary ot ecessarily prefix compressors o geeral mixig sources. Noasymptotic, explicit Gaussia approximatio bouds are established for the best achievable rate o Markov sources. The source dispersio ad the source varetropy rate are defied ad characterized. Together with the etropy rate, the varetropy rate serves to tightly approximate the fudametal oasymptotic limits of fixed-to-variable compressio for all but very small block legths. Idex Terms Lossless data compressio, fixed-to-variable source codig, fixed-to-fixed source codig, etropy, fiite-block legth fudametal limits, cetral limit theorem, Markov sources, varetropy, miimal codig variace, source dispersio. I. FUNDAMENTAL LIMITS A. Asymptotics: Etropy Rate For a radom source X ={P X }, assumed for simplicity to take values i a fiite alphabet A, the miimum asymptotically achievable source codig rate bits per source sample is the etropy rate, H X = lim = lim H X Eı X X ], 2 where X = X, X 2,...,X ad the iformatio i bits of a radom variable Z with distributio P Z is defied as ı Z a = log 2 P Z a, 3 Mauscript received December, 202; revised November 3, 203; accepted November 5, 203. Date of publicatio November 4, 203; date of curret versio Jauary 5, 204. S. Verdú was supported i part by the Natioal Sciece Foudatio uder Grat CCF , i part by the Ceter for Sciece of Iformatio, ad i part by the NSF Sciece ad Techology Ceter uder Grat CCF I. Kotoyiais was supported by the Europea Uio ad Greek Natioal Fuds through the Operatioal Program Educatio ad Lifelog Learig of the Natioal Strategic Referece Framework Research Fudig Program THALES: Ivestig i kowledge society through the Europea Social Fud. I. Kotoyiais is with the Departmet of Iformatics, Athes U of Eco ad Busiess, Athes 0675, Greece yiais@aueb.gr. S. Verdú is with the Departmet of Electrical Egieerig, Priceto Uiversity, Priceto, NJ USA verdu@priceto.edu. Commuicated by T. Weissma, Associate Editor for Shao Theory. Color versios of oe or more of the figures i this paper are available olie at Digital Object Idetifier 0.09/TIT The foregoig asymptotic fudametal limit holds uder the followig settigs: Almost-lossless -to-k fixed-legth data compressio: Provided that the source is statioary ad ergodic ad the ecodig failure probability does ot exceed 0 <ɛ<, the miimum achievable rate k is give by as. This is a direct cosequece of the Shao- MacMilla theorem 25]. Droppig the assumptio of statioarity/ergodicity, the fudametal limit is the limsup i probability of the ormalized iformatios 3]. 2 Strictly lossless variable-legth prefix data compressio: Provided that the limit i exists for which statioarity is sufficiet the miimal average source codig rate coverges to. This is a cosequece of the fact that for prefix codes the average ecoded legth caot be smaller tha the etropy 26], ad the miimal average ecoded legth achieved by the Huffma code, ever exceeds the etropy plus oe bit. If the limit i does ot exist, the the asymptotic miimal average source codig rate is simply the lim sup of the ormalized etropies 3]. For statioary ergodic sources, the source codig rate achieved by ay prefix code is asymptotically almost surely bouded below by the etropy rate as a result of Barro s lemma 2], a boud which is achieved by the Shao code. 3 Strictly lossless variable-legth data compressio: As we elaborate below, if o prefix costraits are imposed ad the source is statioary ad ergodic, the radom rate of the optimum code coverges i probability to. As oted i 40], 4], prefix costraits at the level of the whole ecoded file are superfluous i most applicatios. Stored files do ot rely o prefix costraits to determie the boudaries betwee files. Istead, a poiter directory cotais the startig ad edig locatios of the sequece of blocks occupied by each file i the storage medium e.g. 36]. It should be oted that otwithstadig his itroductio of the Shao code, i 34], Shao ever imposed prefix costraits, ad cosidered codig of the whole file rather tha symbol-by-symbol. B. Noasymptotics: Optimum Fixed-to-Variable Codes A fixed-to-variable compressor for strigs of legth from a alphabet A is a ijective fuctio f : A {0, } where {0, } ={, 0,, 00, 0, 0,, 000, 00,...}. 4 The legth of a strig a {0, } is deoted by la. Therefore, a block or strig, or file of symbols IEE. Persoal use is permitted, but republicatio/redistributio requires IEEE permissio. See for more iformatio.

2 778 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 a = a, a 2,...,a A is losslessly compressed by f ito a biary strig whose legth is lf a bits. If the file X = X, X 2,...,X to be compressed is geerated by a probability law P X, a basic iformatiotheoretic object of study is the distributio of the rate of the optimal compressor, as a fuctio of the blocklegth ad the distributio P X. The best achievable compressio performace at fiite blocklegths is characterized by the followig fudametal limits: R,ɛ is the lowest rate R such that the compressio rate of the best code exceeds R bits/symbol with probability ot greater tha 0 ɛ<: mi Plf X > R] ɛ. 5 f 2 ɛ, k is the best achievable excess-rate probability, amely, the smallest possible probability that the compressed legth is greater tha or equal to k: ɛ, k = mi Plf X k]. 6 f 3 R,ɛ is the smallest blocklegth at which compressio at rate R is possible with probability at least ɛ; i other words, the miimum required for 5 to hold. 4 R is the miimal average compressio rate: R = mi Elf X ]. 7 f Naturally, the fudametal limits i, 2 ad 3 are equivalet i the sese that kowledge of oe of them as a fuctio of its parameters determies the other two. For example, R,ɛ = k iff ɛ, k + ɛ<ɛ, k. 8 The miima i the fudametal limits 5, 6, 7 are achieved by a optimal compressor f that assigs the elemets of A ordered i decreasig probabilities to the elemets i {0, } ordered lexicographically as i 4. I particular, R,ɛ is the smallest R such that Plf X > R] ɛ. 9 As for 4, we observe that 8 results i: R = Elf X ] 0 = = ɛ, k k= 0 R, x dx 2 We emphasize that the optimum code f is idepedet of the desig target, i that, e.g., it is the same regardless of whether we wat to miimize average legth or the probability that the ecoded legth exceeds KB or MB. I fact, the code f possesses the followig strog stochastic competitive optimality property over ay other compressor f : Plf X k] Plf X k], k 0. 3 A optimal compressor f must satisfy the followig costructive property. Property : Let k = log 2 + A.Fork =, 2,...,k, ay optimal code f assigs strigs of legth 0,, 2,...,k to each of the k = 2 k 4 most likely elemets of A.Iflog 2 + A is ot a iteger, the f assigs strigs of legth k to the A + 2 k least likely elemets of A. Accordig to Property, the legth of the logest codeword assiged by f is log 2 A. To check this, ote that if log 2 + A is a iteger, the the maximal legth is log 2 + A = log 2 A, otherwise, the maximal legth is log 2 + A = log 2 A. Note that Property is a ecessary ad sufficiet coditio for optimality but it does ot determie f uiquely: Not oly does it ot specify how to break ties amog probabilities but ay swap betwee two codewords of the same legth preserves optimality. As i the followig example, it is coveiet, however, to thik of f as the uique compressor costructed by breakig ties lexicographically ad by assigig the elemets of {0, } i the lexicographic order of 4. The, if a is the kth most probable outcome its ecoded versio has legth lf a = log 2 k 5 Example : Suppose = 4, A ={, }, ad the source is memoryless with PX = ]> PX = ]. The the followig compressor is optimal: f4 = f4 = 0 f4 = f4 = 00 f4 = 0 f4 = 0 f4 = f4 = 000 f4 = 00 f4 = 00 f4 = 0 f4 = 00 f4 = 0 f4 = 0 f4 = f4 = Although f is ot a prefix code, the decompressor is able to recover the source file a exactly from f a ad its kowledge of ad P X. Sice the whole source file is compressed, it is ot ecessary to impose a prefix coditio i order for the decompressor to kow where the compressed file starts ad eds. Oe of the goals of this paper is to aalyze how much compressed efficiecy ca be improved i strictly lossless source codig by droppig the the prefix-free costrait at the block level.

3 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 779 C. Noasymptotics: Optimum Fixed-to-Variable Prefix Codes I this subsectio we deal with the case whe the prefix coditio is imposed at the level of the whole ecoded file, which is more efficiet tha imposig it at the level of each symbol a well-studied setup which does ot exploit memory ad which we do ot deal with i this paper. The fixed-tovariable prefix code that miimizes the average legth is the Huffma code, achievig the average compressio rate R p which is strictly larger tha R, defied as i 7 but restrictig the miimizatio to prefix codes. Also of iterest is to ivestigate the optimum rate of the prefix code that miimizes the probability that the legth exceeds a give threshold ; if the miimizatio i 5 is carried out with respect to codes that satisfy the prefix coditio, the the correspodig fudametal limit is deoted by R p,ɛ, ad aalogously ɛ p, k for 6. Note that the optimum prefix code achievig those fudametal limits will, i geeral, deped o k. The followig ew result shows that the correspodig fudametal limits, with ad without the prefix coditio, are tightly coupled: Theorem : Suppose all elemets i A have positive probability. For all : For each iteger k : { ɛ, k k < log ɛ p, k + = 2 A 6 0 k log 2 A. 2 If A is ot a power of 2, the for 0 ɛ<: R p,ɛ= R,ɛ+. 7 If A is a power of 2, the 7 holds for ɛ mi a A P X a, while we have, R p,ɛ= R,ɛ= log 2 A +, 8 for 0 ɛ<mi a A P X a. Proof: : Fix k ad satisfyig 2 k < A. Sice there is o beefit i assigig shorter legths, ay Kraft-iequalitycompliat code f p that miimizes Plf X > k] assigs strigs of legth k to each of the 2 k largest masses of P X. Assigig to all the other elemets of A biary strigs of legth equal to l max = k + log 2 A 2 k + 9 guaratees that the Kraft sum is satisfied. O the other had, accordig to Property, the optimum code f without prefix costraits ecodes each of the 2 k largest masses of P X with strigs of legths ragig from 0 to k. Therefore, Plf p X k + ] =Plf X k], 20 or, equivaletly, ɛ p, k + = ɛ, k. If 2 k A, the a zero-error -to-k code exists, ad hece ɛ p, k + = 0. This was cosidered i 5] usig the miimizatio of the momet geeratig fuctio at a give argumet as a proxy see also 4]. 2: For brevity, let p mi = mi a A P X a. From Property it follows that if A is ot a power of 2, the while if A is a power of 2, the ɛ, log 2 A = 0, 2 ɛ, log 2 A + = 0 22 ɛ, log 2 A = p mi 23 O the other had, implies that: ɛ p, log 2 A + = 0 24 ɛ p, log 2 A = ɛ, log 2 A p mi. 25 Furthermore, R p, ca be obtaied from ɛ p, aalogously to 8: R p,ɛ= i iff ɛ p, i ɛ<ɛ p, i. 26 Together with 8 ad 6, 26 implies that 7 holds if ɛ p mi. If ɛ < p mi, the 2 25 result i 8 whe A is a powerof2.if A is ot a power of 2 ad 0 ɛ<p mi, the, R,ɛ = log 2 A 27 R p,ɛ = log 2 A +, 28 completig the proof. D. Noasymptotics: Optimum Fixed-to-Fixed Almost-Lossless Codes As poited out i 40], 4], the quatity ɛ, k is itimately related to the problem of almost-lossless fixed-tofixed data compressio. Assume the otrivial compressio regime i which 2 k < A. The optimal -to-k fixed-to-fixed compressor assigs a uique strig of legth k to each of the 2 k most likely elemets of A, ad assigs all the others to the remaiig biary strig of legth k, which sigals a ecoder failure. Thus, the source strigs that are decodable error-free by the optimal -to-k scheme are precisely those that are ecoded with legths ragig from 0 to k by the optimum variable-legth code Property. Therefore, ɛ, k, defied i 6 as a fudametal limit of strictly lossless variable-legth codes is, i fact, equal to the miimum error probability of a -to-k code. Accordigly, the results obtaied i this paper apply to the stadard paradigm of almost-lossless fixed-to-fixed compressio as well as to the setup of lossless fixed-to-variable compressio without prefixfree costraits at the block level. The case 2 k A is rather trivial: The miimal probability of ecodig failure for a -to-k code is 0, which agai coicides with ɛ, k, uless A = 2 k, i which case, as we saw i 23, ɛ, k = mi a A P X a. 29

4 780 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 E. Existig Asymptotic Results Optimal Variable-Legth Codes: Based o the correspodece we just saw betwee optimal almost-lossless fixedto-fixed codes ad optimal strictly lossless fixed-to-variable codes, previous results o the asymptotics of fixed-to-fixed compressio ca be brought to bear. I particular the Shao- MacMilla theorem 25], 34] implies that for a statioary ergodic fiite-alphabet source with etropy rate H X, ad for all 0 <ɛ<, lim R,ɛ= H X. 30 Suppose X is geerated by a memoryless source with distributio, P X = P X P X P X, 3 where P X has etropy H X ad varetropy 2 σ 2 X = Varı X X. 32 For the expected legth, Szpakowski ad Verdú 38] show that the behavior of 7 for o-equiprobable sources is R = H X 2 log 2 + O, 33 whichisalsorefiedtoshowthatifı X X is o-lattice, 3 the, R = H X 2 log 2 8πeσ 2 X + o. 34 Yushkevich 44] showed that R,ɛ= H X + σx Q ɛ + o 35 a result that was refied for o-equiprobable memoryless sources such that ı X X is o-lattice by Strasse 37]: 4 R,ɛ = H X + σx Q ɛ 2 log 2 2πσ 2 Xe Q ɛ 2 + 6σ 2 X Q ɛ 2 + o 36 where Qx = 2π x e t 2 /2 dt deotes the stadard Gaussia tail fuctio, ad is the third cetered absolute momet of ı X X. Also i the cotext of memoryless sources with fiitealphabet A all whose letters have positive probabilitites, the expoetial decrease of the error probability is give by e.g. 9] the parametric expressio: lim log ɛ, R = DP X α X 37 R = H X α 38 2 σ 2 X is called miimal codig variace i 9]. 3 A discrete radom variable is lattice if all its masses are o a subset of some lattice {ν + kς ; k Z}. 4 See the discussio i Sectio V regardig Strasse s claimed proof of this result. where R H X, log A, α 0,, ad P Xα a = c Pα X a 39 with c = a A Pα X a. 2 Optimal Prefix Variable-Legth Codes: I cotrast to 33, whe a prefix-free coditio is imposed, we have the well-kow behavior for the average rate, R p = H X + O, 40 for ay source for which the limit i exists, see, e.g., 8]. It follows immediately from Theorem that the prefix-free coditio icurs o loss as far as the limit i 30 is cocered: lim R p,ɛ= H X. 4 Kotoyiais 9] gives a differet kid of Gaussia approximatio for the codelegths lf X of arbitrary prefix codes f o memoryless data X, showig that, with probability oe, lf X is evetually bouded below by a radom variable that has a approximately Gaussia distributio, lf X Z where Z D NHX, σ 2 X; 42 ad σ 2 X is the varetropy as i 32. Therefore, the codelegths lf X will have at least Gaussia fluctuatios of O ; this is further sharpeed i 9] to a correspodig law of the iterated logarithm, statig that, with probability oe, the compressed legths lf X will have fluctuatios of O l l, ifiitely ofte: With probability oe: lim sup lf X H X 2 l l σx. 43 Both results 42 ad 43 are show to hold for Markov sources as well as for a wide class of mixig sources with ifiite memory. As far as the large deviatios of the distributio of the legths, 27] shows for a class of sources with memory that either the prefix costrait or uiversality of the compressor degrade the optimum error expoet, which is i fact achieved by the Lempel-Ziv compressor. F. Outlie of Mai New Results Theorem shows that the prefix costrait oly icurs oe bit pealty as far as the distributio of the optimum rate is cocered. Note that requires the prefix code to be optimized for a give overflow probability. I cotrast, as we commeted i Sectios I-E. ad I-E.2, the pealty i average rate icurred by a hypothetical Huffma code that operated at the whole file level is larger. Sectio II gives a geeral aalysis of the distributio of the legths of the optimal lossless code for ay discrete iformatio source. We adopt a sigle-shot approach which ca be particularized to the covetioal case i which the source produces a strig of symbols of either determiistic or radom legth. I Theorems 3 ad 4 we give simple achievability ad coverse bouds, showig that the distributio fuctio of the optimal codelegths, P lf X t ], is itimately related to the distributio fuctio of the iformatio radom variable,

5 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 78 P ı X X t]. Also we observe that the optimal codelegths lf X are always bouded above by ı X X, ad we give a example where their distributios are oticeably differet. However, sigificat deviatios caot be too probable accordig to Theorem 5. Theorem 7 offers a exact, o-asymptotic expressio for the best achievable rate R,ɛ. A exact expressio for the average probability of error achieved by almost-lossless radom biig, is give i Theorem 8. Geeral o-asymptotic ad asymptotic results for the expected optimal legth, R, are obtaied i Sectio III. I Sectio IV we revisit the refied asymptotic results 42 ad 43 of 9], ad show that they remai valid for geeral ot ecessarily prefix compressors, ad for a broad class of possibly ifiite-memory sources. Sectio V examies i detail the fiite-blocklegth behavior of the fudametal limit R,ɛ for the case of memoryless sources. We prove tight, o-asymptotic ad easily computable bouds for R,ɛ. combiig the results of Theorems 7 ad 8 implies the followig approximatio: Gaussia approximatio I: For a memoryless source with etropy H X ad varetropy σ 2 X, the best achievable rate R,ɛ satisfies R,ɛ HX + σx Q ɛ 2 log 2 + O 44 I view of Theorem, the same holds for R p,ɛ i the case of prefix codes. The approximatio 44 is established by combiig the geeral results of Sectio II with the classical Berry-Essée boud 22], 30]. This approximatio is made precise i a oasymptotic way, ad all the costats ivolved are explicitly idetified. I Sectio VI, achievability ad coverse bouds Theorems 9 ad 20 are established for R,ɛ, i the case of geeral ergodic Markov sources. Those results are aalogous though slightly weaker to those i Sectio V. We also defie the varetropy rate of a arbitrary source as the limitig ormalized variace of the iformatio radom variables ı X X, ad we show that, for Markov chais, it plays the same role as the varetropy defied i 32 for memoryless sources. Those results i particular imply the followig: Gaussia approximatio II: For ay ergodic Markov source with etropy rate H X ad varetropy rate σ 2 X, the blocklegth R,ɛ required for the compressio rate to exceed + ηh X with probability o greater tha ɛ>0, satisfies, + ηh X, ɛ σ 2 X Q 2 ɛ H X η Fially, Sectio VII defies the source dispersio D as the limitig ormalized variace of the optimal codelegths. I effect, the dispersio gauges the time oe must wait for the source realizatio to become typical withi a give probability, as i 45 above, with D i place of σ 2 X. For a large class of sources icludig ergodic Markov chais of ay order, the dispersio D is show to equal the varetropy rate σ 2 X of the source. II. NON-ASYMPTOTICS FOR ARBITRARY SOURCES I this sectio we aalyze the best achievable compressio performace o a completely geeral discrete radom source. I particular, except where oted we do ot ecessarily assume that the alphabet is fiite ad we do ot exploit the fact that i the origial problem we are iterested i compressig ablockof symbols. I this way we eve ecompass the case where the source strig legth is a priori ukow at the decompressor. Thus, we cosider a give probability mass fuctio P X defied o a arbitrary fiite alphabet X, which may but is ot assumed to cosist of variable-legth strigs draw from some alphabet. The results ca the be particularized to the settig i Sectio I, lettig X A ad P X P X. Coversely, we ca simply let = isectioi to yield the settig i this sectio. The best achievable rate R,ɛ at blocklegth = is abbreviated as R ɛ = R,ɛ. By defiitio, R ɛ is the lowest R such that Plf X > R] ɛ, 46 which is equal to the quatile fuctio 5 of the iteger-valued radom variable lf X evaluated at ɛ. A. Achievability Boud Our goal is to express the distributio of the optimal codelegths lf X i terms of the distributio of ı X X.The first result is the followig simple ad powerful relatioship: Theorem 2: Assume that the elemets of X are itegervalued with decreasig probabilities: P X P X 2, ad that f is the optimal code that assigs sequetially ad lexicographically the shortest available strig. The, for all a X 6 : lf a ı X a 47 Proof: Fix i =, 2,... Sice there are i outcomes at least as likely as i we must have 7 P X i i. 48 as otherwise the sum of the probabilities would exceed. Takig log 2 of 48, results i where 5 follows as i 5. ı X i log 2 i 49 log 2 i 50 = lf i 5 5 The quatile fuctio Q: 0, ] R is the iverse of the cumulative distributio fuctio F. Specifically, Qα = mi{x : Fx = α} if the set is oempty; otherwise α lies withi a jump lim x xα Fx <α<fx α ad we defie Qα = x α. 6 A legacy of the Kraft iequality midset, the term ideal codelegth is sometimes used for ı X X, which is either a codelegth or ideal i view of 47. As we illustrate i Figure, the differece betwee both sides of 48 may be substatial. 7 This iequality was used by Wyer 43] to show that for a positive-itegervalued radom variable Z with decreasig probabilities Elog Z] H Z.

6 782 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 Theorem 2 results i the followig achievability result: Theorem 3:40] For ay a 0, P lf X a ] P ı X X a]. 52 Proof: Sice either the actual choice of f or the labelig of the elemets of X ca affect either side of 52, we are free to choose those specified i Theorem 2. The, 52 follows immediately. We ote that either Theorem 2 or Theorem 3 require X to take values o a fiite alphabet. Theorem 3 is the startig poit for the achievability result for R,ɛ established for Markov sources i Theorem 9. B. Coverse Bouds I Theorem 4 we give a correspodig coverse result; cf. 40]. It will be used later to obtai sharp coverse bouds for R,ɛ for memoryless ad Markov sources, i Theorems 8 ad 20, respectively. Theorem 4: For ay oegative iteger k, { P ı X X k + τ] 2 τ } P lf X k ]. 53 sup τ>0 Proof: As i the proof of Theorem 3, we label the values take by X as the positive itegers i decreasig probabilities. Fix a arbitrary τ>0. Defie: L ={i X : P X i 2 k τ } 54 C ={, 2,...2 k }. 55 The, abbreviatig P X B = i B P X i, B X, P ı X X k + τ ] = P X L 56 = P X L C + P X L C c 57 P X L C + P X C c 58 2 k 2 k τ + P X C c 59 < 2 τ + P log 2 X k ] 60 = 2 τ + P lf X k ], 6 where 6 follows i view of 5. Next we give aother geeral coverse boud, similar to that of Theorem 4, where this time we directly compare the codelegths lfx of a arbitrary compressor with the values of the iformatio radom variable ı X X. Whereas from 47 we kow that lfx is always smaller tha ı X X, Theorem 5 says that it caot be much smaller with high probability. This is a atural aalog of the correspodig coverse established for prefix compressors i 2], ad stated as Theorem 6 below. Applied to a fiite-alphabet source, Theorem 5 is the key boud i the derivatio of all the poitwise asymptotic results of Sectio IV, amely, Theorems 2, 3 ad 4. It is also the mai techical igrediet of the proof of Theorem 23 i Sectio VII statig that the source dispersio is equal to its varetropy. Theorem 5: For ay compressor f ad ay τ>0: P lfx ı X X τ ] 2 τ log 2 X Proof: Lettig {A} deote the idicator fuctio of the evet A, the probability i 62 ca be bouded above as PlfX ı X X τ] = { P X x P X x 2 τ lfx} 63 x X 2 τ x X log 2 X 2 τ j=0 2 lfx 64 2 j 2 j, 65 where the sum i 64 is maximized if f assigs a strig of legth j + oly if it also assigs all strigs of legth j. Therefore, 65 holds because that code cotais all the strigs of legths 0,,..., log 2 X plus X 2 log 2 X + 2 log 2 X 66 strigs of legth log 2 X. We saw i Theorem that the optimum prefix code uder the criterio of miimum excess legth probability icurs a pealty of at most oe bit. The followig elemetary coverse is derived i 2], 9]; its short proof is icluded for completeess. Theorem 6: For ay prefix code f, adayτ>0: Proof: to 64, PlfX < ı X X τ] 2 τ. 67 We have, as i the proof of Theorem 5 leadig PlfX ı X X τ] 2 τ 2 lfx x X 68 2 τ, 69 where 69 is Kraft s iequality. C. Exact Fudametal Limit The followig result expresses the o-asymptotic data compressio fudametal limit R ɛ = R,ɛ as a parametric fuctio of the source iformatio spectrum. Theorem 7: For all a 0, the exact miimum rate compatible with give excess-legth probability satisfies, with R ɛ = log 2 + M2 a, 70 ɛ = Pı X X a], 7 where Mβ deotes the umber of masses with probability strictly larger tha β, ad which ca be expressed as: Mβ = β P ı X X <log 2 β ] β P ı X X log 2 t ] dt. 72 Proof: As above, the values take by X are labeled as the positive itegers i order of decreasig probability. By the defiitio of M, for ay positive iteger i, ada > 0, P X i 2 a + M2 a i, 73

7 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 783 Therefore, P log 2 X log 2 + M X 2 a ] = P ı X X a] 74 Moreover, because of 5, for all b 0, P log 2 X b ] = P lf X X b] 75 P log 2 X b ] 76 Particularizig 75 to b = log 2 + M X 2 a ad lettig ɛ< be give by 7 we see that if R = log 2 + M X 2 a, ad Plf X > R] =ɛ 77 Plf X > R] >ɛ 78 if R < log 2 + M X 2 a. Therefore, 70 holds. The proof of 72 follows a sequece of elemetary steps: Mβ = { P X x > } 79 β x X {PX X > β = E } ] 80 P X X {PX X > β = P } ] > t dt 8 0 P X X β = P 0 β < P X X < ] dt 82 t β ] = P 0 β < P X X P P X X ] dt 83 t = β P ı X X <log 2 β ] β P ı X X log 2 t ] dt. 84 While Theorem 7 gives R ɛ = R,ɛ exactly for those ɛ which correspod to values take by the complemetary cumulative distributio fuctio of the iformatio radom variable ı X X, a cotiuous sweep of a > 0 gives a very dese grid of values, uless X whose alphabet size typically grows expoetially with i the fixed-to-variable setup takes values i a very small alphabet. From the value of a we ca obtai the probability i the right side of 7. The optimum code achieves a excess probability ɛ = P lf ] X l a for legths equal to l a = a + log 2 2 a + 2 a M2 a, 85 where the secod term is egative ad represets the exact gai with respect to the iformatio spectrum of the source. For later use we observe that if we let M X + β be the umber of masses with probability larger or equal tha β, the,8 M X + β = { P X x } 86 β x X = E exp ı X X { ı X X log 2 β }] Where typographically coveiet we use expa = 2 a Fig.. Cumulative distributio fuctios of lf X ad ı X X whe X is the umber of tails obtaied i 0,000 fair coi flips. Figure shows the cumulative distributio fuctios of lf X ad ı X X whe X is a biomially distributed radom variable: the umber of tails obtaied i 0,000 fair coi flips. Therefore, ı X X rages from 0 4 log to 0 4 ad, H X = Elf X] =6.29, 89 where all figures are i bits. This example illustrates that i the o-asymptotic regime, the optimum codig legth may be substatially smaller tha the iformatio cf. Footote 6. D. Exact Behavior of Radom Biig The followig result gives a exact expressio for the performace of radom biig for arbitrary sources cf. 32] for the exact performace of radom codig i chael codig, as a fuctio of the cumulative distributio fuctio of the radom variable ı X X via 72. I biig, the compressor is o loger costraied to be a ijective mappig. Whe the label received by the decompressor ca be explaied by more tha oe source realizatio, it chooses the most likely oe, breakig ties arbitrarily. Theorem 8: Averagig uiformly over all biig compressors f: X {, 2,...N}, results i a expected error probability equal to J X J X l E X N l, 90 + l l=0 where M is give i 72, the umber of masses whose probability is equal to P X x is deoted by: Jx = PP X X = P X x], 9 P X x ad x = M P X x +J x 92 N Proof: For the purposes of the proof, it is coveiet to assume that ties are broke uiformly at radom amog the most likely source outcomes i the bi. To verify 90, ote that give that the source realizatio is x 0 : The umber of masses with probability strictly higher tha that of x 0 is M P X x 0.

8 784 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY Correct decompressio of x 0 requires that o x with P X x >P X x 0 be assiged to the same bi as x 0. This occurs with probability: N M P X x If, i additio to x 0, its bi cotais l masses with the same probability as x 0, correct decompressio occurs, assumig 2 is satisfied, with probability +l. 4 The probability that there are l masses with the same probability as x 0 i the same bi is equal to: Jx0 l N J x0 l N l. 94 The, 90 follows sice all the bi assigmets are idepedet. Theorem 8 leads to a achievability boud for both almost-lossless fixed-to-fixed compressio ad lossless fixedto-variable compressio. However, i view of the simplicity ad tightess of Theorem 3, the mai usefuless of Theorem 8 is to gauge the suboptimality of radom biig i the fiite i fact, rather short because of computatioal complexity blocklegth regime. III. MINIMAL EXPECTED LENGTH Recall the defiitio of the best achievable rate R i Sectio I, expressed i terms of f as i 0. A immediate cosequece of Theorem 3 is the boud, R = E lf X ] H X, 95 Ideed, by liftig the prefix coditio it is possible to beat the etropy o average as we saw i the asymptotic results 33 ad 34. Lower bouds o the miimal average legth as a fuctio of H X ca be foud i ], 3], 5], ], 23], 33], 38], 42]. A explicit expressio ca be obtaied easily by labelig the outcomes as the positive itegers with decreasig probabilities as i the proof of Theorem 3: Elf X] =E log 2 X ] 96 = P log 2 X k] 97 = k= PX 2 k ]. 98 k= I cotrast, the correspodig miimum legth for prefix codes ca be computed umerically, i the special case of fiite alphabets, usig the Huffma algorithm, but o exact expressio i terms of P X is kow. Example 2: The average umber of bits required to ecode at which flip of a fair coi the first tail appears is equal to PX 2 k ]= 99 k= = 2 2 j k= j=2 k 2 2k 00 k= , 0 sice, i this case, X is a geometric radom variable with PX = j] =2 j. I cotrast, imposig a prefix costrait disables ay compressio: The optimal prefix code cosists of all, possibly empty, strigs of 0s termiatedby, achievig a average legth of 2. Example 3: Suppose X M elemets: The miimal average legth is E lf X M ] = log 2 M + M which simplifies to is equiprobable o a set of M 2 + log 2 M 2 log 2 M + 02 E lf X M ] = M + log 2 M + 2, 03 M whe M + isapowerof2. 2 For large alphabet sizes M, { lim sup H X M E lf X M ]} = 2 04 M { lim if H X M E lf X M ]} M = + log 2 e log 2 log 2 e, 05 where the etropy is expressed i bits. Theorem 9: For ay source X with fiite etropy rate, H X = lim sup H X <, 06 the ormalized miimal average legth satisfies: lim sup R = H X. 07 Proof: The achievability upper boud i 07 holds i view of 95. I the reverse directio, we ivoke the boud ]: H X Elf X ] log 2 H X + + log 2 e. 08 Upo dividig both sides of 08 by ad takig lim sup the desired result follows, sice for ay δ>0, for all sufficietly large, H X HX + δ. I view of 40, we see that the pealty icurred o the average rate by the prefix coditio vaishes asymptotically i the very wide geerality allowed by Theorem 9. I fact, the same proof we used for Theorem 9 shows the followig result: Theorem 0: For ay ot ecessarily serial, i.e. A is ot ecessarily a Cartesia product source X ={P X A }, lim R H X = lim as log as H X diverges. Elf X ] H X =, 09 IV. POINTWISE ASYMPTOTICS A. Normalized Poitwise Redudacy Before turig to the precise evaluatio of the best achievable rate R,ɛ, i this sectio we examie the asymptotic behavior of the actual codelegths lf X of arbitrary compressors f. More specifically, we examie the differece

9 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 785 betwee the codelegth lf X ad the iformatio fuctio, a quatity sometimes called the poitwise redudacy. Theorem : For ay fiite-alphabet discrete source X ad ay positive diverget determiistic sequece κ such that log lim = 0, 0 κ the followig hold. a For ay sequece {f } of codes, with probability w.p.: lim if κ lf X ı X X 0. b The sequece of optimal codes {f } achieves w.p. : lim lf κ X ı X X = 0, 2 Proof: Part a: We ivoke the geeral coverse i Theorem 5, with X ad A i place of X ad X, respectively. Fixig arbitrary ɛ>0 ad lettig τ = τ = ɛκ, we obtai that P lf X ı X X ] ɛκ 2 log 2 ɛκ log 2 A +, 3 which is summable i. Therefore, the Borel-Catelli lemma implies that the lim sup of the evet o the left side of 3 has zero probability, or equivaletly, with probability, lf X ı X X ɛκ 4 is violated oly a fiite umber of times. Sice ɛ ca be chose arbitrarily small, follows. Part b follows from a ad 47. B. Statioary Ergodic Sources Theorem 9 states that for ay discrete process X the expected rate of the optimal codes f satisfy, lim sup Elf X ] =H X. 5 The ext result shows that if the source is statioary ad ergodic, the the same asymptotic relatio holds ot just i expectatio, but also with probability. Moreover, o compressor ca beat the etropy rate asymptotically with positive probability. The correspodig results for prefix codes were established i 2], 8], 9]. Theorem 2: Suppose that X is a statioary ergodic source with etropy rate H X. i For ay sequece {f } of codes, lim if lf X H X, w.p.. 6 ii The sequece of optimal codes {f } achieves, lim lf X = H X, w.p.. 7 Proof: The Shao-Macmilla-Breima theorem states that ı X X H X, w.p.. 8 Therefore, the result is a immediate cosequece of Theorem with κ =. C. Statioary Ergodic Markov Sources We assume that the source is a statioary ergodic firstorder Markov chai, with trasitio kerel, P X X x x x, x A 2, 9 o the fiite alphabet A. Further restrictig the source to be Markov eables us to aalyze more precisely the behavior of the iformatio radom variables ad, i particular, the zeromea radom variables, Z = ı X X H X, 20 will be see to be asymptotically ormal with variace give by the varetropy rate. This geeralizes the varetropy of a sigle radom variable defied i 32. Defiitio : The varetropy rate of a radom process X is: σ 2 X = lim sup Varı X X. 2 Remarks: If X is a statioary memoryless process each of whose letters X is distributed accordig to P X, the the varetropy rate of X is equal to the varetropy of X. The varetropy of X is zero if ad oly if it is equiprobable o its support. 2 I cotrast to the first momet, we do ot kow whether statioarity is sufficiet for lim sup = lim if i 2. The difficulty appears to be that, ulike ı X, ı 2 X does ot satisfy a chai rule. 3 While the etropy rate of a Markov chai admits a two-letter expressio, the varetropy does ot. I particular, if σ 2 a deotes the varetropy of the distributio P X X a, the the varetropy of the chai is, i geeral, ot give by Eσ 2 X 0 ]. The reaso is the ozero correlatio betwee the radom variables {ı Xk X k X k X k }. 4 The varetropy rate of Markov sources is typically ozero. For example, for a first order Markov chai it was observed i 20], 44] that σ 2 = 0 if ad oly if the source satisfies the followig determiistic equipartitio property: Every strig x + that starts ad eds with the same symbol, has probability give that X = x q, for some costat 0 q. The simplest otrivial example of the varetropy of a source with memory is the followig. Example 4: Cosider a biary symmetric homogeeous statioary Markov chai with PX 2 = 0 X = ] =PX 2 = X = 0] =α 22 The etropy i bits is H X = H X + H X 2 X 23 = + hα 24 Furthermore, it is easy to check that for k =, 2,... Elog 2 P Xk+ X k X k+ X k ] = α log 2 α + α log 2 α 25

10 786 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 while for k = 2, 3,... Elog P Xk+ X k X k+ X k log P X2 X X 2 X ]=h 2 α 26 We ca the write ] E log 2 P X X 2 = E log P Xk+ X k X k+ X k 27 k= = + 2 hα + 2h 2 α + Elog 2 P X2 X X 2 X ] 28 = H 2 X + α α log 2 α 29 α Therefore, we coclude that Varı X X = α α log 2 α 30 α ad cosequetly σ 2 X = α α log 2 α 3 α which coicides with the varetropy of the Beroulli source with bias α. The followig result is kow see 3] ad the refereces therei. We iclude its proof for completeess ad for later referece. Theorem 3: Let X be a statioary ergodic fiite-state Markov chai. i The varetropy rate σ 2 X is also equal to the correspodig lim if of the ormalized variaces i 2, ad it is fiite. ii The ormalized iformatio radom variables are asymptotically ormal, i the sese that, as, ı X X H X N0,σ 2 X, 32 i distributio. iii The ormalized iformatio radom variables satisfy a correspodig law of the iterated logarithm: lim sup lim if ı X X H X 2 l l = σx, w.p. 33 ı X X H X 2 l l = σx, w.p. 34 Proof: i ad ii: Cosider the bivariate Markov chai { X = X, X + } o the alphabet B = {x, y A 2 : P X X y x >0} ad the fuctio f : B R defied by f x, y = ı X X y x. Sice X is statioary ad ergodic, so is { X }, hece, by the cetral limit theorem for fuctios of Markov chais 6], ı X X X X H X X = f X i E f X i ], 35 i= coverges i distributio to the zero-mea Gaussia law with fiite variace Furthermore, sice 2 = lim Varı X X X X. 36 ı X X H X = ı X X X X H X X + ı X X H X, 37 where the secod term is bouded, 32 must hold ad we must have 2 = σ 2 X. iii: Normalizig 35 by 2 l l istead of,we ca ivoke the law of the iterated logarithm for fuctios of Markov chais 6] to show that the lim sup / lim if of the sum behave as claimed. Sice, upo ormalizatio, the last term i 37 vaishes almost surely, ı X X H X must exhibit the same asymptotic behavior. Similarly, the followig result readily follows from Theorem 3 ad Theorem with κ = 2 l l. Theorem 4: Suppose X is a statioary ergodic Markov chai with etropy rate H X ad varetropy rate σ 2 X. The: i lf X HX N0,σ 2 X, 38 ii For ay sequece of codes {f }: lim sup lim if lf X H X 2 l l σx, w.p.; 39 lf X H X 2 l l σx, w.p.. 40 ii The sequece of optimal codes {f } achieves the bouds i 39 ad 40 with equality. As far as the poitwise ad 2 l l asymptotics the optimal codelegths exhibit the same behavior as the Shao prefix code ad arithmetic codig. However, the large deviatios behavior of the arithmetic ad Shao codes is cosiderably worse tha that of the optimal codes without prefix costraits. D. Beyod Markov The Markov sufficiet coditio i Theorem 3 eabled the applicatio of the cetral limit theorem ad the law of the iterated logarithm to the sum i 35. Accordig to Theorem 9. of 3] a more geeral sufficiet coditio is that X be a statioary process with αd = Od 336 ad γd = Od 48, where the mixig coefficiets αd ad γd are defied as: ı γd = max E a A X0 X a X ı X 0 X d a X d 4 αd = sup { PB A PBPA } 42 where the sup is over A F 0 ad B F d. Here F 0 ad F d deote the σ -algebras geerated by the collectios of radom variables X 0, X,... ad X d, X d+,..., respectively. The αd are the strog mixig

11 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 787 coefficiets 4] of X, adtheγd were itroduced by Ibragimov i 6]. Although these mixig coditios may be hard to verify i practice, they are fairly weak i that they require oly polyomial decay of αd ad γd. I particular, ay ergodic Markov chai of ay order satisfies these coditios. From the Lideberg-Feller theorem 7], aother sufficiet coditio for 32 to hold cosists i assumig that the lim sup i 2 is equal to the lim if ad is fiite, ad that for all η>0, where lim Z j = E Z j {Z j >ηv X } ] = 0 43 j= ı X j X j X j X j H X j X j Fig. 2. The optimum rate R, 0., the Gaussia approximatio R, 0. i 49, ad the upper boud R u, 0., for a Beroulli-0. source ad blocklegths V. GAUSSIAN APPROXIMATION FOR MEMORYLESS SOURCES We tur our attetio to the o-asymptotic behavior of the best rate R,ɛ that ca be achieved whe compressig a statioary memoryless source X with values i the fiite alphabet A ad margial distributio P X, whose etropy ad varetropy are deoted by H X ad σ 2 X, respectively. Specifically, we will derive explicit upper ad lower bouds o R,ɛ i terms of the first three momets of the iformatio radom variable ı X X. Although usig Theorem 7 it is possible, i priciple, to compute R,ɛ exactly, it is more desirable to derive approximatios that are both easier to compute ad offer more ituitio ito the behavior of the fudametal limit R,ɛ. Theorems 7 ad 8 imply that, for all ɛ 0, /2, the best achievable rate R,ɛ satisfies, c R,ɛ H X + σx Q ɛ log 2 ] c The upper boud is valid for all, ad the lower boud is valid for 0. Explicit values are derived for the costats 0, c ad c. I view of Theorem which states that miimizig the probability that the ecoded legth exceeds a give boud, the prefix costrait oly icurs oe bit pealty, essetially the same results as i 45 hold for prefix codes: c R p,ɛ H X + σx Q ɛ log 2 ] 2 c The bouds i 45 ad 46 result i the Gaussia approximatio 44 stated i Sectio I. Before establishig the precise o-asymptotic relatios leadig to 45 ad 46, we illustrate their utility. To facilitate this, ote that Theorem 3 immediately yields the followig simple boud: Theorem 5: For all, ɛ>0, R,ɛ R u,ɛ, 47 Fig. 3. The optimum rate R, 0. ad the Gaussia approximatio R, 0. i 49, for a Beroulli-0. source ad blocklegths where R u,ɛ is the quatile fuctio of the iformatio spectrum, i.e., the lowest R such that: ] P ı X X i R ɛ. 48 i= I Figure 2, we exhibit the behavior of the fudametal compressio limit R,ɛ i the case of coi flips with bias 0. = h 0.5 for which H X = 0.5 bits. I particular, we compare R,ɛ ad R u,ɛ for ɛ = 0.. The omootoic ature of both R,ɛ ad R u,ɛ with is ot surprisig: Although, the larger the value of, thelessweare at the mercy of the source radomess, we also eed to compress more iformatio. Figure 2 also illustrates that R,ɛ is tracked rather closely by the Gaussia approximatio, R,ɛ = H X + Q ɛ σx 2 log 2, 49 suggested by 45. Figure 3 focuses the compariso betwee R, 0. ad R, 0. o the short blocklegth rage up to 200 ot show i Figure 2. For > 60, the discrepacy betwee the two ever exceeds 4%. The remaider of the sectio is devoted to justifyig the use of 49 as a accurate approximatio to R,ɛ.Tothat ed, i Theorems 8 ad 7 we establish the bouds give i 45. Their derivatio requires that we overcome two techical hurdles:

12 788 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 The distributio fuctio of the optimal ecodig legth is ot the same as the distributio of i= ı X X i ; 2 The distributio of i= ı X X i is oly approximately Gaussia. To cope with the secod hurdle we will appeal to the classical Berry-Essée boud 22], 30]: Theorem 6: Let {Z i } be idepedet ad idetically distributed radom variables with zero mea ad uit variace, ad let Z be stadard ormal. The, for all adaya: ] P Z i a P Z a ] E Z 3 ] i= Ivokig the Berry-Essée boud, Strasse 37] claimed the followig approximatio for > 9600, δ 6 R,ɛ R,ɛ 40 δ 8, 5 where, { } δ mi σx, ɛ, ɛ, μ / = E ı X X H X 3 ]. 53 Ufortuately, there appears to be a gap i how Strasse 37] justifies the applicatio of a asymptotic form of the Berry- Essée theorem with uiform covergece, i order to boud the error itroduced by takig expectatios with respect to Gaussia approximatios cf. equatios 2.7, 3.8 ad the displayed equatio betwee 3.5 ad 3.6 i 37]. The followig achievability result holds for all blocklegths. Theorem 7: For all 0 <ɛ 2 ad all, R,ɛ H X + σx Q ɛ log log log2 e 2 2πσ 2 X + μ 3 σ 3 X + σ 2 Xφ Q ɛ+ μ 54 3 σ 3 X, as log as the varetropy σ 2 X is strictly positive, where = Q ad φ = are the stadard Gaussia distributio fuctio ad desity, respectively, ad is the third absolute momet of ı X X defied i 53. Proof: The proof follows Strasse s costructio, but the essetial approximatio steps are differet. The positive costat β is uiquely defied by: P ı X X ] log 2 β ɛ, 55 P ı X X ] <log 2 β < ɛ. 56 Sice the iformatio spectrum i.e., the distributio fuctio of the iformatio radom variable ı X X is piecewise costat, log 2 β is the locatio of the jump where the iformatio spectrum reaches or exceeds for the first time the value ɛ. Furthermore, defiig the ormalized costat, λ = log 2 β HX, 57 σx the probability i the left side of 55 satisfies ı X X ] HX P λ σx λ + 2σ 3 X, 58 where we have applied Theorem 6. Aalogously 50 also holds for P Zi < a], we obtai, ı X X ] HX P <λ σx λ 2σ 3 X. 59 Sice ɛ is sadwiched betwee the right sides of 58 ad 59, as we must have λ λ, where, λ = ɛ = Q ɛ. 60 By a simple first-order Taylor boud, λ μ 3 λ + 2σ 3 X 6 = λ + 2σ 3 X ξ 62 = λ + 2σ 3 X φ ξ, 63 for some ξ λ, λ + 2σ 3 X ]. Sice ɛ /2, we have λ 0ad λ /2, so that ξ /2. Ad sice t is strictly icreasig for all t, while φ is strictly decreasig for t 0, from 63 we obtai, λ λ + 2σ 3 X φ λ σ 3 X. The evet E i the left side of 55 cotais all the high probability strigs, ad has probability ɛ. Its cardiality is M + X β, defied i 86 with X X. Therefore, deotig, we obtai, with ϕt = 2 t {t 0} 65 Y i = σx ı XX i H X, 66 R,ɛ log 2 M+ X β 67 = log 2 E exp ı X X { ı X X } ] log 2 β 68 σx = H X + λ + log 2 α, 69 α = E ϕlog 2 β ı X X ] 70 σx = E ϕ λ ] Y i, 7 where we have used 57, expa = 2 a,ad{y i } are idepedet, idetically distributed, with zero mea ad uit variace. i=

13 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 789 Let ᾱ be defied as 7 except that Y i are replaced by Ȳ i which are stadard ormal. The, straightforward algebra yields, ] ᾱ = E 2 σxλ Ȳ {Ȳ λ } = x λσx 2 x e 2σ 2 X 72 2 dx 0 2πσ 2 X 73 log 2 e 2πσ 2 X. 74 To deal with the fact that the radom variables i 7 are ot ormal, we apply the Lebesgue-Stieltjes formula for itegratio by parts to 7. Deotig the distributio of the ormalized sum i 7 by F t, α becomes, α = λ = F λ λ 2 σxλ t df t 75 F t σx2 σxλ t dt loge 2 76 =ᾱ + F λ λ σx log e 2 λ F t t2 σxλ t dt 77 ᾱ + 2σ 3 X + μ λ 3 2σ 2 2 σxλ t dt loge 2 78 X =ᾱ + σ 3 X 79 log2 e 2πσ 2 X + μ 3 σ 3, 80 X where 78 results from applyig Theorem 6 twice. The desired result ow follows from 69 after assemblig the bouds o λ ad α i 64 ad 80, respectively. Next we give a complemetary coverse result. Theorem 8: For all 0 <ɛ< 2 ad all such that > 0 = 4 + μ 2 3 2σ 3 X φq ɛq ɛ 2, 8 the followig lower boud holds, R,ɛ H X + σx Q ɛ log σ 3 X σ 2 XφQ ɛ, 82 as log as the varetropy σ 2 X is strictly positive. Proof: Let η = 2σ 2 X + σx φq, 83 ɛ ad cosider P ı X X i HX + σx ] Q ɛ η i= = P Q ɛ + i= ı X X i H X σx Q ɛ η σx Q ɛ η σx φq ɛ 2σ 3 X 2σ 3 X ] η σx = ɛ +, 87 where 85 follows from Theorem 6, ad 86 follows from Qa Qa + φqa, 88 which holds at least as log as a > > Lettig a = Q η ɛ ad = σx, 89 is equivalet to 8. We proceed to ivoke Theorem 4 with X X, k equal to times the right side of 82, ad τ = 2 log 2.Iviewof the defiitio of R,ɛ ad 84 87, the desired result follows. VI. GAUSSIAN APPROXIMATION FOR MARKOV SOURCES Let X be a irreducible, aperiodic, kth order Markov chai o the fiite alphabet A, with trasitio probabilities, P X X k x k+ x k, x k+ A k+, 90 ad etropy rate H X. Note that we do ot assume that the source is statioary. I Theorem 3 of Sectio IV we established that the varetropy rate defied i geeral i equatio 2, for statioary ergodic chais exists as the limit, σ 2 X = lim Varı X X. 9 A examiatio of the proof shows that, by a applicatio of the geeral cetral limit theorem for uiformly ergodic Markov chais 6], 28], the assumptio of statioarity is ot ecessary, ad 9 holds for all irreducible aperiodic chais. Theorem 9: Suppose X is a irreducible ad aperiodic kth order Markov source, ad let ɛ 0, /2. The, there is a positive costat C such that, for all large eough, R,ɛ HX + σx Q ɛ + C, 92 where the varetropy rate σ 2 X is give by 9 ad it is assumed to be strictly positive. Theorem 20: Uder the same assumptios as i Theorem 9, for all large eough, R,ɛ HX + σx Q ɛ 2 log 2 C, 93 where C > 0 is a fiite costat, possibly differet from tha i Theorem 9.

14 790 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 Remarks: By defiitio, the lower boud i Theorem 20 also applies to R p,ɛ, while i view Theorem, the upper boud i Theorem 9 also applies to R p,ɛ provided C is replaced by C +. 2 Note that, ulike the direct ad coverse codig theorems for memoryless sources Theorems 7 ad 8, respectively i the results of Theorems 9 ad 20 we do ot give explicit bouds for the costat terms. This is because the mai probabilistic tool we use i the proofs the Berry-Essée boud i Theorem 6 does ot have a equally precise couterpart for Markov chais. Specifically, i the proof of Theorem 2 below we appeal to a Berry-Essée boud established by Nagaev i 29], which does ot give a explicit value for the multiplicative costat A i If we restrict our attetio to the much more arrow class of reversible chais, the it is ideed possible to apply the Berry-Essée boud of Ma 24] to obtai explicit values for the costats i Theorems 9 ad 20; but the resultig values are pretty loose, drastically limitig the egieerig usefuless of the resultig bouds. For example, i Ma s versio of the Berry-Essée boud, the correspodig right side of the iequality i Theorem 6 is multiplied by a factor of Therefore, we have opted for the less explicit but much more geeral statemets give above. 4 Similar commets to those i the last two remarks apply to the observatio that Theorem 9 is a weaker boud tha that established i Theorem 7 for memoryless sources, by a /2 log 2 term. Istead of restrictig our result to the much more arrow class of reversible Markov chais, or extedig the ivolved proof of Theorem 7 to the case of a Markovia source, we chose to illustrate how this weaker boud ca be established i full geerality, with a much shorter proof. 5 The proof of Theorem 9 shows that the costat i its statemet ca be chose as C = 2AσX φq ɛ, 94 for all 2A 2 πeφq ɛ 4, 95 where A is the costat appearig i Theorem 2, below. Similarly, from the proof of Theorem 20 we see that the costat i its statemet ca be chose as σxa + C = φq +, 96 ɛ for all, A + 2 Q ɛφq. 97 ɛ Note that, i both cases, the values of the costats ca easily be improved, but they still deped o the implicit costat A of Theorem 2. As metioed above, we will eed a Berry-Essée-type boud o the scaled iformatio radom variables, ı X X HX. σx Beyod the Shao-McMilla-Breima Theorem, several more refied asymptotic results have bee established for this sequece; see, i particular, 6], 3], 37], 44] ad the discussios i 20] ad i Sectio IV. Ulike these asymptotic results, we will use the followig o-asymptotic boud. Theorem 2: For a ergodic, kth order Markov source X with etropy rate H X ad positive varetropy rate σ 2 X, there exists a fiite costat A > 0 such that, for all, sup P ı X X HX>z σx ] Qz A. 98 z R Proof: For itegers i j, we adopt the otatio x j i ad X j i for blocks of strigs x i, x i+,...,x j ad radom variables X i, X i+,...,x j, respectively. For all x +k A +k such that P X k x k >0adP X X j x j x j j k >0, j k for j = k +, k + 2,... + k, wehave ı X x = log 2 = = k+ j=k+ P X k x k j=k+ P X X j j k log 2 P X X j j k x j x j j k log 2 P X k x k +k j=+ P X X j j k x j x j j k 99 x j x j j k 200 f x j+k +, 20 j= where the fuctio f : A k+ R is defied o the state space by ad A ={x k+ A k+ : P X X k x k+ x k >0}. 202 f x k+ = ı X X k x k+ x k 203 = log 2 P X X k x k+ x k, 204 = log 2 +k The we ca boud, j=+ P X X j j k P X k x k x j x j j k. 205 δ 206 = max log P X k x k ] 2 +k j=+ P X X k x j x j j k <, 207 where the maximum is over the positive probability strigs for which we have established 20.

15 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 79 Let {Y } deote the first-order Markov source defied by takig overlappig k + -blocks i the origial chai, Y = X, X +,...,X +k. 208 Sice X is irreducible ad aperiodic, so is {Y }. Now, sice the chai {Y } is irreducible ad aperiodic o a fiite state space, coditio 0.2 of 29] is satisfied, ad sice the fuctio f is bouded, Theorem of 29] implies that there exists a fiite costat A such that, for all, j= sup P f Y j HX σx z R ] > z where the etropy rate is H X = E f Ỹ ] ad σ 2 X = lim E j= Qz A, ] f Ỹ j HX, 20 where {Ỹ } is a statioary versio of {Y }, that is, it has the same trasitio probabilities but its iitial distributio is its uique ivariat distributio, PỸ = x k+ ]=πx k P X X k x k+ x k, 2 where π is the uique ivariat distributio of the origial chai X. Sice the fuctio f is bouded ad the distributio of the chai {Y } coverges to statioarity expoetially fast, it is easy to see that 20 coicides with the source varetropy rate. Let F z, G z deote the complemetary cumulative distributio fuctios F z = P ı X X HX >z σx ], 22 G z = P f Y j HX >z σx. 23 j= Sice F z ad G z are o-icreasig, 20 ad 207 imply that F z G z + δ σ 24 Q z + δ σ A, 25 Qz A, 26 uiformly i z, where 25 follows from 209, ad 26 holds with A = A +δ/ 2π sice Q z = φz is bouded by / 2π. A similar argumet shows F z G z δ/ 27 Qz δ/ + A 28 Qz + A. 29 Sice both 26 ad 29 hold uiformly i z R, together they form the statemet of the theorem. Proof of Theorem 9: Let K = HX + σ Q ɛ + C, 220 where we abbreviate σ = σx ad, for ow C is a arbitrary positive costat. Theorem 3 with X i place of X states Plf X K ] Pı X X K ] 22 Q Q ɛ + C σ + A, 222 where 222 follows from Theorem 2. Sice Q x = φx Q x = xφx, 2πe x 0, 224 a secod-order Taylor expasio of the first term i the right side of 222 gives Plf X K ] ɛ C σ { φq ɛ C 2σ 2πe Aσ }, 225 C ad choosig C as i 94 for satisfyig 95 the right side of 225 is bouded above by ɛ. Therefore, Plf X > K ] ɛ, which, by defiitio implies that R,ɛ K,as claimed. Proof of Theorem 20: Applyig Theorem 4 with X i place of X ad with δ>0adk arbitrary, we obtai Plf X K ] ] P ı X X K + δ 2 δ 226 K HX + δ Q σ A 2 δ, 227 where 227 ow follows from Theorem 2. Lettig δ = δ = 2 log 2 ad yields K = HX + σ Q ɛ δ σa + φq ɛ, 228 Plf X K ] Q Q A + ɛ φq ɛ A >ɛ 230 which holds as log as ɛ 0, /2 ad A + 2 Q ɛφq, 23 ɛ i order to esure that the argumet of the Q fuctio i 229 is oegative. Note that i 229 we have used a two-term Taylor expasio of Q based o Q ɛ > 0adQ x = φx. We coclude from 230 that R,ɛ > K, as claimed.

16 792 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 VII. SOURCE DISPERSION AND VARENTROPY Traditioally, refied aalyses i lossless data compressio have focused attetio o the redudacy, defied as the differece betwee the miimum average compressio rate ad the etropy rate. As metioed i Sectio I-E, the persymbol redudacy is positive ad of order O whe the prefix coditio is eforced, while it is 2 log 2 + O without the prefix coditio. But sice the results i Sectios V ad VI show that the stadard deviatio of the best achievable compressio rate is of order O, the deviatio of the rate from the etropy will be domiated by these stochastic fluctuatios. Therefore, as oted i 9], it is of primary importace to aalyze the variace of the optimal codelegths. To that ed, we itroduce the followig operatioal defiitio: Defiitio 2: The dispersio D measured i bits 2 of a source X is D = lim sup Varlf X, 232 where lf is the legth of the optimum fixed-to-variable lossless code cf. Sectio I-B. As we show i Theorem 23 below, for a broad class of sources, the dispersio D is equal to the source varetropy rate σ 2 X defied i 2. Moreover, i view of the Gaussia approximatio bouds for R,ɛ i Sectios V ad VI ad more geerally, as log as a similar two-term Gaussia approximatio i terms of the etropy rate ad varetropy rate ca be established up to o/ accuracy we ca coclude the followig: By the defiitio of R,ɛ i Sectio I-B, the source blocklegth required for the compressio rate to exceed + ηh X with probability o greater tha ɛ>0is approximated by + ηh X, ɛ σ 2 X Q 2 ɛ H X η = D Q 2 ɛ H 2, 234 X η i.e., by the product of a factor that depeds oly o the source through σ 2 X/H 2 X, ad a factor that depeds oly o the desig requiremets ɛ ad η. Note the close parallel with the otio of chael dispersio itroduced i 32]. Example 5: Coi flips with bias p have varetropy, σ 2 X = p p log 2 p, 235 p so the key parameter i 234 which characterizes the time horizo required for the source to become typical is σ 2 X H 2 X = D p 2 log2 p H 2 = p p 236 X hp where h deotes the biary etropy fuctio i bits. I view of Example 4, the ormalized dispersio for the biary symmetric Markov chai with trasitio probability p is also give by 236. Example 6: For a memoryless source whose margial is the geometric distributio, P X k = q q k, k 0, 237 Fig. 4. sources. Normalized dispersio as a fuctio of etropy for memoryless the ratio of varetropy to squared etropy is σ 2 X H 2 X = D log2 q 2 H 2 = q. 238 X hq Figure 4 compares the ormalized dispersio to the etropy for the Beroulli, geometric ad Poisso distributios. We see that, as the source becomes more compressible lower etropy per letter, the horizo over which we eed to compress i order to squeeze most of the redudacy out of the source gets loger. Defiitio 3: A source X takig values o the fiite alphabet A is a liear iformatio growth source if the probability of every strig is either zero or is asymptotically lower bouded by a expoetial, that is, if there is a fiite costat A ad ad a iteger N 0 suchthat,forall N 0,every ozero-probability strig x A satisfies ı X x A. 239 Ay memoryless source belogs to the class of liear iformatio growth. Also ote that every irreducible ad aperiodic Markov chai is a liear iformatio growth source: Writig q for the smallest ozero elemet of the trasitio matrix, ad π for the smallest ozero probability for X, we easily see that 239 is satisfied with N 0 = 2, A = log 2 /q + log 2 q/π. The class of liear iformatio growth sources is related, at least at the level of ituitio, to the class of fiite-eergy processes cosidered by Shields 35] ad to processes satisfyig the Doebli-like coditio of Kotoyiais ad Suhov 2]. We proceed to show a iterestig regularity result for liear iformatio growth sources: Lemma : Suppose X is a ot ecessarily statioary or ergodic liear iformatio growth source. The: 2 ] lim E lf X ı X X = Proof: For brevity, deote l = lf X ad ı = ı X X, respectively. Select a arbitrary τ. The expectatio of iterest is El ı 2 ]=El ı 2 {l ı τ }] +El ı 2 {l < ı τ }]. 24

17 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 793 Sice l ı always, o the evet {l ı τ } we have l ı 2 τ 2. Also, by the liear iformatio growth assumptio we have the boud 0 ı l ı C for a fiite costat C ad all large eough. Combiig these two observatios with Theorem 5, we obtai that El ı 2 ] τ 2 + C2 2 P{l < ı τ } 242 τ 2 + C2 2 2 τ log 2 A τ 2 + C 3 2 τ, 244 for some C < ad all large eough. Takig τ = 3log 2, dividig by ad lettig gives the claimed result. Note that we have actually proved a stroger result, amely, 2 ] E lf X ı X X = Olog Liear iformatio growth is sufficiet for dispersio to equal varetropy rate: Theorem 22: If the source X has liear iformatio growth ad fiite varetropy rate, the: D = σ 2 X. 246 Proof: For otatioal coveiece, we abbreviate H X as H, ad as i the previous proof we write, l = lf X ad ı = ı X X. Expadig the defiitio of the variace of l, we obtai, Varlf X = E l El ] 2] ] = E l ı + ı H El ı ] 248 = El ı 2 ]+Eı H 2 ] E 2 l ı ] +2 El ı ı H ], 249 ad therefore, usig the Cauchy-Schwarz iequality twice, Varlf X Varı X X = El ı 2 ] E 2 l ı ] +2 El ı ı H ] El ı 2 ] +2 El ı 2 ]σı X X. 25 where σ idicates the stadard deviatio. Dividig by ad lettig, we obtai that the first term teds to zero by Lemma, ad the secod term becomes the square root of 4 El ı 2 ] Varı X X 252 which also teds to zero by Lemma ad the fiite-varetropy rate assumptio. Therefore, lim Varlf X Varı X X =0, 253 which, i particular, implies that σ 2 X = D. I view of 245, if we ormalize by log, istead of i the last step of the proof of Theorem 22, we obtai the stroger result: Varlf X Varı X X =O log Also, Lemma ad Theorem 22 remai valid if istead of the liear iformatio growth coditio we ivoke the weaker assumptio that there exists a sequece ɛ = o, such that max x : P X x =0 ı X x = o 2 ɛ / For Markov chais, the various results satisfied by varetropy ad dispersio are as follows. Theorem 23: Let X be a irreducible, aperiodic ot ecessarily statioary Markov source with etropy rate H X. The: The varetropy rate σ 2 X defied i 2 exists as the limit σ 2 X = lim Varı X X The dispersio D defied i 232 exists as the limit D = lim Varlf X D = σ 2 X. 4 The varetropy rate or, equivaletly, the dispersio ca be characterized i terms of the best achievable rate R,ɛ as σ 2 R,ɛ H X 2 X = lim lim ɛ 0 2l 258 ɛ = lim lim ɛ 0 as log as σ 2 X is ozero. R,ɛ H X 2 Q, 259 ɛ Proof: The limitig expressio i part was already established i Theorem 3 of Sectio IV; see also the discussio leadig to 9 i Sectio VI. Recallig that every irreducible ad aperiodic Markov source is a liear iformatio growth source, combiig part with Theorem 22 immediately yields the results of parts 2 ad 3. Fially, part 4 follows from the results of Sectio VI. Uder the preset assumptios, Theorems 9 ad 20 together imply that there is a fiite costat C such that R,ɛ H X σxq ɛ log 2 + C, for all ɛ 0, /2 ad all large eough. Therefore, lim R,ɛ H X 2 = σ 2 XQ ɛ Dividig by 2 l ɛ, lettig ɛ 0, ad recallig the simple fact that Q ɛ 2 2l ɛ see, e.g., 39, Sectio 3.3] proves 259 ad completes the proof of the theorem. From Theorem 22 it follows that, for a broad class of sources icludig all ergodic Markov chais with ozero varetropy rate, Var lf lim X Var ı X X =. 262 Aalogously to Theorem 0, we could explore whether 262 might hold uder broader coditios, icludig the geeral

18 794 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 2, FEBRUARY 204 settig of possibly o-serial sources. However, cosider the followig simple example. Example 7: As i Example 3, let X M be equiprobable o asetofm elemets, the, H X M = log 2 M 263 Var ı X M X M = lim sup Var lf X M = M 4 lim if Var lf X M = M To verify 265 ad 266, defie the fuctio, K sk = i 2 2 i 267 i= = K + 3 2K + K It is straightforward to check that ] E l 2 f X M = s log M 2 M log 2 M 2 2 log 2 M + M 269 Together with 02, 269 results i, Var lf X M = 3ξ M ξm 2 + o, 270 with ξ M = 2+ log 2 M, M 27 which takes values i, 2]. O that iterval, the parabola 3x x 2 takes a miimum value of 2 ad a maximum value of 3/2 2, ad 265, 266 follow. Although the ratio of optimal codelegth variace to the varetropy rate may be ifiity as illustrated i Example 7, we do have the followig couterpart of the first-momet result i Theorem 0 for the secod momets: Theorem 24: For ay ot ecessarily serial source X ={P X }, lim El2 f X ] ] =, 272 E ı 2 X X as log as the deomiator diverges. Proof: Theorem 3 implies that ] El 2 f X ] E ı 2 X X. 273 Therefore, the lim sup i 272 is bouded above by. To establish the correspodig lower boud, fix a arbitrary ϑ>0. The, El 2 f X ] = ] P l 2 f X k 274 k = P l ] k 275 k = ] P l k 276 k k P ı X X + ϑ ] ] k 2 ϑ k, 277 where 277 follows by lettig τ = ϑ k i the coverse Theorem 4. Therefore, El 2 f X ] C ϑ + P ı 2 X X + ϑ 2 k 2] 278 k D ϑ + k P ı 2 X ] X + ϑ 3 k 279 D ϑ + + ϑ 3 Eı 2 X X ]. 280 where C ϑ, D ϑ are positive scalars that oly vary with ϑ. Note that 278 holds because a k is summable for all 0 < a < ; 279 holds because + ϑk k 2 for all sufficietly large k; ad 280 holds because the mea of a oegative radom variable is the itegral of the complemetary cumulative distributio fuctio, which i tur satisfies k+ k Fx dx Fk +, 28 Dividig both sides of by the secod momet Eı 2 X X ] ad lettig, we coclude that the ratio i 272 is bouded below by + ϑ 3.Siceϑ ca be take to be arbitrarily small, this proves that the lim if 272 is bouded below by, as required. REFERENCES ] N. Alo ad A. Orlitsky, A lower boud o the expected legth of oeto-oe codes, IEEE Tras. If. Theory, vol. 40, o. 5, pp , Sep ] A. R. Barro, Logically smooth desity estimatio, Ph.D. dissertatio, Dept. Electr. Eg., Staford Uiv., Staford, CA, USA, Sep ] C. Bludo ad R. de Prisco, New bouds o the expected legth of oe-to-oe codes, IEEE Tras. If. Theory, vol. 42, o., pp , Ja ] B. C. Bradley, Basic properties of strog mixig coditios, i Depedece i Probability ad Statistics, E. Wilel ad M. S. Taqqu, Eds. Bosto, MA, USA: Birkhäuser, 986, pp ] J. Cheg, T. K. Huag, ad C. Weidma, New bouds o the expected legth of optimal oe-to-oe codes, IEEE Tras. If. Theory, vol. 53, o. 5, pp , May ] K. L. Chug, Markov Chais with Statioary Trasitio Probabilities. New York, NY, USA: Spriger-Verlag, ] K. L. Chug, A Course i Probability Theory, 2d ed. New York, NY, USA: Academic, ] T. Cover ad J. Thomas, Elemets of Iformatio Theory, 2d ed. New York, NY, USA: Wiley, ] I. Csiszár ad J. Körer, Iformatio Theory: Codig Theorems for Discrete Memoryless Systems, 2d ed. Cambridge, U.K.: Cambridge Uiv. Press, 20. 0] A. Dembo ad I. Kotoyiais, Source codig, large deviatios, ad approximate patter matchig, IEEE Tras. If. Theory, vol. 48, o. 6, pp , Ju ] J. G. Duham, Optimal oiseless codig of radom variables Corresp., IEEE Tras. If. Theory, vol. 26, o. 3, p. 345, May ] P. Hall, Rates of Covergece i the Cetral Limit Theorem. Lodo, U.K.: Pitma, ] T. S. Ha ad S. Verdú, Approximatio theory of output statistics, IEEE Tras. If. Theory, vol. 39, o. 3, pp , May ] T. C. Hu, D. J. Kleitma, ad J. K. Tamaki, Biary trees optimum uder various criteria, SIAM J. Appl. Math., vol. 37, o. 2, pp , ] P. Humblet, Geeralizatio of Huffma codig to miimize the probability of buffer overflow Corresp., IEEE Tras. If. Theory, vol. 27, o. 2, pp , Mar ] I. A. Ibragimov, Some limit theorems for statioary processes, Theory Probab. Appl., vol. 7, o. 4, pp , 962.

19 KONTOYIANNIS AND VERDÚ: OPTIMAL LOSSLESS DATA COMPRESSION 795 7] M. Hayashi, Secod-order asymptotics i fixed-legth source codig ad itrisic radomess, IEEE Tras. If. Theory, vol. 54, o. 0, pp , Oct ] J. C. Kieffer, Sample coverses i source codig theory, IEEE Tras. If. Theory, vol. 37, o. 2, pp , Mar ] I. Kotoyiais, Secod-order oiseless source codig theorems, IEEE Tras. If. Theory, vol. 43, o. 3, pp , Jul ] I. Kotoyiais, Asymptotic recurrece ad waitig times for statioary processes, J. Theoretical Probab., vol., o. 3, pp , ] I. Kotoyiais ad Y. M. Suhov, Prefixes ad the etropy rate for log-rage sources, Probability Statistics ad Optimizatio, F. P. Kelly, Ed. New York, NY, USA: Wiley, ] V. Yu. Korolev ad I. G. Shevtsova, O the upper boud for the absolute costat i the Berry Essée iequality, Theory Probab. Appl., vol. 54, o. 4, pp , ] S. K. Leug-Ya-Cheog ad T. Cover, Some equivaleces betwee Shao etropy ad Kolmogorov complexity, IEEE Tras. If. Theory, vol. 24, o. 3, pp , May ] B. Ma, Berry Essée cetral limit theorems for Markov chais, Ph.D. dissertatio, Dept. Math., Harvard Uiv., Cambridge, MA, USA, ] B. McMilla, The basic theorems of iformatio theory, A. Math. Statist., vol. 24, pp , Ju ] B. McMilla, Two iequalities implied by uique decipherability, IRE Tras. If. Theory, vol. 2, o. 4, pp. 5 6, Dec ] N. Merhav, Uiversal codig with miimum probability of codeword legth overflow, IEEE Tras. If. Theory, vol. 37, o. 3, pp , May ] S. P. Mey ad R. L. Tweedie, Markov Chais ad Stochastic Stability, 2d ed. Cambridge, U.K.: Cambridge Uiv. Press, ] S. V. Nagaev, More exact limit theorems for homogeeous Markov chais, Theory Probab. Appl., vol. 6, o., pp. 62 8, ] V. V. Petrov, Limit Theorems of Probability Theory: Sequeces of Idepedet Radom Variables. Oxford, U.K.: Oxford Sciece, ] W. Philipp ad W. Stout, Almost Sure Ivariace Priciples for Partial Sums of Weakly Depedet Radom Variables. Providece, RI, USA: AMS, ] Y. Polyaskiy, H. V. Poor, ad S. Verdú, Chael codig rate i the fiite blocklegth regime, IEEE Tras. If. Theory, vol. 56, o. 5, pp , May ] J. Rissae, Tight lower bouds for optimum code legth, IEEE Tras. If. Theory, vol. 28, o. 2, pp , Mar ] C. E. Shao, A mathematical theory of commuicatio, Bell Syst. Tech. J., vol. 27, o. 4, pp , ] P. C. Shields, The Ergodic Theory of Discrete Sample Paths Graduate Studies i Mathematics. Providece, RI, USA: AMS, ] G. Somasudaram ad A. Shrivastava, Iformatio Storage ad Maagemet: Storig, Maagig, ad Protectig Digital Iformatio i Classic, Virtualized, ad Cloud Eviromets. New York, NY, USA: Wiley, ] V. Strasse, Asymptotische abschäzuge i Shaos iformatiostheorie, i Proc. Tras. Third Prague Cof. If. Theory, Statist., Decisio Fuct., Radom Process., 964, pp ] W. Szpakowski ad S. Verdú, Miimum expected legth of fixed-tovariable lossless compressio without prefix costraits, IEEE Tras. If. Theory, vol. 57, o. 7, pp , Jul ] S. Verdú, Multiuser Detectio. Cambridge, U.K.: Cambridge Uiv. Press, ] S. Verdú, teachig it, preseted at the IEEE It. Symp. If. Theory, Nice, Frace, Ju ] S. Verdú, Teachig lossless data compressio, IEEE If. Theory Soc. Newsletter, vol. 6, o., pp. 8 9, Apr ] E. I. Verriest, A achievable boud for optimal oiseless codig of a radom variable, IEEE Tras. If. Theory, vol. 32, o. 4, pp , Jul ] A. D. Wyer, A upper boud o the etropy series, If. Cotrol, vol. 20, o. 2, pp. 76 8, ] A. A. Yushkevich, O limit theorems coected with the cocept of etropy of Markov chais, Uspekhi Mat. Nauk, vol. 8, o. 557, pp , 953. Ioais Kotoyiais was bor i Athes, Greece, i 972. He received the B.Sc. degree i mathematics i 992 from Imperial College Uiversity of Lodo, U.K., ad i 993 he obtaied a distictio i Part III of the Cambridge Uiversity Pure Mathematics Tripos. He received the M.S. degree i statistics ad the Ph.D. degree i electrical egieerig, both from Staford Uiversity, Staford, CA, i 997 ad 998, respectively. From Jue 998 to August 200, he was a Assistat Professor with the Departmet of Statistics, Purdue Uiversity, West Lafayette, IN ad also, by courtesy, with the Departmet of Mathematics, ad the School of Electrical ad Computer Egieerig. From August 2000 util July 2005, he was a Assistat, the Associate Professor, with the Divisio of Applied Mathematics ad with the Departmet of Computer Sciece, Brow Uiversity, Providece, RI. Sice March 2005, he has bee with the Departmet of Iformatics, Athes Uiversity of Ecoomics ad Busiess, where he is curretly a Professor. Dr. Kotoyiais was awarded the Maig edowed Assistat Professorship i 2002, ad was awarded a hoorary Master of Arts degree Ad Eudem, i 2005, both by Brow Uiversity. I 2004, he was awarded a Sloa Foudatio Research Fellowship. He has served two terms as a Associate Editor for the IEEE TRANSACTIONS ON INFORMATION THEORY. His research iterests iclude data compressio, applied probability, iformatio theory, statistics, simulatio, ad mathematical biology. Sergio Verdú S 80 M 84 SM 88 F 93 is o the faculty of the School of Egieerig ad Applied Sciece at Priceto Uiversity. A member of the Natioal Academy of Egieerig, Verdú is the recipiet of the 2007 Claude E. Shao Award ad of the 2008 IEEE Richard W. Hammig Medal.