Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Transcription

1 Outline CS/EE 559 / ENG 4 Special Topics (Class Ids: 784, 785, 783) Lecture ReCap Hoffman Coding Golomb Coding and JPEG Lossless Coding Lec 3 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 6 Spring p. Z. Li Multimedia Communciation, 6 Spring p. Entropy Self Info of an event = = log Pr = = log ( ) Entropy of a source = log ( ) Conditional Entropy, Mutual Information =, ( ), = +, Relative Entropy Total area: H(X, Y) Main application: Contet Modeling a b c b c a b c b a b c b a Contet Reduces Entropy Eample Condition reduces entropy: H( 5 ) > H( 5 4, 3,, ) H( 5 ) > H( 5 f( 4, 3,, )) Contet: ( = log H(X Y) I(X; Y) H(Y X) f( 4, 3,, )== H(X) H(Y) getentropy.m, lossless_coding.m f( 4, 3,, )< Contet function: f( 4, 3,, )= sum( 4, 3,, ) Z. Li Multimedia Communciation, 6 Spring p.3 Z. Li Multimedia Communciation, 6 Spring p.4

2 Lossless Coding Outline Prefi Coding Codes on leaves No code is prefi of other codes Simple encoding/decoding Root node Internal node Lecture ReCap Hoffman Coding Golomb Coding and JPEG Lossless leaf node Kraft- McMillan Inequality: For a coding scheme with code length: l, l, l n, Given a set of integer length {l, l, l n that satisfy above inequality, we can always find a prefi code with code length l, l, l n Z. Li Multimedia Communciation, 6 Spring p.5 Z. Li Multimedia Communciation, 6 Spring p.6 Huffman Coding A procedure to construct optimal prefi code Result of David Huffman s term paper in 95 when he was a PhD student at MIT Shannon Fano Huffman (95-999) Huffman Code Design Requirement: The source probability distribution. (But not available in many cases) Procedure:. Sort the probability of all source symbols in a descending order.. Merge the last two into a new symbol, add their probabilities. 3. Repeat Step, until only one symbol (the root) is left. 4. Code assignment: Traverse the tree from the root to each leaf node, assign to the top branch and to the bottom branch. Z. Li Multimedia Communciation, 6 Spring p.7 Z. Li Multimedia Communciation, 6 Spring p.8

3 Eample Huffman code is prefi-free Source alphabet A = {a, a, a 3, a 4, a 5 Probability distribution: {.,,.,.,. Sort merge a () a(.) a3(.) a4(.). a5(.) Sort. merge.. Sort merge.6. Assign code Sort.6 merge (a 3 ) (a 4 ) (a 5 ) (a ) All codewords are leaf nodes No code is a prefi of any other code. (Prefi free) (a ) Z. Li Multimedia Communciation, 6 Spring p.9 Z. Li Multimedia Communciation, 6 Spring p. Average Codeword Length vs Entropy Huffman Code is not unique Source alphabet A = {a, b, c, d, e Probability distribution: {.,,.,.,. Code: {,,,, Entropy: H(S) = - (.*log(.)* + *log()+.*log(.)*) =. bits / symbol Two choices for each split:, or, Multiple ordering choices for tied probabilities Average Huffman codeword length: L =.*+*+.*3+.*4+.*4 =. bits / symbol This verifies H(S) L < H(S) +. a b c..6.6 b a c..6.6 Z. Li Multimedia Communciation, 6 Spring p. Z. Li Multimedia Communciation, 6 Spring p.

4 Huffman Coding is Optimal Assume the probabilities are ordered: p p p m. Lemma: For any distribution, there eists an optimal prefi code that satisfies: If p j p k, then l j l k : otherwise can swap codewords to reduce the average length. The two least probable letters have the same length: otherwise we can truncate the longer one without violating prefi-free condition. Canonical Huffman Code Huffman algorithm is needed only to compute the optimal codeword lengths The optimal codewords for a given data set are not unique Canonical Huffman code is well structured Given the codeword lengths, can find a canonical Huffman code Also known as slice code, alphabetic code. The two longest codewords differ only in the last bit and correspond to the two least probable symbols. Otherwise we can rearrange to achieve this. Proof skipped. Z. Li Multimedia Communciation, 6 Spring p.3 Z. Li Multimedia Communciation, 6 Spring p.4 Canonical Huffman Code Rules: Assign to left branch and to right branch Build the tree from left to right in increasing order of depth Each leaf is placed at the first available position Eample: Codeword lengths:,, 3, 3, 3, 4, 4 Verify that it satisfies Kraft-McMillan inequality A non-canonical eample The Canonical Tree N i l i Z. Li Multimedia Communciation, 6 Spring p.5 Canonical Huffman Properties: The first code is a series of Codes of same length are consecutive:,, If we pad zeros to the right side such that all codewords have the same length, shorter codes would have lower value than longer codes: < < < < < < If from length n to n + directly: e.g.,, 3, 3, 3, 4, 4 C(n+, ) = 4( C(n, last) + ) Coding from length level n to level n+: C(n+, ) = ( C(n, last) + ): append a to the net available level-n code First code of length n+ Last code of length n Z. Li Multimedia Communciation, 6 Spring p.6

5 Advantages of Canonical Huffman. Reducing memory requirement Non-canonical tree needs: All codewords Lengths of all codewords Need a lot of space for large table Canonical tree only needs: Min: shortest codeword length Ma: longest codeword length Distribution: Number of codewords in each level Min=, Ma=4, # in each level:, 3, Lecture ReCap Hoffman Coding Golomb Coding Outline Z. Li Multimedia Communciation, 6 Spring p.7 Z. Li Multimedia Communciation, 6 Spring p.8 Unary Code (Comma Code) Implementation Very Efficient Encode a nonnegative integer n by n s and a (or n s and an ). No need to store codeword table, very simple Is this code prefi-free? n Codeword When is this code optimal? When probabilities are: /, /4, /8, /6, /3 D-adic Huffman code becomes unary code in this case. Encoding: Decoding: UnaryEncode(n) { while (n > ) { WriteBit(); n--; WriteBit(); UnaryDecode() { n = ; while (ReadBit() == ) { n++; return n; Z. Li Multimedia Communciation, 6 Spring p.9 Z. Li Multimedia Communciation, 6 Spring p.

6 Golomb Code [Golomb, 966] A multi-resolutional approach: Divide all numbers into groups of equal size m o Denote as Golomb(m) or Golomb-m Groups with smaller symbol values have shorter codes Symbols in the same group has codewords of similar lengths o The codeword length grows much slower than in unary code m m m m ma Codeword : (Unary, fied-length) Group ID: Unary code Inde within each group: Z. Li Multimedia Communciation, 6 Spring p. q: Quotient, used unary code q Codeword Golomb Code n n qm r m r m r: remainder, fied-length code K bits if m = ^k m=8:,,, If m ^k: (not desired) log m bits for smaller r bits for larger r log m m = 5:,,,, Z. Li Multimedia Communciation, 6 Spring p. Golomb Code with m = 5 (Golomb-5) n q r code n q r code n q r code Golomb vs Canonical Huffman Codewords:,,,,,,,,, Canonical form: From left to right From short to long Take first valid spot Golomb code is a canonical Huffman With more properties Z. Li Multimedia Communciation, 6 Spring p.3 Z. Li Multimedia Communciation, 6 Spring p.4

7 A special Golomb code with m= ^k The remainder r is the fied k LSB bits of n m = 8 Golobm-Rice Code n q r code n q r code Encoding: Implementation GolombEncode(n, RBits) { q = n >> RBits; UnaryCode(q); WriteBits(n, RBits); Output the lower (RBits) bits of n. Decoding: Remainder bits: RBits = 3 for m = 8 n q r code GolombDecode(RBits) { q = UnaryDecode(); n = (q << RBits) + ReadBits(RBits); return n; Z. Li Multimedia Communciation, 6 Spring p.5 Z. Li Multimedia Communciation, 6 Spring p.6 Eponential Golomb Code (Ep-Golomb) Golomb code divides the alphabet into groups of equal size m m m m ma In Ep-Golomb code, the group size increases eponentially Codes still contain two parts: Unary code followed by fied-length code ma Proposed by Teuhola in 978 n code Decoding Implementation EpGolombDecode() { GroupID = UnaryDecode(); if (GroupID == ) { return ; else { Base = ( << GroupID) - ; Inde = ReadBits(GroupID); return (Base + Inde; n code Group ID Z. Li Multimedia Communciation, 6 Spring p.7 Z. Li Multimedia Communciation, 6 Spring p.8

8 Golomb Code Family: Unary Code Golomb Code Golomb-Rice Code Eponential Golomb Code Why Golomb code? Outline Geometric Distribution (GD) Geometric distribution with parameter ρ: P( = ρ ( - ρ),, integer. Prob of the number of failures before the first success in a series of independent Yes/No eperiments (Bernoulli trials). Unary code is the optimal prefi code for geometric distribution with ρ /: ρ = /4: P(:.75,.9,.5,.,.3, Huffman coding never needs to re-order equivalent to unary code. Unary code is the optimal prefi code, but not efficient ( avg length >> entropy) ρ = 3/4: P(:.5,.9,,.,.8, Reordering is needed for Huffman code, unary code not optimal prefi code. ρ = /: Epected length = entropy. Unary code is not only the optimal prefi code, but also optimal among all entropy coding (including arithmetic coding). Z. Li Multimedia Communciation, 6 Spring p.9 Z. Li Multimedia Communciation, 6 Spring p.3 Geometric Distribution (GD) Geometric Distribution Geometric distribution is very useful for image/video compression Eample : run-length coding Binary sequence with i.i.d. distribution P() = ρ : Eample: Entropy <<, prefi code has poor performance. Run-length coding is efficient to compress the data: or: Number of consecutive s between two s o run-length representation of the sequence: 5, 8, 4,, 6 Probability distribution of the run-length r: o P(r = n) = ρ n (- ρ): n s followed by an. o The run has one-sided geometric distribution with parameter ρ. P(r) r GD is the discrete analogy of the Eponential distribution f ( e Two-sided geometric distribution is the discrete analogy of the Laplacian distribution (also called double eponential distribution) f ( e f( f( Z. Li Multimedia Communciation, 6 Spring p.3 Z. Li Multimedia Communciation, 6 Spring p.3

9 Why Golomb Code? Significance of Golomb code: For any geometric distribution (GD), Golomb code is optimal prefi code and is as close to the entropy as possible (among all prefi codes) How to determine the Golomb parameter? How to apply it into practical codec? Geometric Distribution Eample : GD is a also good model for Prediction error e(n) = (n) pred((),, (n-)). n p( n), Most e(n) s have smaller values around : can be modeled by geometric distribution. p(n) n Z. Li Multimedia Communciation, 6 Spring p.33 Z. Li Multimedia Communciation, 6 Spring p.34 Optimal Code for Geometric Distribution Geometric distribution with parameter ρ: P(X=n) = ρ n ( - ρ) Unary code is optimal prefi code when ρ /. Also optimal among all entropy coding for ρ = /. How to design the optimal code when ρ > /? Transform into GD with ρ / (as close as possible) How? By grouping m events together! Each can be written as m q r m m m qmr qm mq m PX ( q) P ( ) ( ) ( ) ( ) q X qm r r r q has geometric dist with parameter ρ m. Unary code is optimal for q if ρ m / m m is the minimal possible integer. log log Z. Li Multimedia Communciation, 6 Spring p.35 P( P( Golomb Parameter Estimation (JK book: pp. 55) Goal of adaptive Golomb code: For the given data, find the best m such that ρ m /. How to find ρ from the statistics of past data? P( ( ) E ( ) ( ) ( ) ( E(. E( m E( E( Let m= k / log. m E( log ( ) k E E( m / log E( Too costly to compute Z. Li Multimedia Communciation, 6 Spring p.36

10 Golomb Parameter Estimation (JK book: pp. 55) E ( A faster method: Assume ρ, ρ. m m m m( ) m E( ρ m / m k E ( Summary Hoffman Coding A prefi code that is optimal in code length (average) Canonical form to reduce variation of the code length Widely used Golomb Coding Suitable for coding prediction errors in image Optimal for Geometrical Distribution of p=.5 Simple to encode and decode Many practical applications, e.g., JPEG- lossless. k ma, log E (. Z. Li Multimedia Communciation, 6 Spring p.37 Z. Li Multimedia Communciation, 6 Spring p.38 Q&A Z. Li Multimedia Communciation, 6 Spring p.39