Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Similar documents
Recurrence. 1 Definitions and main statements

The OC Curve of Attribute Acceptance Plans

Extending Probabilistic Dynamic Epistemic Logic

An Alternative Way to Measure Private Equity Performance

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

DEFINING %COMPLETE IN MICROSOFT PROJECT

Adaptive Fractal Image Coding in the Frequency Domain

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Simple Interest Loans (Section 5.1) :

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Finite Math Chapter 10: Study Guide and Solution to Problems

Calculation of Sampling Weights

Section 5.4 Annuities, Present Value, and Amortization

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Traffic-light a stress test for life insurance provisions

Small pots lump sum payment instruction

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

1 Example 1: Axis-aligned rectangles

7.5. Present Value of an Annuity. Investigate

This circuit than can be reduced to a planar circuit

A Probabilistic Theory of Coherence

Using Series to Analyze Financial Situations: Present Value

What is Candidate Sampling

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

The Greedy Method. Introduction. 0/1 Knapsack Problem

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

Time Value of Money Module

8 Algorithm for Binary Searching in Trees

An Interest-Oriented Network Evolution Mechanism for Online Communities

Sample Design in TIMSS and PIRLS

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Section 5.3 Annuities, Future Value, and Sinking Funds

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Single and multiple stage classifiers implementing logistic discrimination

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Financial Mathemetics

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Lecture 2: Single Layer Perceptrons Kevin Swingler

1. Measuring association using correlation and regression

QUESTIONS, How can quantum computers do the amazing things that they are able to do, such. cryptography quantum computers

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Statistical Methods to Develop Rating Models

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Demographic and Health Surveys Methodology

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Brigid Mullany, Ph.D University of North Carolina, Charlotte

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.

J. Parallel Distrib. Comput.

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

FINANCIAL MATHEMATICS. A Practical Guide for Actuaries. and other Business Professionals

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Fault tolerance in cloud technologies presented as a service

1. Math 210 Finite Mathematics

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

An MILP model for planning of batch plants operating in a campaign-mode

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

On some special nonlevel annuities and yield rates for annuities

L10: Linear discriminants analysis

How To Solve A Problem In A Powerline (Powerline) With A Powerbook (Powerbook)

Project Networks With Mixed-Time Constraints

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Testing The Torah Code Hypothesis: The Experimental Protocol

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Joe Pimbley, unpublished, Yield Curve Calculations

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

BERNSTEIN POLYNOMIALS

Rapid Estimation Method for Data Capacity and Spectrum Efficiency in Cellular Networks

How Much to Bet on Video Poker

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid

Enterprise Master Patient Index

CHAPTER 14 MORE ABOUT REGRESSION

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Effective wavelet-based compression method with adaptive quantization threshold and zerotree coding

Chapter 15: Debt and Taxes

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Thursday, December 10, 2009 Noon - 1:50 pm Faraday 143

A DATA MINING APPLICATION IN A STUDENT DATABASE

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Availability-Based Path Selection and Network Vulnerability Assessment

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Software project management with GAs

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Transcription:

Module LOSSLESS IMAGE COMPRESSION SYSTEMS

Lesson 3 Lossless Compresson: Huffman Codng

Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy.. State Shannon s Codng theorem for noseless channels. 3. Measure the codng effcency of an encodng scheme. 4. State the basc prncples of Huffman codng. 5. Assgn Huffman codes to a set of symbols of known probabltes. 6. Encode a strng of symbols nto a Huffman coded bt stream. 7. Decode a Huffman coded bt stream. 3.0 Introducton In lesson-, we have learnt the basc dfference between lossless and lossy mage compresson schemes. There are some applcatons, such as satellte mage analyss, medcal and busness document archval, medcal mages for dagnoss etc, where loss may not be tolerable and lossless compresson technques are to be used. Some of the popular lossless mage compresson technques beng used are (a) Huffman codng, (b) Arthmetc codng, (c) Zv- Lempel codng, (d) Bt-plane codng, (e) Run-length codng etc. In ths lesson, we frst begn wth the nformaton-theoretc bass of lossless codng, see how to measure the nformaton content of a set of source symbols based on the probabltes of the symbols. We shall thereafter present Shannon s thorem whch provdes the theoretcal lower bound of bt-rate achevable through lossless compresson technques. The rest of the lesson s devoted to the detaled treatment of Huffman codng, whch s one of the most popular lossless compresson technques adopted n multmeda standards. 3. Source entropy- a measure of nformaton content Generaton of nformaton s generally modeled as a random process that has probablty assocated wth t. If P(E) s the probablty of an event, ts nformaton content I(E), also known as self nformaton s measured as I ( E) = log..(3.) P ( E) If P(E)=, that s, the event always occurs (lke sayng The sun rses n the east ), then we obtan from above that I(E)=0, whch means that there s no nformaton assocated wth t. The base of the logarthm expresses the unt of nformaton and f the base s, the unt s bts. For other values m of the base,

the nformaton s expressed as m-ary unts. Unless otherwse mentoned, we shall be usng the base- system to measure nformaton content. a =,,..., n havng Now, suppose that we have an alphabet of n symbols { } probabltes of occurrences P ( a ) P( a ),..., P( ), a n. If k s the number of source outputs generated, whch s consdered to be suffcently large, then the average number of occurrences of symbol a s kp ( a ) and the average self-nformaton obtaned from k outputs s gven by k n = P ( a ) log P( a ) and the average nformaton per source output for the source z s gven by H n ( z ) = P( a ) log P( a ) = (3.) The above quantty s defned as the entropy of the source and measures the uncertanty of the source. The relatonshp between uncertanty and entropy can be llustrated by a smple example of two symbols a and a, havng probabltes and P respectvely. Snce, the summaton of probabltes s equal to, P ( a ) ( a ) P ( a ) P( a = ) and usng equaton (3.), we obtan H ( z ) = P( a ) log P( a ) ( P( a )) log( P( a )) (3.3) If we plot H(z) versus P( a ), we obtan the graph shown n Fg.3..

It s nterestng to note that the entropy s equal to zero for P(a )=0 and P(a )=. These correspond to the cases where at least one of the two symbols s certan to be present. H(z) assumes maxmum value of -bt for P(a )=/. Ths corresponds to the most uncertan case, where both the symbols are equally probable. 3.. Example: Measurement of source entropy If the probabltes of the source symbols are known, the source entropy can be measured usng equaton (3.). Say, we have fve symbols a, a,..., a5 havng the followng probabltes: P ( a ) = 0., P( a ) = 0., P( a3 ) = 0.05, P( a4 ) = 0.6, P( a5 ) = 0. 05 Usng equaton (3.), the source entropy s gven by H ( z) = 0.log 0. 0.log 0. 0.05log 0.05 0.6log0.6 0.05log 0. 05 bts =.67 bts 3. Shannon s Codng Theorem for noseless channels We are now gong to present a very mportant theorem by Shannon, whch expresses the lower lmt of the average code word length of a source n terms of ts entropy. Stated formally, the theorem states that n any codng scheme, the average code word length of a source of symbols can at best be equal to the source entropy and can never be less than t. The above theorem assumes the codng to be lossless and the channel to be noseless.

If m(z) s the mnmum of the average code word length obtaned out of dfferent unquely decpherable codng schemes, then as per Shannon s theorem, we can state that ( z) H ( z) m.. (3.4) 3.3 Codng effcency The codng effcency (η) of an encodng scheme s expressed as the rato of the source entropy H(z) to the average code word length L(z) and s gven by η = H L ( z) ( z)..(3.5) ( ) ( ) Snce L z H z H(z) are postve, accordng to Shannon s Codng theorem and both L(z) and 0 η...(3.6) 3.4 Basc prncples of Huffman Codng Huffman codng s a popular lossless Varable Length Codng (VLC) (Secton-.4.3) scheme, based on the followng prncples: (a) Shorter code words are assgned to more probable symbols and longer code words are assgned to less probable symbols. (b) No code word of a symbol s a prefx of another code word. Ths makes Huffman codng unquely decodable. (c) Every source symbol must have a unque code word assgned to t. In mage compresson systems (Secton-.4), Huffman codng s performed on the quantzed symbols. Qute often, Huffman codng s used n conjuncton wth other lossless codng schemes, such as run-length codng, to be dscussed n lesson-4. In terms of Shannon s noseless codng theorem, Huffman codng s optmal for a fxed alphabet sze, subject to the constrant that that the source symbols are coded one at a tme.

3.5 Assgnng Bnary Huffman codes to a set of symbols We shall now dscuss how Huffman codes are assgned to a set of source symbols of known probablty. If the probabltes are not known a pror, t should be estmated from a suffcently large set of samples. The code assgnment s based on a seres of source reductons and we shall llustrate ths wth reference to the example shown n Secton-3... The steps are as follows: Step-: Arrange the symbols n the decreasng order of ther probabltes. Symbol Probablty a 4 0.6 a 0. a 0. a 3 0.05 a 5 0.05 Step-: Combne the lowest probablty symbols nto a sngle compound symbol that replaces them n the next source reducton. Symbol Probablty a 4 P(a 4 )=0.6 a P(a )=0. a P(a )= 0. a 3 V a 5 P(a 3 )+P(a 5 )=0. In ths example, a 3 and a 5 are combned nto a compound symbol of probablty 0.. Step-3: Contnue the source reductons of Step-, untl we are left wth only two symbols. Symbol Probablty a 4 P(a 4 )=0.6 a P(a )=0. a V ( a 3 V a 5 ) P(a )+P(a 3 )+P(a 5 )=0. Symbol Probablty a 4 P(a 4 )=0.6 a V ( a V ( a 3 V a 5 )) P(a )+P(a )+P(a 3 )+P(a 5 )=0.4 The second symbol n ths table ndcates a compound symbol of probablty 0.4. We are now n a poston to assgn codes to the symbols.

Step-4: Assgn codes 0 and to the last two symbols. Symbol Probablty Assgned Code a 4 0.6 0 a 0. 0 a 0. 0 a 3 V a 5 0. In ths case, 0 s assgned to the symbol a 4 and s assgned to the compound symbol a V ( a V ( a 3 Va 5 )). All the elements wthn ths compound symbol wll therefore have a prefx. Step-5: Work backwards along the table to assgn the codes to the elements of the compound symbols. Contnue tll codes are assgned to all the elementary symbols. Symbol Probablty Assgned Code a 4 0.6 0 a V ( a V( a 3 Va 5 )) 0.4 s therefore gong to be the prefx of a, a 3 and a 5, snce ths s the code assgned to the compound symbol of these three. Ths completes the Huffman code assgnment pertanng to ths example. From ths table, t s evdent that shortest code word (length=) s assgned to the most probable symbol a 4 and the longest code words (length=4) are assgned to the two least probable symbols a 3 and a 5. Also, each symbol has a unque code and no code word s a prefx of code word for another symbol. The codng has therefore fulflled the basc requrements of Huffman codng, stated n Secton- 3.4. Symbol Probablty Assgned Code a 4 0.6 0 a 0. 0 a V( a 3 Va 5 ) 0. For ths example, we can compute the average code word length. If L(a ) s the codeword length of symbol a, then the average codeword length s gven by Symbol Probablty Assgned Code a 4 0.6 0 a 0. 0 a 0. 0 a 3 0.05 0 a 5 0.05

n ( ) L( a ) P( a ) L z = = = 0.6 + 0. + 0. 3 + 0.05 4 =.7 bts The codng effcency s gven by ( z) ( ) = = H η L z 0.98 3.6 Encodng a strng of symbols usng Huffman codes After obtanng the Huffman codes for each symbol, t s easy to construct the encoded bt stream for a strng of symbols. For example, f we have to encode a strng of symbols a 4a3a5a4aa4a, we shall start from the left, takng one symbol at a tme. The code correspondng to the frst symbol a 4 s 0, the second symbol a 3 has a code 0 and so on. Proceedng as above, we obtan the encoded bt stream as 000000. In ths example, 6 bts were used to encode the strng of 7 symbols. A straght bnary encodng of 7 symbols, chosen from an alphabet of 5 symbols would have requred bts (3 bts/symbol) and ths encodng scheme therefore demonstrates substantal compresson. 3.7 Decodng a Huffman coded bt stream Snce no codeword s a prefx of another codeword, Huffman codes are unquely decodable. The decodng process s straghtforward and can be summarzed below: Step-: Examne the leftmost bt n the bt stream. If ths corresponds to the codeword of an elementary symbol, add that symbol to the lst of decoded symbols, remove the examned bt from the bt stream and go back to step- untl all the bts n the bt stream are consdered. Else, follow step-. Step-: Append the next bt from the left to the already examned bt(s) and examne f the group of bts correspond to the codeword of an elementary symbol. If yes, add that symbol to the lst of decoded symbols, remove the examned bts from the bt stream and go back to step- untl all the bts n the bt stream are consdered. Else, repeat step- by appendng more bts.

In the encoded bt stream example of Secton-3.6, f we receve the bt stream 000000 and follow the steps descrbed above, we shall frst decode a 4 ( 0 ), then a 3 ( 0 ), followed by a 5 ( ), a 4 ( 0 ),a ( 0 ), a 4 ( 0 ) and a ( 0 ). Ths s exactly what we had encoded. 3.8 Dscussons and Further Readng In ths lesson, we have dscussed Huffman Codng, whch s one of the very popular lossless Varable Length Codng (VLC) schemes, named after the scentst who proposed t. The detals of the codng scheme can be read from hs orgnal paper, lsted n []. For a better understandng of the codng theory n the lght of Shannon s theorem, the reader s referred to Shannon s orgnal paper, lsted n []. We have dscussed how Huffman codes can be constructed from the probabltes of the symbols. The symbol probabltes can be obtaned by makng a relatve frequency of occurrences of the symbols and these essentally make a frst order estmate of the source entropy. Better source entropy estmates can be obtaned f we examne the relatve frequency of occurrences of a group of symbols, say by consderng two consecutve symbols at a tme. Wth reference to mages, we can form pars of gray level values, consderng two consecutve pxels at a tme and thus form a second order estmate of the source entropy. In ths case, Huffman codes wll be assgned to the par of symbols, nstead of ndvdual symbols. Although thrd, fourth and even hgher order estmates would make better approxmatons of the source entropy, the convergence wll be slow and excessve computatons are nvolved. We have also seen that Huffman codng assgnment s based on successve source reductons. For an n-symbol source, (n-) source reductons must be performed. When n s large, lke n the case of gray level values of the mages for whch n=56, we requre 54 steps of reducton, whch s excessvely hgh. In such cases, Huffman codng s done only for few symbols of hgher probablty and for the remanng, a sutable prefx code, followed by a fxed length code s adopted. Ths scheme s referred to as truncated Huffman codng. It s somewhat less optmal as compared to Huffman Codng, but code assgnment s much easer. There are other varants of Huffman Codng. In one of the varants, the source symbols, arranged n order of decreasng probabltes are dvded nto a few blocks. Specal shft up and/or shft down symbols are used to dentfy each block and symbols wthn the block are assgned Huffman codes. Ths encodng scheme s referred to as Shft Huffman Codng. The shft symbol s the most probable symbol and s assgned the shortest code word. Interested readers may refer to [3] for further dscussons on Huffman Codng varants.

Questons NOTE: The students are advsed to thoroughly read ths lesson frst and then answer the followng questons. Only after attemptng all the questons, they should clck to the soluton button and verfy ther answers. PART-A A.. Defne the entropy of a source of symbols. A.. How s entropy related to uncertanty? A.3. State Shannon s codng theorem on noseless channels. A.4. Defne the codng effcency of an encodng scheme. A.5. State the basc prncples of Huffman codng. PART-B: Multple Choce In the followng questons, clck the best out of the four choces. B. The entropy of a source of symbols s dependent upon (A) The number of source outputs generated. (B) The average codeword length. (C) The probabltes of the source symbols. (D) The order n whch the source outputs are generated. B. We have two sources of symbols to compare ther entropes. Source- has three symbols a,a and a 3 wth probabltes P ( a ) = 0.9, P( a ) = P( a3 ) = 0. 05. Source- also has three symbols a,a and a 3, but wth probabltes P a =.4, P a = P a 0. ( ) ( ) ( ) 3. 0 3 = (A) Entropy of source- s hgher than that of source-. (B) Entropy of source- s lower than that of source-. (C) Entropy of source- and source- are the same. (D) It s not possble to compute the entropes from the gven data.

B.3 Shannon s codng theorem on noseless channels provdes us wth (A) A lower bound on the average codeword length. (B) An upper bound on the average codeword length (C) A lower bound on the source entropy. (D) An upper bound on the source entropy. B.4 Whch one of the followng s not true for Huffman codng? (A) No codeword of an elementary symbol s a prefx of another elementary symbol. (B) Each symbol has a one-to-one mappng wth ts correspondng codeword. (C) The symbols are encoded as a group, rather than encodng one symbol at a tme. (D) Shorter code words are assgned to more probable symbols. B.5 A source of 4 symbols a, a, a a havng probabltes ( a ) =.5, P( a ) = 0.5, P( a ) = P( a ) 0. 5 0 3 4 = 3, P are encoded by four dfferent encodng schemes and the correspondng codes are shown below. Whch of the followng gves us the best codng effcency? (A) a = 00, a = 0, a3 = 0, a4 = (B) a =, a = 0, a = 0, a 0 3 4 = (C) a = 00, a = 00, a3 = 00, a4 = 0 (D) a =, a = 0, a3 = 0, a4 = 0 4 B.6 Whch of the followng must be ensured before assgnng bnary Huffman codes to a set of symbols? (A) The channel s noseless. (B) There must be exactly n symbols to encode. (C) No two symbols should have the same probablty. (D) The probabltes of the symbols should be known a pror.

B.7 Refer to the Huffman code words assgned to fve symbols a, a,..., a5 n the example shown n Secton-3.5.The bt stream assgned to the sequence of symbols a a a a a a a a a a s 4 5 4 4 4 4 a4 (A) 0000000000 (B) 000000000 (C) 000000000 (D) 000000000 B.8 A 4-symbol alphabet has the followng probabltes P a =., P a = 0.5, P a = 0.5, P a 0. and followng codes are assgned ( ) ( ) ( ) ( ) 5 0 3 4 = a = 0, a = 0, a3 = 0, a4 = to the symbols. The average code word length for ths source s (A).5 (B).5 (C).75 (D).0 B.9 Decode a Huffman encoded bt stream 00000000 whch follows the codes assgnment of the above problem. The sequence of symbols s (A) (C) a3aa3aa3aaa a (B) a3aaa4a3aaa3a a3aaa3aaa a (D) a3aaa4a3aaa a PART-C: Problems C-. A long sequence of symbols generated from a source s seen to have the followng occurrences Symbol Occurrences a 3003 a 996 a 3 07 a 4 487 a 5 497 (a) Assgn Huffman codes to the above symbols, followng a conventon that the group/symbol wth hgher probablty s assgned a 0 and that wth lower probablty s assgned a. (b) Calculate the entropy of the source.

(c) Calculate the average code word length obtaned from Huffman codng. (d) Calculate the codng effcency. (e) Why s the codng effcency less than? SOLUTIONS A. The entropy of a source of symbols s defned as the average nformaton per a =,,..., n havng source output. If we have an alphabet z of n symbols { } probabltes of occurrences P ( a ) P( a ),..., P( ) s gven by H n ( z ) = P( a ) log P( a ) =, a n, the entropy of the source H(z) The unt of entropy s dependent upon the base of the logarthm. For a base of, the unt s bts. In general, for a base m, the unt of entropy s m-ary unts. A. Entropy of a source s related to the uncertanty of the symbols assocated wth t. Greater the uncertanty, hgher s the entropy. We can llustrate ths by the two-symbol example dscussed n Secton-3. A.3 Shannon s codng theorem on lossless channels states that n any codng scheme, the average code word length of a source of symbols can at best be equal to the source entropy and can never exceed t. If m(z) s the mnmum of the average code word length obtaned out of dfferent unquely decpherable codng schemes, then Shannon s theorem states that ( z) H ( z) m A.4 Refer to Secton-3.3 A.5 Refer to Secton-3.4 B. (C) B. (B) B.3 (A) B.4 (C) B.5 (B) B.6 (D) B.7 (A) B.8 (C) B.9 (D).C. (a) Snce the symbols are observed for a suffcently long sequence, the probabltes can be estmated from ther relatve frequences of occurrence. P a.3, P a 0., P a 0., P a 0.5, P a 0. ( ) ( ) ( ) ( ) ( ) 5 0 3 4 5 Based on these probabltes, the source reductons can be done as follows:

Symbo Prob Reducton- Reducton- Reducton -3 l. Symbol Pro b Symbol Pro b Symbol Pro b a 0.3 a 0.3 a 4 +a + 0.45 a + a 5 0.55 a 3 a 5 0.5 a 5 0.5 a 0.3 a 4 +a + 0.45 a 3 a 3 0. a 4 +a 0.5 a 5 0.5 a 4 0.5 a 3 0. a 0. We can now work backwards to assgn Huffman codes to the compound symbols and proceed to the elementary symbols Reducton-3 Reducton- Reducton- Orgnal Symbol Code Symbol Code Symbol Code Symbol Code a + a 5 0 a 00 a 00 a 00 a 4 +a + a 5 0 a 5 0 a 5 0 a 3 a 4 +a + a 3 a 4 +a 0 a 3 a 3 a 4 00 a 0 (b) The source entropy s gven by ( ) P( ) H z n = = a log a = 0.3log 0.3 0.log 0. 0.log 0. 0.5log 0.5 0.5log 0.5 =.7 bts/ symbol bts/ symbol (c) The average code word length s gven by n ( ) P( a ) L( a ) L z = = = 0.3 + 0. 3 + 0. + 0.5 3 + 0.5 =.5 bts/ symbol bts/ symbol (d) The codng effcency ( z) ( z) H.7 η = = = 0.9897 L.5

(e) The student should thnk of ths reason and check later. References. Huffman, D.A., A Method for the Constructon of Mnmum Redundancy Codes, Proc. IRE, vol.40, no.0, pp.098-0, 95.. Shannon, C.E., A Mathematcal Theory of Communcatons, The Bell Sys. Tech. J., vol. XXVII, no. 3, pp.379-43, 948. 3.