Ex. 2.1 (Davide Basilio Bartolini)



Similar documents
An Introduction to Information Theory

LECTURE 4. Last time: Lecture outline

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Chapter 4 Lecture Notes

Gambling and Data Compression

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Linear Threshold Units

Basics of information theory and information complexity

Lecture 4: AC 0 lower bounds and pseudorandomness

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

CHAPTER 2 Estimating Probabilities

The Basics of Graphical Models

4. Continuous Random Variables, the Pareto and Normal Distributions

Section 6.1 Joint Distribution Functions

Introduction to Learning & Decision Trees

Lecture 8. Confidence intervals and the central limit theorem

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

Probability Generating Functions

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Statistics 100A Homework 3 Solutions

arxiv: v1 [math.pr] 5 Dec 2011

Measuring Intrusion Detection Capability: An Information-Theoretic Approach

Notes from Week 1: Algorithms for sequential prediction

Math 55: Discrete Mathematics

Master s Theory Exam Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Influences in low-degree polynomials

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

MAS108 Probability I

Lecture 4: BK inequality 27th August and 6th September, 2007

Adaptive Online Gradient Descent

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I

ST 371 (IV): Discrete Random Variables

2WB05 Simulation Lecture 8: Generating random variables

Random variables, probability distributions, binomial random variable

2. Discrete random variables

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Compression techniques

Math 431 An Introduction to Probability. Final Exam Solutions

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Joint Exam 1/P Sample Exam 1

Solution for Homework 2

Math 461 Fall 2006 Test 2 Solutions

Lecture 1: Course overview, circuits, and formulas

Important Probability Distributions OPRE 6301

A New Interpretation of Information Rate

Microeconomic Theory: Basic Math Concepts

Pennies and Blood. Mike Bomar

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Practice with Proofs

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

Principle of Data Reduction

1 Approximating Set Cover

Economics 1011a: Intermediate Microeconomics

Exact Confidence Intervals

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 4.4 Homework

Math 55: Discrete Mathematics

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

On Directed Information and Gambling

e.g. arrival of a customer to a service station or breakdown of a component in some system.

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Regular Languages and Finite State Machines

H/wk 13, Solutions to selected problems

The Binomial Probability Distribution

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Homework # 3 Solutions

MULTIVARIATE PROBABILITY DISTRIBUTIONS

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

MIMO CHANNEL CAPACITY

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

Inverse Functions and Logarithms

Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Exact Nonparametric Tests for Comparing Means - A Personal Summary

The Mean Value Theorem

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Tail inequalities for order statistics of log-concave vectors and applications

6.4 Logarithmic Equations and Inequalities

The Math. P (x) = 5! = = 120.

Discrete Mathematics: Homework 7 solution. Due:

SMT 2014 Algebra Test Solutions February 15, 2014

Two-Stage Stochastic Linear Programs

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Propagation of Errors Basic Rules

Nonparametric adaptive age replacement with a one-cycle criterion

Introduction to Probability

Mathematical Induction. Lecture 10-11

CHAPTER 6. Shannon entropy

Transcription:

ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the Entropy H(X) in bits (b) A random variable X is drawn according to this distribution. Find an efficient sequence of yes-no questions of the form, Is X contained in the set S?. Compare H(X) to the expected number of questions required to determine X. Solution (a) The random variable X is on the domain X = {,,,...} and it denotes the number of flips needed to get the first head, i.e. + the number of consecutive tails appeared before the first head. Since the coin is said to be fair, we have p( head ) = p( tail ) = and hence (exploiting the independence of the coin flips): p(x = ) = p( head ) = p(x = ) = p( tail ) p( head ) = ( =. ntimes { }} { p(x = n) = p( tail )... p( tail ) p( head ) =... ( = ) ) n from this, it is clear that the probability mass distribution of X is: p X (x) = ( ) x

Once the distribution is known, H(X) can be computed from the definition: H(X) = p X (x) log p X (x) x X ( ) x ( ) x = log x= ( ) x ( ) x = log (since the summed expr. equals 0 for x = 0) ( ) x ( ) = x log (property of logarithms) ( ) ( ) x = log x ( ) ( ) x = x = ( ) = [bit] exploiting (k) x k x = ( k) (b) Since the most likely value for X is (p(x = ) = ), the most efficient first question is: Is X =? ; the next question will be Is X =? and so on, until a positive answer is found. If this strategy is used, the random variable Y representing the number of questions will have the same distribution as X and it will be: E [Y ] = y y=0 ( ) y = ( ) = which is exactly equal to the entropy of X. An interpretation of this fact could be that bits (which is the entropy value for X) are the amount of memory required to store the outcomes of the two binary questions which are enough (on average) to get a positive answer on the value of X. Exercise.4 (Matteo Carminati) Entropy of functions of a random variable. Let X be a discrete random variable. Show that the entropy of a function of X is less than or equal to the entropy of X by justifying the following steps: H(X, g(x)) (a) = H(X) + H(g(X) X) (b) = H(X) () H(X, g(x)) (c) = H(g(X)) + H(X g(x)) (d) H(g(X)) ()

Thus, H(g(X)) H(X). Solution (a) It comes from entropy s chain rule applied to random variables X and g(x), i.e. H(X, Y ) = H(X) + H(Y X), so H(X, g(x)) = H(X) + H(g(X) X). (b) Intuitively, if g(x) depends only on X and if the value of X is known, g(x) is completely specified and it has a deterministic value. The entropy of a deterministic value is 0, so H(g(X) X) = 0 and H(X) + H(g(X) X) = H(X). (c) Again, this formula comes from the entropy s chain rule, in the form: H(X, Y ) = H(Y ) + H(X Y ). (d) Proving that H(g(X)) + H(X g(x)) H(g(X)) means proving that H(X g(x)) 0: the non-negativity is one of the property of entropy and can be proved from its definition by noting that the logarithm of a probability (a quantity always less than or equal to ) is non-positive. In particular H(X g(x)) = 0 if the knowledge of the value of g(x) allows to totally specify the value of X; otherwise H(X g(x)) > 0 (for example if g(x) is an injective function). Ex..7(a) (Davide Basilio Bartolini) Text Coin weighing. Suppose that one has n coins, among which there may or may not be one counterfeit coin. If there is a counterfeit coin, it may be either heavier or lighter than the other coins. The coins are to be weighed by a balance. Find an upper bound on the number of coins n so that k weighings will find the counterfeit coin (if any) and correctly declare it to be heavier or lighter. Solution Let X be a string of n characters on the alphabet X = {, 0, } n, each of which represents one coin. Each of the characters of X may have three different values (say if the coin is heavier than a normal one, 0 if it is regular, if it is lighter). Since only one of the coins may be counterfeit, X may be a string of all 0 (if all the coins are regular) or may present either a or a at only one position. Thus, the possible configurations for X are n +. Under the hypothesis of a uniform distribution of the probability of which coin is counterfeit, the entropy of X will be: H(X) = log (n + )

Now let Z = [Z, Z,..., Z k ] be a random variable representing the weighings; each of the Z i will have three possible values to indicate whether the result of the weighing is balanced, left arm heavier or right arm heavier. The entropy of each Z i will be upper-bounded by the three possible values it can assume: H(Z i ) log, i [, k] and for Z (under the hypothesis of independence of the weighings): H(Z) = H(Z, Z,..., Z k ) (ChainR.) = (Indep.) = k H(Z i ) log i= k H(Z i Z i,..., Z ) i= Since we want to know how many weghings will yield the same amount of information which is given by the configuration of X (i.e. we want to know how many weighings will be needed to find out which coin - if any - is counterfeit), we can write: H(X) = H(Z) log log (n + ) log n + k n k, which is the wanted upper bound. Ex.. (Kenneth Palacio) X Y 0 0 / / 0 / Table : p(x,y) for problem.. Find: (a) H(X), H(Y ). (b) H(X Y ), H(Y X). (c) H(X, Y ). (d) H(Y ) H(Y X). (e) I(X; Y ). (f) Draw a Venn diagram for the quantities in parts (a) through (e). Solution: 4

Compute of marginal distributions: p(x) = [, ] p(y) = [, ] (a) H(X), H(Y ). H(X) = log ( ) log ( ) H(X) = 0.98bits (4) () H(Y ) = log ( ) log ( ) H(Y ) = 0.98bits (6) (5) Figure : H(X), H(Y) (b) H(X Y ), H(Y X). H(X Y ) = p(y = i)h(x Y = y) (7) i=0 H(X Y ) = H(X Y = 0) + H(X Y = ) (8) H(X Y ) = H(, 0) + H(/, /) (9) H(X Y ) = / (0) 5

H(Y X) = X p(x = i)h(y X = x) () i=0 H(Y X) = H(Y X = 0) + H(Y X = ) H(Y X) = H(/, /) + H(0, ) H(Y X) = / () () (4) Figure : H(X Y ), H(Y X) (c) H(X, Y ). H(X, Y ) =, X p(x, y) log p(x, y) (5),y=0 H(X, Y ) = log H(X, Y ) =.584965bits Figure : H(X,Y) 6 (6) (7)

(d) H(Y ) H(Y X). H(Y ) H(Y X) = 0.98 / (8) H(Y ) H(Y X) = 0.54 (9) Figure 4: H(Y ) H(Y X) (e) I(X; Y ). p(x, y) I(X; Y ) = p(x, y) log p(x)p(y) x,y / / / I(X; Y ) = log + log + log (/)(/) (/)(/) (/)(/) I(X; Y ) = 0.5696 X (0) () () Figure 5: I(X;Y) (f) Venn diagram is already shown for each item. Ex..0 (Kenneth Palacio) Run-length coding. Let X,X,..., Xn be (possibly dependent) binary random variables. Suppose that one calculates the run lengths R = (R, R,...) of this sequence (in order as they occur). For example, the sequence X = 0000000 yields run lengths R = (,,,, ). Compare 7

H(X, X,..., X n ), H(R), and H(Xn, R). Show all equalities and inequalities, and bound all the differences. Solution: Lets assume that one random variable Xj (0 < j n) is known, then if R is also known, H(Xj,R) will provide the same information about uncertainty than H(X,X,.. Xj,..,Xn), since the whole sequence of X can be completely recovered from the knowledge of Xj and R. For example, with X5 = and the run lengths R = (,,,, ) it s possible to recover the original sequence as follows: X5 =, R = (,,,, ) leads to recover the sequence: X = 0000000. It can be concluded that: H(Xj, R) = H(X, X,...Xn) Using the chain rule, H(Xj,R) can be written as: H(Xj, R) = H(R) + H(Xj R) H(Xj R) H(Xj), since conditioning reduces entropy. Then it s possible to write: H(Xj) H(X, X,...Xn) H(R) H(Xj) + H(R) H(X, X,...Xn) Computing H(Xj) = n x p(xj) log Xj, where the distribution of Xj is unknown, it can be assumed to be: a probability of p for Xj=0 and of (-p) for Xj=. It can be observed that the maximum entropy is given when p=/ leading max H(Xj)=. Then: + H(R) H(X, X,...Xn) Considering the results obtained in problem.4, we can write also that: H(R) H(X). Because R is a function of X. 8