# Exercises with solutions (1)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal to 0. Can two independent random variables X and Y be correlated? Without loss of generality, we assume that the statistical properties of the random variables X and Y are given by the joint probability density function f XY (x, y) and marginal probability density functions f X (x) and f Y (y). Note that for a discrete random variable X with alphabet A, the pdf f X (x) can be written using the probability mass function p X (a) and the Dirac delta function δ(x), f X (x) = a A p X (a) δ(x a). Similarly, a joint pdf f XY (x, y) can be constructed using the Dirac delta function if either or both random variables X and Y are discrete random variables. Two random variables X and Y are independent if and only if the joint pdf is equal to the product of the marginal pdfs, x, y R, f XY (x, y) = f X (x)f Y (y). For the covariance C XY of two independent random variables X and Y, we then obtain C XY = E{(X E{X}) (Y E{Y })} = E{XY XE{Y } E{X}Y + E{X}E{Y }} = E{XY } E{X}E{Y } E{X}E{Y } + E{X}E{Y } = E{XY } E{X}E{Y } = = = = x y f XY (x, y) dx dy E{X}E{Y } x y f X (x) f Y (y) dx dy E{X}E{Y } x f X (x) x f X (x) dx y f Y (x) dy dx E{X}E{Y } = E{X}E{Y } E{X}E{Y } = 0. y f Y (x) dy E{X}E{Y } Two independent random variables are always uncorrelated.

2 (b) Let X be a continuous random variable with a variance σx > 0 and a pdf f X (x). The pdf shall be non-zero for all real numbers, f X (x) > 0, x R. Furthermore, the pdf f X (x) shall be symmetric around zero, f X (x) = f X ( x), x R. Let Y be a random variable given by Y = a X + b X + c with a, b, c R. For which values of a, b, and c are X and Y uncorrelated? For which values of a, b, and c are X and Y independent? First we investigate the correlation of the random variables X and Y. Due to the symmetry of the pdf around zero, f X (x) = f X ( x), x R, the expectation values of the odd integer powers of the random variable X are equal to 0. With the integer variable n 0, we have E{X n+ } = = x n+ f X (x) dx 0 x n+ f X (x) dx + x n+ f X (x) dx. 0 Using the substitution t = x for the second integral, we obtain E{X n+ } = = = = 0. x n+ f X (x) dx + x n+ f X (x) dx + x n+ f X (x) dx ( t) n+ f X ( t) ( dt) t n+ f X (t) dt t n+ f X (t) dt In particular, we have E{X} = 0 and E{X 3 } = 0. For the covariance C XY, we then obtain C XY = E{XY } E{X}E{Y } = E{XY } = E{aX 3 + bx + cx} = ae{x 3 } + be{x } + ce{x} = b σ X. The random variables X and Y are uncorrelated if and only if b is equal to 0. Now, we investigate the dependence of the random variables X and Y. The random variables X and Y are independent if and only if

3 f XY (x, y) = f X (x)f Y (y). Since f XY (x, y) = f Y X (y x)f X (x), we can also say that X and Y are independent if and only if the marginal pdf for f Y (y) is equal to the conditional pdf f Y X (y x). The value of the random variable Y is completely determined by the value of the random variable X. Hence, the conditional pdf f Y X (y x) is given by the Dirac delta function f Y X (y x) = δ(y ax bx c). If the conditional pdf f Y X (y x) depends on the value x of the random variable X, the random variables X and Y are not independent, since f Y (y) cannot be equal to f Y X (y x) in this case. The conditional pdf f Y X (y x) does not depend on x if one of the following conditions is fulfilled: a 0 and f X (x) = w δ(x x ) + w δ(x x ), where x and x are the roots of the quadratic equation ax + bx = d for any value of d > b /(4a), and 0 w ; a = 0, b 0, and f X (x) = δ(x x 0 ) with x 0 being any constant real value; a = 0 and b = 0. Since it is given that f X (x) > 0, x R, we do not need to consider the first two cases. Hence, for all parameters a, b, and c with a 0 or b 0, the random variables X and Y are dependent. For the case a = 0 and b = 0, the conditional pdf is given by f Y X (y x) = δ(y c), and the random variable Y is given by Y = c. The random variable Y is always equal to c. Consequently, its marginal pdf is given by f Y (y) = δ(y c) and is equal to the conditional pdf f Y X (y x). The random variables X and Y are independent if and only if a = 0 and b = 0. (c) Which of the following statements for two random variables X and Y are true? i. If X and Y are uncorrelated, they are also independent. ii. If X and Y are independent, E{XY } = 0. iii. If X and Y are correlated, they are also dependent. i. The statement if X and Y are uncorrelated, they are also independent is wrong. As a counterexample consider the random variables X and Y in problem (b) for a 0 and b = 0. In this case, the random variables are uncorrelated, but are dependent. 3

4 ii. The statement if X and Y are independent, E{XY } = 0 is wrong. As shown in problem (a), the independence of X and Y implies C XY = E{XY } E{X}E{Y } = 0. If, however, both mean values E{X} and E{Y } are not equal to zero, then E{XY } is also not equal to zero. iii. The statement if X and Y are correlated, they are also dependent is true. This statement is the contraposition of the statement if X and Y are independent, they are also uncorrelated, which has been proved in problem (a). 4

5 . A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th coin toss is head, Y n is equal to ; if it is tail, Y n is equal to 0. Now consider the random process X = {X n }. The random variables X n are determined by X n = Y n + Y n, and thus describe the total number of heads in the n-th and (n )-th coin tosses. (a) Determine the marginal pmf p Xn (x n ) and the marginal entropy H(X n ). Is it possible to design a uniquely decodable code with one codeword per possible outcome of X n that has an average codeword length equal to the marginal entropy? Since we consider a fair coin, both possible outcomes (head and tail) of a single coin toss are equally likely. Hence, the pmf for the random variables Y n is given by p Yn (0) = P (Y n = 0) = and p Y n () = P (Y n = ) =. The random variables Y n and Y m with n m are independent. Furthermore, two different k-symbol sequences of heads and tails Y n Y n Y n k+ are mutually exclusive events. The alphabet A for the random variables X n consists of three possible outcomes A = {0,, }. Hence, the marginal pmf can be obtained as follows: p Xn (0) = P (X n = 0) = P ( Y n Y n = 00 ) = p Yn (0) p Yn (0) = = 4, p Xn () = P (X n = ) = P ({ Y n Y n = 0 } { Y n Y n = 0 }) = P ( Y n Y n = 0 ) + P ( Y n Y n = 0 ) = p Yn (0) p Yn () + p Yn () p Yn (0) = + =, p Xn () = P (X n = ) = P ( Y n Y n = ) = p Yn () p Yn () = = 4. 5

6 The marginal entropy H(X n ) is given by H(X n ) = p Xn (x n ) log p Xn (x n ) x n A = p Xn (0) log p Xn (0) p Xn () log p Xn () p Xn () log p Xn () ( ) 4 log = 4 log = = 3. ( ) 4 log ( ) 4 Since all marginal probabilities are integer powers of, it is possible to develop a Huffman code for which the average codeword length is equal to the marginal entropy. An example for such a code is given in the table below. x n p Xn (x n ) codeword l(x n ) The average codeword length is l = x n A p Xn (x n ) l(x n ) = = 3, and is thus equal to the marginal entropy H(X n ). (b) Determine the conditional pmf p Xn X n (x n x n ) and the conditional entropy H(X n X n ). Design a conditional Huffman code. What is the average codeword length of the conditional Huffman code? The conditional pmf p Xn X n (x n x n ) can be calculated using the relationship p Xn X n (x n x n ) = p X nx n (x n, x n ). p Xn (x n ) The probability masses of the marginal pmf p Xn (x n ) have been calculated in (a). The joint probability masses p XnX n (x n, x n ) can be calculated in a similar way. 6

7 As an example, consider the joint probability mass p XnX n (, ) = P (X n =, X n = ) = P ( Y n Y n Y n = 00 ) + P ( Y n Y n Y n = 0 ) ( ) 3 ( ) 3 = + = 4. Note that for some combinations of x n and x n, the joint probability masses p XnX n (x n, x n ) are equal to zero, since the corresponding event {X n = x n X n = x n } cannot occur. If X n = 0, i.e., if the result of the (n )-th and the (n )-th coin toss is tail, the random variable X n can only take the values 0 or. Similarly, if X n =, X n can only take the values or. Consequently, the joint probability masses p XnX n (, 0) and p XnX n (0, ) are equal to 0. The following table shows that probability masses of the joint pmf p XnX n (x n, x n ) and the conditional pmf p Xn X n (x n x n ). x n x n p Xn (x n ) p XnX n (x n, x n ) p Xn X n (x n x n ) The conditional entropy H(X n X n ) is given by H(X n X n ) = p XnX n (x n, x n ) log p Xn X n (x n x n ). xn A x n A Some of the joint probability masses are equal to 0. These terms can be simply excluded from the summation, as can be shown by considering the following limit, where p denotes the joint probability p XnX n (x n, x n ) and q denotes the marginal probability p Xn (x n ), which is always greater than 0, ( ) p lim p log with q > 0. p 0 q By applying L Hôpital s rule, we obtain lim p 0 p log ( ) p q = lim p 0 ) ( log p q ( ) = lim p p 0 ln q p q p 7

8 = lim = 0. p 0 ln p = ln lim p 0 p Inserting the values of the joint and conditional pmf, which are given in the table above, into the expression for the conditional entropy yields H(X n X n ) = 4 ( ) 8 log ( ) 8 log 4 ( ) 4 log = = = 5 4. An example for a conditional Huffman code is shown in the following table. Note that we do not assign a codeword to the impossible events {X n = 0 X n = } and {X n = X n = 0}. x n x n p Xn X n (x n x n ) codeword l(x n x n ) The average codeword length of the conditional Huffman code is given by l = p XnX n (x n, x n ) l(x n x n ) xn A x n A = = 5 4. The average codeword length of the conditional Huffman code is equal to the conditional entropy H(X n X n ). (c) Is the random process X a Markov process? 8

9 The characteristic property of a Markov process is that the future states of the process depend only on the present state, not on the sequence of events that precede it. Using the conditional pmfs, this property can be written as p Xn X n X n (x n x n, x n, ) = p Xn X n (x n x n ). Now, let us investigate the given process X. If X n = 0, we know that the result of the (n )-th and (n )-th coin tosses was tail. Hence, the random variable X n can only take the values 0 (for Y n = 0) or (for Y n = ); both with the probability of. By considering additional random variables X n k with k >, we cannot improve the knowledge about X n. We have p Xn X n X n (x n 0, x n, ) 0.5 : x n = 0 = p Xn X n (x n 0) = 0.5 : x n = 0.0 : x n =. Similarly, if X n =, we know that the result of the (n )-th and (n )-th coin tosses was head. Hence, the random variable X n can only take the values (for Y n = 0) or (for Y n = ); both with the probability of. By considering additional random variables X n k with k >, we cannot improve the knowledge about X n. We have p Xn X n X n (x n, x n, ) 0.0 : x n = 0 = p Xn X n (x n ) = 0.5 : x n = 0.5 : x n =. However, for X n =, the situation is different. Here, we do not know the exact sequence Y n Y n, we only know that it was either 0 or 0. By considering an additional random variable X n, we can improve our knowledge about X n. If, for example, X n = 0, we know that the sequence Y n Y n Y n 3 was equal to 00, and then the random variable X n can only take the values or, both with a probability of. For an analytic proof that X is not a Markov process, we consider the conditional probabilities p Xn X n (0 ) and p Xn X n X n (0, ). In problem (b), we calculated the conditional pmf p Xn X n (x n x n ) and obtained p Xn X n (0 ) = 4. The probability mass p Xn X n X n (0, ) is given by p Xn X n X n (0, ) = p X nx n X n (0,, ) p XnX n (, ) = P ( Y ny n Y n Y n 3 = 00 ) P ( Y n Y n Y n = 0 ) 9

10 = ( ) 4 ( ) 3 =. Hence, we have p Xn X n X n (0, ) p Xn X n (0 ). The process X is not a Markov process. (d) Derive a general formula for the N-th order block entropy H N = H(X n,, X n N+ ). How many symbols have to be coded jointly at minimum for obtaining a code that is more efficient than the conditional Huffman code developed in (b)? For the following derivation, let p N (x 0,, x N ) denote the N-th order joint pmf p Xn X n N+ (x n,, x n N+ ). The N-th order block entropy is given by H N = H(X n,, X n N+ ) = p N (x 0,, x N ) log p N (x 0,, x N ). x 0 A x N A The summation in the above equation is done over 3 N terms. Each N-symbol sequence x 0 x N can be represented by a number of (N + )-symbol sequences y 0 y N, where y n represents a possible value of the random variable Y n. There are N+ possible (N + )-symbol sequences y 0 y N. We have to differentiate the following three cases: All symbols of the symbol sequence x 0 x N are equal to, x n =, n [0, N ]. In this case, the N-symbol sequence x 0 x N can be obtained by exactly two (N + )-symbol sequences y 0 y N, namely 00 and 00. Consequently, the joint probability mass p N (x 0,, x N ) is equal to p N = ( ) N+ + ( ) N+ = N. The symbol sequence x 0 x N is possible and contains at least one 0 or one. In this case, the N-symbol sequence x 0 x N is obtained by exactly one (N +)-symbol sequence y 0 y N and the joint probability mass p N (x 0,, x N ) is equal to ( ) N+ p N = = (N+). Since there are N+ outcomes of tossing a coin N + times, exactly N+ probability masses (number of possible outcomes minus the two outcomes considered above) are equal to p N. 0

11 The symbol sequence x 0 x N is impossible. This is, for example, the case if the symbol sequence contains the sub-sequences 0, 0, 00, or, which cannot be represented as an outcome of the coin tossing experiment. The joint probability mass p N (x 0,, x N ) for the impossible symbol sequences is, of course, equal to p N0 = 0. The number of impossible N-symbol sequences x 0 x N is equal to the number of total symbol sequences (which is 3 N ) minus the number of symbol sequences for which all symbols are equal to (which is ) minus the number of symbol sequences that correspond to exactly one outcome of N + coin tosses (which is N+ ). Hence, there are 3 N N+ + impossible N-symbol sequences x 0 x N. For problem (b), we have shown that lim p log p = 0. p 0 Hence, we do not need to consider the impossible N-symbol sequences, with the probability masses equal to 0, for calculating the N-th order block entropy. Consequently, we obtain H N = p N log p N ( N+ ) p N log p N = N log ( N ) ( N+ ) (N+) log ( (N+) ) = N N + (N + )( N ) = N N + (N + ) N N N = (N + ) N. Since all joint probability masses are either equal to 0 or negative integer powers of, we can always construct a Huffman code with an average codeword length per N-symbol sequence equal to the N-th order block entropy. Such an N-th order block Huffman code is more efficient than the conditional Huffman code, if its average codeword length per symbol l N is less than the average codeword length per symbol l C for the conditional Huffman code. Hence, we want to find the number of symbols N so that l N = H N N < l C = 5 4. By inserting the expression for H N, we obtain N + N N 4N N < 5 4 < 5N N > 4 (N ). We can manually check that the above inequality is not fulfilled for the case N = ( is not greater than ). For N >, the term

12 (N ) is always greater than 0 and less then or equal to. Since N is an integer number, we can then write N 4. At minimum, we have to code 4 symbols jointly for obtaining a code that is more efficient than the conditional Huffman code developed in (b). The following table lists the N-th order block entropy H N and the average codeword length per symbol (assuming a redundancy of zero) for the block codes with N equal to,, 3, 4, and 5. N H N H N /N 3/ 3/ =.5 /4 /8 = /8 3/4 = /6 79/64 = /3 9/60 =.9375 The data in the table additionally show that a joint coding of 4 or more symbols yields an average codeword length per symbol (assuming a redundancy of zero, which can be achieved with a block Huffman code, since all probability masses are integer powers of ) that is less than the average codeword length of.5 for the conditional Huffman code developed in (b). (e) Calculate the entropy rate H(X) of the random process X. Is it possible to design a variable length code with finite complexity and an average codeword length equal to the entropy rate? If yes, what requirement has to be fulfilled? The entropy rate H(X) is defined by H(X n,, X n N+ ) H N H(X) = lim = lim N N N N. By inserting the expression for the N-th order block entropy, which we have derived in (d), we obtain H(X) = H N lim N N = lim N = lim N = =. N + N N N N + lim N N + lim N N N The entropy rate H(X) for the random process X = {X n } is equal to bit per symbol. It should be noted that the entropy rate H(X)

13 for the random process X = {X n } is equal to the entropy rate H(Y) and the marginal entropy H(Y n ) of the iid process Y = {Y n }. We first consider the joint coding of N symbols. The average codeword length per symbol is given by l N = H N N. By using the expression for H N that we derived in (d), we obtain l N = H N N = + N N >. = N + N N By coding a finite number N of symbols jointly, we cannot develop a code with an average codeword length per symbol that is equal to the entropy rate. Similarly, we cannot achieve the entropy rate by considering a finite number N of previously coded symbols for a conditional code. If we consider the N previously coded symbols x n to x n N, inclusive, we always have to consider the case that all these symbols are equal to. If all considered previously coded symbols are equal to, there are always two possibilities for the sequence of the corresponding random variables Y n Y n N, namely 00 and 00. For this condition, the pmf is equal to { 4,, 4 } and, thus, the average codeword length is equal to 3. For all other possible conditions, the pmf is equal to {,, 0} or {0,, }, and the average codeword length is equal to. But since the probability for the condition that all N previously coded symbols are equal to is greater than 0, the average codeword length for the entire conditional code is always greater than. By only observing the random variables X n, it is not possible to construct a code that achieves the entropy rate. The general problem is that when considering a finite number N of symbols, all symbols can be equal to, and in this case we cannot know whether the outcome of the corresponding sequence of coin tosses is head, tail, head, tail, or tail, head, tail, head,. If, however, we do not only know the values of the random variables X n at the encoder side, but also the values of the random variables Y n, we can construct a simple code that achieves the entropy rate. We do not transmit the values of X n, but the values of Y n using the simple code in the table below. y n p Yn (y n ) codeword 0 / 0 / 3

14 At the decoder side, the values x n of the random variables X n are obtained based on the transmitted values y n of the random variables Y n by x n = y n + y n. The average codeword length for this code is l = y n {0,} p Yn (y n ) l(y n ) = + =. It is equal to the entropy rate H(X) of the random process X = {X n } and the entropy rate H(Y) of the random process Y = {Y n }. 4

15 3. Given is a discrete iid process X with the alphabet A = {a, b, c, d, e, f, g}. The pmf p X (x) and 6 example codes are listed in the following table. x p X (x) A B C D E F a / b / c / d / e / f / g / (a) Develop a Huffman code for the given pmf p X (x), calculate its average codeword length and its absolute and relative redundancy. The Huffman algorithm can be described as follows: First, we create a symbol group for each alphabet letter. Then, in each iteration, the symbol groups are sorted according to their associated probabilities. Two symbol groups with the smallest probabilities are selected, and each of the two symbol groups is characterized by a single bit. Then, the two selected symbol groups are summarized to a new symbol group. This process is repeated until a single symbol group is obtained. Finally, the constructed binary code tree is converted into a prefix code using the assigned bits. The construction of the binary code tree for the given pmf is illustrated in the following table. sorted probabilities associated symbol groups assigned bits step /3 /3 /9 /9 /7 /7 /7 a g b f c d e 0 step /3 /3 /9 /9 /7 /7 a g b f de c 0 step 3 /3 /3 /9 /9 /9 a g b f cde 0 step 4 /3 /3 /9 /9 a g cdef b 0 step 5 /3 /3 /3 a g bcdef 0 step 6 /3 /3 bcdef g a 0 5

16 Given the developed binary code tree, the codeword for each particular symbol x A is constructed by concatenating the bits that are assigned to the symbol groups containing the particular symbol, starting with the last iteration (i.e., the largest symbol groups) of the above described algorithm. The resulting code for the above illustrated code construction is shown in the table below. x codeword a b 0 c 00 d 0000 e 000 f 000 g 00 Note that there are multiple codes for a given pmf that can be constructed with the Huffman algorithm. We could sort the probabilities with the same values in a different order, and we could switch the assignment of 0 and bits in some or all of the iteration steps. The average codeword length per symbol is given by l = x A p X (x) l(x) = = = 7 7 = The entropy of the random variables X n = X is given by H(X) = x A p X (x) log p X (x) = 3 log ( ) 3 9 log ( ) 3 9 ( 3 3 ) = 3 log (3) + 9 log ( 3 ) log ( = = 3 9 log ) log 3 = log 7 3 The absolute redundancy of the Huffman code is ρ = l H(X) = log 3 = 3 7 (5 3 log 3) log ( ) 7

17 The absolute redundancy of the Huffman code is approximately 0.8 bit per symbol. The relative redundancy of the Huffman code is ρ H(X) = l H(X) l = H(X) H(X) 5 = 3 log The relative redundancy of the Huffman code is approximately 5.5%. (b) For all codes A, B, C, D, E, and F, do the following: Calculate the average codeword length per symbol; Determine whether the code is a singular code; Determine whether the code is uniquely decodable; Determine whether the code is a prefix code; Determine whether the code is an optimal prefix code. The average codeword length per symbol is given by l = x A p X (x) l(x), where l(x) denote the length of the codeword for the alphabet letter x. As an example, the average codeword length for code C is l C = = = = The average codeword length for all given codes are summarized in the following table, which also includes a summary of the answers for the other questions. x p X (x) A B C D E F a / b / c / d / e / f / g / l 65/7 99/7 65/7 63/7 8/7 65/7 singular no no no no yes no uniq. dec. yes yes yes no no yes prefix yes yes yes no no no opt. prefix yes no yes no no no 7

18 In the following, the properties of the given codes are briefly analyzed: Code A: The code is not singular, since a different codeword is assigned to each alphabet letter. The code is a prefix code, since no codeword represents a prefix (or the complete bit string) of another codeword. Since the code is a prefix code, it is uniquely decodable. The code is an optimal prefix code, since it is a prefix code and has the same average codeword length as a Huffman code for the given pmf (see 3a). Code B: The code is not singular, since a different codeword is assigned to each alphabet letter. The code is a prefix code, since no codeword represents a prefix (or the complete bit string) of another codeword. Since the code is a prefix code, it is uniquely decodable. The code is not an optimal prefix code, since the average codeword length is greater than that of the Huffman code for the given pmf (see 3a). Code C: The code is not singular, since a different codeword is assigned to each alphabet letter. The code is a prefix code, since no codeword represents a prefix (or the complete bit string) of another codeword. Since the code is a prefix code, it is uniquely decodable. The code is an optimal prefix code, since it is a prefix code and has the same average codeword length as a Huffman code for the given pmf (see 3a). Code D: The code is not singular, since a different codeword is assigned to each alphabet letter. The code is a not a prefix code, since the codeword 0 for the letter a represents a prefix of the codeword 00 for the letter d. The code is not uniquely decodable, since the letter sequences aaa and db give the same bit string 000. The code is not an optimal prefix code, since it is no prefix code. Code E: The code is singular, since the same codeword ( 00 ) is assigned to the alphabet letters b and g. Since the code is singular, it is not uniquely decodable, it is no prefix code, and it is not an optimal prefix code. Code F: The code is not singular, since a different codeword is assigned to each alphabet letter. 8

19 The code is a not a prefix code, since, for example, the codeword for the letter a represents a prefix of the codeword 00 for the letter b. The code is uniquely decodable, since based on the number of successive bits equal to 0, the symbol sequence can be unambiguously determined given a bit sequence. This will be further explained in (3c). The code is not an optimal prefix code, since it is no prefix code. (c) Briefly describe a process for decoding a symbol sequence given a finite sequence of K bits that is coded with code F. The decoding process can be described as follows: () Set n = 0. () Read the next bit b n. (3) Read all bits b n+i until the next bit b n+m equal to, excluding the bit b n+m equal to, or, if the remaining bit sequence does not contain a bit equal to, the end of the bit sequence. (4) Determine the number N 0 of read bits equal to 0, excluding the bit b n and all previously read bits. (5) Depending on the value of b n, do the following: If b n is equal to 0, output (N 0 + )/6 times the symbol e. If b n is equal to, do the following: If N 0 mod 6 == 0, output the symbol a. If N 0 mod 6 ==, output the symbol g. If N 0 mod 6 ==, output the symbol b. If N 0 mod 6 == 3, output the symbol f. If N 0 mod 6 == 4, output the symbol d. If N 0 mod 6 == 5, output the symbol c. If N 0 6, output N 0 /6 times the symbol e. (6) Set n = n + N 0 +. (7) If n < K, go to step (). Note that although the considered code K is uniquely decodable, it is not instantaneously decodable. In general, the next symbol is not known before the next bit equal to (or the end of the message) has been detected and the number of successive zero bits can be arbitrarily large. 9

20 4. Given is a Bernoulli process X with the alphabet A = {a, b} and the pmf p X (a) = p, p X (b) = p. Consider the three codes in the following table. Code A Code B Code C symbols codeword symbols codeword symbol codeword aa aa 000 a 0 ab 0 ab 00 b b 00 ba 0 bb (a) Calculate the average codeword length per symbol for the three codes. The code A is a code that assigns variable-length codewords to variablelength symbol sequences. Let s k be the symbol sequences to which the codewords are assigned. The average codeword length per symbol is the average codeword length per symbol sequence s k divided by the average number of symbols per symbol sequence s k. With l(s k ) denoting the length of the codeword that is assigned to s k, n(s k ) denoting the number of symbols in the symbol sequence s k, and p S (s k ) denoting the probability of the symbol sequence s k, we have s l A = k p S (s k ) l(s k ) s k p S (s k ) n(s k ). Note that the probability p(s k ) is given by p S (s k ) = p na(s k) ( p) n b(s k ), where n a (s k ) and n b (s k ) represent the number of symbols equal to a and b, respectively, in the symbol sequence s k. Hence, the average codeword length per symbol for the code A is l A = p S(aa) l(aa) + p S (ab) l(ab) + p S (b) l(b) p S (aa) n(aa) + p S (ab) n(ab) + p S (b) n(b) = p + p( p) + ( p) p + p( p) + ( p) = p + p p + p p + p p + p = p + p. The code B is a code that assigns a codeword to each possible sequence of two symbols. Hence, the average codeword length per symbol is equal to the average codeword length per symbol sequence divided by, l B = p S (s k ) l(s k ) s k 0

21 = (p S(aa)l(aa) + p S (ab)l(ab) + p S (ba)l(ba) + p S (bb)l(bb)) = ( p 4 + p( p) 3 + p( p) + ( p) ) = ( 4p + 3p 3p + p p + p + p ) = + 3p. The code C assign a codeword of length to both alphabet letters. Hence, its average codeword length per symbol is l C =. (b) For which probabilities p is the code A more efficient than code B? The code A is more efficient than code B if its average codeword length is less than the average codeword length of code B, l A < l B p < + 3p + p 4 p < + 3p + p + 3p 5p 4p + 3 < 0 p p 3 5 > 0. The quadratic function y = p p 3 5 is parabola, which opens upward (since the term p is multiplied by a positive number). Hence, y > 0 if p < p or p > p, where p and p are the roots of 0 = p p 3 5 with p p. The roots p and p are given by ) p / = 5 ( = = ( ) 9. 5 Hence, we have p = ( ) 9.78, 5 p = ( ) Consequently, the code A is more efficient than code B if ( ) 9 < p, 5

22 or, approximately, if < p. (c) For which probabilities p is the simple code C more efficient than both code A and code B? The first condition is l C < l A < p + p + p < p p + p < 0. The quadratic function y = p+p is parabola, which opens upward. Hence, y < 0 if p < p < p, where p and p are the roots of 0 = p + p with p p. The roots p and p are given by ( p / = ) + = = ( ) 5. Hence, we have p = ( ) 5.680, p = ( ) Consequently, the code C is more efficient than code A if The second condition is 0 p < ( 5 ). l C < l B < + 3p < + 3p < 3p p > 3.

23 Hence, the code C is more efficient than code B if 3 < p. By combining both derived conditions, we obtain that the simple code C is more efficient than both code A and code B if 3 < p < ( ) 5, or, approximately, if < p < For 0 p < 3, code B is more efficient than code A and code C, and ( ) for 5 < p, code A is more efficient than code B and code C. 3

24 5. Given is a Bernoulli process B = {B n } with the alphabet A B = {0, }, the pmf p B (0) = p, p B () = p, and 0 p <. Consider the random variable X that specifies the number of random variables B n that have to be observed to get exactly one. Calculate the entropies H(B n ) and H(X). For which value of p, with 0 < p <, is H(X) four times as large as H(B n )? Hint: a <, k=0 a k = a, a <, k a k a = ( a). k=0 The entropy for the Bernoulli process B is H B (p) = H(B n ) = p B (0) log p B (0) p B () log p B () = p log p ( p) log ( p). For calculating the entropy of the random variable X, we first determine its pmf p X (x). The alphabet for the random variable X is A X = {,, }. We may see a in the first observation of B n (for the special case p = 0, we always see a in the first observation), or in the second observation of B n, etc. It is, however, also possible that we have to look at an arbitrarily large number of random variables B n before we see a. The probability mass p X (k) is the probability that X = k and, hence, the probability that we see a symbol string B n B n+ B n+k B n+k equal to 00 0, p X (k) = P (X = k) = P ({B n = 0} {B n+ = 0} {B n+k = 0} {B n+k = }). Since the random variables B n are independent (a Bernoulli process is a binary iid process), we have k p X (k) = P (B n+k = ) P (B n+i = 0) = p B () p B (0) k = ( p) p k. i=0 The pmf for the random variable X is a geometric pmf. Its entropy is given by H X (p) = H(X) = = = p X (i) log p X (i) i= ( p)p i+ ( log ( p)p i+ ) i= ( p)p k ( log ( p)p k ) k=0 4

25 = ( p)p k (log ( p) + k log p) = k=0 ( ( p) log ( p) ) p k ( ( p) log ( p) ) k p k k=0 ( ) ( ) = ( p) log ( p) p k ( p) log p kp k. k=0 For the case we are considering, 0 p <, the series in the above equation converge and we can write k=0 H X (p) = ( ( p) log ( p) ) p ( ( p) log p ) p ( p) = log ( p) p p log p. By reformulating the above expression, we obtain H X (p) = = ( ) ( p) log p ( p) p log p p H B(p). We now determine the value of p, with 0 < p <, for which H X (p) is four times as large as H B (p), H X (p) = 4 H B (p) p H B(p) = 4 H B (p) For 0 < p <, H B (p) is greater than 0. Hence, we can divide the above equation by H B (p) and obtain 4 = p p = 3 4 = For p = 0.75, the entropy of the random variable X is four times as large as the entropy of the Bernoulli process. 5

26 6. Proof the chain rule for the joint entropy, H(X, Y ) = H(X) + H(Y X). With A X and A Y being the alphabets of the random variables X and Y, respectively, the joint entropy H(X, Y ) is defined as H(X, Y ) = p XY (x, y) log p XY (x, y). y A Y x A X Using the chain rule p XY (x, y) = p X (x)p Y X (y x) for the joint probability masses, we obtain H(X, Y ) = p XY (x, y) log p X (x) x A X y A Y p XY (x, y) log p Y X (y x) x A X y A Y = ( ) p XY (x, y) log p X (x) + H(Y X) y A X x A Y = p X (x) log p X (x) + H(Y X) y A X = H(X) + H(Y X). 6

27 7. Investigate the entropy of a function of a random variable X. Let X be a discrete random variable with the alphabet A X = {0,,, 3, 4} and the binomial pmf /6 : x = 0 x = 4 p X (x) = /4 : x = x = 3. 3/8 : x = (a) Calculate the entropy H(X). Inserting the given probability masses into the definition of entropy yields H(X) = 6 log = ( ) 6 4 log (3 log 3) = log 3 = log 3 = 3 8 (7 log 3) ( ) log ( ) 3 8 (b) Consider the functions g (x) = x and g (x) = (x ). Calculate the entropies H(g (X)) and H(g (X)). Let Y be the random variable Y = g(x). The alphabet of Y is given by the alphabet of the random variable X and the function g (x), A Y = g (x) = {0,, 4, 9, 6}. x A X Similarly, let Z be the random variable Z = g (X). The alphabet of Z is given by A Z = g (x) = {0,, 4}. x A X The pmf for Y is given by /6 : y = 0 y = 6 p Y (y) = p X (x) = /4 : y = y = 9 x A X : y=g (x) 3/8 : y = 4. Similarly, the pmf for Z is given by 3/8 : z = 0 p Z (z) = p X (x) = / : z = x A X : z=g (x) /8 : z =. 7

28 Using the determined pmfs for calculating the entropies, we obtain H(g (X)) = H(Y ) = p Y (y) log p Y (y) y A Y ( ) = 6 log ( ) 6 4 log 3 ( ) log 8 = (3 log 3) and = log 3 = log 3 = 3 8 (7 log 3) = H(X), H(g (X)) = H(Z) = = 3 8 log p Z (z) log p Z (z) z A Z ) ( log ( 3 8 = 3 8 (3 log 3) = 8 (6 3 log 3) = H(X) 5 8. ) 8 log ( ) 8 (c) Proof that the entropy H(g(X)) of a function g(x) of a random variable X is not greater than the entropy of the random variable X, H(g(X)) H(X) Determine the condition under which equality is achieved. Using the chain rule, H(X, Y ) = H(X) + H(Y X), we can write H(X, g(x)) = H(g(X), X) H(X) + H(g(X) X) = H(g(X)) + H(X g(x)) H(g(X)) = H(X) + H(g(X) X) H(X g(x)). Since the random variable g(x) is a function of the random variable X, the value of g(x) is known if the value of X is known. Hence, the conditional probability mass function p g(x) X (y x) is given by p g(x) X (y x) = { : y = g(x) 0 : y g(x). 8

29 Let A X denote the alphabet of the random variables X with x A X, p X (x) > 0. Similarly, let A Y denote the alphabet of the random variable Y = g(x) with y A Y, p Y (y) > 0. The conditional entropy H(g(X) X) is given by H(g(X) X) = p X,g(X) (x, y) log p g(x) X (y x) x A X y A Y = p X (x) p g(x) X (y x) log p g(x) X (y x). x A X y A Y The terms with p g(x) X (y x) = 0 do not contribute to the sum in parenthesis and can be ignored. We obtain, H(g(X) X) = p X (x) p g(x) X (g(x) x) log p g(x) X (g(x) x) x A X = ( log ) p X (x) = log = 0. x Hence, the conditional entropy H(g(X) X) is always equal to 0, and we obtain for H(g(X)), H(g(X)) = H(X) H(X g(x)). Since the entropy is always greater than or equal to 0, we have proved that H(g(X)) H(X). If and only if g(x) is an injective function for all letters of the alphabet A X, i.e., if a, b A X, a b implies g(a) g(b), we can define an inverse function h(y), so that h(g(x)) = x, x A X. In this case, we obtain H(X g(x)) = H(h(g(X)) g(x)) = 0, Consequently, if g(x) is an injective function for all letters of the alphabet A Y, the entropy H(g(X)) is equal to the entropy H(X), H(g(X)) = H(X). If g(x) is not an injective function for all letters of the alphabet A Y, i.e., if there are two alphabet letters a and b a, with g(a) = g(b), the entropy H(g(X)) is less than the entropy H(X), H(g(X)) < H(X). 9

### Topic 4: Multivariate random variables. Multiple random variables

Topic 4: Multivariate random variables Joint, marginal, and conditional pmf Joint, marginal, and conditional pdf and cdf Independence Expectation, covariance, correlation Conditional expectation Two jointly

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Gambling and Data Compression

Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

### Almost every lossy compression system contains a lossless compression system

Lossless compression in lossy compression systems Almost every lossy compression system contains a lossless compression system Lossy compression system Transform Quantizer Lossless Encoder Lossless Decoder

### Math 431 An Introduction to Probability. Final Exam Solutions

Math 43 An Introduction to Probability Final Eam Solutions. A continuous random variable X has cdf a for 0, F () = for 0 <

### Covariance and Correlation. Consider the joint probability distribution f XY (x, y).

Chapter 5: JOINT PROBABILITY DISTRIBUTIONS Part 2: Section 5-2 Covariance and Correlation Consider the joint probability distribution f XY (x, y). Is there a relationship between X and Y? If so, what kind?

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### ECE302 Spring 2006 HW7 Solutions March 11, 2006 1

ECE32 Spring 26 HW7 Solutions March, 26 Solutions to HW7 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### 3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

### Ex. 2.1 (Davide Basilio Bartolini)

ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

### Some Notes on Taylor Polynomials and Taylor Series

Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### ST 371 (VIII): Theory of Joint Distributions

ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or

### Probability Generating Functions

page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

### Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

### ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

### TOPIC 3: CONTINUITY OF FUNCTIONS

TOPIC 3: CONTINUITY OF FUNCTIONS. Absolute value We work in the field of real numbers, R. For the study of the properties of functions we need the concept of absolute value of a number. Definition.. Let

### e.g. arrival of a customer to a service station or breakdown of a component in some system.

Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Lecture 13: Martingales

Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### The Bivariate Normal Distribution

The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Chapter 4 Expected Values

Chapter 4 Expected Values 4. The Expected Value of a Random Variables Definition. Let X be a random variable having a pdf f(x). Also, suppose the the following conditions are satisfied: x f(x) converges

### Elementary Number Theory We begin with a bit of elementary number theory, which is concerned

CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,

### Joint Distributions. Tieming Ji. Fall 2012

Joint Distributions Tieming Ji Fall 2012 1 / 33 X : univariate random variable. (X, Y ): bivariate random variable. In this chapter, we are going to study the distributions of bivariate random variables

### Chapter 4. Multivariate Distributions

1 Chapter 4. Multivariate Distributions Joint p.m.f. (p.d.f.) Independent Random Variables Covariance and Correlation Coefficient Expectation and Covariance Matrix Multivariate (Normal) Distributions Matlab

### Fifth Problem Assignment

EECS 40 PROBLEM (24 points) Discrete random variable X is described by the PMF { K x p X (x) = 2, if x = 0,, 2 0, for all other values of x Due on Feb 9, 2007 Let D, D 2,..., D N represent N successive

### Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

### Practice with Proofs

Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

### Basic Properties of Rings

Basic Properties of Rings A ring is an algebraic structure with an addition operation and a multiplication operation. These operations are required to satisfy many (but not all!) familiar properties. Some

### Chapters 5. Multivariate Probability Distributions

Chapters 5. Multivariate Probability Distributions Random vectors are collection of random variables defined on the same sample space. Whenever a collection of random variables are mentioned, they are

### Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

### Chap 3 Huffman Coding

Chap 3 Huffman Coding 3.1 Overview 3.2 The Huffman Coding Algorithm 3.4 Adaptive Huffman Coding 3.5 Golomb Codes 3.6 Rice Codes 3.7 Tunstall Codes 3.8 Applications of Huffman Coding 1 3.2 The Huffman Coding

### Problem Set 7 - Fall 2008 Due Tuesday, Oct. 28 at 1:00

18.781 Problem Set 7 - Fall 2008 Due Tuesday, Oct. 28 at 1:00 Throughout this assignment, f(x) always denotes a polynomial with integer coefficients. 1. (a) Show that e 32 (3) = 8, and write down a list

### Real Roots of Univariate Polynomials with Real Coefficients

Real Roots of Univariate Polynomials with Real Coefficients mostly written by Christina Hewitt March 22, 2012 1 Introduction Polynomial equations are used throughout mathematics. When solving polynomials

### Fundamentele Informatica II

Fundamentele Informatica II Answer to selected exercises 1 John C Martin: Introduction to Languages and the Theory of Computation M.M. Bonsangue (and J. Kleijn) Fall 2011 Let L be a language. It is clear

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### AN INTRODUCTION TO ERROR CORRECTING CODES Part 1

AN INTRODUCTION TO ERROR CORRECTING CODES Part 1 Jack Keil Wolf ECE 154C Spring 2008 Noisy Communications Noise in a communications channel can cause errors in the transmission of binary digits. Transmit:

### The Scalar Algebra of Means, Covariances, and Correlations

3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,

### Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.

Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### LECTURE 2. Part/ References Topic/Sections Notes/Speaker. 1 Enumeration 1 1.1 Generating functions... 1. Combinatorial. Asst #1 Due FS A.

Wee Date Sections f aculty of science MATH 895-4 Fall 0 Contents 1 Sept 7 I1, I, I3 Symbolic methods I A3/ C Introduction to Prob 4 A sampling 18 IX1 of counting sequences Limit Laws and and generating

### Taylor Polynomials and Taylor Series Math 126

Taylor Polynomials and Taylor Series Math 26 In many problems in science and engineering we have a function f(x) which is too complicated to answer the questions we d like to ask. In this chapter, we will

### TOPIC 4: DERIVATIVES

TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the

### The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

### Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS Part 3: Discrete Uniform Distribution Binomial Distribution Sections 3-5, 3-6 Special discrete random variable distributions we will cover

### INTRODUCTION TO CODING THEORY: BASIC CODES AND SHANNON S THEOREM

INTRODUCTION TO CODING THEORY: BASIC CODES AND SHANNON S THEOREM SIDDHARTHA BISWAS Abstract. Coding theory originated in the late 1940 s and took its roots in engineering. However, it has developed and

### Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS Part 4: Geometric Distribution Negative Binomial Distribution Hypergeometric Distribution Sections 3-7, 3-8 The remaining discrete random

### Multiplicity. Chapter 6

Chapter 6 Multiplicity The fundamental theorem of algebra says that any polynomial of degree n 0 has exactly n roots in the complex numbers if we count with multiplicity. The zeros of a polynomial are

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### Multimedia Communications. Huffman Coding

Multimedia Communications Huffman Coding Optimal codes Suppose that a i -> w i C + is an encoding scheme for a source alphabet A={a 1,, a N }. Suppose that the source letter a 1,, a N occur with relative

### Generating Functions

Generating Functions If you take f(x =/( x x 2 and expand it as a power series by long division, say, then you get f(x =/( x x 2 =+x+2x 2 +x +5x 4 +8x 5 +x 6 +. It certainly seems as though the coefficient

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 14 Non-Binary Huffman Code and Other Codes In the last class we

### Bivariate Distributions

Chapter 4 Bivariate Distributions 4.1 Distributions of Two Random Variables In many practical cases it is desirable to take more than one measurement of a random observation: (brief examples) 1. What is

### Functions and Equations

Centre for Education in Mathematics and Computing Euclid eworkshop # Functions and Equations c 014 UNIVERSITY OF WATERLOO Euclid eworkshop # TOOLKIT Parabolas The quadratic f(x) = ax + bx + c (with a,b,c

### Worked examples Multiple Random Variables

Worked eamples Multiple Random Variables Eample Let X and Y be random variables that take on values from the set,, } (a) Find a joint probability mass assignment for which X and Y are independent, and

### Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Series Convergence Tests Math 122 Calculus III D Joyce, Fall 2012

Some series converge, some diverge. Series Convergence Tests Math 22 Calculus III D Joyce, Fall 202 Geometric series. We ve already looked at these. We know when a geometric series converges and what it

### M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung

M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS

### MATH REVIEW KIT. Reproduced with permission of the Certified General Accountant Association of Canada.

MATH REVIEW KIT Reproduced with permission of the Certified General Accountant Association of Canada. Copyright 00 by the Certified General Accountant Association of Canada and the UBC Real Estate Division.

### 10.2 Series and Convergence

10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### Lecture 16: Expected value, variance, independence and Chebyshev inequality

Lecture 16: Expected value, variance, independence and Chebyshev inequality Expected value, variance, and Chebyshev inequality. If X is a random variable recall that the expected value of X, E[X] is the

### Review of Fundamental Mathematics

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

### Review Horse Race Gambling and Side Information Dependent horse races and the entropy rate. Gambling. Besma Smida. ES250: Lecture 9.

Gambling Besma Smida ES250: Lecture 9 Fall 2008-09 B. Smida (ES250) Gambling Fall 2008-09 1 / 23 Today s outline Review of Huffman Code and Arithmetic Coding Horse Race Gambling and Side Information Dependent

### Section 5.1 Continuous Random Variables: Introduction

Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,

### Continuous Random Variables

Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### Conditional expectation

A Conditional expectation A.1 Review of conditional densities, expectations We start with the continuous case. This is sections 6.6 and 6.8 in the book. Let X, Y be continuous random variables. We defined

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following

### FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

### ORDERS OF ELEMENTS IN A GROUP

ORDERS OF ELEMENTS IN A GROUP KEITH CONRAD 1. Introduction Let G be a group and g G. We say g has finite order if g n = e for some positive integer n. For example, 1 and i have finite order in C, since

### Recitation 4. 24xy for 0 < x < 1, 0 < y < 1, x + y < 1 0 elsewhere

Recitation. Exercise 3.5: If the joint probability density of X and Y is given by xy for < x

### Quotient Rings and Field Extensions

Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### CHAPTER 2. Inequalities

CHAPTER 2 Inequalities In this section we add the axioms describe the behavior of inequalities (the order axioms) to the list of axioms begun in Chapter 1. A thorough mastery of this section is essential

### Change of Continuous Random Variable

Change of Continuous Random Variable All you are responsible for from this lecture is how to implement the Engineer s Way (see page 4) to compute how the probability density function changes when we make

### 1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

Factorisation 1.5 Introduction In Block 4 we showed the way in which brackets were removed from algebraic expressions. Factorisation, which can be considered as the reverse of this process, is dealt with

### VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS

VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS MICHAEL DRMOTA, OMER GIMENEZ, AND MARC NOY Abstract. We show that the number of vertices of a given degree k in several kinds of series-parallel labelled

### a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### 3. Continuous Random Variables

3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

### Statistics 100A Homework 8 Solutions

Part : Chapter 7 Statistics A Homework 8 Solutions Ryan Rosario. A player throws a fair die and simultaneously flips a fair coin. If the coin lands heads, then she wins twice, and if tails, the one-half

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### Actually, if you have a graphing calculator this technique can be used to find solutions to any equation, not just quadratics. All you need to do is

QUADRATIC EQUATIONS Definition ax 2 + bx + c = 0 a, b, c are constants (generally integers) Roots Synonyms: Solutions or Zeros Can have 0, 1, or 2 real roots Consider the graph of quadratic equations.

### MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

### LECTURE 4. Last time: Lecture outline

LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

### CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS

CONTINUED FRACTIONS, PELL S EQUATION, AND TRANSCENDENTAL NUMBERS JEREMY BOOHER Continued fractions usually get short-changed at PROMYS, but they are interesting in their own right and useful in other areas

### k, then n = p2α 1 1 pα k

Powers of Integers An integer n is a perfect square if n = m for some integer m. Taking into account the prime factorization, if m = p α 1 1 pα k k, then n = pα 1 1 p α k k. That is, n is a perfect square

### Combinatorics: The Fine Art of Counting

Combinatorics: The Fine Art of Counting Week 7 Lecture Notes Discrete Probability Continued Note Binomial coefficients are written horizontally. The symbol ~ is used to mean approximately equal. The Bernoulli

### CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.

### Maximum Entropy. Information Theory 2013 Lecture 9 Chapter 12. Tohid Ardeshiri. May 22, 2013

Maximum Entropy Information Theory 2013 Lecture 9 Chapter 12 Tohid Ardeshiri May 22, 2013 Why Maximum Entropy distribution? max f (x) h(f ) subject to E r(x) = α Temperature of a gas corresponds to the

### Homework 5 Solutions

Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which