Gambling and Data Compression



Similar documents
Review Horse Race Gambling and Side Information Dependent horse races and the entropy rate. Gambling. Besma Smida. ES250: Lecture 9.

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Gambling with Information Theory

On Directed Information and Gambling

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

arxiv: v1 [math.pr] 5 Dec 2011

Information, Entropy, and Coding

Goal Problems in Gambling and Game Theory. Bill Sudderth. School of Statistics University of Minnesota

Gambling and Portfolio Selection using Information theory

On Adaboost and Optimal Betting Strategies

An Introduction to Information Theory

Week 4: Gambler s ruin and bold play

LECTURE 4. Last time: Lecture outline

The Cost of Offline Binary Search Tree Algorithms and the Complexity of the Request Sequence

A Note on Proebsting s Paradox

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Linear Risk Management and Optimal Selection of a Limited Number

Statistics 100A Homework 3 Solutions

Part I. Gambling and Information Theory. Information Theory and Networks. Section 1. Horse Racing. Lecture 16: Gambling and Information Theory

A New Interpretation of Information Rate

Principle of Data Reduction

CHAPTER 6. Shannon entropy

Stochastic Inventory Control

Chapter 7: Proportional Play and the Kelly Betting System

An example of a computable

Master s Theory Exam Spring 2006

1 if 1 x 0 1 if 0 x 1

Low upper bound of ideals, coding into rich Π 0 1 classes

How to Gamble If You Must

2.1 Complexity Classes

Arithmetic Coding: Introduction

Ex. 2.1 (Davide Basilio Bartolini)

Decision Theory Rational prospecting

A Quantitative Measure of Relevance Based on Kelly Gambling Theory

it is easy to see that α = a

SOME ASPECTS OF GAMBLING WITH THE KELLY CRITERION. School of Mathematical Sciences. Monash University, Clayton, Victoria, Australia 3168

Influences in low-degree polynomials

Fairness in Routing and Load Balancing

The Mathematics of Gambling

Degrees that are not degrees of categoricity

Information Theory and Stock Market

Lecture 2: Universality

Week 1: Introduction to Online Learning

The Kelly criterion for spread bets

The Ergodic Theorem and randomness

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Follow the Perturbed Leader

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Entropy and Mutual Information

The Kelly Betting System for Favorable Games.

About the inverse football pool problem for 9 games 1

ALOHA Performs Delay-Optimum Power Control

1 Approximating Set Cover

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

The Convolution Operation

Linear Codes. Chapter Basics

A Catalogue of the Steiner Triple Systems of Order 19

Basics of information theory and information complexity

1 Portfolio Selection

Betting with the Kelly Criterion

Lectures on Stochastic Processes. William G. Faris

International Journal of Advanced Research in Computer Science and Software Engineering

Introduction to Online Learning Theory

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

How To Prove The Dirichlet Unit Theorem

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

A Practical Scheme for Wireless Network Operation

Analogy Between Gambling and. Measurement-Based Work Extraction

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

Gambling Systems and Multiplication-Invariant Measures

Single-Period Balancing of Pay Per-Click and Pay-Per-View Online Display Advertisements

Lecture 4: AC 0 lower bounds and pseudorandomness

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

The p-norm generalization of the LMS algorithm for adaptive filtering

Lecture 2: The Kelly criterion for favorable games: stock market investing for individuals

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

3. Linear Programming and Polyhedral Combinatorics

Exam Introduction Mathematical Finance and Insurance

Adaptive Online Gradient Descent

8.1 Min Degree Spanning Tree

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao

2.3 Convex Constrained Optimization Problems

Chapter 7. Sealed-bid Auctions

Continued Fractions and the Euclidean Algorithm

Chapter 1 Introduction

Transcription:

Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction of the gambler s wealth invested in horse X and o(x) is the corresponding odds. Definition The doubling rate of a horse race is W (b, p) = E[log S(X)] = m p k log b k o k Theorem Let the race outcomes X, X 2, be i.i.d. p(x). Then the wealth of the gambler using betting strategy b grows exponentially at rate W (b, p); that is, k= S n. = 2 nw (b,p) Definition The optimum doubling rate W (p) is the maximum doubling rate over all choices of the portfolio b: W (p) = max W (b, p) = max b b:b i 0, P i b i= m p i log b i Theorem (proportional gambling is log-optimal) the optimal doubling rate is given by i= W (p) = p i log H(p) and is achieved by the proportional gambling scheme b = p. Theorem (Conservation theorem) For uniform fair odds, W (p) + H(p) = log m Thus, the sum of the doubling rate and the entropy is a constant. If the gambler does not always bet all the money, then the optimum strategy may depend on the odds and will not necessarily have the simple form of proportional gambling. There are three cases:. Fair odds with respect to some distribution: =. By betting b i =, one achieves S(X) =, which is the same as keeping some cash aside. Proportional betting is optimal. 2. Superfair odds: <. By choosing b i = c, where c = /, one has S(X) = / > with probability. In this case, the gambler will always want to bet all the money and the optimum strategy is again proportional betting. 3. Subfair odds: >. Proportional gambling is no longer log-optimal. The gambler may want to bet only some of the money and keep the rest aside as cash, depending on the odds. Based on Cover & Thomas, Chapter 5,6

.2 Side Information and Entropy Rate Definition The increase W is defined as: W = W (X Y ) W (X), where W (X) = max b(x) W (X Y ) = max b(x y) p(x) log b(x)o(x) x p(x, y) log b(x y)o(x) x,y Theorem The increase W in doubling rate due to side information Y for a horse race X is W = I(X; Y )..3 Dependent horse races and the entropy rate If the horse races are dependent, suppose that the winning horses form a stochastic process {X k }: The optimal doubling rate for uniform fair odds (m for ) is, W (X k X k, X k 2,, X ) = log m H(X k X k, X k 2,, X ), which is achieved by b (x k x k,, x ) = p(x k x k,, x ). The doubling rate then satisfies n E[log S n] = log m H(X,, H n ) n Thus in the limit as n, the doubling rate is related to the entropy rate as lim n n E[log S n] = log m H(X ). 2 Data Compression Codes and Optimality 2. Definitions and Examples of Codes Definition A source code C for a random variable X is a mapping from X, the range of X, to D, the set of finite-length strings of symbols from a D-ary alphabet. Definition Let C(x) denote the codeword corresponding to x and let l(x) denote its length. Then the expected length of source code C(x) for random variable X with pmf p(x) is given by L(C) := E p [l(x)] = x X p(x)l(x), Definition A code is said to be nonsingular if every element of the range of X maps into a different string in D ; that is, x x C(x) C(x ). 2

Definition The extension C of a code C is the mapping from finite-length strings of X to finite-length strings of D, defined by C(x x 2 x n ) = C(x )C(x 2 ) C(x n ), where C(x )C(x 2 ) C(x n ) indicates concatenation of the corresponding codewords. Definition A code is called uniquely decodable if its extension is nonsingular. May have tnspect entire string to decode first codeword. Definition A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword. Instantaneous code self-punctuating code. 2.2 Instantaneous Codes and the Kraft Inequality Theorem (Kraft inequality) For any instantaneous code (prefix code) over an alphabet of size D, the codeword lengths l, l 2,..., l m must satisfy the inequality i Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths. Theorem (Extended Kraft inequality) For any countably infinite set of codewords that form a prefix code, the codeword lengths satisfy the extended Kraft inequality: i= Conversely, given any l, l 2,... satisfying the extended Kraft inequality, we can construct a prefix code with these codeword lengths. 3 Data Compression Optimal Codes and Length Bounds 3. Optimal Codes Definition A probability distribution is called D-adic if each of the probabilities is equal to D n for some n. Theorem (Lower bound on codeword length) The expected length L of any instantaneous D-ary code for a random variable X is greater than or equal to the base-d entropy H D (X): Equality holds iff the distribution of X is D-adic. L H D (x), with equality iff = p i for all i. 3

3.2 Bounds on the Optimal Code Length The previous theorem suggests finding the D-adic distribution vector r closest to a given source distribution vector p, and then designing a code for r. By minimizing D(p r) for D-adic r, we may exhibit a code (not necessarily optimal) whose length L satisfies the following bound: Theorem (Optimal expected codeword length) Let l, l 2,..., l m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L be the associated expected length of an optimal code (L = p i l i ). Then H D (x) L < H D (x) +. Consider sending a sequence of n symbols drawn iid according to p(x) in a block, so that we have a supersymbol from X n. Let L n be the expected codeword length per input symbol: L n := n E [l(x, X 2,..., X n )]. Then by letting the block length n become large, we may achieve an expected length per symbol L n arbitrarily close to the entropy: Theorem (Distributing the extra overhead bit) The minimum expected codeword length per symbol satisfies H(X, X 2,..., X n ) L n < H(X, X 2,..., X n ) + n n n. Moreover, if X, X 2,..., X n is a stationary stochastic process, then L n H(X ), where H(X ) is the entropy rate of the process. The previous theorem confirms that the entropy rate of a stationary stochastic process is indeed the minimum expected number of bits per symbol needed to describe the process. If we design a code for the wrong input distribution, then the increase in expected description length is given exactly by the relative entropy: Theorem (Wrong code) The expected length under p(x) of the code assignment l(x) = log q(x) satisfies H(p) + D(p q) E p [l(x)] < H(p) + D(p q) +. 3.3 Kraft Inequality for Uniquely Decodable Codes In the sense of expected length, the set of uniquely decodable codes while larger does not improve upon instantaneous codes: Theorem (McMillan) The codeword lengths of any uniquely decodable D-ary code must satisfy the Kraft inequality: i Conversely, given a set of codeword lengths satisfying this inequality, it is possible to construct a uniquely decodable code with these lengths. Corollary A uniquely decodable code for an infinite source alphabet X also satisfies the Kraft inequality. 4

3.4 Huffman Codes: Optimality and Examples Consider the tree construction we used earlier in order to suggest a proof of the Kraft inequality for finite, instantaneous codes. It suggests a constructive procedure for assigning codewords in a manner such that their lengths are roughly inversely proportional to corresponding symbol probabilities. We now formalize this idea through an example of Huffman codes: Construct a D-ary tree from which codewords can be assigned. Build up the tree recursively by combining D lowest-probability symbols together at each stage. A simple algorithm due to Huffman allows for the construction of optimal prefix codes for a given distribution: Lemma (Existence of a particular optimal code) For any distribution, there exists an optimal instantaneous code (with minimum expected length) that satisfies the following properties:. The lengths are ordered inversely with the probabilities (i.e., if p j > p k, then l j l k ). 2. The two longest codewords have the same length. 3. Two of the longest codewords differ only in the last bit and correspond to the two least likely symbols. Theorem (Optimality of Huffman coding) Huffman coding is optimal; that is, if C is a Huffman code and C is any other uniquely decodable code, then L(C ) L(C ). 3.5 Shannon-Fano-Elias Coding Fano proposed a suboptimal procedure based on recursively partitioning the unit interval, under the assumption that symbol probabilities are given in decreasing order. A related procedure, Shannon-Fano-Elias coding, makes direct use of the cumulative distribution function (cdf) F (x) to assign codewords. By using the midpoint F (x) of each jump in the cdf, we may exhibit a prefix-free code C satisfying Codeword lengths l(x) = log p(x) +. Expected length L(C) < H(X) + 2. 5