CSC 310: Information Theory

Similar documents
Gambling and Data Compression

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Information, Entropy, and Coding

International Journal of Advanced Research in Computer Science and Software Engineering

Full and Complete Binary Trees

Mathematical Induction. Lecture 10-11

Arithmetic Coding: Introduction

Entropy and Mutual Information

CHAPTER 6. Shannon entropy

CAs and Turing Machines. The Basis for Universal Computation

Reading 13 : Finite State Automata and Regular Expressions

Base Conversion written by Cathy Saxton

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

3. Mathematical Induction

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

6.3 Conditional Probability and Independence

Factorizations: Searching for Factor Strings

Determine If An Equation Represents a Function

A Non-Linear Schema Theorem for Genetic Algorithms

Notes on Complexity Theory Last updated: August, Lecture 1

WHERE DOES THE 10% CONDITION COME FROM?

One pile, two pile, three piles

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

26 Integers: Multiplication, Division, and Order

Greatest Common Factor and Least Common Multiple

Image Compression through DCT and Huffman Coding Technique

Polarization codes and the rate of polarization

Lecture 13 - Basic Number Theory.

NUMBER SYSTEMS. William Stallings

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

Computational Models Lecture 8, Spring 2009

Math 4310 Handout - Quotient Vector Spaces

Coding and decoding with convolutional codes. The Viterbi Algor

=

Lecture 1: Course overview, circuits, and formulas

Our Primitive Roots. Chris Lyons

C H A P T E R Regular Expressions regular expression

Lecture 2: Universality

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Right Triangles 4 A = 144 A = A = 64

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

10.2 Series and Convergence

1 Definition of a Turing machine

University of Ostrava. Reasoning in Description Logic with Semantic Tableau Binary Trees

Math Circle Beginners Group October 18, 2015

Introduction to Algebraic Coding Theory

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

NP-Completeness and Cook s Theorem

Session 7 Fractions and Decimals

Basics of Counting. The product rule. Product rule example. 22C:19, Chapter 6 Hantao Zhang. Sample question. Total is 18 * 325 = 5850

Multiplication and Division with Rational Numbers

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Vieta s Formulas and the Identity Theorem

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

3515ICT Theory of Computation Turing Machines

Lecture 1. Basic Concepts of Set Theory, Functions and Relations

Lecture 6 Online and streaming algorithms for clustering

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Connectivity and cuts

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

1.4 Compound Inequalities

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Network File Storage with Graceful Performance Degradation

Lecture 5 - CPA security, Pseudorandom functions

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Activity 1: Using base ten blocks to model operations on decimals

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

3.3 Real Zeros of Polynomials

Symbol Tables. Introduction

Kolmogorov Complexity and the Incompressibility Method

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

MAC Sublayer. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross

Turing Machines: An Introduction

Encoding Text with a Small Alphabet

Union-Find Problem. Using Arrays And Chains

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Lecture 16 : Relations and Functions DRAFT

PYTHAGOREAN TRIPLES KEITH CONRAD

Grade 7/8 Math Circles Sequences and Series

Codes for Network Switches

Classification/Decision Trees (II)

From Last Time: Remove (Delete) Operation

Stanford Math Circle: Sunday, May 9, 2010 Square-Triangular Numbers, Pell s Equation, and Continued Fractions

GRAPH THEORY LECTURE 4: TREES

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

Caesar Ciphers: An Introduction to Cryptography

MATH 13150: Freshman Seminar Unit 10

Colored Hats and Logic Puzzles

Decision Making under Uncertainty

Binary Search Trees CMPSC 122

LINEAR INEQUALITIES. Mathematics is the art of saying many things in many different ways. MAXWELL

Dynamic Programming. Lecture Overview Introduction

Transcription:

CSC 310: Information Theory University of Toronto, Fall 2011 Instructor: Radford M. Neal Week 2

What s Needed for a Theory of (Lossless) Data Compression? A context for the problem. What are we trying to compress, and what are we compressing it into? A notion of what data compression schemes are possible. A data compression scheme must allow us to encode data, and then decode it, recovering the original data. A measure of how good a data compression scheme is. We will have to look at how good a scheme is on average, given some model for the source. One Danger: If we don t formalize things well, we might eliminate data compression schemes that would have been practical.

What Might We Hope for From a Theory of Data Compression? Easier ways of telling whether a data compression scheme is possible, and if so, how good it is. A theorem that tells us how good a scheme can possibly be the theoretical limit. Some help in finding a scheme that approaches this theoretical limit. Insight into the nature of the problem, which may help for other problems. One insight: Compression is limited by the entropy of the source, which is a measure of information content that has many other uses.

Formalizing the Source of Data We ll assume that we are trying to compress data from a digital source that produces a sequence of symbols, X 1, X 2,.... These will be viewed as random variables; some particular values they take on will be denoted by x 1, x 2,.... These source symbols are from a finite source alphabet, A X. Examples: A X = {A, B,..., Z, } A X = {0, 1, 2,..., 255} A X = {C, G, T, A} A X = {0, 1} The source alphabet is known to the receiver who may be us at a later time, for storage applications.

Formalizing What We Compress To The output of the compression program is a sequence of code symbols, Z 1, Z 2,... from a finite code alphabet, A Z. These symbols are sent through the channel, to the receiver. We assume for now that the channel is noise-free the symbol received is always the symbol that was sent. We ll almost always assume that the code alphabet is {0, 1}, since computer files and digital transmissions are usually binary, but the theory can easily be generalized to any finite code alphabet.

Possible Compression Programs A compression program (ie, a code) defines a mapping of each source symbol to a finite sequence of code symbols (a codeword). For example, suppose our source alphabet is A X = {C, G, T, A}. One possible code is C 0 G 10 T 110 A 1110 We encode a sequence of source symbols by concatenating the codewords of each: CCAT 001110110 We require that the mapping be such that we can decode this sequence. Later, we ll see that the above formalization isn t really right...

What Codes are Decodable? Let s consider only codes that can be decoded. What does that mean? This may depend on how the channel behaves at the end of a transmission. Four possibilities: The end of the transmission is explicitly marked, say by $ : 011101101$ After the end of the transmission, subsequent symbols all have a single known value, say 0 : 0111011010000000000... After the end of the transmission, subsequent symbols are random garbage: 0111011011100100101... There is no end to the transmission.

When Do We Need the Decoding? Another possible issue is when we require that a decoded symbol be known. Possibilities: As soon as the codeword for the symbol has been received. If this is possible, the code is instantaneously decodable. With no more than a fixed delay after the codeword for the symbol has been received. If this is possible, the code is decodable with bounded delay. Not until the entire message has been received. Assuming that the end of transmission is explicitly marked, we then require only that the code be uniquely decodable.

How Much Difference Does it Make? We could develop theories of data compression with various definitions of decodability. Question: How much difference will it make? Will we find that we can t compress data as much if we insist on using a code that is instantaneously decodable? Or will we find that a single theory is robust not sensitive to the exact details of the channel and decoding requirements. Easiest: Assume the end of transmission is explicitly marked; don t require any symbols be decoded until the entire message is received. Hardest: Require instantaneous decoding. (It then won t matter whether the end of transmission is marked, as far as decoding the symbols that were actually sent is concerned.)

Notation for Sequences & Codes A X and A Z are the source and code alphabets. A + X and A+ Z denote sequences of one or more symbols from the source or code alphabets. A symbol code, C, is a mapping A X A + Z. We use c(x) to denote the codeword C maps x to. We can use concatenation to extend this to a mapping for the textitextended code, C + : A + X A+ Z : c + (x 1 x 2 x N ) = c(x 1 )c(x 2 ) c(x N ) That is, we code a string of symbols by just stringing together the codes for each symbol. We sometimes also use C to denote the set of codewords: {w w = c(a) for some a A X }

Formalizing Uniquely Decodable and Instantaneous Codes We can now define a code to be uniquely decodable if the mapping C + : A + X A+ Z is one-to-one. In other words: For all x and x in A + X, x x imples c + (x) c + (x ) A code is obviously not uniquely decodable if two symbols have the same codeword ie, if c(a) = c(a ) for some a a so we ll usually assume that this isn t the case. We define a code to be instantaneously decodable if any source sequences x and x in A + X for which x is not a prefix of x have encodings z = C(x) and z = C(x ) for which z is not a prefix of z. (Since otherwise, after receiving z, we wouldn t yet know whether the message starts with x or with x.)

Examples Examples with A X = {a, b, c} and A Z = {0,1}: Code A Code B Code C Code D a 10 0 0 0 b 11 10 01 01 c 111 110 011 11 Code A: Not uniquely decodable Both bbb and cc encode as 111111 Code B: Instantaneously decodable End of each codeword marked by 0 Code C: Decodable with one-symbol delay End of codeword marked by following 0 Code D: Uniquely decodable, but with unbounded delay: 011111111111111 decodes as accccccc 01111111111111 decodes as bcccccc

How to Check Whether a Code is Uniquely Decodable The Sardinas-Patterson Theorem tells us how to check whether a code is uniquely decodable. Let C be the set of codewords. Define C 0 = C. For n > 0, define C n = {w A + Z uw = v where u C, v C n 1 or u C n 1, v C} Finally, define C = C 1 C 2 C 3 The code C is uniquely decodable if and only if C and C are disjoint. We won t both much with this theorem, since as we ll see it isn t of much practical use.

How to Check Whether a Code is Instantaneously Decodable A code is instantaneous if and only if no codeword is a prefix of some other codeword. Proof: ( ) If codeword C(a) is a prefix of codeword C(a ), then the encoding of the sequence x = a is obviously a prefix of the encoding of the sequence x = a. ( ) If the code is not instantaneous, let z = C(x) be an encoding that is a prefix of another encoding z = C(x ), but with x not a prefix of x, and let x be as short as possible. The first symbols of x and x can t be the same, since if they were, we could drop these symbols and get a shorter instance. So these two symbols must be different, but one of their codewords must be a prefix of the other.

Existence of Codes With Given Lengths of Codewords Since we hope to compress data, we would like codes that are uniquely decodable and whose codewords are short. If we could make all the codewords really short, life would be really easy. Too easy. Instead, making some codewords short will require that other codewords be long, if the code is to be uniquely decodable. Questions: What sets of codeword lengths are possible? Is the answer to this question different for instantaneous codes than for uniquely decodable codes?

McMillan s Inequality There is a uniquely decodable binary code with codewords having lengths l 1,..., l I if and only if I 1 2 l i i=1 1 Examples: There is a uniquely decodable binary code with lengths 1, 2, 3, 3, since 1/2 + 1/4 + 1/8 + 1/8 = 1 An example of such a code is {0, 01, 011, 111}. There is no uniquely decodable binary code with lengths 2, 2, 2, 2, 2, since 1/4 + 1/4 + 1/4 + 1/4 + 1/4 > 1

Kraft s Inequality There is an instantaneous binary code with codewords having lengths l 1,..., l I if and only if I 1 2 l i i=1 1 Examples: There is an instantaneous binary code with lengths 1, 2, 3, 3, since 1/2 + 1/4 + 1/8 + 1/8 = 1 An example of such a code is {0, 10, 110, 111}. There is an instantaneous binary code with lengths 2, 2, 2, since 1/4 + 1/4 + 1/4 < 1 An example of such a code is {00, 10, 01}.

Implications for Instantaneous and Uniquely Decodable Codes Combining Kraft s and McMillan s inequalities, we conclude that there is an instantaneous binary code with lengths l 1,..., l I if and only if there is a uniquely decodable code with these lengths. Implication: There is probably no practical benefit to using uniquely decodable codes that aren t instantaneous.

Proving the Two Inequalities We can prove both Kraft s and McMillan s inequality by proving that for any set of lengths, l 1,..., l I, for binary codewords: A) If B) If I i=1 1/2 l i 1, we can construct an instantaneous code with codewords having these lengths. I i=1 1/2 l i > 1, there is no uniquely decodable code with codewords having these lengths. (A) is half of Kraft s inequality. (B) is half of McMillan s inequality. Since instantaneous codes are uniquely decodable, we also see that (A) gives the other half of McMillan s inequality, and (B) gives the other half of Kraft s inequality.

Visualizing Prefix Codes as Trees We can view codewords of an instantaneous (prefix) code as leaves of a tree. The root represents the null string; each branch corresponds to adding another code symbol. Here is the tree for a code with codewords 0, 11, 100, 101: 0 NULL 1 10 100 101 11

Extending the Tree to Maximum Depth We can extend the tree to the depth of the longest codeword. Each codeword then corresponds to a subtree. The extension of the previous tree, with each codeword s subtree circled: NULL 0 1 00 01 10 11 000 001 010 011 100 101 110 111 Short codewords occupy more of the tree. For a binary code, the fraction of leaves taken by a codeword of length l is 1/2 l.

Constructing Instantaneous Codes When the Inequality Holds Suppose that Kraft s Inequality holds: I i=1 1 2 l i 1 Order the lengths so l 1 l I. In the binary tree with depth l I, how can we allocate subtrees to codewords with these lengths? We go from shortest to longest, i = 1,..., I: 1) Pick a node at depth l i that isn t in a subtree previously used, and let the code for codeword i be the one at that node. 2) Mark all nodes in the subtree headed by the node just picked as being used, and not available to be picked later. Will there always be a node available in step (1) above?

Why the Construction Will be Possible If Kraft s inequality holds, we will always be able to do this. To begin, there are 2 l b nodes at depth l b. When we pick a node at depth l a, the number of nodes that become unavailable at depth l b (assumed not less than l a ) is 2 l b l a. When we need to pick a node at depth l j, after having picked earlier nodes at depths l i (with i < j and l i l j ), the number of nodes left to pick from will be 2 l j j 1 i=1 2 l j l i = 2 l j [ 1 j 1 i=1 1 2 l i ] > 0 Since j 1 i=1 1 2 l i < I i=1 1 2 l i 1, by assumption.

Why Uniquely Decodable Codes Must Obey the Inequality Let l 1 l I be the codeword lengths. Define K = For any positive integer n, [ I K n = i=1 1 2 l i I i=1 ] n = i 1,...,i n 1 2 l i 1 1 2 l in 1 2 l i. The sum is over all combinations of values for i 1,..., i n in {1,..., I}. Let s rewrite this in terms of possible values for j = l i1 + + l in : K n = nl I j=1 N j,n 2 j N j,n is the number of sequences of n codewords that have total length j. If the code is uniquely decodable, N j,n 2 j, so K n nl I, which for big enough n is possible only if K 1.