Polarization codes and the rate of polarization



Similar documents
x a x 2 (1 + x 2 ) n.

CS 3719 (Theory of Computation and Algorithms) Lecture 4

The Ergodic Theorem and randomness

Lecture 13 - Basic Number Theory.

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

1 The Line vs Point Test

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range

I. Pointwise convergence

Notes on Complexity Theory Last updated: August, Lecture 1

CHAPTER 6. Shannon entropy

Monte Carlo-based statistical methods (MASM11/FMS091)

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Lecture 1: Course overview, circuits, and formulas

Basics of Statistical Machine Learning

THE BANACH CONTRACTION PRINCIPLE. Contents

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

arxiv: v1 [math.pr] 5 Dec 2011

Prime numbers and prime polynomials. Paul Pollack Dartmouth College

Offline sorting buffers on Line

Introduction to Markov Chain Monte Carlo

Theory of Computation Chapter 2: Turing Machines

Coding and decoding with convolutional codes. The Viterbi Algor

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Inverse Functions and Logarithms

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

Arkansas Tech University MATH 4033: Elementary Modern Algebra Dr. Marcel B. Finan

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Lectures 5-6: Taylor Series

4 Lyapunov Stability Theory

10.2 Series and Convergence

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

A Practical Scheme for Wireless Network Operation

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

3 Some Integer Functions

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Notes on metric spaces

Fixed Point Theorems

Notes on Continuous Random Variables

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

1 Prior Probability and Posterior Probability

Data Streams A Tutorial

Lecture 7: Continuous Random Variables

Deterministic Finite Automata

Lecture 2: Universality

1. Prove that the empty set is a subset of every set.

2. Linear regression with multiple regressors

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Metric Spaces. Chapter Metrics

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

The Goldberg Rao Algorithm for the Maximum Flow Problem

Introduction to General and Generalized Linear Models

Lecture 16 : Relations and Functions DRAFT

Monte Carlo-based statistical methods (MASM11/FMS091)

2.1 Complexity Classes

Math 319 Problem Set #3 Solution 21 February 2002

Practice with Proofs

Gambling and Data Compression

E3: PROBABILITY AND STATISTICS lecture notes

Scalar Valued Functions of Several Variables; the Gradient Vector

Welcome to... Problem Analysis and Complexity Theory , 3 VU

ELEC3028 Digital Transmission Overview & Information Theory. Example 1

Lecture 13: Martingales

6.2 Permutations continued

Lecture 4: BK inequality 27th August and 6th September, 2007

CHAPTER 2 Estimating Probabilities

AN ANALYSIS OF A WAR-LIKE CARD GAME. Introduction

Maximum Likelihood Estimation

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Mathematical Induction

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Reading 13 : Finite State Automata and Regular Expressions

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Notes from Week 1: Algorithms for sequential prediction

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

0 0 such that f x L whenever x a

I. INTRODUCTION. of the biometric measurements is stored in the database

Basics of information theory and information complexity

Practical Guide to the Simplex Method of Linear Programming

A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching

A Network Flow Approach in Cloud Computing

Collinear Points in Permutations

Transcription:

Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008

Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. W X n F n 2 Y n

Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. W X n F n 2 Y n

Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. U n F n 2 G n : F n 2 Fn 2 X n F n 2 W Y n

Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. U n F n 2 G n : F n 2 Fn 2 X n F n 2 W Y n

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) = i I (W ) I (U n ; Y n ) = i = i I (U i ; Y n U i 1 ) I (U i ; Y n U i 1 ) Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) = i I (W ) I (U n ; Y n ) = i = i I (U i ; Y n U i 1 ) I (U i ; Y n U i 1 ) Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) = i I (W ) I (U n ; Y n ) = i = i I (U i ; Y n U i 1 ) I (U i ; Y n U i 1 ) Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) = i I (W ) I (U n ; Y n ) = i = i I (U i ; Y n U i 1 ) I (U i ; Y n U i 1 ) Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (X i ; Y i ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) 0 0 I (W ) 1 ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (X i ; Y i ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) 0 0 I (W ) 1 ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (X i ; Y i ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) 0 0 I (W ) 1 ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) 0 0 1 ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 1 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 4 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 8 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 12 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 20 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) 0 0 1 ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

Channel Polarization We say channel polarization takes place if it is the case that almost all of the numbers I (U i ; Y n U i 1 ) are near the extremal values, i.e., if 1 n #{ i : I (U i ; Y n U i 1 ) 1 } I (W ) and 1 n #{ i : I (U i ; Y n U i 1 ) 0 } 1 I (W ).

Channel Polarization We say channel polarization takes place if it is the case that almost all of the numbers I (U i ; Y n U i 1 ) are near the extremal values, i.e., if 1 n #{ i : I (U i ; Y n U i 1 ) 1 } I (W ) and 1 n #{ i : I (U i ; Y n U i 1 ) 0 } 1 I (W ).

Channel Polarization If polarization takes place and we wish to communicate at rate R: Pick n, and k = nr good indices i such that I (U i ; Y n U i 1 ) is high, let the transmitter set U i to be uncoded binary data for good indices, and set U i to random but publicly known values for the rest, let the receiver decode the U i successively: U 1 from Y n ; U i from Y n Û i 1. One would expect this scheme to do do well as long as R < I (W ).

Channel Polarization If polarization takes place and we wish to communicate at rate R: Pick n, and k = nr good indices i such that I (U i ; Y n U i 1 ) is high, let the transmitter set U i to be uncoded binary data for good indices, and set U i to random but publicly known values for the rest, let the receiver decode the U i successively: U 1 from Y n ; U i from Y n Û i 1. One would expect this scheme to do do well as long as R < I (W ).

Channel Polarization If polarization takes place and we wish to communicate at rate R: Pick n, and k = nr good indices i such that I (U i ; Y n U i 1 ) is high, let the transmitter set U i to be uncoded binary data for good indices, and set U i to random but publicly known values for the rest, let the receiver decode the U i successively: U 1 from Y n ; U i from Y n Û i 1. One would expect this scheme to do do well as long as R < I (W ).

Channel Polarization If polarization takes place and we wish to communicate at rate R: Pick n, and k = nr good indices i such that I (U i ; Y n U i 1 ) is high, let the transmitter set U i to be uncoded binary data for good indices, and set U i to random but publicly known values for the rest, let the receiver decode the U i successively: U 1 from Y n ; U i from Y n Û i 1. One would expect this scheme to do do well as long as R < I (W ).

Channel Polarization If polarization takes place and we wish to communicate at rate R: Pick n, and k = nr good indices i such that I (U i ; Y n U i 1 ) is high, let the transmitter set U i to be uncoded binary data for good indices, and set U i to random but publicly known values for the rest, let the receiver decode the U i successively: U 1 from Y n ; U i from Y n Û i 1. One would expect this scheme to do do well as long as R < I (W ).

Polarization HowTo Given two copies of a binary input channel P : F 2 Y X 1 P Y 1 X 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u 2 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

Polarization HowTo Given two copies of a binary input channel P : F 2 Y U 1 + P Y 1 U 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u 2 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

Polarization HowTo Given two copies of a binary input channel P : F 2 Y U 1 + P Y 1 U 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u 2 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

Polarization HowTo Given two copies of a binary input channel P : F 2 Y U 1 + P Y 1 U 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u 2 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

Polarization HowTo Given two copies of a binary input channel P : F 2 Y U 1 + P Y 1 U 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u 2 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

Polarization HowTo Given two copies of a binary input channel P : F 2 Y U 1 + P Y 1 U 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u 2 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

Polarization HowTo Observe that [ U1 U 2 ] = [ 1 1 0 1 With independent, uniform U 1, U 2, Thus, ] [ X1 X 2 I (P ) = I (U 1 ; Y 1 Y 2 ), ]. I (P + ) = I (U 2 ; Y 1 Y 2 U 1 ). I (P ) + I (P + ) = I (U 1 U 2 ; Y 1 Y 2 ) = I (X 1 ; Y 1 ) + I (X 2 ; Y 2 ) = 2I (P), and I (P ) I (P) I (P + ).

Polarization HowTo Observe that [ U1 U 2 ] = [ 1 1 0 1 With independent, uniform U 1, U 2, Thus, ] [ X1 X 2 I (P ) = I (U 1 ; Y 1 Y 2 ), ]. I (P + ) = I (U 2 ; Y 1 Y 2 U 1 ). I (P ) + I (P + ) = I (U 1 U 2 ; Y 1 Y 2 ) = I (X 1 ; Y 1 ) + I (X 2 ; Y 2 ) = 2I (P), and I (P ) I (P) I (P + ).

Polarization HowTo Observe that [ U1 U 2 ] = [ 1 1 0 1 With independent, uniform U 1, U 2, Thus, ] [ X1 X 2 I (P ) = I (U 1 ; Y 1 Y 2 ), ]. I (P + ) = I (U 2 ; Y 1 Y 2 U 1 ). I (P ) + I (P + ) = I (U 1 U 2 ; Y 1 Y 2 ) = I (X 1 ; Y 1 ) + I (X 2 ; Y 2 ) = 2I (P), and I (P ) I (P) I (P + ).

Polarization HowTo Observe that [ U1 U 2 ] = [ 1 1 0 1 With independent, uniform U 1, U 2, Thus, ] [ X1 X 2 I (P ) = I (U 1 ; Y 1 Y 2 ), ]. I (P + ) = I (U 2 ; Y 1 Y 2 U 1 ). I (P ) + I (P + ) = I (U 1 U 2 ; Y 1 Y 2 ) = I (X 1 ; Y 1 ) + I (X 2 ; Y 2 ) = 2I (P), and I (P ) I (P) I (P + ).

Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

Polarization HowTo We now have a one-to-one transformation G n between U 1,..., U n and X 1,..., X n for n = 2 l. Furthermore, the n = 2 l quantities are exactly the n quantities I (W b 1...,b l ), b j {+, } I (U i ; Y n U i 1 ), i = 1,..., n whose empirical distribution we seek. This suggests: Set B 1, B 2,..., to be i.i.d., {+, } valued, uniformly distributed random variables, Define the random variable I l = I (W B 1,...,B l ). Study the distribution of I l.

Polarization HowTo We now have a one-to-one transformation G n between U 1,..., U n and X 1,..., X n for n = 2 l. Furthermore, the n = 2 l quantities are exactly the n quantities I (W b 1...,b l ), b j {+, } I (U i ; Y n U i 1 ), i = 1,..., n whose empirical distribution we seek. This suggests: Set B 1, B 2,..., to be i.i.d., {+, } valued, uniformly distributed random variables, Define the random variable I l = I (W B 1,...,B l ). Study the distribution of I l.

Polarization HowTo We now have a one-to-one transformation G n between U 1,..., U n and X 1,..., X n for n = 2 l. Furthermore, the n = 2 l quantities are exactly the n quantities I (W b 1...,b l ), b j {+, } I (U i ; Y n U i 1 ), i = 1,..., n whose empirical distribution we seek. This suggests: Set B 1, B 2,..., to be i.i.d., {+, } valued, uniformly distributed random variables, Define the random variable I l = I (W B 1,...,B l ). Study the distribution of I l.

Polarization HowTo We now have a one-to-one transformation G n between U 1,..., U n and X 1,..., X n for n = 2 l. Furthermore, the n = 2 l quantities are exactly the n quantities I (W b 1...,b l ), b j {+, } I (U i ; Y n U i 1 ), i = 1,..., n whose empirical distribution we seek. This suggests: Set B 1, B 2,..., to be i.i.d., {+, } valued, uniformly distributed random variables, Define the random variable I l = I (W B 1,...,B l ). Study the distribution of I l.

Polarization HowTo We now have a one-to-one transformation G n between U 1,..., U n and X 1,..., X n for n = 2 l. Furthermore, the n = 2 l quantities are exactly the n quantities I (W b 1...,b l ), b j {+, } I (U i ; Y n U i 1 ), i = 1,..., n whose empirical distribution we seek. This suggests: Set B 1, B 2,..., to be i.i.d., {+, } valued, uniformly distributed random variables, Define the random variable I l = I (W B 1,...,B l ). Study the distribution of I l.

Polarization HowTo Properties of the process I l : I 0 = I (W ) is a constant. I l is lies in [0, 1], so is bounded. Conditional on B 1,..., B l, and with P := W B 1,...,B l, I l+1 can take only two values I (P ) and I (P + ), so, E[I l+1 B 1,..., B l ] = 1 2 [I (P ) + I (P + )] = I (P) = I l. and we see that {I l } is a martingale.

Polarization HowTo Properties of the process I l : I 0 = I (W ) is a constant. I l is lies in [0, 1], so is bounded. Conditional on B 1,..., B l, and with P := W B 1,...,B l, I l+1 can take only two values I (P ) and I (P + ), so, E[I l+1 B 1,..., B l ] = 1 2 [I (P ) + I (P + )] = I (P) = I l. and we see that {I l } is a martingale.

Polarization HowTo Properties of the process I l : I 0 = I (W ) is a constant. I l is lies in [0, 1], so is bounded. Conditional on B 1,..., B l, and with P := W B 1,...,B l, I l+1 can take only two values I (P ) and I (P + ), so, E[I l+1 B 1,..., B l ] = 1 2 [I (P ) + I (P + )] = I (P) = I l. and we see that {I l } is a martingale.

Polarization HowTo Properties of the process I l : I 0 = I (W ) is a constant. I l is lies in [0, 1], so is bounded. Conditional on B 1,..., B l, and with P := W B 1,...,B l, I l+1 can take only two values I (P ) and I (P + ), so, E[I l+1 B 1,..., B l ] = 1 2 [I (P ) + I (P + )] = I (P) = I l. and we see that {I l } is a martingale.

Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued!

Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued!

Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued!

Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued! I (P + ) I (P ) 1 2 0 1 I (P)

Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued! I (P + ) I (P ) 1 2 0 1 I (P)

Polarization Rate We have seen that polarization takes place. How fast? Idea 0: We have seen in the I (P + ) I (P ) vs I (P) graph that BSC is least polarizing. So we should be able to obtain a lower bound on the speed of polarization by studying BSC. However, if P is a BSC, P + and P are not. Thus the estimate based on this idea is turns out to be too pessimistic. Instead, we introduce an auxiliary quantity Z(P) = P(y 0)P(y 1) y as a companion to I (P).

Polarization Rate We have seen that polarization takes place. How fast? Idea 0: We have seen in the I (P + ) I (P ) vs I (P) graph that BSC is least polarizing. So we should be able to obtain a lower bound on the speed of polarization by studying BSC. However, if P is a BSC, P + and P are not. Thus the estimate based on this idea is turns out to be too pessimistic. Instead, we introduce an auxiliary quantity Z(P) = P(y 0)P(y 1) y as a companion to I (P).

Polarization Rate We have seen that polarization takes place. How fast? Idea 0: We have seen in the I (P + ) I (P ) vs I (P) graph that BSC is least polarizing. So we should be able to obtain a lower bound on the speed of polarization by studying BSC. However, if P is a BSC, P + and P are not. Thus the estimate based on this idea is turns out to be too pessimistic. Instead, we introduce an auxiliary quantity Z(P) = P(y 0)P(y 1) y as a companion to I (P).

Polarization Rate We have seen that polarization takes place. How fast? Idea 0: We have seen in the I (P + ) I (P ) vs I (P) graph that BSC is least polarizing. So we should be able to obtain a lower bound on the speed of polarization by studying BSC. However, if P is a BSC, P + and P are not. Thus the estimate based on this idea is turns out to be too pessimistic. Instead, we introduce an auxiliary quantity Z(P) = P(y 0)P(y 1) y as a companion to I (P).

Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z.

Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Z(P) I (P) Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z. 1 0 0 1

Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Z(P) I (P) Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z. 1 0 0 1

Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Z(P) I (P) Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z. 1 0 0 1

Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Z(P) I (P) Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z. 1 0 0 1

Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Z(P) I (P) Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z. 1 0 0 1

Polarization Rate Given a binary input channel W, just as for I l, define Z l = Z(W B 1,...,B l ). We know that Pr(Z l 0) = I (W ). It turns out that when Z l 0 it does so fast: Theorem For any β < 1/2, lim Pr(Z l < 2 2βl ) = I (W ). l This means that for any β < 1/2, as long as R < I (W ) the error probability of polarization codes decays to 0 faster than 2 nβ.

Polarization Rate Given a binary input channel W, just as for I l, define Z l = Z(W B 1,...,B l ). We know that Pr(Z l 0) = I (W ). It turns out that when Z l 0 it does so fast: Theorem For any β < 1/2, lim Pr(Z l < 2 2βl ) = I (W ). l This means that for any β < 1/2, as long as R < I (W ) the error probability of polarization codes decays to 0 faster than 2 nβ.

Polarization Rate Given a binary input channel W, just as for I l, define Z l = Z(W B 1,...,B l ). We know that Pr(Z l 0) = I (W ). It turns out that when Z l 0 it does so fast: Theorem For any β < 1/2, lim Pr(Z l < 2 2βl ) = I (W ). l This means that for any β < 1/2, as long as R < I (W ) the error probability of polarization codes decays to 0 faster than 2 nβ.

Polarization Rate Given a binary input channel W, just as for I l, define Z l = Z(W B 1,...,B l ). We know that Pr(Z l 0) = I (W ). It turns out that when Z l 0 it does so fast: Theorem For any β < 1/2, lim Pr(Z l < 2 2βl ) = I (W ). l This means that for any β < 1/2, as long as R < I (W ) the error probability of polarization codes decays to 0 faster than 2 nβ.

Polarization Rate Given a binary input channel W, just as for I l, define Z l = Z(W B 1,...,B l ). We know that Pr(Z l 0) = I (W ). It turns out that when Z l 0 it does so fast: Theorem For any β < 1/2, lim Pr(Z l < 2 2βl ) = I (W ). l This means that for any β < 1/2, as long as R < I (W ) the error probability of polarization codes decays to 0 faster than 2 nβ.

Polarization Rate Proof idea The subset of the probability space where Z l (ω) 0 has probability I (W ). For any ɛ > 0 there is an m and a subset of this set with probability at least I (W ) ɛ with Z l (ω) < 1/8 whenever l > m. Intersect this set with high probability event: that B i = + and B i = occur with almost equal frequency for i l. It is then easy to see that lim l P(log 2 Z l l) = I (W ).

Polarization Rate Proof idea The subset of the probability space where Z l (ω) 0 has probability I (W ). For any ɛ > 0 there is an m and a subset of this set with probability at least I (W ) ɛ with Z l (ω) < 1/8 whenever l > m. Intersect this set with high probability event: that B i = + and B i = occur with almost equal frequency for i l. It is then easy to see that lim l P(log 2 Z l l) = I (W ).

Polarization Rate Proof idea The subset of the probability space where Z l (ω) 0 has probability I (W ). For any ɛ > 0 there is an m and a subset of this set with probability at least I (W ) ɛ with Z l (ω) < 1/8 whenever l > m. Intersect this set with high probability event: that B i = + and B i = occur with almost equal frequency for i l. It is then easy to see that lim l P(log 2 Z l l) = I (W ).

Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 l

Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l l 0 l 1 = l 0

Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l l 0 2 1 2 l1 l 1 = l 0

Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l 0 +l 1 +l 2 l l 0 2 1 2 l1 l 1 = l 0 l 2 = 2 1 2 l1

Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l 0 +l 1 +l 2 l l 0 2 1 2 l1 l 1 = l 0 l 2 = 2 1 2 l1 2 1 2 l2

Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

Remarks Moreover: For symmetric channels the dummy symbols can be chosen to be constant, yielding a deterministic construction, the error probability guarantees are not based on simulation. They can be generalized to channels with Galois field input alphabets: a similar two-by-two construction, with same complexity and error probability bounds.

Remarks Moreover: For symmetric channels the dummy symbols can be chosen to be constant, yielding a deterministic construction, the error probability guarantees are not based on simulation. They can be generalized to channels with Galois field input alphabets: a similar two-by-two construction, with same complexity and error probability bounds.

Remarks Moreover: For symmetric channels the dummy symbols can be chosen to be constant, yielding a deterministic construction, the error probability guarantees are not based on simulation. They can be generalized to channels with Galois field input alphabets: a similar two-by-two construction, with same complexity and error probability bounds.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.

Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.