Polarization codes and the rate of polarization

Transcription

1 Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008

2 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. W X n F n 2 Y n

3 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. W X n F n 2 Y n

4 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. U n F n 2 G n : F n 2 Fn 2 X n F n 2 W Y n

5 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,..., X n ) {0, 1} n, in one-to-one correspondence with binary data (U 1,..., U n ) {0, 1} n. Observe that U i are i.i.d., uniform on {0, 1}. U n F n 2 G n : F n 2 Fn 2 X n F n 2 W Y n

6 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) = i I (W ) I (U n ; Y n ) = i = i I (U i ; Y n U i 1 ) I (U i ; Y n U i 1 ) Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

10 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (X i ; Y i ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) 0 0 I (W ) 1 ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

13 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

14 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 1 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

15 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

16 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = I (U i ; Y n U i 1 ) 0 i 0 1 n = 2 4 Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed. ζ

20 Channel Polarization I (U n ; Y n ) = I (X n ; Y n ) = i I (X i ; Y i ) #{i : I (U i ; Y n U i 1 ) ζ}/n 1 = i I (W ) I (U n ; Y n ) = i I (U i ; Y n U i 1 ) = i I (U i ; Y n U i 1 ) ζ Notation: I (P) denotes the mutual information between the input and output of a channel P when input is uniformly distributed.

21 Channel Polarization We say channel polarization takes place if it is the case that almost all of the numbers I (U i ; Y n U i 1 ) are near the extremal values, i.e., if 1 n #{ i : I (U i ; Y n U i 1 ) 1 } I (W ) and 1 n #{ i : I (U i ; Y n U i 1 ) 0 } 1 I (W ).

22 Channel Polarization We say channel polarization takes place if it is the case that almost all of the numbers I (U i ; Y n U i 1 ) are near the extremal values, i.e., if 1 n #{ i : I (U i ; Y n U i 1 ) 1 } I (W ) and 1 n #{ i : I (U i ; Y n U i 1 ) 0 } 1 I (W ).

23 Channel Polarization If polarization takes place and we wish to communicate at rate R: Pick n, and k = nr good indices i such that I (U i ; Y n U i 1 ) is high, let the transmitter set U i to be uncoded binary data for good indices, and set U i to random but publicly known values for the rest, let the receiver decode the U i successively: U 1 from Y n ; U i from Y n Û i 1. One would expect this scheme to do do well as long as R < I (W ).

28 Polarization HowTo Given two copies of a binary input channel P : F 2 Y X 1 P Y 1 X 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

29 Polarization HowTo Given two copies of a binary input channel P : F 2 Y U 1 + P Y 1 U 2 P Y 2 consider the transformation above to generate two channels P : F 2 Y 2 and P + : F 2 Y 2 F 2 with P (y 1 y 2 u 1 ) = u P(y 1 u 1 + u 2 )P(y 2 u 2 ) P + (y 1 y 2 u 1 u 2 ) = 1 2 P(y 1 u 1 + u 2 )P(y 2 u 2 ).

34 Polarization HowTo Observe that [ U1 U 2 ] = [ With independent, uniform U 1, U 2, Thus, ] [ X1 X 2 I (P ) = I (U 1 ; Y 1 Y 2 ), ]. I (P + ) = I (U 2 ; Y 1 Y 2 U 1 ). I (P ) + I (P + ) = I (U 1 U 2 ; Y 1 Y 2 ) = I (X 1 ; Y 1 ) + I (X 2 ; Y 2 ) = 2I (P), and I (P ) I (P) I (P + ).

38 Polarization HowTo What we can do once, we can do many times: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ )....

44 Polarization HowTo We now have a one-to-one transformation G n between U 1,..., U n and X 1,..., X n for n = 2 l. Furthermore, the n = 2 l quantities are exactly the n quantities I (W b 1...,b l ), b j {+, } I (U i ; Y n U i 1 ), i = 1,..., n whose empirical distribution we seek. This suggests: Set B 1, B 2,..., to be i.i.d., {+, } valued, uniformly distributed random variables, Define the random variable I l = I (W B 1,...,B l ). Study the distribution of I l.

49 Polarization HowTo Properties of the process I l : I 0 = I (W ) is a constant. I l is lies in [0, 1], so is bounded. Conditional on B 1,..., B l, and with P := W B 1,...,B l, I l+1 can take only two values I (P ) and I (P + ), so, E[I l+1 B 1,..., B l ] = 1 2 [I (P ) + I (P + )] = I (P) = I l. and we see that {I l } is a martingale.

53 Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued!

56 Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued! I (P + ) I (P ) I (P)

57 Polarization HowTo Bounded martingales converge almost surely. Thus I := lim l I l exists with probability one. So we know that the empirical distribution we seek exists it is the distribution of I and E[I ] = I 0 = I (W ). Almost sure convergence of {I l } implies I l+1 I l 0 a.s. I l+1 I l = 1 2 [I (P+ ) I (P )]. But I (P + ) I (P ) < ɛ implies I (P) (δ, 1 δ). (The BEC and BSC turn out to be extremal among all binary input channels). Thus I is {0, 1} valued! I (P + ) I (P ) I (P)

58 Polarization Rate We have seen that polarization takes place. How fast? Idea 0: We have seen in the I (P + ) I (P ) vs I (P) graph that BSC is least polarizing. So we should be able to obtain a lower bound on the speed of polarization by studying BSC. However, if P is a BSC, P + and P are not. Thus the estimate based on this idea is turns out to be too pessimistic. Instead, we introduce an auxiliary quantity Z(P) = P(y 0)P(y 1) y as a companion to I (P).

62 Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z.

63 Polarization Rate Properties of Z(P): Z(P) [0, 1]. Z(P) 0 iff I (P) 1. Z(P) 1 iff I (P) 0. Z(P + ) = Z(P) 2. Z(P ) 2Z(P). Z(P) I (P) Note that Z(P) is the Bhattacharyya upper bound on probability of error for uncoded transmission over P. Thus, we can choose the good indices on the basis of Z(P); the sum of the Z s of the chosen channels will upper bound the block error probability. This suggests studying the polarization rate of Z

68 Polarization Rate Given a binary input channel W, just as for I l, define Z l = Z(W B 1,...,B l ). We know that Pr(Z l 0) = I (W ). It turns out that when Z l 0 it does so fast: Theorem For any β < 1/2, lim Pr(Z l < 2 2βl ) = I (W ). l This means that for any β < 1/2, as long as R < I (W ) the error probability of polarization codes decays to 0 faster than 2 nβ.

73 Polarization Rate Proof idea The subset of the probability space where Z l (ω) 0 has probability I (W ). For any ɛ > 0 there is an m and a subset of this set with probability at least I (W ) ɛ with Z l (ω) < 1/8 whenever l > m. Intersect this set with high probability event: that B i = + and B i = occur with almost equal frequency for i l. It is then easy to see that lim l P(log 2 Z l l) = I (W ).

76 Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 l

77 Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l l 0 l 1 = l 0

78 Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l l l1 l 1 = l 0

79 Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l 0 +l 1 +l 2 l l l1 l 1 = l 0 l 2 = l1

80 Polarization Rate Proof idea cont. log 2 Z l l 0 l 0 +l 1 l 0 +l 1 +l 2 l l l1 l 1 = l 0 l 2 = l l2

81 Remarks With the particular one-to-one mapping described here (Hadamard transform) and with the successive cancellation decoding polarization codes are I (W ) achieving, encoding complexity is n log n, decoding complexity is n log n, probability of error decays like 2 n. However, for small l their performance is surpassed by the usual suspects.

87 Remarks Moreover: For symmetric channels the dummy symbols can be chosen to be constant, yielding a deterministic construction, the error probability guarantees are not based on simulation. They can be generalized to channels with Galois field input alphabets: a similar two-by-two construction, with same complexity and error probability bounds.

90 Open Questions Many open questions. A few: Improve the error probability estimate to include rate dependence. Polarization appears to be a general phenomenon, how general? The restriction to Galois fields must be an artifact of the construction technique. Polarization based on combining more than 2 channels may yield stronger codes. Successive cancellation is too naive, different decoding algorithms should do better. Numerical evidence suggests scaling laws for the distribution of I l and Z l. Would be nice to have them.