1 The Gaussian channel

ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise. We will derive the famous capacity formula. The Gaussia chael Suppose we sed iformatio over a chael that is subjected to additive white Gaussia oise. The the output is Y i = X i + Z i Y i is the chael output, X i is the chael iput, ad Z i is zero-mea Gaussia with variace N: Z i N (0, N). This is differet from chael models we saw before, i that the output ca take o a cotiuum of values. This is also a good model for a variety of practical commuicatio chaels. We will assume that there is a costrait o the iput power. If we have a iput codeword (x, x 2,..., x ), we will assume that the average power is costraied so that x 2 i P Let is cosider the probability of error for biary trasmissio. Suppose that we ca sed either + P or P over the chael. The receiver looks at the received sigal amplitude ad determies the sigal trasmitted usig a threshold test. The P e = 2 P (Y < 0 X = + P ) + 2 P (Y > 0 X = P ) = 2 P (Z < P X = + P ) + 2 P (Z > P X = P ) = P (Z > P ) = e x2 /2N dx P 2πN = Q( P/N) = Φ( P/N) or Defiitio costrait is Q(x) = e x2 /2 dx 2π x Φ(x) = 2π x e x2 /t dx The iformatio capacity of the Gaussia chael with power C = max I(X; Y ). p(x):ex 2 P

ECE 77: Lecture 0 The Gaussia chael 2 We ca compute this as follows: I(X; Y ) = h(y ) h(y X) = h(y ) h(x + Z X) = h(y ) h(z X) = h(y ) h(z) 2 log 2πe(P + N) log 2πeN 2 = log( + P/N) 2 sice EY 2 = P + N ad the Gaussia is the maximum-etropy distributio for a give variace. So C = log( + P/N), 2 bits per chael use. The maximum is obtaied whe X is Gaussia distributed. (How do we make the iput distributio look Gaussia?) Defiitio 2 A (M, ) code for the Gaussia chael with power costrait P cosists of the followig:. A idex set {, 2,..., M} 2. A ecodig fuctio x : {,..., M} X, which maps a iput idex ito a sequece that is elemets log, x (), x (2),..., x (M), such that the average power costraits is satisfied: for w =, 2,..., M. (x i (w)) 2 P 3. A decodig fuctio g : Y {, 2,..., M}. Defiitio 3 A rate R is said to be achievable for a a Gaussia chael with a power costrait P if there exists a sequece of (2 R, ) codes with codewords satisfyig the power costrait such that the maximal probability of error λ () 0. The capacity of the chael is the supremum of the achievable rates. Theorem The capacity of a Gaussia chael with power costrait P ad oise variace N is C = ( 2 log + P ) bits per trasmissio. N Geometric plausibility For a codeword of legth, the received vector (i space) is ormally distributed with mea equal to the true codeword. With high probability, the received vector is cotaied i sphere about the mea of radius (N + ɛ). Why? Because with high probability, the vector falls withi oe stadard deviatio away from the mea i each directio, ad the total distace away is the Euclidea sum: E[z 2 + z 2 2 + z 2 ] = N.

ECE 77: Lecture 0 The Gaussia chael 3 This is the square of the expected distace withi which we expect to fall. If we assig everythig withi this sphere to the give codeword, we misdetect oly if we fall outside this codeword. Other codewords will have other spheres, each with radius approximately (N + ɛ). The received vectors a limited i eergy by P, so they all must lie i a sphere of radius (P + N). The umber of (approximately) oitersectig decodig spheres is therefore umber of spheres volume of sphere i -space with radius r = (P + N) volume of sphere i -space with radius r = (N + ɛ) The volume of a sphere of radius r i space is proportioal to r. Substitutig i this fact we get umber of spheres ((P + N))/2 ((N + ɛ)) /2 2 2 (+ P N ) Proof We will follow essetially the same steps as before.. First we geerate a codebook at radom. This time we geerate the codebook accordig to the Gaussia distributio: let X i (w), i =, 2,..., be the code sequece correspodig to iput idex w, each X i (w) is selected at radom i.i.d. accordig to N (0, P ɛ). (With high probability, this has average power P.) The codebook is kow by both trasmitter ad receiver. 2. Ecode as described above. 3. The receiver gets a Y, ad looks at the list of codewords {X (w)} ad searches for oe which is joitly typical with the received vector. If there is oly oe such vector, it is declared as the trasmitted vector. If there is more tha oe such vector, a error is declared. A error is also declared if the chose codeword does ot satisfy the power costrait. For the probability of error, assume w.o.l.o.g. that codeword is set: Defie the followig evets: Y = X () + Z E 0 = { Xi 2 () > P } (the evet that the codeword exceeds the power costrait) ad The probability of error is the E i = {(X (i), Y ) is i A () ɛ } P (E) = P (E 0 E c E 2 E 3 E 2 R) 2 R P (E 0 ) + P (E) c + P (E i ) i=2 uio boud By LLN, P (E 0 ) 0. By joit AEP, P (E c ) 0, so P (E c ) ɛ for sufficietly large. By the code geeratio process, X () ad X (i) are idepedet, so are

ECE 77: Lecture 0 The Gaussia chael 4 Y ad X (i), i. So the probability that X () ad Y are joitly typical is 2 (I(X;Y ) 3ɛ) by joit AEP. So 2 R e ɛ + ɛ + P () i=2 (I(X;Y ) 3ɛ) 2 (I(X;Y ) 3ɛ) 2ɛ + (2 R )2 = 2ɛ + 2 R 2 (I(X;Y ) 3ɛ) 3ɛ for sufficietly large, if R < I(X; Y ) 3ɛ. This gives the average probability of error: we the go through the same kids of argumets as before to coclude that the maximum probability of error also must go to zero. The coverse is that rate R > C are ot achievable, or, equivaletly, that if P () e 0 the it must be that R C. Proof The proof starts with Fao s iequality: H(W Y ) + RP () e ɛ = ad ɛ 0 as. The proof is a strig of iequalities: + RP () e = ɛ R = H(W ) = I(W ; Y ) + H(W Y ) uiform W ; defiitio of I I(W ; Y ) + ɛ Fao s iequality = h(y ) h(y X ) + ɛ = h(y ) h(z ) + ɛ h(y i ) h(z ) + ɛ = h(y i ) h(z i ) + ɛ 2 log 2πe(P i + N) 2 log 2πeN + ɛ etropies of Y ad Z; power costrait = 2 log( + P i/n) + ɛ ( ) = log( + P i /N) + ɛ log( + P i /N) + ɛ Jese s 2 log( + P/N) + ɛ. Dividig through by, R 2 log( + P/N) + ɛ.

ECE 77: Lecture 0 The Gaussia chael 5 2 Bad-limited chaels We ow come to the first time i the book the iformatio is actually carried by a time-waveform, istead of a radom variable. We will cosider trasmissio over a bad-limited chael (such as a phoe chael). A key result is the samplig theorem: Theorem 2 If f(t) is badlimited to W Hz, the the fuctio is completely determied by samples of the fuctio take every 2W secods apart. This is the classical Nyquist samplig theorem. However, Shao s ame is also attached to it, sice he provided a proof ad used it. A represetatio of the fuctio f(t) is f(t) = sic(t) = f( 2W ) sic(t 2W ) si(2πw t) 2πW t From this theorem, we coclude (the dimesioality theorem) that a badlimited fuctio has oly 2W degrees of freedom per secod. For a sigal which has most of the eergy i badwidth W ad most of the eergy i a time T, the there are about 2W T degrees of freedom, ad the time- ad bad-limited fuctio ca be represeted usig 2W T orthogoal basis fuctios, kow as the prolate spheroidal fuctios. We ca view bad- ad timelimited fuctios as vectors i a 2T W dimesioal vector space. Assume that the oise power-spectral desity of the chael is N 0 /2. The the oise power is (N 0 /2)(2W ) = N 0 W. Over the time iterval of T secods, the eergy per sample (per chael use) is Use this iformatio i the capacity: P T 2W T = P 2W. C = 2 log( + P ) bits per chael use N = 2 log( + P ) bits per chael use. N 0 W There are 2W samples each secod (chael uses), so the capacity is or C = (2W ) 2 log( + P N 0 W ) bits/secod C = W log( + P N 0 W ) This is the famous ad key result of iformatio theory. As W, we have to do a little calculus to fid that C = P N 0 log 2 e bits per secod.

ECE 77: Lecture 0 The Gaussia chael 6 This is iterestig: eve with ifiite badwidth, the capacity is ot ifiite, but grows liearly with the power. Example For a phoe chael, take W = 3300 Hz. If the SNR is P/N 0 W = 40dB = 0000, we get If P/W N 0 = 20dB = 00 we get C = 43850 bits per secod. C = 2972 bits/secod. (The book is dated.) We caot do better tha capacity! 3 Kuh-Tucker Coditios Before proceedig with the ext sectio, we eed a result from costraied optimizatio theory kow as the Kuh-Tucker coditio. Suppose we are miimizig some covex objective fuctio L(x), subject to a costrait mi L(x) f(x) 0. Let the optimal value of x be x 0. The either the costrait is iactive, i which case we get L = 0 x0 or, if the costrait is active, it must be the case that the objective fuctio icreases for all admissible values of x: L 0 x A A is the set of admissible values, for which f y 0. (Thik about what happes if this is ot the case.) Thus, or sg L We ca create a ew objective fuctio so the ecessary coditios become = sg f L + λ f = 0 λ 0. () J(x, λ) = L(x) + λf(x), J = 0

ECE 77: Lecture 0 The Gaussia chael 7 ad f(x) 0 λ { 0 f(y) = 0 costrait is active = 0 f(y) < 0 costrait is iactive. For a vector variable x, the the coditio () meas: L is parallel to f ad poitig i opposite directios, L is iterpreted as the gradiet. I words, what coditio () says is: the gradiet of L with respect to x at a miimum must be poited i such a way that decrease of L ca oly come by violatig the costraits. Otherwise, we could decrease L further. This is the essece of the Kuh-Tucker coditio. 4 Parallel Gaussia chaels Parallel Gaussia chaels are used to model badlimited chaels with a o-flat frequecy respose. We assume we have k Gaussia chaels, Y j = X j + Z j, j =, 2,..., k. Z j N (0, N j ) ad the chaels are idepedet. The total power used is costraied: E k Xj 2 P. j= Oe questio we might ask is: how do we distribute the power across the k chaels to get maximum throughput. We ca fid the maximum mutual iformatio (the iformatio chael capacity) as I(X,..., X k ; Y,..., Y k ) = h(y,..., Y k ) h(y,..., Y k X,..., X k ) = h(y,..., Y k ) h(z,..., Z k ) k = h(y,..., Y k ) h(z i ) k h(y i ) h(z i ) i 2 log( + P i/n i ) Equality is obtaied whe the Xs are idepedet ormally distributed. We wat to distribute the power available amog the various chaels, subject to ot exceedig the power costrait: J(P,..., P k ) = i 2 log( + P i N i ) + λ k P i

ECE 77: Lecture 0 The Gaussia chael 8 with a side costrait (ot show) that P i 0. Differetial w.r.t. P j to obtai P j + N j + λ 0. with equality oly if all the costraits are iactive. After some fiddlig, we obtai P j = ν N j (sice λ is a costat). However, we must also have P j 0, so we must esure that we do t violate that if N j > ν. Thus, we let P j = (ν N j ) + ad ν is chose so that (x) + = { x x 0 0 x < 0 (ν N i ) + = P Draw picture; explai water fillig. 5 Chaels with colored Gaussia oise We will exted the results of the previous sectio ow to chaels with o-white Gaussia oise. Let K z be the covariace of the oise K x the covariace of the iput, with the iput costraied by EXi 2 P i which is the same as We ca write tr(k X) P. I(X,..., X ; Y,..., Y ) = h(y,..., Y ) h(z,..., Z ) h(y,..., Y ) 2 log((2πe) K x + K z ) Now how do we choose K x to maximize K x + K z, subject to the power costrait? Let the K z = QΛQ T K x + K z = K x + QΛQ T = Q Q T K x Q + Λ Q T = Q T K x Q + Λ = A + λ

ECE 77: Lecture 0 The Gaussia chael 9 A = Q T K x Q. Observe that tr(a) = tr(q T K x Q) = tr(q T QK x ) = tr(k x ) So we wat to maximize A + Λ subject to tr(a) P. The key is to use a iequality, i this case Hadamard s iequality. Hadamard s iequality follows directly from the coditioig reduces etropy theorem: Let X N (0, K). The ad Substitutig i ad simplifyig gives h(x,..., X ) h(x i ). h(x) = 2 log(2πe) K h(x i ) = 2 log(2πe)k ii K i K ii with equality iff K is diagoal. Gettig back to our problem, A + Λ i (A ii + Λ ii ) with equality iff A is diagoal. We have A ii P (the power costrait), ad A ii 0. As before, we take ν is chose so that i A ii = (ν λ i ) + Aii = P. Now we wat to geeralize to a cotiuous time system. For a chael with AWGN ad covariace matrix K () Z, the covariace is Toeplitz. If the chael oise process is statioary, the the covariace matrix is Toeplitz, ad the eigevalues of the covariace matrix ted to a limit as. The desity of the eigevalues o the real lie teds to the power spectrum of the stochastic process. That is, if K ij = K i j are the autocorrelatio values ad the power spectrum is the S(ω) = F[r k ] λ + λ 2 + + λ M lim M M = π S(ω)dω. 2π π I this case, the water fillig traslates to water fillig i the spectral domai. The capacity of the chael with oise spectrum N(f) ca be show to be (ν N(f))+ C = log( + )df 2 N(f) ν is chose so that (ν N(f)) + df = P