1 The Gaussian channel



Similar documents
Convexity, Inequalities, and Norms

Maximum Likelihood Estimators.

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Overview of some probability distributions.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Properties of MLE: consistency, asymptotic normality. Fisher information.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008


Normal Distribution.

Section 11.3: The Integral Test

I. Chi-squared Distributions

1. C. The formula for the confidence interval for a population mean is: x t, which was

Hypothesis testing. Null and alternative hypotheses

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Lecture 4: Cheeger s Inequality

1 Computing the Standard Deviation of Sample Means

Theorems About Power Series

Sampling Distribution And Central Limit Theorem

Chapter 5: Inner Product Spaces

Confidence Intervals for One Mean

Sequences and Series

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Chapter 7 Methods of Finding Estimators

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

Infinite Sequences and Series

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Soving Recurrence Relations

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Determining the sample size

Universal coding for classes of sources

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

4.3. The Integral and Comparison Tests

Department of Computer Science, University of Otago

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Math C067 Sampling Distributions

Modified Line Search Method for Global Optimization

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

5 Boolean Decision Trees (February 11)

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Measures of Spread and Boxplots Discrete Math, Section 9.4

5: Introduction to Estimation

Plug-in martingales for testing exchangeability on-line

Confidence Intervals

CHAPTER 3 DIGITAL CODING OF SIGNALS

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

The Stable Marriage Problem

Output Analysis (2, Chapters 10 &11 Law)

Lecture 5: Span, linear independence, bases, and dimension

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

LECTURE 13: Cross-validation

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Domain 1 - Describe Cisco VoIP Implementations

Asymptotic Growth of Functions

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Basic Elements of Arithmetic Sequences and Series

Chapter 14 Nonparametric Statistics

Practice Problems for Test 3

Overview on S-Box Design Principles

3 Basic Definitions of Probability Theory

One-sample test of proportions

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

MARTINGALES AND A BASIC APPLICATION

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function

1 Correlation and Regression Analysis

CS103X: Discrete Structures Homework 4 Solutions

Basic Measurement Issues. Sampling Theory and Analog-to-Digital Conversion

NATIONAL SENIOR CERTIFICATE GRADE 12

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Statistical inference: example 1. Inferential Statistics

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

AP Calculus BC 2003 Scoring Guidelines Form B

A probabilistic proof of a binomial identity

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Descriptive Statistics

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Partial Di erential Equations

1. MATHEMATICAL INDUCTION

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Chapter 7: Confidence Interval and Sample Size

On the Capacity of Hybrid Wireless Networks

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Transcription:

ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise. We will derive the famous capacity formula. The Gaussia chael Suppose we sed iformatio over a chael that is subjected to additive white Gaussia oise. The the output is Y i = X i + Z i Y i is the chael output, X i is the chael iput, ad Z i is zero-mea Gaussia with variace N: Z i N (0, N). This is differet from chael models we saw before, i that the output ca take o a cotiuum of values. This is also a good model for a variety of practical commuicatio chaels. We will assume that there is a costrait o the iput power. If we have a iput codeword (x, x 2,..., x ), we will assume that the average power is costraied so that x 2 i P Let is cosider the probability of error for biary trasmissio. Suppose that we ca sed either + P or P over the chael. The receiver looks at the received sigal amplitude ad determies the sigal trasmitted usig a threshold test. The P e = 2 P (Y < 0 X = + P ) + 2 P (Y > 0 X = P ) = 2 P (Z < P X = + P ) + 2 P (Z > P X = P ) = P (Z > P ) = e x2 /2N dx P 2πN = Q( P/N) = Φ( P/N) or Defiitio costrait is Q(x) = e x2 /2 dx 2π x Φ(x) = 2π x e x2 /t dx The iformatio capacity of the Gaussia chael with power C = max I(X; Y ). p(x):ex 2 P

ECE 77: Lecture 0 The Gaussia chael 2 We ca compute this as follows: I(X; Y ) = h(y ) h(y X) = h(y ) h(x + Z X) = h(y ) h(z X) = h(y ) h(z) 2 log 2πe(P + N) log 2πeN 2 = log( + P/N) 2 sice EY 2 = P + N ad the Gaussia is the maximum-etropy distributio for a give variace. So C = log( + P/N), 2 bits per chael use. The maximum is obtaied whe X is Gaussia distributed. (How do we make the iput distributio look Gaussia?) Defiitio 2 A (M, ) code for the Gaussia chael with power costrait P cosists of the followig:. A idex set {, 2,..., M} 2. A ecodig fuctio x : {,..., M} X, which maps a iput idex ito a sequece that is elemets log, x (), x (2),..., x (M), such that the average power costraits is satisfied: for w =, 2,..., M. (x i (w)) 2 P 3. A decodig fuctio g : Y {, 2,..., M}. Defiitio 3 A rate R is said to be achievable for a a Gaussia chael with a power costrait P if there exists a sequece of (2 R, ) codes with codewords satisfyig the power costrait such that the maximal probability of error λ () 0. The capacity of the chael is the supremum of the achievable rates. Theorem The capacity of a Gaussia chael with power costrait P ad oise variace N is C = ( 2 log + P ) bits per trasmissio. N Geometric plausibility For a codeword of legth, the received vector (i space) is ormally distributed with mea equal to the true codeword. With high probability, the received vector is cotaied i sphere about the mea of radius (N + ɛ). Why? Because with high probability, the vector falls withi oe stadard deviatio away from the mea i each directio, ad the total distace away is the Euclidea sum: E[z 2 + z 2 2 + z 2 ] = N.

ECE 77: Lecture 0 The Gaussia chael 3 This is the square of the expected distace withi which we expect to fall. If we assig everythig withi this sphere to the give codeword, we misdetect oly if we fall outside this codeword. Other codewords will have other spheres, each with radius approximately (N + ɛ). The received vectors a limited i eergy by P, so they all must lie i a sphere of radius (P + N). The umber of (approximately) oitersectig decodig spheres is therefore umber of spheres volume of sphere i -space with radius r = (P + N) volume of sphere i -space with radius r = (N + ɛ) The volume of a sphere of radius r i space is proportioal to r. Substitutig i this fact we get umber of spheres ((P + N))/2 ((N + ɛ)) /2 2 2 (+ P N ) Proof We will follow essetially the same steps as before.. First we geerate a codebook at radom. This time we geerate the codebook accordig to the Gaussia distributio: let X i (w), i =, 2,..., be the code sequece correspodig to iput idex w, each X i (w) is selected at radom i.i.d. accordig to N (0, P ɛ). (With high probability, this has average power P.) The codebook is kow by both trasmitter ad receiver. 2. Ecode as described above. 3. The receiver gets a Y, ad looks at the list of codewords {X (w)} ad searches for oe which is joitly typical with the received vector. If there is oly oe such vector, it is declared as the trasmitted vector. If there is more tha oe such vector, a error is declared. A error is also declared if the chose codeword does ot satisfy the power costrait. For the probability of error, assume w.o.l.o.g. that codeword is set: Defie the followig evets: Y = X () + Z E 0 = { Xi 2 () > P } (the evet that the codeword exceeds the power costrait) ad The probability of error is the E i = {(X (i), Y ) is i A () ɛ } P (E) = P (E 0 E c E 2 E 3 E 2 R) 2 R P (E 0 ) + P (E) c + P (E i ) i=2 uio boud By LLN, P (E 0 ) 0. By joit AEP, P (E c ) 0, so P (E c ) ɛ for sufficietly large. By the code geeratio process, X () ad X (i) are idepedet, so are

ECE 77: Lecture 0 The Gaussia chael 4 Y ad X (i), i. So the probability that X () ad Y are joitly typical is 2 (I(X;Y ) 3ɛ) by joit AEP. So 2 R e ɛ + ɛ + P () i=2 (I(X;Y ) 3ɛ) 2 (I(X;Y ) 3ɛ) 2ɛ + (2 R )2 = 2ɛ + 2 R 2 (I(X;Y ) 3ɛ) 3ɛ for sufficietly large, if R < I(X; Y ) 3ɛ. This gives the average probability of error: we the go through the same kids of argumets as before to coclude that the maximum probability of error also must go to zero. The coverse is that rate R > C are ot achievable, or, equivaletly, that if P () e 0 the it must be that R C. Proof The proof starts with Fao s iequality: H(W Y ) + RP () e ɛ = ad ɛ 0 as. The proof is a strig of iequalities: + RP () e = ɛ R = H(W ) = I(W ; Y ) + H(W Y ) uiform W ; defiitio of I I(W ; Y ) + ɛ Fao s iequality = h(y ) h(y X ) + ɛ = h(y ) h(z ) + ɛ h(y i ) h(z ) + ɛ = h(y i ) h(z i ) + ɛ 2 log 2πe(P i + N) 2 log 2πeN + ɛ etropies of Y ad Z; power costrait = 2 log( + P i/n) + ɛ ( ) = log( + P i /N) + ɛ log( + P i /N) + ɛ Jese s 2 log( + P/N) + ɛ. Dividig through by, R 2 log( + P/N) + ɛ.

ECE 77: Lecture 0 The Gaussia chael 5 2 Bad-limited chaels We ow come to the first time i the book the iformatio is actually carried by a time-waveform, istead of a radom variable. We will cosider trasmissio over a bad-limited chael (such as a phoe chael). A key result is the samplig theorem: Theorem 2 If f(t) is badlimited to W Hz, the the fuctio is completely determied by samples of the fuctio take every 2W secods apart. This is the classical Nyquist samplig theorem. However, Shao s ame is also attached to it, sice he provided a proof ad used it. A represetatio of the fuctio f(t) is f(t) = sic(t) = f( 2W ) sic(t 2W ) si(2πw t) 2πW t From this theorem, we coclude (the dimesioality theorem) that a badlimited fuctio has oly 2W degrees of freedom per secod. For a sigal which has most of the eergy i badwidth W ad most of the eergy i a time T, the there are about 2W T degrees of freedom, ad the time- ad bad-limited fuctio ca be represeted usig 2W T orthogoal basis fuctios, kow as the prolate spheroidal fuctios. We ca view bad- ad timelimited fuctios as vectors i a 2T W dimesioal vector space. Assume that the oise power-spectral desity of the chael is N 0 /2. The the oise power is (N 0 /2)(2W ) = N 0 W. Over the time iterval of T secods, the eergy per sample (per chael use) is Use this iformatio i the capacity: P T 2W T = P 2W. C = 2 log( + P ) bits per chael use N = 2 log( + P ) bits per chael use. N 0 W There are 2W samples each secod (chael uses), so the capacity is or C = (2W ) 2 log( + P N 0 W ) bits/secod C = W log( + P N 0 W ) This is the famous ad key result of iformatio theory. As W, we have to do a little calculus to fid that C = P N 0 log 2 e bits per secod.

ECE 77: Lecture 0 The Gaussia chael 6 This is iterestig: eve with ifiite badwidth, the capacity is ot ifiite, but grows liearly with the power. Example For a phoe chael, take W = 3300 Hz. If the SNR is P/N 0 W = 40dB = 0000, we get If P/W N 0 = 20dB = 00 we get C = 43850 bits per secod. C = 2972 bits/secod. (The book is dated.) We caot do better tha capacity! 3 Kuh-Tucker Coditios Before proceedig with the ext sectio, we eed a result from costraied optimizatio theory kow as the Kuh-Tucker coditio. Suppose we are miimizig some covex objective fuctio L(x), subject to a costrait mi L(x) f(x) 0. Let the optimal value of x be x 0. The either the costrait is iactive, i which case we get L = 0 x0 or, if the costrait is active, it must be the case that the objective fuctio icreases for all admissible values of x: L 0 x A A is the set of admissible values, for which f y 0. (Thik about what happes if this is ot the case.) Thus, or sg L We ca create a ew objective fuctio so the ecessary coditios become = sg f L + λ f = 0 λ 0. () J(x, λ) = L(x) + λf(x), J = 0

ECE 77: Lecture 0 The Gaussia chael 7 ad f(x) 0 λ { 0 f(y) = 0 costrait is active = 0 f(y) < 0 costrait is iactive. For a vector variable x, the the coditio () meas: L is parallel to f ad poitig i opposite directios, L is iterpreted as the gradiet. I words, what coditio () says is: the gradiet of L with respect to x at a miimum must be poited i such a way that decrease of L ca oly come by violatig the costraits. Otherwise, we could decrease L further. This is the essece of the Kuh-Tucker coditio. 4 Parallel Gaussia chaels Parallel Gaussia chaels are used to model badlimited chaels with a o-flat frequecy respose. We assume we have k Gaussia chaels, Y j = X j + Z j, j =, 2,..., k. Z j N (0, N j ) ad the chaels are idepedet. The total power used is costraied: E k Xj 2 P. j= Oe questio we might ask is: how do we distribute the power across the k chaels to get maximum throughput. We ca fid the maximum mutual iformatio (the iformatio chael capacity) as I(X,..., X k ; Y,..., Y k ) = h(y,..., Y k ) h(y,..., Y k X,..., X k ) = h(y,..., Y k ) h(z,..., Z k ) k = h(y,..., Y k ) h(z i ) k h(y i ) h(z i ) i 2 log( + P i/n i ) Equality is obtaied whe the Xs are idepedet ormally distributed. We wat to distribute the power available amog the various chaels, subject to ot exceedig the power costrait: J(P,..., P k ) = i 2 log( + P i N i ) + λ k P i

ECE 77: Lecture 0 The Gaussia chael 8 with a side costrait (ot show) that P i 0. Differetial w.r.t. P j to obtai P j + N j + λ 0. with equality oly if all the costraits are iactive. After some fiddlig, we obtai P j = ν N j (sice λ is a costat). However, we must also have P j 0, so we must esure that we do t violate that if N j > ν. Thus, we let P j = (ν N j ) + ad ν is chose so that (x) + = { x x 0 0 x < 0 (ν N i ) + = P Draw picture; explai water fillig. 5 Chaels with colored Gaussia oise We will exted the results of the previous sectio ow to chaels with o-white Gaussia oise. Let K z be the covariace of the oise K x the covariace of the iput, with the iput costraied by EXi 2 P i which is the same as We ca write tr(k X) P. I(X,..., X ; Y,..., Y ) = h(y,..., Y ) h(z,..., Z ) h(y,..., Y ) 2 log((2πe) K x + K z ) Now how do we choose K x to maximize K x + K z, subject to the power costrait? Let the K z = QΛQ T K x + K z = K x + QΛQ T = Q Q T K x Q + Λ Q T = Q T K x Q + Λ = A + λ

ECE 77: Lecture 0 The Gaussia chael 9 A = Q T K x Q. Observe that tr(a) = tr(q T K x Q) = tr(q T QK x ) = tr(k x ) So we wat to maximize A + Λ subject to tr(a) P. The key is to use a iequality, i this case Hadamard s iequality. Hadamard s iequality follows directly from the coditioig reduces etropy theorem: Let X N (0, K). The ad Substitutig i ad simplifyig gives h(x,..., X ) h(x i ). h(x) = 2 log(2πe) K h(x i ) = 2 log(2πe)k ii K i K ii with equality iff K is diagoal. Gettig back to our problem, A + Λ i (A ii + Λ ii ) with equality iff A is diagoal. We have A ii P (the power costrait), ad A ii 0. As before, we take ν is chose so that i A ii = (ν λ i ) + Aii = P. Now we wat to geeralize to a cotiuous time system. For a chael with AWGN ad covariace matrix K () Z, the covariace is Toeplitz. If the chael oise process is statioary, the the covariace matrix is Toeplitz, ad the eigevalues of the covariace matrix ted to a limit as. The desity of the eigevalues o the real lie teds to the power spectrum of the stochastic process. That is, if K ij = K i j are the autocorrelatio values ad the power spectrum is the S(ω) = F[r k ] λ + λ 2 + + λ M lim M M = π S(ω)dω. 2π π I this case, the water fillig traslates to water fillig i the spectral domai. The capacity of the chael with oise spectrum N(f) ca be show to be (ν N(f))+ C = log( + )df 2 N(f) ν is chose so that (ν N(f)) + df = P