The Information Capacity of. Multiuser Channels. Mohammad Jafar Rezaeian. B.S. (Electronics), M.S. (Telecommunication)

Transcription

1 The Information Capacity of Multiuser Channels Mohammad Jafar Rezaeian B.S. (Electronics), M.S. (Telecommunication) Doctor of Philosophy Dissertation Institute for Telecommunications Research, The University of South Australia, May 2002

2 Contents Introduction. Some Simplied Network Models Overview of Thesis Discrete Memoryless Multiuser Channels 0 2. Single User Channel Time Varying DMC Multiple Access Channel Time Varying MAC Interference Channel The Strong Interference Channel Outer Bounds Inner Bounds Appendix Interference Channel Capacity Region 5 3. The Han and Kobayashi Region Capacity Region by a Limiting Expression An Alternative Limiting Expression Capacity Region Versus the Best Achievable Region i

3 4 Single User Decomposition of Multiuser Channels Model Extension for Single User Channel Multiuser Channel Analysis Using the SCMC Model An Upper Bound on Code Rates for SCMC Channel Capacity Region Outer Bound Outer Bound by a Limiting Expression Capacity Region Characterizations General Limiting Forms of Capacity Region General Single Letter Formula Multiple Letter Extension Example (Strong Interference Channel) The Equality of Expressions for Multiple Access Capacity Finite Memory Multiple Access Channel Non Existence of a Single Letter Description Conclusion 8 6. Future Work ii

4 List of Figures. Block diagram of communication system A simple network model Multiple access channel Two user interference channel Random coding for a TVDMC Discrete memoryless multiple access channel Achievable region for a given distribution P (X )P (X 2 ) Random coding for TVMAC Discrete memoryless interference channel A code for interference channel Channel decomposition for inner bound (2.7) Channel decomposition for inner bound (2.73) Channel decomposition for the Carlieal inner bound Channel decomposition for the Han and Kobayashi bound Communicating channels for the two receivers Factorization (4.2) iii

5 List of Abbreviations AEP DC DMAC DMC IC LHS MAC RHS SCMC SSC TVDMC TVMAC asymptotic equipartition property discrete channel discrete multiple access channel discrete memoryless channel Interference channel left hand side multiple access channel right hand side state conditioned memoryless channel state specied channel time variant discrete memoryless channel time variant multiple access channel iv

6 Notation X x X Upper case is used to indicate a random variable Lower case is used to indicate a realization of a random variable indicates a random vector (X ; X 2 ; ; X n ), n is implied x i The i-th element of vector x x [t] ^X X(i) The rst t elements of vector x Indicates a random matrix The i-th row of the random matrix ^X ~x Interchangeably, an l-vector or an element of ~X, l is implied ~x (~x ; ~x 2 ; ; ~x n ), where ~x i = (x i ; x i ; ; x il ) or ~x i 2 ~ X X Indicates a discrete set ~X Indicates a discrete set with cardinality X c jx j X Y X l The complement of the discrete set The cardinality of the discrete set Indicates Cartesian product of discrete sets The l-th cartesian product of a set X ln The set of all elements ~x (ie: ~X n or (X l ) n = X ln ) P (:) P V P (Y j X) P (y j x) ~ X = jxj l, l is implied A probability distribution on random variables, vectors or matrices A probability distribution on the set of random variables V A conditional probability function dened on the domain X and Y The evaluation of a predetermined conditional probability at a point. v

7 List of Symbols closure(:) co(:) C C HK C(:; :; :; :; :) C IC C MAC E [:] H(:) I(:; :) M P e q R + R(:) T t (:) T t (:) The closure of a continuous set The convex hull of a continuous set A codebook Han and Kobayashi achievable region for interference channel H & K region for specic cardinalities of auxiliary sets Capacity region for the interference channel Capacity region for the multiple access channel Expectation Entropy function Mutual information Message set Average probability of error Time sharing sequence Nonnegative real numbers A region in R 2 dened by the argument One dimensional empirical time distribution of the code The rst t-dimensional empirical distribution of the code X! Y! Z A Markov chain (A distribution P (X)P (Y j X)P (Z j Y ))!(y j x) P (A j x) A collection of probability assignment on y, for each value of x P y2a P (y j x) vi

8 (:; :; :) DC A discrete channel (:; :; :) DMC Discrete memoryless channel without feedback (:; :; :; :; :) TVDMC Time variant discrete memoryless channel (:; :; :; :; :; :; :) SCMC State conditioned memoryless channel (:; :; :; :) DMAC Discrete multiple access channel (:; :; :; :) MAC Discrete memoryless multiple access channel (:; :; :; :; :; :) TVMAC Time variant discrete memoryless multiple access channel (:; :; :; :; :) IC Discrete memoryless interference channel (Z j X; Y ) A probability distribution on Z that has all probability mass at one point depending on X and Y vii

9 Summary In this thesis, we investigate channel capacity for discrete memoryless multiuser channels. For point-to-point channels the memoryless assumption leads to a single letter formula for capacity. A single letter formula gives the capacity by the maximization of the mutual information between the input and output random variables in a single channel use. For multiuser channels, however, a single letter description of the capacity region has been found only for the multiple access channel, which is a multiuser channel that has one receiver decoding the messages of all transmitters. Finding single letter formulas for other multiuser channel models is a long-standing open problem in multiuser information theory. The signicance of single letter formula as compared to limiting expressions is mainly due to the direct conversion of such representations to functions of signal and noise power for the capacity of the corresponding continuous alphabet (eg: Gaussian noise) channels. The main focus in this thesis is the capacity region of the interference channel, which models simultaneous utilization of a common channel by a number of transmitters communicating with a number of receivers, where each transmitter intends only to send messages to some specic receivers. Although it is not yet proved that a single letter formula for the capacity of memoryless interference channel does not exist, extensive attempts over the past 30 years for obtaining such formulation motivate this idea. Most of the content of this thesis are side results in our attempt to nd a single letter solution for the capacity region of the interference channel. In viii

10 this thesis we provide some evidence supporting the non existence of a single letter solution for this channel. This evidence suggests that for memoryless multiuser channels, the eect of source memory of non-intended users for each receiver can introduce memory into the communicating links. Unless additional assumptions are made on the channel statistics, the resulting channel capacity is a limiting expression dened on innite set of random variables. The dierence in the characterization of capacity for the interference channel as compared to the multiple access channel corresponds to dierent classes of interference aecting communication in each of these channels. This categorization of interference for multiuser channels is introduced in this thesis and dierent eects of each type on the statistical properties of the communication links are examined. The limiting operator that appears in the formula for the channel capacity of the interference channel is a symbolic mathematical statement that exact computation of the capacity region requires computation of an innite number terms. In principle these limiting expressions can be used for approximation of the capacity region by nite, but suciently large terms of the sequence (like the solution of many dierential equations that have limiting expressions). In this dissertation we introduce new limiting expressions that converge faster to the capacity region of the memoryless interference channel as compared to previous ones. These limiting expressions use a sequence of regions that starts from the best known single -letter subset of the capacity region and monotonically approaches the capacity region. ix

11 Declaration I declare that this thesis does not incorporate without acknowledgment any material previously submitted for a degree or diploma in any university; and that to the best of knowledge it does not contain any materials previously published or written by another person except where due reference is made in the text. Mohammad Jafar Rezaeian x

12 Acknowledgment I would like to thank Dr. Alex Grant my research supervisor for his shrewd views and guidance throughout the course of this research and insightful advises during the completion of this thesis. Also my thanks to Dr. Gerhard Kramer, Bell Labs, Lucent Technologies, USA. for his exemplary comments in some discussions and sharing his knowledge on this subject with us. This work is greatly beneted by the nancial support of The Institute for Telecommunication Research and The Sir Ross and The Sir Keith Smith Center for Aviation Operation Research. The grant of the scholarship that covered most part of this educational period is highly appreciated, special thanks to Prof. Mike Miller and Prof. Bill Cowley for the recommendation and continuation of this grant. Also my thanks to Prof. Ken Lever and Prof. Daniel McMichael for their guidance on the rst work of my Ph.D. I am thankful of my family for their support and encouragement during this study period coincidental with the transition of immigration to our new home country, Australia. xi

13 Chapter Introduction Design problems always involve a set of constraints related to the physics of the problem as well as trade os between the dierent design parameters. Despite the primary communication engineering mindset that for a communication system the rate of information transmission through a channel is a trade o with the probability of error, in 948 C.E. Shannon showed [] that this rate is fundamentally constrained by channel capacity, which is related to the statistical behavior of the channel. In the basic communication system of Figure., the probability of error has a trade o with the complexity of encoding and decoding rather than rate, as long as rate is below the channel capacity. In the point to point communication model of Figure., the source selects a message out of a known set of messages. This message is transmitted by encoding it into a sequence of symbols which are sent over the channel. The channel does not noise Source Encoder Channel Decoder Destination Figure.: Block diagram of communication system

14 CHAPTER. INTRODUCTION 2 provide a deterministic and reversible replica of the transmitted signal at the output, but there is a known statistical dependency between the output and the input of channel. Using the knowledge of these statistics and the observation of the output sequence a message is estimated at the decoder for the destination. This estimation process has a probability of error due to the randomness of the input-output relation of the channel. While the reliability of a communication system depends on how much this error probability can be diminished, the eciency of communication system depends on the information rate through the channel, ie: how much information (here measured by the logarithm of the number of possible messages) can be transmitted by the same number of channel use (the encoded sequence length). A fundamental question in communication theory was this: Is reliability only achieved by losing eciency in channel use? While the answer seemed to be positive (because for lower information rates a reliable system design was simpler), Shannon's theory showed that the answer is negative. Shannon's contribution was to cast the communication model of Figure. into probability theory. By introducing new measures and concepts Shannon provided a mathematical framework for the analysis of this communication model. The basic concepts in this theory are the information content of a process (source entropy) and the information exchanged between two dependent processes (mutual information). Shannon showed that if we take care not to lose any information in the process of extracting information from the received signal at the decoder (by more information processing inside the decoder or smarter design of encoder-decoder) we can have any desirable degree of reliability for the system. Better reliability is achieved by more complex system design. However the information that can be transmitted to the other side of the channel is limited by the channel capacity. If we try to send more information than capacity, part of this information will be lost and cannot be retrieved by more information processing in the decoder. Therefore reliability to any desirable level may not be achieved for rates

15 CHAPTER. INTRODUCTION 3 more than capacity. In other words, this theory showed that error in a communication system comes as a result of two factors. Ignoring part of the information in the received signal in trade-o with the simplication of the process of extracting this information. Losing information when the source information exceeds the channel capacity. Below capacity there is no loss. If the constraint of capacity is observed at the input of the channel, reliability can be achieved by extracting all the information received at the output of the channel. Channel capacity is dened as the maximum information rate that can be transmitted through the channel where the receiver can detect messages with arbitrarily small probability of error. Channel capacity depends on the statistical behavior of the channel (for a discrete channel it is also restricted by the alphabet size of the input and output). An important aspect of these statistics aecting channel capacity is memory. Capacity calculation for memoryless channels is generally much simpler compared with channels with memory. For channels with memory a well designed system may exploit dependency for greater information transfer, thus extra (and far reaching) rate points can be augmented to the channel capacity. Capacity for such channels is found as a limit of an innite sequence. Each element of this innite sequence is found by maximizing a function over a set of probability distributions. In contrast, for memoryless channels computation of channel capacity requires only one such optimization. We call such formulations of capacity computable. Telecommunication systems have developed from the point to point model of Figure. to network models, where multiple users are communicating in the system. Network models are more versatile depending on the number of users and dierent scenarios for interconnection of these users; an example is shown in Figure.2. Ever increasing demand for fast, reliable and voluminous data transfer in today's networks

16 CHAPTER. INTRODUCTION 4 T User Receiver User 2 PSfrag replacements Receiver 2 User 3 T 2 Figure.2: A simple network model pushes system design to the limits imposed by available resources. In order to optimize data rate in these systems, designers need to know the fundamental limits of a channel in a multiuser environment. Channel capacity provides a benchmark that shows how much a system can be improved by either more information processing in the encoder and decoder or by smarter design of these system components. In a multiuser channel, a subset of transmitters (say T i ) communicate with a given (say i-th) receiver (Figure.2). We assume that each transmitter has a common message to all its recipients. The disturbing factors aecting the detection of each of the signals T i by the i-th receiver can be categorized as follows:. The internal noise (random behavior) of the channel. 2. The signals of the transmitters in the set T i. We call this multiple access interference. Therefore we do not consider broadcast channels, in which users may have dierent messages for each receiver.

17 CHAPTER. INTRODUCTION 5 3. The signals of the non intended transmitters T c i. We call this non intended user interference. In this thesis we reach the conclusion that multiple access interference statistics are similar to memoryless noise, but that non intended user interference behaves like noise with memory. Due to this memory characteristic, the derivation of capacity for multiuser channels (even with memoryless internal noise) can be computationally complex if for a receiver non intended user interference exists. The capacity of a multiuser channel is a region, called the capacity region, dened by all possible combinations of information rates that can be allocated to the dierent users, where each receiver can detect all intended users' messages with arbitrary small probability of error. This concept of capacity is called the information capacity of channel (sometimes called Shannon capacity) and is the subject of this thesis. In multiuser communication systems another concept also referred to by the term capacity is the constraint on the number of users that can use the system simultaneously at a given data rate. In contrast, for information capacity, the number of users is xed and the term capacity refers to the constraint on information rates that can be assigned to users such that they can communicate reliably with their intended receivers.. Some Simplied Network Models One possible network topology is the multiple access system depicted in Figure.3, where a number of transmitters communicate with a single receiver through a common channel. An example of such a network is the uplink of one cell of a mobile communication system in which the base station detects all mobile stations in the cell, ignoring intercell interference. Increasing number of users in a cell and demand for high data rate with limited available bandwidth, motivates ecient use of this resource. The capacity region for this system indicates the optimum possible sharing

18 CHAPTER. INTRODUCTION 6 User Receiver Figure.3: Multiple access channel of this resource between users. In the multiple access topology, there are no non-intended transmitters for the common receiver. The receiver is confronted with only the internal channel noise and multiple access interference. If the channel is memoryless, the capacity region can be computed by a so called single letter formula [2], dened by the mutual informations between the input and output random variables in a single channel use. The boundary of the capacity region is determined by an optimization process over an innite set of probability distributions. In this thesis the single letter capacity (which is already known) of the two user memoryless multiple access channel is discussed in detail. A more complicated topology that we can consider is the interference channel with multiple receivers and transmitters. Any receiver only intends to detect the messages of a specic subset of transmitters but all users' signals aect each received signal. The simplest such model is the two user interference channel depicted in Figure.4. This model was introduced by Shannon [3] in the context of two way channels. An example of a two user interference channel is a system of two neighboring geosynchronous satellites attempting to send independent messages to two separate earth stations, with the signals aected both by thermal noise and by interference

19 CHAPTER. INTRODUCTION 7 from each other. A more complex example of interference channel is the uplink of a set of neighboring cells of a mobile communication system where intercell interference is noticeable. noise User + Receiver (decodes user ) User 2 + noise Receiver 2 (decodes user 2) Figure.4: Two user interference channel Despite many attempts over the last 30 years, a single letter formula for the interference channel capacity region has not been found, not even for the simple two user case. So far, the only condition that yields a single letter formula is the unusual condition that the non intended interfering signals for any receiver are stronger than (or equal to) the intended user's signals [4], [5]. For the example of a mobile cellular communication system, this means that the received signal at any base station from all the neighboring cells are stronger than(or equal to) the received signal from the users inside that cell. Under such conditions, the receiver can decode the neighboring cells users' messages, even if it doesn't need them. For each receiver the whole system appears as a multiple access channel. In this thesis we speculate on some factors that may inhibit the single letter formulation for the capacity region of the interference channel. In contrast with the multiple access channel, an interference channel is aected by non intended interference in addition to the channel's internal noise and multiple access interference. Similar to channels with memory, the capacity region of the

20 CHAPTER. INTRODUCTION 8 interference channel is the limit of an innite sequence of regions where nding the boundary of each region requires an optimization process. The capacity region can be approximated by nite, but suciently large terms of this sequence. Already a limiting expression has been derived for the capacity region [2]. In this thesis we derive two other limiting sequence forms for the capacity region, and show that these regions converge faster to the capacity region, compared to the former limiting expression. There are dierent classes of (single or multi user) channel models that can be used for the analysis of communication systems [6]. For digital communication systems, the transmitter sends only one of a nite number of signals, and the discrete time discrete alphabet model is used to analysis the system. Moreover, in conjunction with rate distortion theory, one can approximate continuous signals with a suciently large nite alphabet and use the discrete channel model in the analysis of capacity for continuous communication systems. We consider only discrete time discrete alphabet channel models throughout this thesis..2 Overview of Thesis The remaining parts of this dissertation are organized in the following way. In Chapter 2 we give the mathematical description of some basic discrete memoryless channel models and concepts related to capacity. The two user multiple access channel and two user interference channel are the basic multiuser models that are presented in a unied framework consistent with and as a generalization of the single user memoryless channel. In this chapter we introduce the specic method that we use to prove direct and converse coding theorems for these basic models. This chapter provides the core denitions and logical conduct for our main objectives in the next chapters that target the capacity region of interference channel. It also comprises a literature review on the subject of the interference channel capacity region.

21 CHAPTER. INTRODUCTION 9 In Chapter 3 the main contribution to the problem of the capacity region of the interference channel is presented. We extend the best previous inner bound for the interference channel to a sequence of inner bounds that approach the capacity region. This sequence of regions denes a new limiting expression for the capacity region of interference channel. An alternative limit sequence is also derived using a convex hull operation. Comparison of the inner bound and the capacity region, highlights the shortcomings of the inner bound in representing the capacity region. In Chapter 4 we take another approach towards the problem of the interference channel capacity region by decomposing the channel to several single user channels. The objective is to target the core element that inhibits the derivation of a single letter capacity region for the interference channel. A two user interference channel can be considered as a combination of two main links and two interfering links. we try to capture the statistics of the main links by introducing an extension to the single user channel model. Through this analysis a limiting expression as an outer bound on the capacity region of the interference channel is derived. In Chapter 5, we discuss in more detail the lack of a single letter description of the capacity region for the memoryless interference channel. Based on generalization of results in Chapters 3 and 4, we give an argument which possibly suggests the non existence of single letter descriptions for the general interference channel capacity region. Chapter 2 is mainly a literature review, but the methodology used in this chapter for the proof of the coding theorems is original, eg: Lemma 2.6 is new. This special method is devised for the purpose of obtaining the capacity region of the interference channel and may be used in similar applications. Chapters 3 (except Section 3.), 4 and 5 are all original work presented in this thesis.

22 Chapter 2 Discrete Memoryless Multiuser Channels In this chapter the focus is on the class of discrete memoryless channels. The main objectives are as follows:. To present a prototype description of discrete memoryless channels and concepts related to capacity. In order to dene the discrete memoryless multiuser channel models in a consistent way, as a generalization of single user channels, we start with the basic model of a point to point channel. These concepts provide the basic framework for our discussion on the interference channel capacity region. 2. To introduce a particular method for proving coding theorems. This argument is generalized for multiuser channels. 3. To provide a comprehensive literature review of previous contributions to the computation of the capacity region for the interference channel. 0

23 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 2. Single User Channel A discrete channel (X ; Y; P (y j x)) DC is dened by an input alphabet X, output alphabet Y and a sequence of transition probabilities P (yjx) for all x 2 X n ; y 2 Y n ; n = ; 2;. The term P (y j P x) denotes a function of (x; y) satisfying P (y j y x) = ; 8x. This term is used also as the evaluation of a conditional probability function P Y jx (: j :) (at the specic point (x; y)) in the context of a predetermined joint distribution P (X; Y ) on the random variables X and Y. The variables x and y are n-vectors (n is implied) of elements of X and Y, respectively. Equivalently, x and y can be considered as the elements of the sets X n, Y n, which are the n-th Cartesian products of X and Y. We now specify the property that we refer to as memoryless. Denition 2. (DMC). A discrete memoryless channel without feedback (X ; Y;!(y j x)) DMC is a (X ; Y; P (y j x)) DC where P (yjx) satises: P? y t j x; y [t?] =!(yt j x t ) (2.) for all t. The single letter channel transition probability!(y j x) is a collection of probability assignments on Y for any x 2 X. The term y [t?] refers to the rst t? elements of the vector y. Equation (2.) shows that the output at time t conditioned on the input at this time is independent of previous input and output symbols. This is the memoryless characterization of the channel. The memoryless property by itself can be identied by P? y t j x [t] ; y [t?] =!(yt j x t ). The fact that the left hand side of (2.) is also independent of x t+ ; x t+2 ; :::; x n means that the memoryless channel is used without feedback. If the memoryless channel is used with feedback, then these future input symbols are dependent on y t through the feedback information. The no feedback property allows that the sequence of channel transition probabilities P (y j x), n =

24 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 2 ; 2; :::, be dened independent of the input distribution. Note that as a result of (2.) in a DMC we have: P (y j x) = ny P (y t j x; y [t?] ) = ny t= t=! (y t j x t ) (2.2) Equation (2.) shows that!(y t j x t ) is in fact P (y t j x t ). The notation!(y j x) is used instead of P (y t j x t ) to emphasis that P (y t j x t ) is not a function of t. Hereafter the notation! is used to show that a conditional probability is independent of the time index of its variables. The conditional distribution!(y j x) comprises all the knowledge of the receiver about the channel statistics in the message detection process. For the purpose of communicating through a discrete channel, a set of messages is prearranged between the two communicating parties. For each message an element of X n is considered as the channel input (specifying an encoding function) and a subset of Y n as the respective possible outputs (specifying a decoding function). Any such setting is called a code for the channel. Having arranged a code, the two parties can communicate through the channel, because receiver can distinguish (or estimate) the message of the sender by observing the channel output. However, there is a probability of error in estimating the sender's message, which depends on the encoding and decoding functions and the channel transition probability. We consider this probability of error as part of the characterization of a code and represent a code by a triple (n; M; ) as follows: Denition 2.2 (Code). An (n; M; ) code consists of a message set M = f; 2; ; Mg, a collection of M codewords x(i) 2 X n, i 2 M and M disjoint decoding sets A i Y n,i 2 M, such that P e, M MX i= P (A c i j x(i)) ; (2.3) where for a set B Y n, P (B j x) = P y2b P (yjx).the term Ac denotes the complement set of A.

25 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 3 The set of codewords C = fx(i) j i = ; 2; :::; Mg is called the code book, and P e is called the average probability of error. For a given code, the empirical time distribution of the code, T t (:) : X! [0; ] is dened by T t (x) = jfx 2 Cjx t = xgj=m is the letter distribution for the t-th element of the codewords. Note that any (n; M; ) code is also an (n; M; 0 ) code for any 0. Denition 2.3 (Approachable Rate - Capacity). A rate R is approachable if for any > 0, there exists an (n; M; ) code such that log(m)=n R? for any arbitrary small > 0 and all suciently large n. The capacity of a channel is the maximum of approachable rates. As implied by its denition, the concept of channel capacity is an argument about the existence or nonexistence of codes under dierent conditions on the code parameters. The following two lemmas provide a basic method for obtaining the capacity for a DMC. Given a DMC (a xed!(y j x)), the rst lemma shows that rates arbitrary close to I(X; Y ) on the joint distribution P (X)!(Y j X) for some distribution P (X) are approachable and therefore all rates below max P (X) I(X; Y ) are approachable. The second lemma shows that any approachable rate has to be less than I(X; Y ) for some P (X), and therefore less than max P (X) I(X; Y ). These direct and converse arguments prove that the capacity of a DMC is C = max P (X) I(X; Y ). Lemma 2. features the Shannon random coding technique [7],[6],[8] and is proved by the Asymptotic Equipartition Property (AEP) for jointly typical sequences. Lemma 2. (Direct). Consider a joint distribution P (Y; X) = P X (X)P Y jx (Y j X). Dene the M n random matrix ^X 2 X Mn with distribution P ( ^X) = Q i;j P X(X i;j ). We call ^X a random codebook. (codewords X(i); i = ; ; M, are the rows). For xed codebook ^x, let "(^x) be the probability of error minimized over selection of decoding regions for a DMC with transition probability!(y j x) = P Y jx (y j x), ie: "(^x) = min A i M MX i= P (A c i j x(i)); (2.4)

26 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 4 where P (y j x) = Q n t= P Y jx(y t j x t ). The average error probability ", E ["] (E [:] is expectation over P ( ^X)) can be made arbitrarily small by suciently large n (ie: lim n! " = 0) if for some > 0, and some R satisfying 2 log(m) = R? (2.5) n R I(X; Y ) (2.6) This lemma and also Lemmas 2.3 and 2.5 are specialization of Lemma 3.3, for which we give a complete proof in Section 3.2. In fact Lemma 2. is a special case of Lemma 2.3, where jqj =. Lemma 2. shows that under the conditions (2.5) and (2.6) for any distribution P (X), any > 0 and all suciently large n, we can have ". Since " is the average of "(^x), these conditions imply the existence of a ^x, such that "(^x), and due to above denition of ^x existence of an (n; M; ) code for the channel? X ; Y; PY jx (Y jx). This shows that any rate R satisfying (2.6) for some dis- DMC tribution P (X) is approachable, and hence all rates R max P (X) I(X; Y ) are approachable. Lemma 2.2 (Converse). In a channel (X ; Y;!(yjx)) DMC, any (n; M; ) code with empirical time distribution T t (x) satises n log(m) [I(X; Y ) + =n](? )? (2.7) where the mutual information is on the distribution P (X; Y ) = P n t= [=nt t(x)]!(y jx). Equations (2.5) and (2.6) can be merged as =n log(m) = I(X; Y )?, but this separation makes it more clear to show that the rates R I(X; Y ) are approachable, using the denition of approachable rates. 2 I(X; Y ) is the mutual information function of the distribution P (X; Y ).

27 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 5 Proof. The term P e = M P M i= P (Ac ijx(i)) in the denition of the code is the average probability of error for an estimator of the random vector X based on the observation Y, for which Fano's inequality ([8] Theorem 2..) gives H(XjY) + Pe log(m) (2.8) for the joint distribution P (X; Y) = P (X)P (YjX) that has P (YjX) = Q n t=!(y tjx t ) and P (X) = 8 < : =M X 2 C 0 Otherwise. (2.9) The term of P (Y j X) throughout the thesis refers to the conditional probability distribution P Y jx (Y j X) of random variables X; Y under an assumed joint distribution. By (2.9), we have H(X) = log(m). This equality, (2.8) and the inequality Pe result in log(m) = I(X; Y) + H(X j Y) I(X; Y) + + log(m) and therefore log(m) (I(X; Y) + )(? )? (2.0) The t-th -dimensional marginal of P (X; Y) is P t (X t ; Y t ) = T t (X)!(Y jx). For the distribution P (X; Y) we have P (YjX) = ny P (Y t jx t ) =) H(YjX) = nx t= t= This and the general inequality H(Y) P n I(X; Y) nx t= H(Y t jx t ): (2.) t= H(Y t) ([8] Theorem 2.6.6), result in I(X t ; Y t ): (2.2) Note that I(X t ; Y t ) is in fact I(X; Y ) for the joint distribution P t (X; Y ). Consider the following convex combination of these distributions, P (X; Y ) P n = t= P n t(x; Y ) = P n t= [ n T t(x)]!(y jx) (2.3)

28 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 6 Because mutual information is a concave function of the input distribution for a xed transition probability ([8], Theorem 2.7.4), we have: nx t= n I(X t; Y t ) I(X; Y ); (2.4) where I(X; Y ) is on the above distribution P (X; Y ). So for this distribution we have: log(m)=n [I(X; Y ) + =n](? )? : Note that (2.9) does not need to be part of the assumption of the Lemma. Equation (2.0) is derived based on Fano's inequality for an estimator X that has the observation Y, under the assumption that P e. Random variables X and Y have the joint distribution constructed by (2.9) and a specic P (Y j X). Therefore by the only assumption P e = M P M i= P (Ac ijx(i)), in which P (Y j X) is given by the channel denition, one can assert the existence of such an estimator for which inequality (2.0) needs to be satised, and from that derive the inequality (2.7) between the parameters M, n, and. According to the denition of an approachable rate, if R is approachable, then for any > 0 and for all suciently large n, there exists an (n; M; ) code such that log M=n R? for some > 0. But according to Lemma 2.2, any such code must also satisfy (2.7). So any approachable rate R is upper bounded R [I(X; Y ) + =n](? )? +. Since,, n are arbitrary, this bound is also true for the limiting case! 0;! 0; n! and thus any approachable rate R satises R I(X; Y ) for some P (X), which implies that rates R > max P (X) I(X; Y ) are not approachable. 2.. Time Varying DMC We now introduce a variant of the DMC in which the channel transition probability at each time is varying in a manner known to both transmitter and receiver. We

29 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 7 suppose that the set of all possible channel transition probabilities for dierent time instances are nite. Therefore we can write P (y t j x t ) =!(y t j x t ; q t ), where q t 2 Q for any t with Q as a nite set. All the channel statistics are given by the matrix!(y j x; q) (which is a collection of probability assignments on Y given for any x 2 X and q 2 Q) and a deterministic innite sequence q. We call the sequence q a time sharing sequence. Denition 2.4 (TVDMC). A time varying discrete memoryless channel without feedback (X ; Y; Q; q;!(yjx; q)) TVDMC. is a (X ; Y; P (y j x)) DC where P (yjx) satises: P? y t j x; y [t?] =!(yt j x t ; q t ) (2.5) for all t and the deterministic sequence q. This model will be extended in Section 4. to the model of State Conditioned Memoryless Channel (SCMC) which we will use in the analysis of multiuser channels. We now extend the random coding technique of Lemma 2. to the TVDMC. Lemma 2.3 (Random TVDMC). Consider a joint distribution P (Y; X; Q) = P Q (Q)P XjQ (X j Q)P Y jxq (Y j X; Q). Dene the M n random matrix ^X 2 X Mn and the random vector Q 2 Q n with P ( ^X; Q) = P (Q)P ( ^X j Q); (2.6) where, P (Q) = Q n t= P Q(Q t ) and P ( ^X j Q) = Q i;j P XjQ(X ij j Q j ). For xed codebook ^x and vector q, let "(^x; q) be the minimum probability of error over selection of decoding regions for a TVDMC with!(y j x; q) = P Y jxq (y j x; q), ie: "(^x; q) = min A i M MX i= P (A c i j x(i)); (2.7) where P (y j x) = Q n t= P Y jxq(y t j x t ; q t ).

30 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 8 ^X = [matrix] Mn TVDMC of error by the best selection A i 's Q = [vector] n P (Y j X) = Q t P (Y t j X t ; Q t ) Y = [vector] n A i Y n i = ; 2; ; M Figure 2.: Random coding for a TVDMC. The average error probability ", E ["] can be made arbitrarily small by suciently large n (ie: lim n! " = 0) if for some > 0, and any R satisfying log(m) = R? ; (2.8) n R I(X; Y j Q): (2.9) This lemma is a special case of Lemma 2.5, where X = X and jx 2 j =, and its proof is by specialization of the proof for Lemma 3.3. Lemma 2.3 states that for the joint distributions P ( ^X; Q), dened by (2.6) for dierent M and n, if we increase n and M under the constraint (2.8), then the expectation of the random variable ", as a function of ^X; Q (see Figure 2. and Equation (2.7)) will approach zero. We can also prove an equivalent converse lemma for this channel as follows. Lemma 2.4. In a channel (X ; Y; Q; q;!(yjx; q)) TVDMC, any (n; M; ) code with empirical time distribution T t (x) satises n log(m) [I(X; Y j Q) + =n](? )? (2.20) where the mutual information is calculated on the random variables X; Y dened on the sets X ; Y by the distribution P (X; Y; Q) = P t:qt=q [=nt t(x)]!(y jx; Q).

31 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 9 Proof. Similar to the proof of Lemma 2.2, a (n; M; ) code implies: log(m) (I(X; Y) + )(? )? : (2.2) Now the memoryless property of TVDMC results in: P (YjX) = ny t= P t (Y t jx t ) =) H(YjX) = nx t= H(Y t jx t ): (2.22) Therefore : I(X; Y) nx t= I t (X; Y ): (2.23) where I t (X; Y ) is the mutual information of distribution P t (X; Y ) = T t (X)!(Y j X; q t ). We can write: where s q n nx t= I t (X; Y ) = X q2q s q n X t:qt=q s q I t (X; Y ); (2.24) = jft : 0 < t < n; q t = qgj. The inner summation is a convex combination of mutual informations on the distributions P t (X; Y ) = T t (X)!(Y jx; q), with xed transition probability!(y jx; q) but dierent input distribution T t (X). By the concavity argument similar to that in the proof of the Lemma 2.2, for each q 2 Q, we have X t:qt=q s q I t (X ; Y ) I q (X ; Y ); (2.25) where I q (X ; Y ) is based on the same convex combination of distributions, ie: P q (X; Y ) =!(Y jx; q) P t:qt=q n nx t= T t(x). Equations (2.24) and (2.25), result in sq I t (X ; Y ) X q2q s q n I q(x ; Y ): (2.26) But the right hand side of (2.26) is I(X; Y j Q) for the joint distribution P (X; Y; Q) = P (Q)P (X; Y jq) where P (Q) = s Q n and P (X; Y jq) = P Q (X; Y ). Therefore (2.2),

32 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 20 (2.23) and (2.26) result in the inequality (2.20) for the joint distribution P (X; Y; Q) = P (Q)P (X; Y jq) = s Q n Pt:qt=Q [ s Q T t (X)]!(Y jx; Q) = P t:qt=q [ n T t(x)]!(y jx; Q) (2.27) Lemmas 2.3 and 2.4 do not have direct application for single user channels, but their extension to multiple users is used in the next section to derive the capacity regions for multiple access channels, and achievable regions for the interference channel in Chapter Multiple Access Channel In this section we extend the single user channel model to the case where the channel has two inputs. A discrete multiple access channel (X ; X 2 ; Y; P (yjx ; x 2 )) DMAC (referred to by DMAC) is dened by two input alphabets X ; X 2, one output alphabet Y and a transition probability P (yjx ; x 2 ) for all y 2 Y n, x 2 X n, x 2 2 X2 n, n = ; 2; 3;. Note that this channel can also be uniquely determined by the input and output sets and the set of conditional probabilities P (y t j x ; x 2 ; y [t?] ); t = ; 2;, therefore in general the output at a time instance depends on the inputs at all times and the outputs at previous time instances, ie: arbitrary memory in the channel. We now focus on the class of DMAC with the additional properties of memoryless and no feedback. This section up to subsection 2.2. is literature review. Denition 2.5. A discrete memoryless multiple access channel (MAC) without feedback (X ; X 2 ; Y;!(yjx ; x 2 )) MAC is a DMAC, where P (y j x ; x 2 ) satises P (y t j x ; x 2 ; y [t?] ) =!(y t jx t ; x 2t ) (2.28) for all t.

33 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 2 PSfrag replacements X Y!(Y j X ; X 2 ) X 2 Figure 2.2: Discrete memoryless multiple access channel Given a (X ; X 2 ; Y;!(yjx ; x 2 )) MAC and an integer n, dene the n-th transmission probability matrix! n (yjx ; x 2 ), n Y t=!(y t jx t ; x 2t ) (2.29) From (2.28), we have! n (yjx ; x 2 ) = P (yjx ; x 2 ). We use the notation! n throughout the thesis to highlight the fact that P (Y ; Y 2 j X ; X 2 ) is the product conditional distribution generated by the channel single letter conditional probabilities 3. Denition 2.6. An (n; M ; M 2 ; ) code for the channel (X ; X 2 ; Y;!(yjx ; x 2 )) MAC consists of two message sets M = f; 2; ; M g, and M 2 = f; 2; ; M 2 g, a collection of M codewords x (i) 2 X n ; i 2 M, M 2 codewords x 2 (j) 2 X n 2 ; j 2 M 2 and M M 2 disjoint decoding sets A ij Y n,i 2 M; j 2 M 2, such that P e = M M 2 M X M 2 i= X j=! n (A c ijjx (i); x 2 (j)) (2.30) For an (n; M ; M 2 ; ) code dened above, Pe is called the average probability of error. The sets C = fx (i)ji = ; 2; :::; M g; C 2 = fx 2 (j)jj = ; 2; :::; M 2 g are called the codebooks for the rst and second transmitter. The functions T t (x ) = kfx 2C jx t =x gk M, and T 2t (x 2 ) = kfx 22C 2 jx 2t =x 2 gk M 2 are called the empirical time distributions of the code. 3 Consider for example Equation (2.56) where!n is replaced by P (Y; Y2 j X; X2). Then this Equation only shows that X is independent of X2 and we need to add an explanation about P (Y; Y2 j X; X2), whearas the notation! n already contains this explanation. Furthermore without the notation! n, Equations like (3.6) would be very confusing.

34 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 22 Denition 2.7 (Approachable Rate). A pair (R ; R 2 ) of nonnegative real values is called an approachable rate pair for the multiple access channel if for any > 0, 0 < < and for any suciently large n, there exists an (n; M ; M 2 ; ) code such that log(m n ) log(m n 2) R? R 2? (2.3) The set of all approachable rate pairs for an multiple access channel is called the capacity region of the channel which we denote by C MAC. A closed region in which all points are approachable is called an achievable region. The capacity region for the memoryless multiple access channel was rst derived by [9] in a limiting form. The rst single letter formula for the capacity region of this channel was obtained by [2] in the form of C MAC = co(g [ G 2 ), where the term co(r) is the convex hull of the region R and S G = S P (X );P (X 2 )f(r ; R 2 ) : R = I(X ; Y j X 2 ); R 2 = I(X 2 ; Y )g P (X );P (X 2 ) f(r ; R 2 ) : R = I(X 2 ; Y j X ); R 2 = I(X ; Y )g: G 2 = This formula is (geometrically) equivalent to C MAC = co [ P (X );P (X 2 ) 8 >< >: (R ; R 2 ) : j R I(X ; Y jx 2 ) R 2 I(X 2 ; Y jx ) R + R 2 I(X ; X 2 ; Y ) 9 >= >; which was obtained in [0] and []. Another single letter formula for the capacity region using a time sharing random variable instead of the convex hull operation has been attained (eg: [8]). A limiting form was also obtained for the multiple access channel with nite memory [2] (see Section 5.3.3). Gaarder and Wolf [3] showed that in contrast with DMC, feedback increases the capacity region of DMAC. A single letter achievable region for the multiple access channel with feedback was derived in [4].

35 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 23 R 2 PSfrag replacements I(X 2 ; Y j X ) R + R 2 I(X ; X 2 ; Y ) convex hull I(X 2 ; Y ) 45 R I(X ; Y ) I(X ; Y j X 2 ) I(X ; X 2 ; Y ) Figure 2.3: Achievable region for a given distribution P (X )P (X 2 ) 2.2. Time Varying MAC Similar to the TVDMC (Denition 2.4), we consider a memoryless multiple access channel that has a predetermined variation of transition probability 4, ie: P (y t x t ; x 2t ) =!(y t j x t ; x 2t ; q t ), where q is a deterministic innite sequence, called a time sharing sequence. We use this model of give a proof to the coding theorem for the memoryless multiple access channel as a special case. j Denition 2.8 (TVMAC). A time varying discrete memoryless multiple access channel without feedback (X ; X 2 ; Y; Q; q;!(yjx ; x 2 ; q)) TVMAC is a (X ; X 2 ; Y; P (yjx ; x 2 )) DMAC where P (yjx ; x 2 ) satises: P? y t j x ; x 2 ; y [t?] =!(yt j x t ; x 2t ; q t ) (2.32) for all t and the deterministic sequence q. 4 We developed the extended version of this model (similar to the state conditioned extension of the DMC in Chapter 4) for the purpose of applying its 3 user version in a converse coding theorem to the Han and Kobayashi achievable region for the general model of the interference channel in conjunction with a list coding argument.

36 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 24 The n-th transmission probability for a TVMAC is! n (y j x ; x 2 ) = Q n t=!(y t j x t ; x 2t ; q t ). The extension of Lemma 2.3 to the two user case is the basis for the direct part of the coding theorem for the (time varying) multiple access channel. This extension is given here and its 3 user version is used in the next chapter for the direct part of coding theorem for the interference channel. Lemma 2.5. Consider X 2 X ; X 2 2 X 2 ; Y 2 Y; and Q 2 Q jointly distributed according to P (Y; X ; X 2 ; Q) = P (Q)P (X j Q)P (X 2 j Q)P (Y j X ; X 2 ; Q) (2.33) Dene the random matrices ^X 2 X M n ; ^X 2 2 X M n and the random vector Q 2 Q n with distribution P ( ^X ; ^X 2 ; Q) = P (Q)P ( ^X j Q)P ( ^X 2 j Q): (2.34) where, P (Q) = Q n t= P Q(Q t ), P ( ^X j Q) = Q i;j P (X ij j Q j ) and P ( ^X 2 j Q) = Q i;j P (X 2ij j Q j ). We call ( ^X ; ^X 2 ) a random codebook. For xed codebook (^x ; ^x 2 ) and vector q, let "(^x ; ^x 2 ; q) be the minimum probability of error over selection of decoding regions for a TVMAC with!(y j x ; x 2 ; q) = P (y j x ; x 2 ; q), ie: "(^x ; ^x 2 ; q) = min A ij M ; M 2 M X M 2 i= where P (y j x ; x 2 ) = Q n t= P (y t j x t ; x 2t ; q t ). X j= P (A c ij j x (i); x (j)); (2.35) The average error probability ", E ["] can be made arbitrarily small by suciently large n (ie: lim n! " = 0) if n log(m i) = R i? i = ; 2 (2.36)

37 CHAPTER 2. DISCRETE MEMORYLESS MULTIUSER CHANNELS 25 ror by the best selection A i;j 's ^X = [matrix] M n ^X 2 = [matrix] M2 n Q = [vector] n TVMAC P (Y j X ; X 2 ) = Q t P (Y t j X t ; X 2t ; Q t ) Y = [vector] n A i;j Y n i ; 2; ; j = ; 2; ; M 2 Figure 2.4: Random coding for TVMAC for some > 0, and some (R ; R 2 ) satisfying R I(Y ; X j X 2 ; Q) R 2 I(Y ; X 2 j X ; Q) R + R 2 I(Y ; X ; X 2 j Q) (2.37) This lemma is the specialization of Lemma 3.3, where U = X ; V = X 2 and jwj =. Lemma 2.5 states that for the joint distributions P ( ^X ; ^X 2 ; Q), dened by (2.34) for dierent M, M 2 and n, if we increase n, M and M 2 under the constraint (2.36), then the expectation of the random variable ", as a function of ^X ; ^X 2 ; Q (see Figure 2.4 and Equation (2.35)) will approach zero. We extend Lemma 2.4 to the two user case to prove the converse part of coding theorem for multiple access channel. Lemma 2.6. Any (n; M ; M 2 ; ) code for a channel (X ; X 2 ; Y; Q; q;!(yjx ; x 2 ; q)) TVMAC, with empirical distributions T t (x ); T 2t (x 2 ) satises the following inequalities. log(m )=n [I(X ; Y jx 2 ; Q) + n ]? + + log(jx j) log(m 2 )=n [I(X 2 ; Y jx ; Q) + n ]? + + log(jx 2j) log(m M 2 )=n [I(X ; X 2 ; Y jq) + n ]? (2.38) for any 0 < <, where the mutual information is on the joint distribution X P (X ; X 2 ; Y; Q) = [ n T t(x )T 2t (X 2 )]!(Y jx ; X 2 ; Q): (2.39) t:qt=q