The mathematics behind wireless communication

Similar documents
Capacity Limits of MIMO Channels

Chapter 1 Introduction

Polarization codes and the rate of polarization

Coding and decoding with convolutional codes. The Viterbi Algor

Gambling with Information Theory

MIMO CHANNEL CAPACITY

Linear Codes. Chapter Basics

Gambling and Data Compression

Privacy and Security in the Internet of Things: Theory and Practice. Bob Baxley; HitB; 28 May 2015

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Entropy and Mutual Information

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Review Horse Race Gambling and Side Information Dependent horse races and the entropy rate. Gambling. Besma Smida. ES250: Lecture 9.

Teaching Convolutional Coding using MATLAB in Communication Systems Course. Abstract

Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska

ELEC3028 Digital Transmission Overview & Information Theory. Example 1

An Introduction to Information Theory

On Directed Information and Gambling

Diffusion and Data compression for data security. A.J. Han Vinck University of Duisburg/Essen April 2013

Coding Theorems for Turbo-Like Codes Abstract. 1. Introduction.

Image Compression through DCT and Huffman Coding Technique

LECTURE 4. Last time: Lecture outline

Information, Entropy, and Coding

Solutions to Exam in Speech Signal Processing EN2300

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Basics of information theory and information complexity

Ex. 2.1 (Davide Basilio Bartolini)

Sheet 7 (Chapter 10)

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Capacity of the Multiple Access Channel in Energy Harvesting Wireless Networks

A Practical Scheme for Wireless Network Operation

6.02 Fall 2012 Lecture #5

Introduction to Learning & Decision Trees

Design of LDPC codes

A New Interpretation of Information Rate

Secure Physical-layer Key Generation Protocol and Key Encoding in Wireless Communications

Secure Network Coding on a Wiretap Network

Modified Golomb-Rice Codes for Lossless Compression of Medical Images

Capacity Limits of MIMO Systems

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

Information Theory and Coding SYLLABUS

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

CODING THEORY a first course. Henk C.A. van Tilborg

Physical Layer Security in Wireless Communications

Principle of Data Reduction

Hyperspectral images retrieval with Support Vector Machines (SVM)

Weakly Secure Network Coding

DATA VERIFICATION IN ETL PROCESSES

encoding compression encryption

INTER CARRIER INTERFERENCE CANCELLATION IN HIGH SPEED OFDM SYSTEM Y. Naveena *1, K. Upendra Chowdary 2

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Hill s Cipher: Linear Algebra in Cryptography

LDPC Codes: An Introduction

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Coded Bidirectional Relaying in Wireless Networks

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts

1.2 Solving a System of Linear Equations

A Probabilistic Quantum Key Transfer Protocol

4932 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009

Digital Modulation. David Tipper. Department of Information Science and Telecommunications University of Pittsburgh. Typical Communication System

Mathematical finance and linear programming (optimization)

Elements of probability theory

A Survey of the Theory of Error-Correcting Codes

Intelligent Agents. Based on An Introduction to MultiAgent Systems and slides by Michael Wooldridge

Complexity-bounded Power Control in Video Transmission over a CDMA Wireless Network

Compression techniques

On closed-form solutions of a resource allocation problem in parallel funding of R&D projects

How To Find A Nonbinary Code Of A Binary Or Binary Code

Principles of Digital Communication

Technical Specifications for KD5HIO Software

Notes 11: List Decoding Folded Reed-Solomon Codes

Power Control is Not Required for Wireless Networks in the Linear Regime

CHAPTER 6. Shannon entropy

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

On the Use of Compression Algorithms for Network Traffic Classification

Transform-domain Wyner-Ziv Codec for Video

Regular Languages and Finite State Machines

Enhancing High-Speed Telecommunications Networks with FEC

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Coding Schemes for a Class of Receiver Message Side Information in AWGN Broadcast Channels

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

Coding Theorems for Turbo Code Ensembles

0.1 Phase Estimation Technique

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA

Physical-Layer Security: Combining Error Control Coding and Cryptography

Transcription:

June 2008

Questions and setting In wireless communication, information is sent through what is called a channel. The channel is subject to noise, so that there will be some loss of information. How should we send information so that there is as little information loss as possible? How should we dene the capacity of a channel? Can we nd an expression for the capacity from the characteristics of the channel?

What is information? Assume that the random variable X takes values in the alphabet X = {α 1, α 2,...}. Set p i = Pr(X = α i ). How can we dene a measure H for how much choice/uncertainty/information is associated with each outcome? Shannon [1] proposed the following requirements for H: 1 H should be continous in the p i. 2 If all the p i are equal (p i = 1 n ), then H should be an increasing function of n (with equally likely events there is more uncertainty when there are more possible events). 3 If a choice can be broken down into successive choices, the original H should be the weighted sum of the individual values of H: A choice between {α 1, α 2, α 3 } can rst be split into a choice between {α 1, {α 2, α 3 }}, followed by an alternative choice between {α 2, α 3 }.

Entropy Denition The entropy of X is dened by H(X ) = H(p 1, p 2,...) = i p i log 2 p i The entropy is measured in bits. Shannon showed that an information measure which satises the requirements of the previous foil, necessarily has this form! If p 1 = 1, p 2 = 1, p 3 = 1, the weighting described on the previous 2 3 6 foil can be veried as ( 1 H 2, 1 3, 1 ) ( 1 = H 6 2, 1 ) + 1 ( 2 2 2 H 3, 1 ), 3 where the weight 1 appearing on the right side is computed as 2 p 2 + p 3 = 1. 2

Shannon's source coding theorem We would like to represent data generated by the random variable X in a shorter way (i.e. compress). Shannon's source coding theorem addresses the limits of such compression: Theorem Assume that we have independent outcomes of the random variable X (= x 1 x 2 x 3 ) The average number of bits per symbol for any lossless compression strategy is always greater than or equal to the entropy H(X ). The entropy H is therefore a lower limit for achievable compression. The theoretical limit given by the entropy is also achievable. In a previous talk, I focused on methods for achieving the limit given by the entropy (Human coding, arithmetic coding).

Sketch of Shannon's proof There exists a subset A (n) ɛ of all length-n sequences (x 1, x 2,..., x n ) such that The size of A (n) ɛ is 2 nh(x ) (which can be small when compared to the number of all sequences). Pr(A (n) ɛ ) > 1 ɛ. A (n) ɛ is called the typical set, and consists of all (x 1, x 2,..., x n ) with 1 n log 2(p(x 1, x 2,..., x n ) (=empirical entropy) close enough to the actual entropy H(X ). Shannon proved the source coding theorem by 1 assigning codes with a (smaller) xed length to ALL elements in the typical set, 2 assigning codes with another (longer) xed length to ALL elements outside the typical set, 3 letting n, and ɛ 0.

What is a communication channel? That A communicates with B means that the physical acts of A induce a desired physical state in B. This transfer of information is subject to noise and the imperfections of the physical signaling process itself. The communication is succesful if the receiver B and the transmitter A agree on what was sent. Denition A discrete channel, denoted by (X, p(y x), Y), consists of two nite sets X (the input alphabet) and Y (the output alphabet), and a probability transition matrix p(y x) that expresses the probability of observing the output symbol y given that we send the symbol x. The channel is said to be memoryless if the probability distribution of the output depends only on the input at that time, and is conditionally independent of previous channel inputs and outputs.

A general scheme for communication W Encoder X n Channel p(y x) Y n Decoder Ŵ W {1, 2,..., M} is the message we seek to transfer via the channel The encoder is a map X n : {1, 2,..., M} X n, taking values in a codebook from X n of size M (X n (1), X n (2),..., X n (M)). The decoder is a map Y n : Y n {1, 2,..., M}. This is a deterministic rule that assigns a guess to each possible received vector. Ŵ {1, 2,..., M} is the message retrieved by the decoder. n is the block length. It says how many times the channel is used for each transmission. M is the number of possible messages. A message can thus be represented with log 2 (M) bits.

The encoder/decoder pair is called a (M, n)-code (i.e. codes where there are M possible messages, n uses of the channel per transmission). When the encoder maps the input to codewords in the data transmission process, it actually adds redundancy in a controlled fashion to combat errors in the channel. This is in contrast to data compression, where one goes the opposite way, i.e. removing redundancy in the data to form the most compressed form possible. The basic question is how one can construct an encoder/decoder pair, such that there is a high probability that the received message Ŵ equals the transmitted message W?

Denition Let λ W be the probability that the received message Ŵ is dierent from the sent message W. This is called the conditional probability of error given that W was sent. We also dene the maximal probability of error as λ (n) = max W {1,2,...,M} λ W. Denition The rate of an (M.n)-code is dened as R = log 2 (M) n, measured in bits per transmission. Denition A rate R is said to be achievable if there for each n exists a ( 2 nr, n)-code, such that lim n λ (n) = 0 (i.e. the maximal probability of error goes to 0). Denition The (operational) capacity of a channel is the supremum of all achievable rates.

Shannon's channel coding theorem Expresses the capacity in terms of the probability distribution of the channel, irrespective of the use of encoders/decoders. Theorem The capacity of a discrete memoryless channel is given by C = max I (X ; Y ), q(x) where X /Y is the random input/output to the channel, with X having distribution q(x) on X. Here I (X ; Y ) is the mutual information between the random variables X and Y, dened by I (X ; Y ) = x,y p(x, y) log 2 ( p(x, y) p(x)p(y) where p(x, y) is the joint p.d.f. of X and Y. ), (1)

Sketch of proof I We generalize the denition of the typical set (from the proof of the source coding theorem) to the following: The jointly typical set consists of all jointly typical sequences ((x n ), (y n )) = ((x 1, x 2,..., x n ), (y 1, y 2,..., y n )), dened as those sequences where 1 the empirical entropy of (x 1, x 2,..., x n ) is close enough to the actual entropy H(X ), 2 the empirical entropy of (y 1, y 2,..., y n ) is close enough to the actual entropy H(Y ), 3 the joint empirical entropy ( 1 n log ( n p(x 2 i=1 i, y i ))) of ((x 1, x 2,..., x n ), (y 1, y 2,..., y n )) is close enough to the actual joint entropy H(X, Y ) dened by H(X, Y ) = p(x, y) log 2 p(x, y), x X y Y where p(x, y) is the joint distribution of X and Y.

Sketch of proof II The jointly typical set is, just as the typical set, denoted A (n) ɛ. It has the following property similar to the corresponding properties for the typical set: 1 The size of A (n) ɛ is approximately 2 nh(x,y ) (which is small when compared to the number of all sequences). 2 Pr(A (n) ɛ ) 1 as n.

Sketch of proof III The channel coding theorem can be proved in the following way for a given rate R < C : 1 Construct a randomly (dictated by some xed distribution of the input) generated codebook of length 2 nr from X n. Dene the encoder as any mapping from {1,..., 2 nr } into this set. 2 Dene the decoder in the following way if the output (y1, y2,..., yn) of the channel is jointly typical with a unique (x 1,...x n), dene (x 1,...x n) as the output of the decoder Otherwise, the output of the decoder should be some dummy index, declaring an error. 3 One can show that, with high probability (going to 1 as n ), the input to the channel (x 1, x 2,..., x n ) is jointly typical with the output (y 1, y 2,..., y n ). The expression for the mutual information enters the picture when computing the probability that the output is jointly typical with another sequence, which is 2 ni (X ;Y ).

More general channels I In general, channels do not use nite alphabet inputs/outputs. The most important continous alphabet channel is the Gaussian channel. This is a time-discrete channel with output Y i at time i given by Y i = X i + Z i. X i is input, Z i N (0, N) noise (Gaussian, variance N). Capacity can be dened in a similar fashion for such channels The capacity can be innite, unless we restrict the input. The most common such restriction is a limitation on its variance. Assume that the variance of the input is less than P. One can then show that the capacity of the Gaussian channel is ( 1 2 log 1 + P ), 2 N and that the capacity is achieved when X N (0, P).

More general channels II In general, communication systems consist of multiple transmitters and receivers, talking and interfering with each other. Such communication systems are described by a channel matrix, whose dimensions match the number of transmitters and receivers. Its entries is a function of the geometry of the transmitting and receiving antennas. Capacity can be described in a meaningful way for such systems also. It turns out that, for a wide class of channels, the capacity is given by C = 1 ( n log det I 2 n + ρ 1 ) m HHH where H is the n m channel matrix, n,m is the number of receiving/transmitting antennas, ρ is signal to noise ratio (like for the Gaussian channel). P N

Active areas of research and open problems How do we construct codebooks which help us achieve rates close to the capacity? In other words, how can we nd the input distribution p(x) which maximizes I (X ; Y ) (the mutual information between the input and the output)? Such codes should also be implementable. Much progress made in recent years. Convolutional codes, Turbo codes, LDPC (Low-Density Parity Check) codes. Error correcting codes: These codes are able to detect where bit errors have occured in the received data. Hamming codes. What is the capacity in more general systems? One has to account for any number of receivers/transmitters, any type of interference, cooperation and feedback between the sending and receiving antennas. General case far from being solved.

Good sources on information theory are the books [2] (which most of these foils are based on), and [3]. Related courses at UNIK: UNIK4190, UNIK4220, UNIK4230. Related courses at NTNU: TTT4125, TTT4110. This talk is available at http://heim.i.uio.no/ oyvindry/talks.shtml. My publications are listed at http://heim.i.uio.no/ oyvindry/publications.shtml

C. E. Shannon, A mathematical theory of communication, The Bell System Technical Journal, vol. 27, pp. 379423,623656, October 1948. T. M. Cover and J. A. Thomas, Elements of Information Theory, second edition. Wiley, 2006. D. J. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.