Sphere-Bound-Achieving Coset Codes and Multilevel Coset Codes



Similar documents
Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Linear Codes. Chapter Basics

How To Prove The Dirichlet Unit Theorem

Diversity and Multiplexing: A Fundamental Tradeoff in Multiple-Antenna Channels

Ideal Class Group and Units

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST If the capacity can be expressed as C(SNR) =d log(snr)+o(log(snr))

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Numerical Analysis Lecture Notes

IN current film media, the increase in areal density has

Mathematics Course 111: Algebra I Part IV: Vector Spaces

IN THIS PAPER, we study the delay and capacity trade-offs

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

I. INTRODUCTION. of the biometric measurements is stored in the database

Adaptive Linear Programming Decoding

1 Local Brouwer degree

A Network Flow Approach in Cloud Computing

Sphere Packings, Lattices, and Kissing Configurations in R n

The Open University s repository of research publications and other research outputs

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Elements of Abstract Group Theory

Weakly Secure Network Coding

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

Coding Theorems for Turbo Code Ensembles

G = G 0 > G 1 > > G k = {e}

A Practical Scheme for Wireless Network Operation

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts

SUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by

Notes 11: List Decoding Folded Reed-Solomon Codes

Secure Network Coding on a Wiretap Network

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Adaptive Online Gradient Descent

Coding Theorems for Turbo-Like Codes Abstract. 1. Introduction.

The Ideal Class Group

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY

Continued Fractions and the Euclidean Algorithm

Data Mining: Algorithms and Applications Matrix Math Review

MOST error-correcting codes are designed for the equal

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Math 231b Lecture 35. G. Quick

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks

MIMO CHANNEL CAPACITY

Chapter 1 Introduction

GROUPS ACTING ON A SET

Surface bundles over S 1, the Thurston norm, and the Whitehead link

Group Theory. Contents

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

CODING THEORY a first course. Henk C.A. van Tilborg

Load Balancing and Switch Scheduling

SECRET sharing schemes were introduced by Blakley [5]

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

4932 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009

Coded Bidirectional Relaying in Wireless Networks

FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 22

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Efficient Recovery of Secrets

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING

THREE DIMENSIONAL GEOMETRY

Labeling outerplanar graphs with maximum degree three

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao

EXERCISES FOR THE COURSE MATH 570, FALL 2010

3. Linear Programming and Polyhedral Combinatorics

Coding and decoding with convolutional codes. The Viterbi Algor

THE DIMENSION OF A VECTOR SPACE

ON TORI TRIANGULATIONS ASSOCIATED WITH TWO-DIMENSIONAL CONTINUED FRACTIONS OF CUBIC IRRATIONALITIES.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Adaptive Cut Generation Algorithm for Improved Linear Programming Decoding of Binary Linear Codes

11 Ideals Revisiting Z

Lecture 3: Finding integer solutions to systems of linear equations

MA651 Topology. Lecture 6. Separation Axioms.

Capacity Limits of MIMO Channels

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

Mathematics Review for MS Finance Students

State of Stress at Point

5 Capacity of wireless channels

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Gambling and Data Compression

A New Interpretation of Information Rate

Linear Algebra I. Ronald van Luijk, 2012

Chapter 7. Permutation Groups

Linear Authentication Codes: Bounds and Constructions

Factoring Patterns in the Gaussian Plane

The van Hoeij Algorithm for Factoring Polynomials

AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets

α = u v. In other words, Orthogonal Projection

Comments on Quotient Spaces and Quotient Maps

Integer Factorization using the Quadratic Sieve

Math 4310 Handout - Quotient Vector Spaces

1 VECTOR SPACES AND SUBSPACES

Two classes of ternary codes and their weight distributions

Metric Spaces. Chapter Metrics

Transcription:

820 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Sphere-Bound-Achieving Coset Codes and Multilevel Coset Codes G David Forney, Jr, Fellow, IEEE, Mitchell D Trott, Member, IEEE, and Sae-Young Chung, Student Member, IEEE Abstract A simple sphere bound gives the best possible tradeoff between the volume per point of an infinite array and its error probability on an additive white Gaussian noise (AWGN) channel It is shown that the sphere bound can be approached by a large class of coset codes or multilevel coset codes with multistage decoding, including certain binary lattices These codes have structure of the kind that has been found to be useful in practice Capacity curves and design guidance for practical codes are given Exponential error bounds for coset codes are developed, generalizing Poltyrev s bounds for lattices These results are based on the channel coding theorems of information theory, rather than the Minkowski Hlawka theorem of lattice theory Index Terms AWGN channel coding, coset codes, de Buda s result, lattice constellations, multilevel codes, sphere-boundachieving codes I INTRODUCTION SHANNON proved by random coding arguments that the channel capacity of an additive white Gaussian noise (AWGN) channel with noise variance per dimension and power constraint per dimension is bits per two dimensions (), where may represent either an average power constraint or a peak power constraint [49] (In this paper will denote the binary logarithm ) The subsequent challenge for coding theorists and inventors has been to find structured classes of codes that can be encoded and decoded with reasonable complexity in practice, and with performance that can approach Shannon s limit A class of codes that has often been proposed for this channel, particularly when the signal-to-noise ratio (SNR) is large, is the class of lattice constellations An -dimensional lattice constellation is the finite set of points in a translate of an infinite -dimensional lattice that lie in a certain bounding region, where is typically spherical or quasispherical De Buda [8] (see also [39]) has shown that spherical lattice constellations can in fact achieve a rate of Manuscript received September 17, 1997; revised June 24, 1999 The material in this paper was presented in part at the 1996 International Symposium on Information Theory and its Applications, Victoria, BC, Canada, September, 17 20, 1996, and at the 1997 IEEE International Symposium on Information Theory, Ulm, Germany, June 29 July 4, 1997 G D Forney, Jr and S-Y Chung are with the Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: forneyd@mediaonenet; sychung@mitedu) M D Trott was with he Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139 USA He is now with ArrayComm, Inc, San Jose, CA 95134 USA (e-mail: mitch@arraycommcom) Communicated by S Shamai, Associate Editor for Shannon Theory Publisher Item Identifier S 0018-9448(00)02897-2 bits per two dimensions, which is almost the desired result It has been recognized in recent years that, at least for codes of moderate complexity on high-snr AWGN channels, the problems of coding (packing) and constellation shaping are largely separable [18], [21] That is, if is a discrete constellation consisting of the points in an infinite array that lie in a shaping region, then when is large the properties of the constellation are largely determined by the properties of This paper will therefore focus on the coding problem ie, on the properties of We expect that a good infinite array will lead to good finite constellations, although we do not prove this in this paper (but see Section IX) We will consider infinite dense regular -dimensional discrete arrays, not necessarily lattices, and associated decoding schemes such that the following parameters are well defined: the volume of (ie, the volume of -space per point of ); the average probability of error of a given decoder for if a random point of is transmitted over an AWGN channel with noise variance per dimension For example, if is a lattice, or more generally an array all of whose minimum-distance (Voronoi) regions are congruent, then is the volume of a Voronoi region, and is the probability that an -dimensional independent and identically distributed (iid) Gaussian random variable with mean located at a point and variance per dimension falls outside the Voronoi region associated with More generally, we will show that coset codes and multilevel coset codes have well-defined and We seek the best tradeoff between the volume and the error probability function (Poltyrev [45] has called this problem coding without restrictions for the AWGN channel ) A fundamental lower bound on the tradeoff between and is the following sphere bound From the law of large numbers, in order for to be small, the volume of must be larger than the volume of a noise sphere of squared radius Therefore, the best possible curve of error probability as a function of would be if (1) must be large otherwise We will say that sphere-bound-achieving if (1) holds is 0018 9448/00$1000 2000 IEEE

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 821 Fig 1 Two-level coding/shaping and two-stage decoding of coset code L The main result of this paper is that there exist sphere-boundachieving arrays and associated decoding schemes that have a great deal of structure, and moreover the kind of structure that has been found to be useful in actual practice: namely, may be a coset code or a multilevel coset code A coset code is based on a lattice partition and a block code of length over [4], [18] A codeword specifies a sequence of representatives for a sequence of cosets of The coset code is then Piret [10], or simply from the fact that the channel is an additive-noise channel As, the capacity of the mod- channel goes to, so that the capacity of the channel approaches the capacity of the mod- channel As, the second-level decoding error probability becomes arbitrarily small, and the capacity of the mod- channel asymptotically approaches In general, need not be a lattice However, if is a group code over the quotient group, then is a lattice Fig 1 shows a typical application of a coset code on an AWGN channel A sequence is determined by a codeword, and a second sequence is chosen independently using a shaping scheme such that the transmitted sequence lies in a quasi-spherical region At the receiver, the received sequence is decoded as follows: First, is reduced to While this - operation is not in general information-lossless, it makes the first-level decoding independent of the second-level shaping; moreover, we will argue that it is nearly information-lossless Given, a decoder for finds a decoded codeword and a corresponding sequence The decoded sequence is subtracted from to give (This requires storage of until decoding of is complete; this delay is not shown in Fig 1) If the decoding was correct, so that, then, so the first-level coding does not affect Given, a decoder for finds a sequence of elements of We will prove that there exist sphere-bound-achieving coset codes, even when the receiver uses a mod- front end and multistage decoding as above ie, we show that it is possible to find a lattice partition and a code such that is arbitrarily close to and is arbitrarily small The proof is quite straightforward, and is information-theoretic rather than lattice-theoretic An important separability result shows that the capacity of the first level (the channel ) is simply the difference between the capacities of a mod- channel (the first-level channel with no restriction that ) and a mod- channel This result follows from the fact that the channel is regular in the sense of Delsarte and where is the dimension of and is again the volume of a noise sphere of squared radius By the usual Shannon-type coding theorems, there exists a code with rate arbitrarily close to the capacity of the channel and a first-level decoder such that the firstlevel decoding error probability is arbitrarily small We show that if the rate of is arbitrarily close to per dimensions, then has a well-defined volume such that is arbitrarily close to Notice that this result does not depend on the dimension of and Moreover, while this result appears to require an arbitrarily large partition, we will show that in practice a four-way one-dimensional partition (or more generally a -way -dimensional partition ) suffices if the target error probability is of the order of If for some prime and positive integer, then by a further result of Delsarte and Piret [10] the code may be taken as a linear code over the finite field GF in coding theorems for the channel Because a coset code based on the one-dimensional lattice partition is a mod- lattice, our results prove the existence of sphere-bound-achieving modlattices The existence of sphere-bound-achieving mod- lattices was already proved implicitly in de Buda s original paper [8], as Loeliger has observed [37] Poltyrev has given an explicit proof of the existence of spherebound-achieving mod- lattices in [45], with exponential error bounds that are tight near capacity We obtain exponential error bounds for coset codes that agree with Poltyrev s bounds for lattices Whereas the results of both de Buda and Poltyrev are based on the Minkowski Hlawka theorem of lattice theory, ours are based simply on the Shannon [49] and Gallager [28] channel coding theorems and the fact that the channel is regular in the sense of Delsarte and Piret [10] Moreover, our results apply to more general classes of lattices and nonlattice coset codes

822 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Using our separability result, we are able to prove similar results for multilevel coset codes based on lattice partition chains, using sufficiently powerful block or convolutional codes over the finite alphabets, and using multistage decoding [31] with a mod- front end at the th level The chains may be standard binary partition chains such as or, and then all codes may be binary linear codes We show how to arrange such constructions so that the multilevel coset code is a binary lattice [19] In practice, it is not necessary to use chains of greater depth than the standard chains or Moreover, it appears that a powerful probabilistic code (eg, a turbo code) is really needed only at the first level; for the remaining levels, low-redundancy error-correcting algebraic codes will probably suffice Our results for multilevel coset codes are much like those for multilevel finite-constellation codes found by Kofman, Zehavi, and Shamai [32], [32], and independently by Huber and Wachsmann [30] These prior derivations are based on the chain rule of information theory, whereas ours are based on our separability result We will compare and contrast these approaches Moreover, Wachsmann et al [56], [57] have exhibited impressive performance results using binary turbo codes at each level which indicate that these multilevel results are not merely theoretical The organization of this paper is as follows In Section II, we review basic lattice concepts In Section III, we introduce mod- and channels and their capacities, and prove our basic separability result Section IV gives simple lower bounds and asymptotes for the capacity of a mod- channel In Section V, we use these results to prove the existence of sphere-boundachieving coset codes, lattices, and multilevel coset codes In Section VI, we give universal curves for the normalized capacities of the lattices that are most often used in practice, namely,,, and The capacity of any lattice partition involving versions of these lattices may be read directly from these curves The Appendix gives good estimates of these curves The results of Section VI were first presented in [25] In Section VII, using these curves, we derive guidelines for practical coset codes that provide a theoretical basis for design choices that have proven to be successful in practice (eg, why,,or are good partitions to use to achieve, and why four dimensional (4-D) trellis codes have become popular) We discuss why two-level coset codes with a powerful, complex, probabilistic code at the first level and a low-redundancy algebraic code at the second level may be an attractive practical approach In Section VIII, we derive the exponential error bounds mentioned above These bounds support the attractiveness of our proposed two-level approach Finally, in Section IX we consider spherical constellations based on sphere-bound-achieving lattices We sketch an argument based on an improved continuous approximation that such constellations can achieve the true channel capacity bits per two dimensions with ordinary lattice decoding However, our argument relies an unproved though plausible assumption as well as various approximations Therefore, the long-standing question of whether spherical lattice constellations can achieve the true channel capacity with lattice decoding remains open II LATTICES AND LATTICE PARTITIONS In this section we first review lattice fundamentals, particularly the notions of a fundamental region of a lattice, and of reducing an arbitrary real vector to by a mod- map We then state and prove the sphere bound, which motivates our definition of a sphere-bound-achieving lattice (or array), as well as a normalized volume-to-noise ratio parameter Finally, we introduce sublattices, lattice partitions, and coset decompositions A Lattices An -dimensional lattice is a discrete subgroup of real Euclidean -space Without essential loss of generality, we will assume that spans For example, the set of integers is a discrete subgroup of,so is a one-dimensional lattice A fundamental region of is a region that includes one and only one point from each coset of in Algebraically, is a set of coset representatives for the cosets of in Every may, therefore, be written uniquely as for some and Symbolically, we may write Geometrically, it follows that the set of translates tile -space Consequently, the volume of must be equal to, the volume of -space per point of Given, the mod- map mod- is defined by, where is the unique element of such that We write this map simply as (This is just a concrete way of writing the natural homomorphism from to ) A fundamental Voronoi region of is a fundamental region in which every point is a minimum-energy point in its coset, with ties broken arbitrarily The set of translates of a fundamental Voronoi region tile -space and are a set of decision regions for a minimum-distance decoder for If a point is transmitted over an AWGN channel with noise variance per dimension, then the probability of error of a minimum-distance decoder for is where is the probability density function (pdf) of an -dimensional zero-mean Gaussian random variable with variance per dimension This expression holds for any lattice point and evidently does not depend on the resolution of boundary ties in Conway and Sloane [6, p 69] call the problem of minimizing for a given the (AWGN) channel coding problem

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 823 B Sphere Bound There is a simple sphere lower bound for the channel coding problem, which in the spirit of Shannon relies on the following ideas: For a given, the best possible shape of is an -sphere For to be small, this sphere must be larger than a noise sphere of squared radius The volume of an -sphere of squared radius, for even, is given by where The limit for uses Stirling s approximation By the law of large numbers, as, the probability that a Gaussian noise -tuple with variance per dimension falls outside a sphere of squared radius goes to if, whereas it goes to if Since would be minimized for a given if were spherical, we have where is chosen so that It follows that for large we must have if is to be small Moreover, it is clear that if, then must be large in order for to be small In summary, we have proved the following bound, which we call the sphere bound: Theorem 1 (Sphere Bound): For large, the probability of error of a minimum-distance decoder for an -dimensional lattice on an AWGN channel with noise variance per dimension cannot be small unless Moreover, if, then cannot be small unless is large The same argument clearly applies to any array that has a well-defined average volume per point and average error probability, since practically all decision regions must have volumes exceeding if is to be small Therefore, we call an array sphere-bound-achieving if whenever C Normalized Parameters We now introduce some useful normalized parameters The normalized volume of an -dimensional lattice is defined as The normalized volume may be regarded as the volume of per two dimensions The logvolume of is defined as The units of are bits per two dimensions () The logvolume is simply a logarithmic measure of normalized volume The volume-to-noise ratio (VNR) of is defined as (2) For large, the VNR is the ratio of the normalized volume of to the normalized volume of a noise sphere of squared radius We will find it useful to plot the probability of error as a function of, because: Ifan -dimensional lattice is scaled by while is scaled by, then 1) the volume of scales by ; ie, ; 2) the probability of error does not change; ie, Thus depends only on the ratio (called the generalized SNR in [45]) The sphere bound may be expressed as follows: cannot be small unless A lattice is spherebound-achieving if whenever We will often measure in decibels Alternatively, since we may use binary logarithms and measure in bits per two dimensions (), where is the logvolume of and is the asymptotic logvolume of a sphere of squared radius for large D Lattice Partitions A sublattice of an -dimensional lattice is a subgroup of We will assume that also spans The quotient group then has finite order, and is the disjoint union of cosets (translates) of where is a set of coset representatives for the cosets of in We call a lattice partition Every lies in a coset, and thus may be written uniquely as where This coset decomposition may be written symbolically as Given, this defines a modmap from to, written (This is just a way of writing the natural homomorphism from to ) If is a chain of lattices with quotients, then is a lattice partition chainif are sets of coset representatives for, respectively, then every lies in a distinct coset, and may be written uniquely as This is a chain coset decomposition, written symbolically as ; it shows that is a set of coset representatives for

824 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Similarly, if, and and are sets of coset representatives for and, respectively, then every lies in a distinct coset, and thus may be written uniquely as This chain coset decomposition may be written symbolically as ; it shows that, the disjoint union of fundamental regions of, A The mod- Channel Given an -dimensional lattice and an AWGN channel with noise variance per dimension, we define the mod- channel as follows The mod- channel input is a point, where is any fundamental region of For transmission, an arbitrary lattice point is added to to form an input to an AWGN channel The AWGN channel output is is a fundamental region of It follows that For example, if is the integer lattice, then a fundamental region for is, whose volume is For, is then the fractional part of The even-integer lattice is a sublattice of The lattice partition is a two-way partition; ie, is the union of two cosets of, namely, and, and is a set of coset representatives for A fundamental region for is Continuing this example, we consider the one-dimensional lattice partition chain We may take, as pairs of coset representatives for the two-way partitions Then is the set of integers in, which is a set of coset representatives for where is a white Gaussian noise variable with variance per dimension As usual, we assume that is nonzero and finite: At the receiver front end, the channel output is first mapped to, the unique element of that is congruent to Then since The effect of the second input thus disappears completely The -aliased white Gaussian noise random variable is defined as Since,wehave III THE MOD- AND CHANNELS We will define the mod- channel using a mod- map in the receiver front end The input, noise, and output may then all be defined on a fundamental region of, the noise becomes -aliased white Gaussian noise, and the channel becomes an additive noise channel (mod- ) The mod- map makes the mod- channel independent of higher levels of coding and/or shaping It is not informationlossless, but we argue that for large constellations it is nearly so The channel is a mod- channel with the input restricted to discrete lattice points in We will later use -ary coset codes designed for the channel As becomes small, the channel approaches a mod- channel We compute the capacity of the mod- channel We then show that the capacity of the channel is Thus the mod- channel may be regarded as an additive -aliased white Gaussian noise ( -AWGN) channel, with all variables defined on Fig 2(a) illustrates the physical mod- channel, with two inputs and Fig 2(b) illustrates the equivalent additive -aliased WGN channel Concretely, the domain of the mod- channel is the fundamental region : the input alphabet is, the noise is the -aliased WGN variable, and the output is, again reduced mod- to lie in Algebraically, represents the quotient group : the input is a representative of the coset of in, the noise variable is a representative of the coset, and the output is a representative of the coset, which is the sum of the cosets and Topologically, is homeomorphic to This key separability result follows from the fact that the channel is regular in the sense of Delsarte and Piret [10], which is a strong symmetry property The channel was used by Loeliger [37] to motivate his derivation of the Minkowski Hlawka theorem Mod- channels were considered in [12] in connection with precoding for channel equalization The capacity of the mod- channel in a precoding application was computed by Wesel and Cioffi [62] We expect that there exist other prior references to mod- channels the -torus The reader should therefore try to think of as an -torus For this and other purposes it is good to take as a fundamental parallelotope of If is any generator matrix for, ie, if then the corresponding fundamental parallelotope of region is the

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 825 Note also that for small, for all, there are at most two nearest neighbor Gaussian probability density functions that contribute significantly to (a) of the additive noise vari- C Noise Differential Entropy The differential entropy able is (b) Fig 2 (a) The physical mod-3 channnel (b) Equivalent additive 3-aliased WGN channel Now becomes homeomorphic to an -torus if we connect opposite sides; ie, if we identify the half-open -cube with the -torus B Probability Density Functions The -aliased Gaussian density function is defined as the probability density function of the -aliased WGN variable Let be the probability density function of the Gaussian noise Since the input if and only if density function (4) produces the output, the conditional probability is given by In other words, the mod- channel is an additive noise channel (mod- ) Consequently, we have Lemma 2 (Conditional Entropy): The conditional differential entropy is equal to the differential entropy of, independent of Proof: (5) Then since maps to if and only if, we have It is useful to think of as the restriction to of the infinite-support -aliased Gaussian function (3) D Capacity of the mod- Channel We can now compute the capacity of the mod- channel Theorem 3 (Capacity of mod- Channel): The capacity of the mod- channel is Clearly, is -periodic, meaning that for any, so it is completely defined by its values on any fundamental region It is evidently nonnegative Moreover, its integral over is, since channel is the max- Capacity is achieved if the input density Proof: The capacity of the modimum mutual information is uniform over Therefore, restricted to is a valid probability density function Example 1 ( ): For example, let Fig 3(a) illustrates the set of one-dimensional Gaussian density functions for small and large values of Fig 3(b) illustrates the corresponding -aliased Gaussian densities, which are the sums of the sets of functions shown in Fig 3(a) The restrictions of these -periodic functions to represent the actual -aliased Gaussian probability density functions, which integrate to Note that as becomes small, becomes Gaussian, whereas as becomes large, becomes uniform over all probability densities over the input alphabet, where is the differential entropy of the output, whose pdf is From Lemma 2, we have, independent of Maximizing is therefore equivalent to maximizing The maximum differential entropy of any probability density over is, which is achieved if and only if is uniform over (ie, ) But if we choose the input density to be uniform, then is uniform independent of the density

826 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 (a) (b) Fig 3 (a) One-dimensional Gaussian densities g (n + b); b2 for small and large (b) -aliased Gaussian densities f (n ) for small and large It follows from Shannon s channel coding theorem that there exists a code over of any rate that achieves an arbitrarily low error probability on the mod- channel The exponential error bounds of Gallager [28] may be used to obtain asymptotically tight bounds on the error probability for rates near capacity (see Section VIII) E The Channel Given an -dimensional lattice partition, the channel will be defined as a mod- channel in which the input alphabet is restricted to the discrete set, the elements of a translate the lattice that fall in a fundamental region of Since is the union of cosets of, there are precisely such elements, consisting of one representative of each coset of in We will prove two key theorems about the channel, which hold for any offset : Theorem 4 (Regularity): The channel is regular in the sense of Delsarte and Piret, which implies that it is symmetric in the sense of Gallager, and that the sphere-bound-achieving input distribution is uniform Moreover, if with prime, then linear codes over GF may be used in coding theorems without loss of optimality Theorem 5 (Capacity Separability): The capacity of the channel is the difference between the capacities and of the mod- and mod- channels, respectively, Capacity is achieved with a uniform distribution over the input alphabet Delsarte and Piret [10] call a channel with transition probabilities regular if the input alphabet can be identified with an Abelian group that acts on the output alphabet by permutation ie, if a set of permutations can be defined such that for all and such that depends only on

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 827 The regularity of the channel (Theorem 4) then follows from the fact that depends only on the difference ; ie, the channel is an additive noise channel, mod- We make the natural identification of the input alphabet with the quotient group, and define the action of on as the natural action of on by translation ie, ; then and depends only on Gallager [28, Sec 45] calls a channel with transition probabilities symmetric if the output alphabet can be partitioned into classes such that in each matrix, every row is a permutation of every other row, and every column is a permutation of every other column (This is the sense in which a binary erasure channel or a binary-input Gaussian channel is symmetric) Delsarte and Piret show that a regular channel is symmetric; the classes are the orbits of under the action of In our case, the classes are the equivalence classes of ie, the representatives of each coset in Using the Kuhn Tucker conditions, Gallager shows that on a symmetric channel, capacity is achieved by an equiprobable input distribution [28, Theorem 452] Therefore, the spherebound-achieving input distribution for a channel is a uniform distribution over Delsarte and Piret show that if a channel is regular and the input group is isomorphic to an elementary Abelian group for some prime, then the ensemble of random linear (or rather affine) codes over the finite field GF may be used in place of the usual random code ensemble in proofs of the coding theorem and in Gallager s exponential error bounds [10], [38], thus dramatically generalizing Elias result that random binary linear codes may be used in place of random codes for the binary-symmetric channel [11], [28, Sec 62], or Gabidulin s [27] more general result for symmetric channels with finitefield input alphabets More generally, one may use ensembles of group codes over In our context, this implies that if is isomorphic to, then in coding theorems for the channel the usual random codes may be replaced by random linear codes over GF Moreover, no expurgation of bad codewords is required Remark 1: While the only action considered in Theorem 4 and in the rest of this paper is translation, the Delsarte Piret result is more general For example, consider the lattice partition Instead of identifying the input alphabet with, we may identify with via and we may define the corresponding permutation group by and Then it can be verified that this defines an action of on, and furthermore, since by reflection symmetry, depends only on Consequently, the ensemble of random linear codes over GF, or equivalently pairs of random binary linear codes, may be used in coding theorems for the channel (We shall see later that coding for the channel should suffice to approach capacity in practice) Even more generally, this group-theoretic development may be extended to channels, where is any geometrically uniform partition [22] Given that the sphere-bound-achieving input distribution is uniform, it is straightforward to compute the capacity of the channel With an offset, the conditional probability density function is Thus from Lemma 2, we have again To compute,we observe Lemma 6 (Uniform Input -Aliased Output): If the input probability distribution to a channel with offset is uniform, then the output probability density function is where is The differential entropy of this distribution where is the differential entropy of restricted to Proof: If the input distribution is uniform, then the output density is the -periodic infinite-support -aliased Gaussian density restricted to and scaled by, since For the second part, noting that is the disjoint union (mod- )of fundamental regions of, we observe that we may regard as the sum (mod- )oftwo independent random variables and, where is a uniform discrete -ary random variable with entropy, while is a continuous random variable over with density, and thus with differential entropy Therefore, We conclude therefore that Theorem 7 (Capacity of the the -channel is -Channel): The capacity of where and are the differential entropies of - and -aliased WGN densities, respectively Since (Theorem 5) follows, our capacity separability result

828 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Fig 4 Diagram for alternative proof of capacity separability We remark that while we have defined the channel as a restriction of the mod- channel, the mod- channel may alternatively be regarded as the limit of a -channel for fixed as goes to zero ie, as a channel In the limit, the alphabet becomes the entire fundamental region, and the -periodic output density becomes -periodic; ie, constant The capacity loss due to using the discrete signal set rather than is precisely the capacity of the mod- channel This loss becomes arbitrarily small as goes to zero, and thus goes to F Alternative Proof We now present an alternative, instructive derivation of the capacity separability theorem along lines suggested by a reviewer As shown in Fig 4, let the input to an AWGN channel be, where is a continuous random variable uniformly distributed in, and is a discrete -valued random variable uniformly distributed in Then is a continuous random variable uniformly distributed in Let the output be reduced first to and then to Since is a sublattice of, we have, in fact, Consider the Markov chain By Lemma 6, the conditional density is Now the alternative proof of Theorem 5 goes as follows: By the chain rule of information theory, ; By Lemma 8, ; From to we have a uniform-input mod- channel, so ; From to we have a uniform-input mod- channel, so ; From to we have a uniform-input modchannel with random offset Given, the offset may be subtracted at the receiver, so the capacity of the channel with offset is independent of : Hence Remarks : 1) Following this proof, it is clear that the capacity separability theorem holds for any additive-noise channel, not just for AWGN channels 2) This argument shows that if we have a mod- channel, and code separately for on the mod- channel (with a mod- receiver front end) and for on the channel, then capacity is not reduced This foreshadows later results for multilevel codes 3) This proof resembles the well-known proof using the chain rule that multilevel codes with multistage decoding can achieve capacity, or at least the equiprobable-input capacity Here the argument would be that since a -periodic function The joint density is thus Consequently, if otherwise if otherwise Therefore, does not depend on ; ie, is conditionally independent of given In other words (see [28, Problem 218] or [7, Secs 28, 210]): Lemma 8 (Sufficiency of mod- Output): In Fig 4, if is uniform, then the output of the mod- map is a sufficient statistic for ; the mod- map is information-lossless; ie, ; forms a Markov chain with and if, else we can code for at a rate up to, decode with negligible error probability even in the presence of interference from, and then, given at the transmitter and receiver, code for at a rate up to and decode with negligible error probability From this perspective, what our argument adds is the observation that is sufficient for, so with the modmap we can completely eliminate interference from with no loss of information (in addition to other benefits, such as making the channel highly symmetrical) The philosophical difference between our approach and the chain-rule approach is perhaps best illustrated by the fact that the chain-rule argument works perfectly well with the alternative decomposition whose practical utility is much more questionable (despite the encouraging results of [56] and [57]) 4) By an extension of this development, we can argue that the output of the mod- map is quasi-sufficient in the usual

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 829 AWGN channel context illustrated in Fig 1 Whereas in Fig 4 the second-level input is uniform over, in Fig 1 the distribution of the second-level input will typically be Gaussian-like If the SNR is large, however, then the distribution of will typically be quasi-uniform on the scale of Then, following the arguments in the proof of Lemma 8, will be almost -periodic, so will be almost conditionally independent of given, the output of the mod- map will be a quasi-sufficient statistic for, and the mod- map will be almost information-lossless; ie, Thus we expect that at large SNR s we can obtain arbitrarily small error probabilities at rates arbitrarily close to the AWGN channel capacity even with use of the mod- map and multistage decoding IV BASIC PROPERTIES OF We now give lower bounds and asymptotic properties of the capacity of the mod- channel, which hold for any These properties will suffice to prove our main results A Lower Bounds and Asymptotic Behavior It is intuitively evident that as the noise variance becomes large, the mod- channel becomes very noisy and its capacity goes to zero On the other hand, as the mod- channel becomes very clean; the -aliased density approaches the Gaussian density ; the differential entropy approaches that of, namely, ; the capacity thus goes to Theorem 10 (High-VNR Behavior): For any -dimensional lattice and noise variance,wehave As, Thus As, and monotonically approaches zero Proof: From the fact that the -aliased WGN density is obtained by passing a Gaussian variable with density through a many-to-one mod- map, we have, since and are the differential entropies of and, respectively Since the variance of is strictly less than, the inequality is strict However, as, the effects of aliasing become insignificant, the -aliased Gaussian density approaches the Gaussian density, and Again, the error probability must approach zero monotonically because the mod- channel with noise variance is a degraded mod- channel with noise variance is strictly lower-bounded by the piece- In summary, wise-linear curve The breakpoint occurs at, the sphere-bound limit For any lattice, approaches the first asymptote as, and the second as (6) In terms of the volume-to-noise ratio (VNR) be expressed as, this can These properties are formalized in the following theorems: Theorem 9 (Low-VNR Behavior): For any lattice and finite noise variance,wehave As, monotonically approaches zero Proof: By the nonnegativity of mutual information Since the -aliased Gaussian density is restricted to a region of finite volume, its differential entropy is upper-bounded by, with equality if and only if is uniform over But is not in fact uniform if is finite, so and However, as, the mod- channel becomes very noisy, tends toward uniformity, approaches, and The approach must be monotonic because the mod- channel with noise variance is a degraded version of the mod- channel with noise variance, obtained by adding more noise with variance B Normalized Capacity A version of is any lattice that can be obtained from by scaling and/or orthogonal transformation (change of basis) Sometimes it is helpful also to regard an -fold Cartesian product as a version of Since and are invariant to orthogonal transformations, it is clear that is invariant to orthogonal transformations As for scale invariance, if we scale by and by, then is unchanged Since, it follows that depends only on the VNR, which determines the ratio We therefore define the normalized capacity of as a function of by () a universal function that is invariant for any version of Moreover, by normalizing capacity in bits per two dimensions (), we make it invariant under Cartesian products, since by the additivity of mutual information so

830 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Fig 5 Normalized capacity C ( ) versus in decibels, and lower bounds (asymptotes) From Theorems 9 and 10 we have the following lower bounds and asymptotes:, and as ;, and as (or ) In other words, for any lattice,as, eventually increases by 1 for each factor of (301 db) increase in, and approaches Fig 5 shows the computed normalized capacity curve for the integer lattice, which is also the normalized capacity curve for for any (This curve was computed in a different context in [62]) The piecewise-linear lower bound is also shown Note that the lower bound is approached within about 01 at (15 db), or 1/2, and at ( 15 db), or 1/2, respectively Note also that 03, the decoding error proba- C Probability of Error Among all versions of a lattice bility is also clearly a function only of the VNR Since we may normalize the error probability under Cartesian products by defining the normalized error probability (per two dimensions) then Fig 6 plots the normalized error probability curve for the one-dimensional integer lattice, or indeed for for any Note that at 45 db, and at 77 db We also show the curve for a sphere-bound-achieving lattice, a vertical line at the sphere-bound limit (0 db) For plots of the sphere lower bound for various (namely, the error probability that would be achieved if the Voronoi regions were -spheres of volume ), see [51] V SPHERE-BOUND-ACHIEVING COSET CODES We now show that we can construct a sphere-boundachieving coset code with volume-to-noise ratio arbitrarily close to and arbitrarily small, based on any lattice partition such that is sufficiently small and is sufficiently large Such a sphere-bound-achieving coset code is not necessarily a lattice, and for practical purposes there is no need for it to be so However, by choosing in a certain way, it becomes a lattice We thereby obtain a simple proof of the existence of sphere-

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 831 Fig 6 Normalized error probability curve P ( ), and the sphere bound bound-achieving lattices, as previously shown by de Buda [8] and Poltyrev [45] by use of the Minkowski Hlawka theorem Second, the separability result of Theorem 7 shows that these results continue to hold for multilevel coding and multistage decoding using any multilevel refinement of, which allows us to prove the existence of sphere-boundachieving binary lattices based on standard partitions such as Fig 7 Logvolume (L) as a function (3); (3 ); and (C) This implies that has a well-defined volume A Coset Codes A (block) coset code is defined as follows Let be an -dimensional lattice partition, and let be a set of coset representatives for the cosets of in Let be a block code of length over ; ie, a subset of Then is the set of all elements of that are congruent to some codeword in Since the rate of the code we have in bits per two dimensions is for some The normalized volume of is, therefore The alphabet is a group isomorphic to under addition If is a subgroup of ie, a group code [24] then the coset code becomes a group ie, a lattice In general, however, we need not endow with group structure and we need not insist that be a group code, unless we insist that be a lattice Since the set of all that are congruent to a particular codeword is the coset of, is the union of cosets of Thus we obtain a simple expression for the logvolume Lemma 11 (Logvolume Lemma): The logvolume of a coset code based on a lattice partition and a code of rate bits per two dimensions over is The logvolume thus lies between and in the simple proportional manner determined by that is illustrated in Fig 7 A coset code may alternatively be defined using a convolutional code of rate over, in which case is a trellis

832 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 code If is regarded as a group isomorphic to and is a convolutional group code, then is a group trellis code, which is a kind of infinite-dimensional lattice It is shown in [18], using the concept of time-zero lattice, that a trellis coset code has a well-defined normalized volume and logvolume ( normalized redundancy ), and that Lemma 11 continues to hold in this case B Sphere-Bound-Achieving Coset Codes To construct a sphere-bound-achieving coset code, we start with any lattice partition such that is small enough that, which by the capacity separability theorem implies that is large enough that We will show that this condition implies that since otherwise would violate the sphere-bound limit We choose a code over with rate arbitrarily close to bits per two dimensions, such that on the channel the first-level error probability is arbitrarily small The existence of such a code is guaranteed by Shannon s channel coding theorem for the channel We use multistage decoding of as described in Section I, not maximum-likelihood decoding The first-level decoding determines a coset To complete the decoding of, a second stage of decoding is required to find the minimum-distance point in this coset The second-level error probability is The total decoding error probability is denoted by, and by the union bound is upperbounded by Thus is arbitrarily small if both and are arbitrarily small By the logvolume lemma, the capacity separability theorem, and our choices of and, the logvolume is given by Since have equally,we More precisely, we have shown that for any, we can choose a lattice partition and a code such that the resulting coset code satisfies Now by the sphere bound, and by Theorem 10 Therefore, we can conclude both that and that The first conclusion is our main result: Theorem 12 (Existence of Sphere-Bound-Achieving Coset Codes): For an AWGN channel with noise variance,we can choose an -dimensional lattice partition and a code of length such that the resulting coset code over has normalized volume arbitrarily close to and arbitrarily small Secondly, since could be any lattice with sufficiently small, we have also proved an important general relationship between and : namely, if, then must nearly meet the lower bound of Theorem 10: Theorem 13: For any lattice, if, then C Sphere-Bound-Achieving Lattices We now show that Theorem 12 may be extended to give a simple proof of the existence of sphere-bound-achieving lattices Our proof is based on the Delsarte Piret coding theorem for linear block codes over GF on the -ary symmetric channel, rather than on the Minkowski Hlawka theorem However, as Loeliger [37] has observed, the standard proof of the Minkowski Hlawka Theorem (cf [8, Appendix] or [52, pp 534 535]) is also based on mod- lattices constructed using block codes over GF over lattice partitions We use a one-dimensional lattice partition that is a scaled version of, where is a large prime The quotient is then isomorphic to the group, which when is prime is actually a finite field GF We use a random linear block code over GF as our code over (choose all components of all generators at random from GF ) Then The resulting coset code is a mod- ( generalized Construction A ) -dimensional lattice [6] By the Delsarte Piret linear coding theorem, Shannon s coding theorem and Gallager s exponential error bounds remain valid for the regular channel using random linear codes over GF The construction then goes as for Theorem 12 We choose large enough and small enough that is negligibly small; is negligibly small, which implies is a prime number

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 833 Then and by the Delsarte Piret result we can find a linear code over GF of rate bits per two dimensions arbitrarily close to such that is negligibly small, while In sum: Theorem 14 (Existence of Sphere-Bound-Achieving Lattices): For an AWGN channel with noise variance per dimension, there exists an -dimensional mod- lattice such that is arbitrarily close to and is arbitrarily small For practical purposes, there is no great advantage to being a lattice However, if is a lattice, then all decision regions are congruent, so we can say that is the probability of error given an arbitrary transmitted point in, rather than just the average error probability over all points Also, if is a lattice, then we may consider mod- or channels D Discussion There is nothing difficult or exotic about these theorems They depend only on Shannon s channel coding theorem, the asymptotic properties of, and, for Theorem 14, the strong symmetry properties of the channel Theorem 12 verifies de Buda s claim that some optimal codes have structure Indeed, the coset codes of Theorem 12 are considerably more general than de Buda envisioned, as well as more attractive for practical applications In the next subsection, we will see that multilevel codes with even more structure can still achieve the sphere bound Theorem 12 would hold equally well on a mod- channel rather than a channel ie, if we used a random code over an entire fundamental region rather than over a discrete -point subset From a practical point of view, using the channel is more attractive, and the cost is negligible provided that is small enough that is negligibly small The lattices and may be versions of standard lattices in low dimensions; indeed, there is no reason not to choose versions of the integer lattice, and there is no reason not to choose From Fig 5, we see that is about 01 when is about 15 db (or is about 1/2 ), so from a practical point of view, does not need to be extremely small relative to On the other hand, the error probability when is about 77 db We conclude that may be chosen as an appropriately scaled version of a partition (or less; see the practical guidelines below) and still meet the above criteria in a practical sense If is taken to be a group isomorphic to and the code is a group code over, then the resulting coset code is a lattice We conjecture that the Delsarte Piret results may be generalized to show that random group codes over can achieve the capacity of a regular channel Then Theorem 14 would hold for lattices constructed from random group codes over, which would be a much more general class of sphere-bound-achieving lattices than mod- lattices We do not pursue this approach because the multilevel Constructions D and E below achieve the same effect E Sphere-Bound-Achieving Multilevel Coset Codes A practical objection to the sphere-bound-achieving coset codes of Theorem 12 could be that the order of the lattice partition is unbounded (As we have seen, in practice need not be large) We will now see that multilevel coset codes can achieve capacity using multiple small partitions, even binary partitions Suppose that is a lattice partition that can be refined into a lattice partition chain Then it follows immediately from the capacity separability theorem that: Theorem 15 (Lattice Partition Chain Capacity): If is a lattice partition chain, then Proof: From Theorem 5 The significance of this theorem is that the sphere bound may be approached just as closely by a multilevel coset code based on the lattice partition chain and multistage decoding as by a single-level coset code over Fig 8 illustrates a multilevel coset code with multistage decoding For each partition in the lattice partition chain a code over selects a sequence of coset representatives in a set of representatives for the cosets of It is helpful to think of each alphabet as being used by a different user The symbols from each user are summed at the channel input to form a coset representative for a coset of in, which may be reduced to a fundamental Voronoi region if desired Finally, an element is added to At the receiver, the received symbol is first reduced, which removes all dependence on lower levels, and then a decoder for the first-level code decodes the sequence of first-user symbols The decoded first-user symbols are then subtracted from the received symbols (again, this involves a delay which is not shown in Fig 8) This removes any dependence of lower levels on the first level, provided that decoding is correct The second through th levels are then decoded in the same way The last step is to decode the symbols using a decoder for Consequently, coding and decoding are performed independently at all levels By Shannon s channel coding theorem, it is possible to choose the code rate of the th-level code to be arbitrarily close to the capacity at that level and to decode with an arbitrarily small error probability, provided that decoding at previous levels has been correct By the union bound, the overall probability of incorrect decoding is upper-bounded by the sum of the probabilities of incorrect decoding at each of the levels, which can be made arbitrarily small

834 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Fig 8 Multilevel coset code and multistage decoding Given a bi-infinite lattice partition chain such as, we may clearly choose a top lattice in the chain such that and a bottom lattice such that, so the conditions of Theorem 12 are satisfied In combination with Theorem 15, this yields the multilevel analog of Theorem 12 Theorem 16 (Sphere-Bound-Achieving Multilevel Coset Codes: For an AWGN channel with noise variance per dimension, for any bi-infinite -dimensional lattice partition chain, there exists a finite subchain and a multilevel coset code of length over such that the normalized volume is arbitrarily close to and is arbitrarily small A similar result for finite-constellation multilevel codes was derived in [32] and [33], and independently in [30] and [57] Those developments were based on the chain rule of information theory, whereas our development is based on our capacity separability theorem By using mod- receivers, or equivalently by thinking in terms of infinite constellations, we have eliminated edge effects and developed universal capacity curves for the types of lattices most often used in practice, thus obtaining results that are somewhat cleaner and more general Nonetheless, our results for multilevel coset codes are conceptually and qualitatively quite similar to these prior results As pointed out by a reviewer, the notion that each coding level in a multilevel code may be thought of as a different user suggests possible connections with multiuser information theory, specifically with the Gaussian multiple-access channel (see, eg, [47]) However, there are important differences between multilevel coding and multiple-access channels In the former case, the different users are tightly synchronized in both time and in amplitude/phase, whereas in the latter user synchronization is in general infeasible Moreover, in multilevel coding, as in Fig 8, all users operate independently without interference from other users, whereas on a multiple-access channel, even with a single receiver, the best one can do in general is successive cancellation, where each user suffers interference from as-yet-undecoded users We therefore doubt whether deep connections can be made between the two theories However, some connections may be possible; eg, multilevel coding might be used by each user in the rate-splitting idea of [47] F Sphere-Bound-Achieving Binary Lattices We now develop a multilevel analog to Theorem 14, using a Construction D lattice [1], [6] We use a one-dimensional lattice partition chain that is a scaled version of, where is an integer Then GF for all We construct a set of nested random binary linear codes as follows We randomly choose binary -tuples, and let be the code generated by the first generators We choose large enough so that for there is a code in this set with rate arbitrarily close to Using the nested binary linear codes we construct a multilevel Construction D lattice as follows: where denotes ordinary addition in It is easy to see that is in fact a lattice A binary lattice is defined in [19] as an -dimensional lattice such that is a lattice partition chain for some integer A Construction D lattice is evidently a binary lattice For our purposes has the same properties as a multilevel coset code based on and the codes,except that must be a lattice In particular Also, decoding at the th level given the decoded binary coefficients at all previous levels can be performed by decoding the coset code based on and ; the probability of error will then be The Delsarte Piret linear coding theorem holds for each random linear code, since the th-level channel is regular, and, therefore, the

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 835 decoding error probability may be made arbitrarily small at each level Of course, we again choose large enough and small enough that and, so that The total rate is then arbitrarily close to, so By the union bound, the probability of any decoding error at any level (including is arbitrarily small Therefore, we have proved Theorem 17 (Existence of Sphere-Bound-Achieving Binary Lattices): For an AWGN channel with noise variance per dimension, there exists a Construction D lattice based on a chain of two-way one-dimensional lattice partitions and nested binary linear codes of length such that is arbitrarily close to and is arbitrarily small Multilevel constructions based on and binary codes are more attractive for practical applications than constructions based on and -ary codes, as in the Minkowski Hlawka theorem and de Buda s result We will see that two-level constructions based on suffice for practical purposes Finally, Theorem 17 may be extended straightforwardly to Construction E lattices [3], [6] based on higher dimensional chains such as,,, and so forth VI NORMALIZED CAPACITY CURVES In this section we study the normalized capacity and probability of error functions in more detail, for design guidance Many of these results were presented in [25] We first show that if is a sphere-bound-achieving lattice, then the normalized capacity curve must approximately attain its lower bound (6) everywhere Theorem 18: If for all is a sphere-bound-achieving lattice, then Proof: If is a sphere-bound-achieving lattice, then for, so By Theorem 13 and the monotonicity of for and of for, the normalized capacity must therefore approximately achieve its lower bound everywhere Consequently, if is a sphere-bound-achieving lattice and is a sublattice, then there is no appreciable loss of capacity if we use a channel rather than a mod- channel B First-Order Estimates For large, the first-order estimate derived in the Appendix has the following intriguing form, which somewhat quantifies Theorem 13 Theorem 19 (Large- Estimate): For large It appears likely that this estimate may actually be an upper bound on, but we have not been able to prove this conjecture In any case, this expression indicates how close remains to its lower bound when is small The small- first-order estimate derived in the Appendix also has an intriguing form: Theorem 20 (Small- Estimate): For small Here is the dual lattice to, defined by Since for all, Theorem 20 says roughly that This result suggests that the closeness of the curve to this lower bound is a measure of goodness of a lattice for the problem of infinite-constellation coding on the AWGN channel For a lattice with positive coding gain, we thus expect to lie between and the lower bound We have already given curves for and, the normalized capacity and error probability curves for integer lattices (Figs 5 and 6) Here we give curves of normalized capacity for the lattices (the two-dimensional hexagonal lattice) and (the four-dimensional checkerboard lattice) Although they fall below, they are not very different We also give two first-order approximations to that are valid for large and small, respectively, which are derived in the Appendix We shall see that they compare quite well to the computed curves for,, and A Normalized Capacity for Sphere-Bound-Achieving Lattices We first observe that Theorem 13 implies that sphere-boundachieving lattices must approximately achieve the lower bound (6) everywhere However, it is doubtful whether there is any deep significance to this estimate C Normalized Capacity Curves Fig 9 shows the normalized capacity compared to these large- and small- estimates We see that the large- estimate is quite accurate down to (0 db); it overestimates by about 3% at 3 db and 17% at 0 db The small- estimate is even better; at 0 db, it underestimates by less than 5%, and even at 3 db, it underestimates by less than 12% Fig 10 shows similar curves for the hexagonal lattice They do not differ much from those for ; in particular,, compared to In this case the large- estimate is good down to about 1 db, while the small- estimate is good up to about 1 db, although it is now an overestimate Finally, Fig 11 shows similar curves for, which are still not very different from those for ; in particular, In this case, the large- estimate is again good down to

836 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Fig 9 Normalized capacity C ( ) in bits per two dimensions as a function of in decibels, with large- and small- first-order approximations Fig 10 Normalized capacity C ( ) in bits per two dimensions as a function of in decibels, with large- and small- first-order approximations below 0 db, while the small- estimate is notably less accurate than in the previous two examples D Normalized Probability of Error Curves The union bound estimate for is [21] where is the (nominal) coding gain of, is the minimum squared distance between points in, is the kissing number of,, and is the Gaussian probability of error function Since and, the union bound estimate for is (7) Comparing this equation with (7), we see that the curve for may be obtained from the curve for by

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 837 Fig 11 Normalized capacity C ( ) in bits per two dimensions as a function of in decibels, with large- and small- first-order approximations moving the latter to the left by (in decibels) and up by a factor of (on a log scale) At very low, the curve is nearly vertical, and the required to achieve a given is reduced by almost the entire nominal coding gain However, at, the slope of the curve is nonnegligible, and the effective coding gain is reduced from the nominal coding gain by about 02 db for every factor of increase in relative to,if is not too large For example, the nominal coding gain of is (15 db), but since, the effective coding gain of at is only about 12 db VII PRACTICAL GUIDELINES A significant contribution of Wachsmann et al [56], [57] was to take seriously the prescriptions of information theory for the design of finite-constellation multilevel coset codes In a similar spirit, we now discuss the implications of our results for the practical design of infinite-constellation coset codes and multilevel coset codes The guidelines that we suggest resemble those of [56] and [57] in many respects The kinds of questions that we are interested in are: How should we choose the top lattice and the bottom lattice? What dimension should we use? When and how should we refine into a partition chain and use multilevel coset codes? We define a target error probability, usually of the order of, and we aim to construct a coset code that approaches the sphere-bound limit as closely as possible while achieving We define as the value of for which ; then the maximum possible effective coding gain at the target error probability is For example, since at 77 db, the maximum possible effective coding gain at is 77 db At, the maximum possible effective coding gain is 45 db A Design Guidelines for Coset Codes For the bottom lattice, the requirement for no further coding at lower levels is that, which according to Theorem 13 will ensure that as well If is a version of, then this requirement is simply More generally, we define the effective coding gain of a lattice that achieves at as The bottom lattice requirement then generalizes to Requirement 1 (Bottom Lattice): To achieve, must satisfy Similarly, we define the effective coding gain coset code that achieves a given at as Since by Theorem 11 we have of a

838 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 it follows that Proof: If, then with approximate equality if and only if Requirement 1 is met with approximate equality In words, the effective coding gain of in bits per two dimensions is not more than the effective coding gain of the bottom lattice plus the code rate in bits per two dimensions In turn, the code rate is upper-bounded by the normalized capacity of the channel We have seen that if the top lattice is chosen so that, then,, and the effective coding gain approaches its upper limit The basic requirement on the top lattice is thus Requirement 2 (Top Lattice): must be small enough that In particular, must be less than In practical applications, there are usually reasons for desiring that not be too small, or equivalently that the density of not be too great For practical implementation, it is desirable that the number of constellation points not be too large Moreover, tentative decisions are often made on elements of for immediate feedback into tracking loops The accuracy of these tentative decisions is determined by the raw error rate We define the code redundancy as bits per two dimensions Since by Lemma 11,,wehave or equivalently To illustrate this tradeoff, assume that the top lattice is a version of (We have found no reason to prefer any other top lattice) If we choose, then 03, so by Lemma 21 the code redundancy will be 03, and at best we can achieve 03 (09 db) If we choose 05 ( 15 db), however, then 01, so we can achieve 01 (03 db), but with a modestly larger code redundancy (constellation expansion) of 06 Choosing smaller than 05 ( 15 db) will permit us to get only slightly closer to the sphere-bound limit (0 db), at the cost of substantially increasing code redundancy and constellation expansion For example, choosing 1 ( 3 db) reduces to 003, but increases code redundancy to more than 1 Therefore, it appears that choosing the top lattice as a version of with 05 ( 15 db) is a good engineering choice At this value of, the raw error rate is about per two dimensions (ie, 85% per dimension), which may be marginally adequate for tentative decisions At, there is therefore a spread of about 92 db (31 ) between the desired normalized values of 15 db and 77 db of the top and bottom lattices, assuming that they are both versions of This implies that partitions of the type will not suffice at this error rate Partitions that have an adequate spread in this sense are It follows that the top lattice is times as dense in as a version of with the same logvolume as The 2-D constellation expansion ratio (CER) is therefore [18] spread spread spread CER We will assume as a design objective that CER, or equivalently the code redundancy, should not be any larger than necessary There is a tradeoff between and that may be expressed as follows Lemma 21: If and, then where is the standard eight-way partition used in Ungerboeck-type 2-D trellis codes [55] In all cases, the code redundancy should be about 05 We note that the eight-state 2-D trellis code used in the V32 modem [59] uses the partition and has a code redundancy of 1, which from this analysis is larger than necessary At, the minimum is only 45 db, so the desired spread is only about 6 db (2 ), which can be met by partitions of the type, with which we can use binary linear codes The code redundancy should still be about 05 Conversely, if, then a larger partition like will be required

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 839 If instead we use versions of as the top and bottom lattices, the required spread is reduced by the effective coding gain of, which is about 05 db at ; thus the spread needs to be only about 87 db (29 ) Partitions with this approximate spread include spread spread spread spread where and are the three-way and seven-way partitions of, respectively In four dimensions, there is a chain so we may choose the top and bottom lattices to be versions either of or We see no reason to prefer over as the top lattice For the bottom lattice, however, we prefer a version of to minimize the spread If the bottom lattice is a version of, then the spread needs to be about 92 db (31 ), whereas if the bottom lattice is a version of, then since the effective coding gain of is about 12 db, the spread needs to be only about 80 db (27 ) The two candidate partitions are spread spread The former is preferable, since the rate of the a code based on will have to be higher, say 25 instead of 20 The code trellis may therefore need more states to get the same performance, and will have 32-way rather than 16-way branches, so the decoding complexity will be higher (Decoding is trivial) The only cost is that the spread is effectively 01 poorer, due to the fact that the effective coding gain of is 01 less than its nominal coding gain More generally, we can conclude that if there is an -dimensional lattice with the same minimum distance as but less volume (higher density), then it will be modestly preferable to code over rather than over This is in fact the choice that Wei [60] and other trellis code designers have made The partition is actually used in the 64-state 4-D Wei code, which is the most powerful code included in the V34 modem [61] In general, Wei 4-D codes have a normalized redundancy of 05 [60], which as we have seen is a good choice from the point of view of the tradeoff between constellation expansion and coding gain Does the dimension of matter? On the one hand, for Cartesian product partitions, a code over may be used symbols at a time to code over, so there is no real difference Moreover, one-dimensional partitions are attractive for implementation, and there is no great penalty to using, eg,, and letting be much greater than if necessary On the other hand, a desired spread is more easily achieved in two or four dimensions, where the partitions are finer grained, at the cost of increasing We conclude that dimension does not make very much difference, although in general it seems desirable to choose to be a simple low-dimensional partition What appears to be more important is to choose the code redundancy correctly, perhaps of the order of 05 In traditional trellis codes such as those of Ungerboeck [55] and Wei [60], the code redundancy is fixed at 1 bit per dimensions, which links and ; to achieve 05 with this constraint requires a 4-D code But this constraint is unnecessary For example, one could use a binary code of dimensionless code rate with the eight-way partition in order to get 05 B Design Guidelines for Multilevel Coset Codes We consider standard chains of two-way lattice partitions such as,, or, for which we can use binary codes at each level This has been the approach taken in most previous work on multilevel codes eg, Constructions B E [5], [6], [26], [29], [31], [35], [46], [56] We consider also combining levels We continue to aim for a target error probability of, and we aim for a top lattice with 15 db, where 01 1) : We first consider the one-dimensional chain For simplicity, we now refer to these lattices as,, and, respectively, with the scaling implicit The values of (in bits per two dimensions and decibels) and capacity (in bits per two dimensions) for each lattice is, db 15 db 45 db 105 db Thus 1392 and 1996, which implies dimensionless code rates of about and on the two levels, respectively The coding problem is thus very different on the two levels The bottom level does not require a very powerful code, since the raw error rate of the middle lattice is about In fact, this raw error rate needs only to be driven down to about, and furthermore soft decisions can be used, so the code minimum distance can be quite small The problem is to do this with zero redundancy! or rather, with as little redundancy as possible In fact, prior multilevel code designs have often used a single-parity-check code with minimum distance on the lowest level, or at most an extended Hamming code with minimum distance At the top level, the coding problem is to approach the capacity of the binary one-dimensional channel with -aliased WGN when the capacity is of the order of 07 bits per dimension (b/d) Indeed, prior multilevel code designs have often used a powerful code of dimensionless

840 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 rate of the order of 05 b/d at the top level, and prior authors have often remarked that performance and complexity are both dominated by this level 2) : We next consider the two-dimensional chain, which just suffices to approach capacity at if we set 12 db ( 04 ) and 77 db (26 ) Then we have, db 12 db 18 db 48 db 78 db The capacities of the three levels are thus about 055, 091, and 10, which translate directly to dimensionless code rates The coding problem again differs considerably on the three levels The bottom level again requires a weak code with as little redundancy as possible, and the top level requires a powerful code of rate near The middle level requires a high-rate code ( ) operating at a fairly high raw error rate ( per two dimensions), so its design is also challenging We may consider the option of combining levels If we combine the first and second levels to code over, then the problem is no different from coding over as in the one-dimensional case The fact that in two dimensions we need code only over rather than at the bottom level may however somewhat reduce coding complexity and delay, so this approach may be attractive If on the other hand we combine the second and third levels, then we have the problem of coding over at a dimensionless code rate close to when the top lattice has a fairly high raw error rate ( per two dimensions), which appears to be no more attractive than the original middle-level problem 3) : Finally, let us consider the four-dimensional chain, which just suffices to approach capacity at if we set 10 db ( 03 ) and 65 db (22 ) Then we have, db 09 db 06 db 21 db 36 db 51 db 66 db The capacities of the five levels are thus about 019, 041, 043, 05 and 05 We note that the and curves appear to be related such that the capacity of two adjacent steps such as and is more or less equal In this case it appears that it might be desirable to combine the first three levels and to code over the eight-way partition with a code whose dimensionless rate is about This could be a rate- binary convolutional code, such as is used in the Wei 16-state 4-D V34 code [60], but much more powerful Since the raw error rate of will then be of the order of, the second-level code can be a weak code of as high a rate as possible over A simple two-level code of this type was actually proposed for the V34 modem [13] C Discussion We may briefly summarize our conclusions as follows To construct a coset code with approaching the sphere-bound limit and error probability less than some target error probability : 1) One may use the partition down to, down to, and down to very low, or alternatively Cartesian products of these partitions It can be advantageous to replace the bottom lattice by a denser superlattice with the same minimum distance; eg, rather than 2) The code redundancy should be of the order of 05 to achieve a reasonable tradeoff between coding gain and constellation expansion 3) An attractive multilevel approach may be a two-level code over a chain, where the middle lattice is chosen to have a raw error rate of the order of, so that the bottom-level code may be a simple high-rate code Performance and complexity will then be dominated by the top-level code, which is simplified by being over the symbol alphabet rather than over 4) We do not expect codes over partitions of to be much better (or worse) than codes over partitions of, since and Our first conclusion is consistent with the practical Euclidean-space trellis codes designed by Ungerboeck [55] and subsequent inventors Our second conclusion has come to be appreciated in the decade since [18], and in our judgment accounts for the current popularity of Wei-type 4-D trellis codes [60] Our third conclusion remains to be practically exploited Our final conclusion is consistent with limited results to date (see, eg, [20]) on trellis code constructions over hexagonal lattices, as well as with mathematical constructions of dense lattices (eg, the coexistence of binary and ternary constructions of and, and the modest superiority of the -based lattices and in their respective dimensions [6]) VIII EXPONENTIAL ERROR BOUNDS In practice it is not possible to use a code with rate equal to capacity Often the computational cut-off rate is taken to be the practical capacity [43], although recently turbo codes [2] and low-density parity-check codes [42] have shown that low error probabilities can be obtained for rates well above One might be concerned, particularly with multilevel codes, that the

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 841 gaps from capacity at each level might add up in such a way as to seriously degrade the practically achievable VNR of a coset code In this section we will address this concern by computing Gallager s exponential error bounds [28] for coset codes over channels When is small and is large, we obtain Poltyrev s sphere-packing and straight-line error exponents [45] If is isomorphic to, then these coset codes can be lattices, and Poltyrev s bounds for optimal lattices may be obtained from the Delsarte Piret linear coding theorem rather than the Minkowski Hlawka theorem These bounds imply that is only a factor of (168 db) away from the Shannon limit on a high-snr AWGN channel, as has been observed by authors since Shannon [50] Moreover, can be straightforwardly achieved by sequential decoding of trellis coset codes over channels [41], [44], [58] Finally, these results support the approach suggested in the previous section, ie, a two-level approach based on a two-level partition chain, where the top level uses a probabilistic code and the bottom, a low-redundancy algebraic code A Bounds for Small We compute Gallager s exponential error bounds for an -dimensional channel, where again we assume sufficiently large and sufficiently small It will be convenient here to use units of nats rather than bits Gallager s bounds are expressed in terms of the Gallager function [28], defined for the channel by which approaches a constant for small enough This constant must be, since It follows that where we have defined In summary: Lemma 22: For any -dimensional lattice partition with sufficiently small and sufficiently large, the Gallager function is Gallager then shows that the following bound on average probability of error holds over an ensemble of random codes of length and rate nats per symbol, or nats per two dimensions, for any in the range [28]: where we have substituted Differentiating with respect to, we find that the exponent is maximized for where we use the fact that the equiprobable input distribution maximizes, since the channel is regular and hence symmetric [10] We evaluate this function by approximating by, since is large Then we observe that provided that and for if (nats/2d) Thus a Gaussian-shaped function with variance over these func- Therefore, the average probability density tions is, where where, substituting, (8) In summary: Theorem 23: For any -dimensional lattice partition with sufficiently small and sufficiently large, there exists a coset code based on a code of length over with VNR such that on the channel where the error exponent is defined in (8)

842 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 The error exponents, and, are the same as the sphere-packing and straight-line error exponents found by Poltyrev [45], using the Minkowski Hlawka theorem for lattices The sphere-packing bound is known to be asymptotically tight Theorem 23 is based instead on Gallager s bounds and the symmetry of channels; it applies to coset codes rather than lattices; and it appears to be slightly sharper in that there is no term in the exponent, although we have not quantified our approximations On the other hand, we have not developed an analog to Poltyrev s improved expurgated exponent for The key facts about are is nonnegative for all ; that is, there exist coset codes with error probability decreasing exponentially with dimension whenever exceeds the sphere-bound limit The straight-line exponent is nonnegative for all (168 db); ie, on a channel with small and large, is 168 db from the spherebound limit The critical rate marking the boundary between and occurs at (3 db), where As a practical matter, it appears that Theorem 23 holds approximately whenever is large enough that, and is small enough that In other words, the raw error rate needs to be only of the order of, and needs to be only of the order of 15 to 3 db If is isomorphic to an elementary Abelian group, then by the results of Delsarte and Piret [10] we may use an ensemble of linear codes over GF in Gallager s theorem, in which case the coset codes become lattices Then we obtain: Corollary 24: For any -dimensional lattice partition isomorphic to, for sufficiently small and sufficiently large, there exists a lattice based on a linear code of length over GF with VNR such that on the channel which of course implies that is even larger Our aim is to develop exponential error bounds for the lower levels of a multilevel coset code In this regime the channel is almost noiseless, in the sense that the overlaps between the densities for different inputs are small Therefore, we expect that both the capacity and the cutoff rate will be nearly as large as they can be, namely, Therefore, we compute only the straight-line portion of Gallager s error bounds, which depends only on We may again approximate by, since is large Then, following [43], we observe that From the symmetry of the Assuming that which with the approximation the union bound estimate channel, it follows that for any is large, we obtain and This corollary is somewhat broader than Poltyrev s result, since the lattice partition can be of a type other than the type that is used in the standard proof of the Minkowski Hlawka theorem; eg, it applies to or to Many other results follow from the Gallager function For example, the Viterbi Yudkin exponential error bounds for convolutional codes with maximum-likelihood (Viterbi algorithm) decoding [16] are based on, as are the bounds for sequential decoding [17] B for Large We now assume that is large enough that becomes Substituting the cutoff rate in nats per two dimensions, the normalized error probability, and,wehave Lemma 25: For any -dimensional lattice partition with small raw error rate and sufficiently large, the computational cutoff rate in nats per two dimensions is approximated by

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 843 This compares interestingly to our earlier estimate of the capacity in Theorem 19, which in nats per two dimensions is The main point is that as soon as is large enough that the raw error rate is small, there is a negligible gap between or and the maximum rate that can be transmitted over a channel Therefore, in principle there need be no significant loss in in a multilevel coset code due to code redundancy at the lower levels, once we reach a level where the raw error rate is small There remains of course the practical problem of finding and implementing codes with negligible redundancy at these lower levels C Discussion These results support the two-level design approach suggested at the end of the previous section The top level should use a coset code based on a partition such that is of the order of 15 db, so that 01 and 8% per dimension; is such that per dimension Examples of such partitions are,, or The top-level code should have a normalized redundancy of the order of 1/2 ; eg, a binary code of rate over, or a ternary code of rate over or a binary code of rate over Decoding at the top level should be probabilistic ; eg, Viterbi algorithm decoding of convolutional codes [15]; sequential decoding of convolutional codes [63], which can approach within about a factor of (168 db) of the Shannon limit; iterative decoding of low-density parity-check codes, which can closely approach the Shannon limit [42]; iterative decoding of turbo codes, which can also closely approach the Shannon limit [2] The bottom level should use a coset code based on a partition such that is no greater than the target error probability For, good choices for are,,,or We remark that coding for a channel is not the same as for an ordinary AWGN channel, particularly at low SNR The -aliased WGN density is more uniform than a Gaussian noise density and lacks tails The aliasing is another way of expressing the nearest neighbor multiplication effect that has long been observed with multilevel codes While the symmetry of the channel ensures that the usual codes for -ary symmetric channels should be perfectly appropriate, designing probabilistic decoding methods for such a non-gaussian noise density may require a nontraditional approach The bottom-level code should have as low redundancy as possible, with just enough error correction to drive the raw error rate down to the target A single-parity-check or extended Hamming code may suffice The larger the field GF over which is defined, the less redundancy is needed for the same error-correction capability However, the field GF must match the partition Decoding at the bottom level should probably be algebraic, since will have very little redundancy Furthermore, this is the regime in which traditional algebraic coding concepts such as bounded-distance decoding and the union bound estimate are valid Preferably, the algebraic decoding method should use soft decisions, although as pointed out in [57] hard decisions may be nearly as effective when the normalized rate is near For maximum performance, Reed Solomon (RS) or Bose Chouhudri Hocquenghem (BCH) codes with error-correction, erasure-and-error correction, or generalized minimum-distance decoding are good candidates [14], [26] The two-level probabilistic/algebraic approach suggested here has much in common with the philosophy of concatenated codes [14] Both aim to approach capacity as closely as possible with reasonable complexity, and both address the problem with a probabilistic code on a noisy channel combined with an algebraic code on a clean channel However, in the present setting RS codes are likely to be more powerful than needed for the second level (unless they are needed for other purposes such as burst-error correction) Previous work on multilevel coset codes has tended to be either all-algebraic (eg, most multilevel lattice or trellis-code constructions that maximize the nominal coding gain by equalizing distance at all levels) or all-probabilistic (eg, the multilevel turbo coding scheme of [56] that chooses code rates approaching capacity at all levels) We propose that a mix-andmatch combination of these two approaches might be a better way of approaching capacity while optimizing the tradeoff between performance and complexity IX SPHERICAL CONSTELLATIONS AND DE BUDA S RESULT Up to this point, this paper has focused entirely on the properties of infinite arrays In this section we will briefly consider the properties of constellations consisting of a finite number of points from such arrays when used on a power-constrained AWGN channel Using the Minkowski Hlawka theorem of lattice theory, de Buda [8] showed that spherical -dimensional lattice constellations bounded by an -sphere of squared radius could achieve a rate of bits per two dimensions with lattice decoding, which is almost the true capacity bits per two dimensions of the peak-constrained AWGN channel Subsequently, de Buda [9] and Linder, Schlegel, and Zeger [36] showed that the true capacity could be achieved with peak-constrained lattice constellations constrained to lie in a thin spherical shell, provided that the decoder uses minimumdistance (maximum-likelihood) decoding rather than lattice (in-

844 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 finite-constellation) decoding However, with this kind of decoding many of the benefits of lattice structure are lost [36] Loeliger [38] reproved de Buda s spherical-constellation result using standard averaging arguments for linear codes over the -ary field GF lifted to mod- lattices, and conjectured that is in fact the capacity with lattice decoding On the other hand, Urbanke and Rimoldi [54] have recently shown that peak-constrained spherical lattice constellations can in fact achieve the true capacity, provided that the decoder uses minimum-distance decoding At this point, therefore, the main open question is whether can be achieved using lattice decoding In this section, we observe that the difference between and corresponds to the difference between the average energy of a discrete spherical constellation and the average energy of its continuous support region We show that if the support region has asymptotically the average energy of a sphere of the same volume, then can be achieved with a spherical constellation based on a sphere-bound-achieving lattice and lattice decoding However, this assumption, though plausible, remains unproven Furthermore, our approximations are in any case too crude to distinguish between and Finally, our argument appears to conflict with the arguments of [8] and [38] Therefore, the question of whether can be achieved using lattice decoding remains open A Argument Consider a spherical constellation consisting of the least-energy points in an -dimensional sphere-boundachieving lattice, or a translate The rate of is thus where,, and are the average energies per dimension of, and, respectively Proof: Let be a continuous random variable whose distribution is uniform over, and let be a discrete random variable whose distribution is equiprobable over ; then is a continuous random variable whose distribution is uniform over the support region Then since and are independent and the mean of is zero We remark, however, that if we consider the ensemble of all translates of, where is a continuous random variable with a uniform distribution over, then the average energy of the translates is precisely the average energy of the support region, as would be predicted by the usual continuous approximation We now show that if is sphere-bound-achieving, then achieves the true capacity, provided that approximates the average energy per dimension of an -sphere of volume Moreover, the difference between the continuous approximation and the improved continuous approximation is the difference between and The normalized second moment of an -dimensional region is defined as The support region of will be defined as the union of the minimum-distance regions of the points, where is the Voronoi region of Its volume is thus times the volume of each of these regions The average energy per dimension is defined as the average energy per dimension of a continuous random variable whose distribution is uniform over Let denote the average energy per dimension of an equiprobable discrete distribution over the constellation The usual continuous approximation is that [21] The following lemma shows that this is a slight overestimate Lemma 26 (Improved Continuous Approximation): If is a finite constellation based on a lattice translate, and is the support region of, then where is the average energy per dimension of a uniform distribution over and is the volume of It is well known that for a given and, is minimized if is an -sphere of volume, and that if denotes the normalized second moment of an -sphere, then approaches as We will assume that is large enough that ; this can be ensured if necessary by replacing by the Cartesian product for sufficiently large Now we make the following unsupported but plausible assumptions 1) The Voronoi region is sufficiently spherical that 2) The support region is sufficiently spherical that Actually, we will see that the second assumption implies the first, because would then imply that we could transmit reliably at rates above capacity The improved continuous approximation of Lemma 26 then gives (9)

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 845 whereas the original continuous approximation would give (10) The difference between and is the discretization factor identified in [34] Now if is sphere-bound-achieving, then Therefore, and Therefore, achieves the capacity On the other hand, if then achieves only : B Discussion The argument above supports the following conjecture, which is de Buda s desired result Conjecture: Spherical constellations based on spherebound-achieving lattices with lattice decoding can achieve the capacity bits per two dimensions of an AWGN channel under an average power constraint, at least when the signal-to-noise ratio is large However, our assumptions that the Voronoi region and the support region can have normalized second moments arbitrarily close to have not been justified Moreover, our approximations are not precise enough to distinguish between and We note that Shannon originally proved his channel capacity theorem [49] by a soft packing of noise spheres of squared radius into an output sphere of squared radius Our construction is very similar: we use a spherical constellation of size with near-spherical Voronoi regions of squared radius whose union is an approximate -sphere of squared radius This similarity between our construction and Shannon s probabilistic proof is encouraging We expect our conjecture to hold also for spherical constellations based on nonlattice coset codes and multilevel coset codes, but the fact that such arrays need not have congruent minimum-distance regions further complicates the argument In work reported in [64], Poltyrev has proved the long-conjectured result that there exist lattices whose Voronoi regions have normalized second moments arbitrarily close to This provides strong support for our assumption that can have this property, and some encouragement for the corresponding assumption about The assumption that the support region is quasi-spherical may be justifiable only when the constellation is large, in the sense that, which would make this a large-snr result When the SNR is large, there is, of course, no practical difference between and On the other hand, when is small, then it is well known that the Euclidean image of a binary linear code under the standard 2-PAM map can approach the capacity of a peak or average power-constrained AWGN channel Such a Euclidean image is a spherical constellation based on the - lattice corresponding to translated by Thus spherical constellations can approach capacity in the low-snr regime Long, Rimoldi, and Urbanke [40] have in fact shown using the central limit theorem that with a large number of superimposed binary codes, the true capacity can be achieved at any SNR without explicit shaping See also [48], an earlier paper that leads to the same conclusions Our argument uses an average energy constraint, whereas in most of the literature a peak constraint is used We do not expect any difference for large because of sphere hardening ie, the average energy per dimension of a uniform random variable over an -sphere is effectively equal to the peak energy per dimension Obviously if there exist capacity-achieving spherical lattice constellations satisfying a peak energy constraint, then such constellations also satisfy an average energy constraint The following simple argument, due to Urbanke [53], shows that the converse is also true asymptotically Lemma 27 (Equivalence of Average and Peak Energy Constraints): If spherical lattice constellations with average energy per dimension can transmit reliably with lattice decoding at any rate up to, then for any spherical lattice constellations with normalized peak energy per dimension can transmit reliably with lattice decoding at any rate up to Proof: If the average energy of an -dimensional constellation is, then at most a fraction of its points can lie outside a sphere with squared radius Expurgating these points reduces the rate by bits per two dimensions, which becomes negligible as The probability of error of lattice decoding is not affected by expurgation Finally, a serious objection to our argument is that the Minkowski Hlawka (MH) theorem and Blichfeldt s principle show that the original continuous approximation (10) is exact over the average of the MH ensemble of mod- lattices, so that only can be achieved, not [37], [38] More precisely, over the ensemble of randomly translated random modlattices, if is a sphere of squared radius, then the average size of the spherical lattice constellation is and the average energy per dimension is In this paper, the same ensemble has been used when we have used random linear -ary codes over the partition, and, as is well known, most codes in this ensemble are good On the other hand, as we have remarked below Lemma 26, averaging over

846 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 translates may precisely account for the difference between the continuous approximation and the improved continuous approximation As arguments of this type tend to be quite delicate, we must leave final resolution of the question of whether can be achieved using lattice decoding for another day X CONCLUSION This paper proves the existence of sphere-bound-achieving coset codes, lattices, and multilevel coset codes, and thus markedly extends the kinds of constructions that can be used to approach capacity In particular, rather than requiring mod- lattices with a large prime as does de Buda s finite-constellation result or Poltyrev s infinite-constellation result, it encompasses conventional binary lattices based on standard lattice partition chains, as well as nonlattice coset codes and multilevel coset codes Furthermore, it validates the choices of lattice partitions and lattice partition chains that have been found to be actually useful in practical codes In other words, it shows that optimal codes can have the same kind of structure as the best known practical codes Practical trellis codes have usually been single-level coset codes This paper, like [26] and [56], shows that multilevel coset codes based on standard binary lattice partition chains and powerful binary codes may merit more attention Alternatively, it suggests further study of two-level coset codes over, with one level chosen to be a powerful probabilistic code based on a partition, where has a modest raw error rate like, and the second level a low-redundancy algebraic code over aimed at reducing this raw error rate to the target APPENDIX FIRST-ORDER ESTIMATES In this appendix we will develop the first-order estimates for given in Theorems 19 and 20 for large and small, respectively In the large- case, we begin with, to illustrate the principles used in the general case Here we will abbreviate and by and, respectively The capacity in nats per dimensions is then is dominated by the contributions from the two nearest points to Let and We translate to, and integrate over the region, which is the region in which the two closest points in the translated lattice are and Then where with We may write the integral as The sum of the terms and is equal by translation by and, respectively, to the differential entropy in nats per dimension of a one-dimensional Gaussian variable with variance The two terms where the differential entropy in nats per dimensions is with being any fundamental region of A Large- Approximation to When, the key approximation is that the -aliased Gaussian probability density function and are equal under sign inversion, and thus their sum is the correction term By changing variables to and defining, so that is a Gaussian variable with variance and density, this becomes

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 847 For, goes as ; for, goes as Thus is small except near, where Therefore, the correction term is approximated by part of in which a particular lattice point is the second-nearest point in Let be the set of Voronoi-facedefining points ; then there is one such subregion for each lattice point, and is the union of the subregions, which overlap only on their boundaries Thus Since,wehave Consequently, the integral in the correction term may be evaluated as which for large goes to Consequently, the correction term goes to We conclude that Because a lattice is symmetric under sign inversion, if is in then is in, and two opposite subregions and are congruent under sign inversion We compute the contribution to over two such subregions, or equivalently over, which is symmetric about the separating hyperplane midway between and We translate the origin to the midpoint, so that the region of integration becomes is precisely the region in which and are the two closest points in the translated lattice Therefore, for, and consequently, since dimension is, the capacity in nats per In view of the identity where is the inner product between and,wehave We use a coordinate basis in which the first coordinate axis passes through the two translated lattice points and, so that the separating hyperplane consists of all points whose first coordinate is Thus the two lattice points have coordinates where we have used the approximation and the kissing number The expression is the union bound estimate of in one dimension, so in nats per one dimension where and denotes all zeros in the second through th coordinates Furthermore,, where is the first coordinate of We therefore have for In bits per two dimensions, substituting and where again we have B Large- Approximation to For general, we integrate For the general case, in the region using the coordinate system in which the two nearest lattice points are and,wecan write, where from the separability property of Gaussian densities over the Voronoi region, which is the region in which is the closest point in We partition into subregions, each subregion corresponding to that where, is a one-dimensional Gaussian density with mean, and is an -dimensional

848 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 Gaussian density with mean In other words, aliasing affects only the first coordinate of in, to the accuracy of our two-term approximation Now, in nats per dimensions, where is the Dirac delta function and the discrete Fourier coefficients are defined by where is any fundamental region of The inverse Fourier transform yields the Fourier expansion where includes one point from each pair in the facedefining set Arguing as in the one-dimensional case, we can show that Note that from the basic Fourier transform pair obtain a basic orthogonality relation, viz, for all,we (11) The region where is the Kronecker delta function if otherwise The Fourier transform of is easily computed from its definition includes the two subregions and of, as well as two neighboring subregions and Since the union of these regions over includes all of and the union of for all, for small we have where * denotes convolution and is the picket-fence function The Fourier transforms of and are the differential entropy in nats of the -dimensional Gaussian distribution For the correction term, we have where is Consequently, the Fourier transform of by the union bound estimate over for the probability of error per dimensions In summary, in nats per dimensions, with, wehave The discrete Fourier coefficients of are thus Converting to bits per two dimensions, we obtain an estimate in the form of Theorem 19: The Fourier expansion of is then C Small- Approximation to Since the -aliased Gaussian probability density is -periodic, its Fourier transform is defined on the points of the dual lattice for all In general, if is a -periodic function, then its Fourier transform is [23] For, becomes nearly constant, and is well approximated by the low-frequency terms of its Fourier expansion, namely the DC term for and the fundamental frequency terms for, the minimum squared norm in Let denote the set of points in with Then

FORNEY et al: SPHERE-BOUND-ACHIEVING COSET CODES AND MULTILEVEL COSET CODES 849 where the constant is the limit of as, and Using the approximation and the union bound estimate of (7), this small- estimate may alternatively be written in the form of Theorem 20 as is the fundamental frequency component of We approximate with a Taylor series expansion Then ACKNOWLEDGMENT The authors would like to thank R Fischer, R G Gallager, J Huber, F R Kschischang, A Vardy, the reviewers, and especially H-A Loeliger and R Urbanke for helpful comments and suggestions We then obtain an approximation for dimensions:, and the orthog- More- where we use the fact that onality relation (11), which implies over, since and we have Thus in nats per nats per dimensions Substituting and the coding gain where we use the fact that to bits per two dimensions, we obtain, and converting REFERENCES [1] E S Barnes and N J A Sloane, New lattice packings of spheres, Can J Math, vol 35, pp 117 130, 1983 [2] C Berrou, A Glavieux, and P Thitimajshima, Near Shannon limit error-correcting coding and decoding: Turbo codes, in Proc 1993 ICC, Geneva, May 1993, pp 1064 1070 [3] A Bos, J H Conway, and N J A Sloane, Further lattice packings in high dimensions, Mathematika, vol 29, pp 171 180, 1982 [4] A R Calderbank and N J A Sloane, New trellis codes based on lattices and cosets, IEEE Trans Inform Theory, vol IT-33, pp 177 195, Mar 1987 [5] A R Calderbank, Multilevel codes and multi-stage decoding, IEEE Trans Commun, vol 37, pp 222 229, Mar 1989 [6] J H Conway and N J A Sloane, Sphere Packings, Lattices and Groups New York: Springer-Verlag, 1988 [7] T M Cover and J A Thomas, Elements of Information Theory New York: Wiley, 1991 [8] R de Buda, The upper error bound of a new near-optimal code, IEEE Trans Inform Theory, vol IT-21, pp 441 445, July 1975 [9], Some optimal codes have structure, IEEE J Select Areas Commun, vol 7, pp 893 899, Aug 1989 [10] P Delsarte and P Piret, Algebraic constructions of Shannon codes for regular channels, IEEE Trans Inform Theory, vol IT-28, pp 593 599, July 1982 [11] P Elias, Coding for noisy channels, in IRE Conv Rec, vol 3, Mar 1955, pp 37 46 [12] M V Eyuboglu and G D Forney Jr, Trellis precoding: Combined coding, precoding and shaping for intersymbol interference channels, IEEE Trans Inform Theory, vol 38, pp 301 314, Mar 1992 [13] M V Eyuboglu, G D Forney Jr, P Dong, and G Long, Advanced modulation techniques for Vfast, Euro Trans Telecomm, vol 4, pp 9 22, May 1993 [14] G D Forney Jr, Concatenated Codes Cambridge, MA: MIT Press, 1966 [15], The Viterbi algorithm, Proc IEEE, vol 61, pp 268 278, 1973 [16], Convolutional codes II Maximum-likelihood decoding, Inform Contr, vol 25, pp 222 266, 1974 [17], Convolutional codes III Sequential decoding, Inform Contr, vol 25, pp 267 297, 1974 [18], Coset codes Part I: Introduction and geometrical classification, IEEE Trans Inform Theory, vol 34, pp 1123 1151, Sept 1988 [19], Coset codes Part II: Binary lattices and related codes, IEEE Trans Inform Theory, vol 34, pp 1152 1187, Sept 1988 [20], Coset codes III: Ternary codes, lattices and trellis codes, in Proc 1988 Beijing Intl Workshop Inform Theory, Beijing, China, July 1988, pp E-61 E 65 [21] G D Forney Jr and L F Wei, Multidimensional constellations Part I: Introduction, figures of merit, and generalized cross constellations, IEEE J Select Areas Commun, vol 7, pp 877 892, Aug 1989 [22] G D Forney Jr, Geometrically uniform codes, IEEE Trans Inform Theory, vol 36, pp 1241 1260, Sept 1991 [23], Transforms and groups, in Codes, Curves and Signals: Common Threads in Communications, A Vardy, Ed Boston, MA: Kluwer, 1998, pp 79 97

850 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 3, MAY 2000 [24] G D Forney Jr and M D Trott, The dynamics of group codes: State spaces, trellis diagrams, and canonical encoders, IEEE Trans Inform Theory, vol 39, pp 1491 1513, Sept 1993 [25] G D Forney Jr, M D Trott, and S-Y Chung, On the capacity of coset codes and multilevel coset codes, in Proc 1996 Int Symp Inform Theory Appl, Victoria, BC, Canada, Sept 1996 [26] G D Forney Jr and A Vardy, Generalized minimum distance decoding of Euclidean-space codes and lattices, IEEE Trans Inform Theory, vol 42, pp 1992 2026, Nov 1996 [27] E M Gabidulin, Bounds for the probability of decoding error using linear codes over memoryless channels, Probl Pered Inform, vol 3, pp 55 62, 1967 [28] R G Gallager, Information Theory and Reliable Communication New York: Wiley, 1968 [29] H Herzberg, Y Be ery, and J Snyders, Concatenated multilevel block coded modulation, IEEE Trans Commun, vol 41, pp 41 49, Jan 1993 [30] J Huber and U Wachsmann, Capacities of equivalent channels in multilevel coding schemes, Electron Lett, vol 30, pp 557 558, Mar 1994 [31] H Imai and S Hirakawa, A new multilevel coding method using errorcorrecting codes, IEEE Trans Inform Theory, vol IT-23, pp 371 377, May 1977 [32] Y Kofman, E Zehavi, and S Shamai (Shitz), Analysis of a multilevel coded modulation system, in Proc 1990 Bilkent Int Conf New Trends in Commununications Control and Signal Processing, Ankara, Turkey, July 1990, pp 376 382 [33], Performance analysis of a multilevel coded modulation system, IEEE Trans Commun, vol 42, pp 299 312, Feb 1994 [34] F R Kschischang and S Pasupathy, Optimal nonuniform signaling for Gaussian channels, IEEE Trans Inform Theory, vol 39, pp 913 929, May 1993 [35] J Leech and N J A Sloane, Sphere packings and error-correcting codes, Can J Math, vol 23, pp 718 745, 1971 [36] T Linder, C Schlegel, and K Zeger, Corrected proof of de Buda s theorem, IEEE Trans Inform Theory, vol 39, pp 1735 1737, Sept 1993 [37] H-A Loeliger, On existence proofs for asymptotically good Euclidean-space group codes, in Proc Joint DIMACS/IEEE Workshop on Coding and Quantization, Oct 1992, pp 15 18 [38], On the basic averaging arguments for linear codes, in Communications and Cryptography, R Blahut et al, Eds Boston, MA: Kluwer, 1994, pp 251 261 [39], Averaging bounds for lattices and linear codes, IEEE Trans Inform Theory, vol 43, pp 1767 1773, Nov 1997 [40] D Long, B Rimoldi, and R Urbanke, Approaching AWGN channel capacity without active shaping, in Proc 1997 IEEE Int Symp Information Theory, Ulm, Germany, June 1997, p 374 [41] P Ljungberg, Some results on sequential decoding of high-rate trellis codes, Licentiate thesis, Dept Inform Theory, Lund Univ, Lund, Sweden, Mar 1996 [42] D J C MacKay and R M Neal, Near Shannon limit performance of low-density parity-check codes, Electron Lett, vol 32 (33), pp 1645 (457) 1646 (458), Aug (Mar) 1996 (1967) [43] J L Massey, Coding and modulation in digital communications, in Proc 1994 Zürich Int Sem Digital Communication, Zürich, Switzerland, Mar 1974 [44] P M Maurer, Sequential decoding of trellis codes through ISI channels, MSc thesis, Dept Elec Eng Comput Sci, MIT, Cambridge, MA, June 1996 [45] G Poltyrev, On coding without restrictions for the AWGN channel, IEEE Trans Inform Theory, vol 40, pp 409 417, Mar 1994 [46] G J Pottie and D P Taylor, Multilevel codes based on partitioning, IEEE Trans Inform Theory, vol 35, pp 87 98, Jan 1989 [47] B Rimoldi and R Urbanke, A rate-splitting approach to the Gaussian multiple-access channel, IEEE Trans Inform Theory, vol 42, pp 364 375, Mar 1996 [48] S Shamai (Shitz), On permutation-invariant multiple-access channels with application to multiple-access spread-spectrum, AEÜ (Electron Commun), vol 41, pp 347 355, Nov-Dec 1987 [49] C E Shannon, A mathematical theory of communication, Bell Syst Tech J, vol 27, pp 379 423 and 623 656, 1948 [50], Probability of error for optimal codes in a Gaussian channel, Bell Syst Tech J, vol 38, pp 611 656, May 1959 [51] V Tarokh, A Vardy, and K Zeger, Universal bound on the performance of lattice codes, IEEE Trans Inform Theory, vol 45, pp 670 681, Mar 1999 [52] M A Tsfasman and S G Vlăduţ, Algebraic-Geometric Codes Boston, MA: Kluwer, 1991 [53] R Urbanke, private communication, Sept 1997 [54] R Urbanke and B Rimoldi, Lattice codes can achieve capacity on the AWGN channel, IEEE Trans Inform Theory, vol 44, pp 273 278, Jan 1998 [55] G Ungerboeck, Channel coding with multilevel/phase signals, IEEE Trans Inform Theory, vol IT-28, pp 55 67, Jan 1982 [56] U Wachsmann and J Huber, Power and bandwidth efficient digital communication using turbo codes in multilevel codes, Euro Trans Telecomm, vol 6, pp 557 567, Sept 1995 [57] U Wachsmann, R F H Fischer, and J B Huber, Multilevel codes: Theoretical concepts and practical design rules, IEEE Trans Inform Theory, vol 45, pp 1361 1391, July 1999 [58] F-Q Wang and D J Costello Jr, Sequential decoding of trellis codes at high spectral efficiencies, IEEE Trans Inform Theory, vol 43, pp 2013 2019, Nov 1997 [59] L-F Wei, Rotationally invariant convolutional channel coding with expanded signal space Part II: Nonlinear codes, IEEE J Select Areas Commun, vol SAC-2, pp 672 686, 1984 [60], Trellis-coded modulation with multidimensional constellations, IEEE Trans Inform Theory, vol IT-33, pp 483 501, July 1987 [61] L-F Wei (AT&T), A new 4-D 64-state rate-4/5 trellis code, Geneva, Switzerland, Contribution D19, ITU-T Study Group 14, Sept 1993 [62] R D Wesel and J M Cioffi, Achievable rates for Tomlinson Harashima precoding, IEEE Trans Inform Theory, vol 44, pp 824 831, Mar 1998 [63] J M Wozencraft and I M Jacobs, Principles of Communication Engineering New York: Wiley, 1965 [64] R Zamir and M Feder, On lattice quantization noise, IEEE Trans Inform Theory, vol 42, pp 1152 1159, July 1996