ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007

Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols. Long data sequences are broken up into blocks of k symbols and each block is encoded independently of all others. Convolutional encoders, on the other hand, convert an entire data sequence, regardless of its length, into a single code sequence by using convolution and multiplexing operations. In general, it is convenient to assume that both the data sequences (u 0, u 1,...) and the code sequences (c 0, c 1,...) are semi-infinite sequences and to express them in the form of a power series.

Definition: The power series associated with the data sequence u = (u 0, u 1, u 2,...) is defined as u(d) = u 0 + u 1 D + u 2 D 2 +... = u i D i, i=0 where u(d) is called the data power series. Similarly, the code power series c(d) associated with the code sequence c = (c 0, c 1, c 2,...) is defined as c(d) = c 0 + c 1 D + c 2 D 2 +... = c i D i. i=0 The indeterminate D has the meaning of delay, similar to z 1 in the z transform, and D is sometimes called the delay operator.

A general rate R = k/n convolutional encoder converts k data sequences into n code sequences using a k n transfer function matrix G(D) as shown in the following figure. u(d) D e m u x u (1) (D). u (k) (D) Convolutional Encoder G(D) c (1) (D). c (n) (D) M u x c(d) Fig.1 Block Diagram of a k-input, n-output Convolutional Encoder

The data power series u(d) is split up into k subsequences, denoted u (1) (D), u (2) (D),..., u (k) (D) in power series notation, using a demultiplexer whose details are shown in the figure below. u(d) (u 0,u 1,..., u k 1,u k,u k+1,..., u 2k 1,u 2k,u 2k+1,..., u 3k 1,...)............ u (1) (D) u (1) 0 u (1) 1 u (1) 2............ u (2) (D) u (2) 0 u (2) 1 u (2) 2......... u (k) (D) u (k) 0 u (k) 1 u (k) 2. Fig.2 Demultiplexing from u(d) intou (1) (D),...,u (k) (D)

The code subsequences, denoted by c (1) (D), c (2) (D),..., c (n) (D) in power series notation, at the output of the convolutional encoder are multiplexed into a single power series c(d) for transmission over a channel, as shown below. c (1) (D) c (2) (D). c (n) (D)............ c (1) 0 c (1) 1 c (1) 2............ c (2) 0 c (2) 1 c (2) 2......... c (n) 0 c (n) 1 c (n) 2 (c 0,c 1,..., c n 1,c n,c n+1,..., c 2n 1,c 2n,c 2n+1,..., c 3n 1,...) c(d) Fig.3 Multiplexing of c (1) (D),...,c (n) (D) into Single Output c(d)

Definition: A q-ary generator polynomial of degree m is a polynomial in D of the form g(d) = g 0 + g 1 D + g 2 D 2 +... + g m D m = m g i D i, i=0 with m + 1 q-ary coefficients g i. The degree m is also called the memory order of g(x).

Consider computing the product (using modulo q arithmetic) Written out, this looks as follows c(d) = u(d) g(d). c 0 + c 1 D + c 2 D 2 +... = = (g 0 + g 1 D + g 2 D 2 +... + g m D m ) (u 0 + u 1 D + u 2 D 2 +...) = = g 0 u 0 + g 0 u 1 D + g 0 u 2 D 2 +... + g 0 u m D m + g 0 u m+1 D m+1 + g 0 u m+2 D m+2 +... + g 1 u 0 D + g 1 u 1 D 2 +... + g 1 u m 1 D m + g 1 u m D m+1 + g 1 u m+1 D m+2 +... + g 2 u 0 D 2 +... + g 2 u m 2 D m + g 2 u m 1 D m+1 + g 2 u m D m+2 +...... + g mu 0 D m + g mu 1 D m+1 + g mu 2 D m+2 +... Thus, the coefficients of c(d) are mx c j = g i u j i, j = 0, 1, 2,..., where u l = 0 for l < 0, i=0 i.e., the code sequence (c 0, c 1, c 2,...) is the convolution of the data sequence (u 0, u 1, u 2,...) with the generator sequence (g 0, g 1,..., g m ).

A convenient way to implement the convolution c j = m g i u j i, j = 0, 1, 2,..., where u l = 0 for l < 0, i=0 is to use a shift register with m memory cells (cleared to zero at time t = 0), as shown in following figure. m memory cells...,u 2,u 1,u 0 g 0 g 1 g 2 g m + + +...,c 2,c 1,c 0 Fig.4 Block Diagram for Convolution of u(d) withg(d)

A general k-input, n-output convolutional encoder consists of k such shift registers, each of which is connected to the outputs via n generator polynomials. Definition: A q-ary linear and time-invariant convolutional encoder with k inputs and n outputs is specified by a k n matrix G(D), called transfer function matrix, which consists of generator polynomials g (l) h (D), h = 1, 2,..., k, l = 1, 2,..., n, as follows G(D) = (2) (n) (D) g 1 (D)... g 1 (D) (2) (n) (D) g 2 (D)... g 2 (D).... g (1) k (D) g (2) k (D)... g (n) k (D) g (1) 1 g (1) 2 The generator polynomials have q-ary coefficients, degree m hl, and are of the form g (l) h (l) (D) = g 0h + g (l) 1h D + g (l) 2h D2 +... + g (l) m hl h Dm hl.

Define the power series vectors u(d) = [u (1) (D), u (2) (D),..., u (k) (D)], c(d) = [c (1) (D), c (2) (D),..., c (n) (D)]. The operation of a k-input n-output convolutional encoder can then be concisely expressed as c(d) = u(d) G(D). Each individual output sequence is obtained as c (l) (D) = k h=1 u (h) (D) g (l) h (D). Note: By setting u (h) (D) = 1 in the above equation, it is easily seen that the generator sequence (g (l) 0h, g (l) 1h, g (l) 2h,..., g (l) m hl h ), is the unit impulse response from input h to output l of the convolutional encoder.

Definition: The total memory M of a convolutional encoder is the total number of memory elements in the encoder, i.e., M = k max m hl. 1 l n h=1 Note that max 1 l n m hl is the number of memory cells or the memory order of the shift register for the input with index h. Definition: The maximal memory order m of a convolutional encoder is the length of the longest input shift register, i.e., m = max 1 h k max 1 l n m hl. Equivalently, m is equal to the highest degree of any of the generator polynomials in G(D).

Definition: The constraint length K of a convolutional encoder is the maximum number of symbols in a single output stream that can be affected by any input symbol, i.e., K = 1 + m = 1 + max 1 h k max 1 l n m hl. Note: This definition for constraint length is not in universal use. Some authors define constraint length to be the maximum number of symbols in all output streams that can be affected by any input symbol, which is nk in the notation used here.

Example: Encoder #1. Binary rate R = 1/2 encoder with constraint length K = 3 and transfer function matrix G(D) = [ g (1) (D) g (2) (D) ] = [ 1 + D 2 1 + D + D 2]. A block diagram for this encoder is shown in the figure below. +...,c (1) 2,c (1) 1,c (1) 0...,u 2,u 1,u 0 + +...,c (2) 2,c (2) 1,c (2) 0 Fig.5 Binary Rate 1/2 Convolutional Encoder with K = 3

At time t = 0 the contents of the two memory cells are assumed to be zero. Using this encoder, the data sequence u = (u 0, u 1, u 2,...) = (1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0...), for example, is encoded as follows u = 110100111010... ud 2 = 110100111010... ----------------- c (1) = 111001110100... After multiplexing this becomes u = 110100111010... ud = 110100111010... ud 2 = 110100111010... ----------------- c (2) = 100011101001... c = (c 0 c 1, c 2 c 3, c 4 c 5,...) = (c (1) 0 c (2) 0, c(1) 1 c (2) 1, c(1) 2 c (2) 2,...) = (11, 10, 10, 00, 01, 11, 11, 10, 01, 10, 00, 01,...). The pairs of code symbols that each data symbol generates are called code frames.

Definition: Consider a rate R = k/n convolutional encoder, let u = (u 0 u 1... u k 1, u k u k+1... u 2k 1, u 2k u 2k+1... u 3k 1,...) = (u (1) 0 u (2) 0... u (k) 0, u(1) 1 u (2) 1... u (k) 1, u(1) 2 u (2) 2... u (k) 2,...), and let c = (c 0 c 1... c n 1, c n c n+1... c 2n 1, c 2n c 2n+1... c 3n 1,...) = (c (1) 0 c (2) 0... c (n) 0, c(1) 1 c (2) 1... c (n) 1, c(1) 2 c (2) 2... c (n) 2,...). Then the set of data symbols (u ik u ik+1... u (i+1)k 1 ) is called the i-th data frame and the corresponding set of code symbols (c in c in+1... c (i+1)n 1 ) is called the i-th code frame for i = 0, 1, 2,....

Example: Encoder #2. Binary rate R = 2/3 encoder with constraint length K = 2 and transfer function matrix " # " # (1) g 1 (D) g (2) 1 (D) g (3) 1 (D) 1 + D D 1 + D G(D) = = g (1) 2 (D) g (2) 2 (D) g (3) 2 (D) D 1 1 A block diagram for this encoder is shown in the figure below....,u (1) 2,u(1) 1,u(1) 0 + + + + +...,c (1) 2,c (1) 1,c (1) 0...,c (2) 2,c(2) 1,c(2) 0...,c (3) 2,c (3) 1,c (3) 0...,u (2) 2,u (2) 1,u (2) 0 Fig.6 Binary Rate 2/3 Convolutional Encoder with K = 2

In this case the data sequence u = (u 0 u 1, u 2 u 3, u 4 u 5,...) = (11, 01, 00, 11, 10, 10,...), is first demultiplexed into u (1) = (1, 0, 0, 1, 1, 1,...) and u (2) = (1, 1, 0, 1, 0, 0,...), and then encoded as follows u (1) = 100111... u (1) D = 100111... u (2) D = 110100... ---------- c (1) = 101110... u (1) D = 100111... u (2) = 110100... ---------- c (2) = 100111... u (1) = 100111... u (1) D = 100111... u (2) = 110100... ---------- c (3) = 000000... Multiplexing the code sequences c (1), c (2), and c (3) yields the single code sequence c = (c 0 c 1 c 2, c 3 c 4 c 5,...) = (c (1) 0 c (2) 0 c (3) 0, c(1) 1 c (2) 1 c (3) 1,...) = (110, 000, 100, 110, 110, 010,...). Because this is a rate 2/3 encoder, data frames of length 2 are encoded into code frames of length 3.

Definition: Let u = (u 0, u 1, u 2,...) be a data sequence (before demultiplexing) and let c = (c 0, c 1, c 2,...) be the corresponding code sequence (after multiplexing). Then, in analogy to block codes, the generator matrix G of a convolutional encoder is defined such that c = u G. Note that G for a convolutional encoder has infinitely many rows and columns. Let G(D) = [g (l) h ] be the transfer function matrix of a convolutional encoder with generator polynomials g (l) h (D) = m i=0 g (l) ih Di, h = 1, 2,..., k, l = 1, 2,..., n, where m is the maximal memory order of the encoder. Define the matrices 2 3 g (1) i1 g (2) i1... g (n) i1 g (1) i2 g (2) i2... g (n) i2 G i = 6 4.. 7, i = 0, 1, 2,..., m.. 5 g (1) ik g (2) ik... g (n) ik

In terms of these matrices, the generator matrix G can be conveniently expressed as (all entries below the diagonal are zero) G 0 G 1 G 2... G m 0 0... G 0 G 1... G m 1 G m 0... G 0... G m 2 G m 1 G m...... G =... G 0 G 1 G 2.... G 0 G 1... G 0...... Note that the first row of this matrix is the unit impulse response (after multiplexing the outputs) from input stream 1, the second row is the unit impulse response (after multiplexing the outputs) from input stream 2, etc.

Example: Encoder #1 has m = 2, G 0 = [ 1 1 ], G 1 = [ 0 1 ], G 2 = [ 1 1 ], and thus generator matrix 11 01 11 00 00 00... 00 11 01 11 00 00... G = 00 00 11 01 11 00... 00 00 00 11 01 11.......... Using this, it is easy to compute, for example, the list of (non-zero) datawords and corresponding codewords shown on the next page.

u = (u 0, u 1,...) c = (c 0 c 1, c 2 c 3,...) 1,0,0,0,0,... 11,01,11,00,00,00,00,... 1,1,0,0,0,... 11,10,10,11,00,00,00,... 1,0,1,0,0,... 11,01,00,01,11,00,00,... 1,1,1,0,0,... 11,10,01,10,11,00,00,... 1,0,0,1,0,... 11,01,11,11,01,11,00,... 1,1,0,1,0,... 11,10,10,00,01,11,00,... 1,0,1,1,0,... 11,01,00,10,10,11,00,... 1,1,1,1,0,... 11,10,01,01,10,11,00,... One thing that can be deduced from this list is that most likely the minimum weight of any non-zero codeword is 5, and thus, because convolutional codes are linear, the minimum distance, called minimum free distance for convolutional codes for historical reasons, is d free = 5.

Example: Encoder #2 has m = 1, [ ] 1 0 1 G 0 =, G 0 1 1 1 = [ ] 1 1 1, 1 0 0 and therefore generator matrix 101 111 000 000 000... 011 100 000 000 000... 000 101 111 000 000... 000 011 100 000 000... G = 000 000 101 111 000... 000 000 011 100 000... 000 000 000 101 111... 000 000 000 011 100.........

The first few non-zero codewords that this encoder produces are u = (u 0 u 1,...) c = (c 0 c 1 c 2,...) 10,00,00,... 101,111,000,000,... 01,00,00,... 011,100,000,000,... 11,00,00,... 110,011,000,000,... 10,10,00,... 101,010,111,000,... 01,10,00,... 011,001,111,000,... 11,10,00,... 110,110,111,000,... 10,01,00,... 101,100,100,000,... 01,01,00,... 011,111,100,000,... 11,01,00,... 110,000,100,000,... 10,11,00,... 101,001,011,000,... 01,11,00,... 011,010,011,000,... 11,11,00,... 110,101,011,000,...

Definition: The code generated by a q-ary convolutional encoder with transfer function matrix G(D) is the set of all vectors of semi-infinite sequences of encoded symbols c(d) = u(d) G(D), where u(d) is any vector of q-ary data sequences. Definition: Two convolutional encoders with transfer function matrices G 1 (D) and G 2 (D) are said to be equivalent if they generate the same codes. Definition: A systematic convolutional encoder is a convolutional encoder whose codewords have the property that each data frame appears unaltered in the first k positions of the first code frame that it affects. Note: When dealing with convolutional codes and encoders it is important to carefully distinguish between the properties of the code (e.g., the minimum distance of a code) and the properties of the encoder (e.g., whether an encoder is systematic or not).

Example: Neither encoder #1 nor encoder #2 are systematic. But the following binary rate 1/3 encoder, which will be called encoder #3, with constraint length K = 4 and transfer function matrix G(D) = [ 1 1 + D + D 3 1 + D + D 2 + D 3], is a systematic convolutional encoder. Its generator matrix is 111 011 001 011 000 000 000... 000 111 011 001 011 000 000... G = 000 000 111 011 001 011 000.... 000 000 000 111 011 001 011.......... Note that the first column of each triplet of columns has only a single 1 in it, so that the first symbol in each code frame is the corresponding data symbol from the data sequence u.

Much more interesting systematic encoders can be obtained if one allows not only FIR (finite impulse response), but also IIR (infinite impulse response) filters in the encoder. In terms of the transfer function matrix G(D), this means that the use of rational polynomial expressions instead of generator polynomials as matrix elements is allowed. The following example illustrates this. Example: Encoder #4. Binary rate R = 1/3 systematic encoder with constraint length K = 4 and rational transfer function matrix [ ] 1 + D + D G(D) = 3 1 + D + D 2 + D 3 1. 1 + D 2 + D 3 1 + D 2 + D 3 A block diagram of this encoder is shown in the next figure.

» G(D) = 1 Convolutional Codes 1 + D + D 3 1 + D + D 2 + D 3. 1 + D 2 + D 3 1 + D 2 + D 3 c (1) (D) + + + c (2) (D) c (3) (D) u(d) + + Fig.7 Binary Rate 1/3 Systematic Convolutional Encoder with K = 4

Convolutional encoders have total memory M. Thus, a time-invariant q-ary encoder can be regarded as a finite state machine (FSM) with q M states and it can be completely described by a state transition diagram called encoder state diagram. Such a state diagram can be used to encode a data sequence of arbitrary length. In addition, the encoder state diagram can also be used to obtain important information about the performance of a convolutional code and its associated encoder.

Example: Encoder state diagram for encoder #1. This is a binary encoder with G(D) = [1 + D 2 1 + D + D 2 ] that uses 2 memory cells and 2 2 = 4 states. With reference to the block diagram in Figure 5, label the encoder states as follows: S 0 = 00, S 1 = 10, S 2 = 01, S 3 = 11, where the first binary digit corresponds to the content of the first (leftmost) delay cell of the encoder, and the second digit corresponds to the content of the second delay cell. At any given time t (measured in frames), the encoder is in a particular state S (t). The next state, S (t+1), at time t + 1 depends on the value of the data frame at time t, which in the case of a rate R = 1/2 encoder is just simply u t. The code frame c (1) t c (2) t that the encoder outputs at time t depends only on S (t) and u t (and the transfer function matrix G(D), of course). Thus, the possible transitions between the states are labeled with u t/c (1). The t c (2) t resulting encoder state diagram for encoder #1 is shown in the following figure.

S 1 1/11 1/10 0/00 S 0 S 3 1/01 1/00 0/01 0/11 0/10 S 2 Fig.8 Encoder State Diagram for Binary Rate 1/2 Encoder with K =3 To encode the data sequence u = (0, 1, 0, 1, 1, 1, 0, 0, 1,...), for instance, start in S 0 at t = 0, return to S 0 at t = 1 because u 0 = 0, then move on to S 1 at t = 2, S 2 at t = 3, S 1 at t = 4, S 3 at t = 5, S 3 at t = 6 (self loop around S 3), S 2 at t = 7, S 0 at t = 8, and finally S 1 at t = 9. The resulting code sequence (after multiplexing) is c = (00, 11, 01, 00, 10, 01, 10, 11, 11,...).

Example: Encoder state diagram for encoder #2 with [ ] 1 + D D 1 + D G(D) = D 1 1 and block diagram as shown in Figure 6. This encoder also has M = 2, but each of the two memory cells receives its input at time t from a different data stream. The following convention is used to label the 4 possible states (the upper bit corresponds to the upper memory cell in Figure 6) S 0 = 0 0, S 1 = 1 0, S 2 = 0 1, S 3 = 1 1. Because the encoder has rate R = 2/3, the transitions in the encoder state diagram from time t to time t + 1 are now labeled with u (1) t u (2) t /c (1) t c (2) t c (3) t. The result is shown in the next figure.

10/010 00/111 S 1 10/110 10/101 11/001 10/001 11/110 00/000 S 0 S 3 11/101 00/011 01/100 00/100 01/000 01/011 S 2 11/010 01/111 Fig.9 Encoder State Diagram for Binary Rate 2/3 Encoder with K =2

Example: The figure on the next slide shows the encoder state diagram for encoder #4 whose block diagram was given in Figure 7. This encoder has rational transfer function matrix [ G(D) = 1 1 + D + D 3 1 + D + D 2 + D 3 1 + D 2 + D 3 1 + D 2 + D 3 ], and M = 3. The encoder states are labeled using the following convention (the leftmost bit corresponds to the leftmost memory cell in Figure 7) S 0 = 000, S 1 = 100, S 2 = 010, S 3 = 110, S 4 = 001, S 5 = 101, S 6 = 011, S 7 = 111.

S 1 1/100 S 3 1/111 0/011 0/011 0/001 0/010 1/110 0/000 S 0 0/000 S 2 S 5 1/100 1/110 S 7 1/111 1/101 1/101 0/001 S 4 0/010 S 6 Fig.10 Encoder State Diagram for R = 1/3, K = 4 Systematic Encoder

Trellis Diagrams Because the convolutional encoders considered here are time-invariant, the encoder state diagram describes their behavior for all times t. But sometimes, e.g., for decoding convolutional codes, it is convenient to show all possible states of an encoder separately for each time t (measured in frames), together with all possible transitions from states at time t to states at time t + 1. The resulting diagram is called a trellis diagram. Example: For encoder #1 with G(D) = [ 1 + D 2 1 + D + D 2] and M = 2 (and thus 4 states) the trellis diagram is shown in the figure on the next slide.

S 3 01 01 01 10 10 10 S 2 S 1 10 01 10 01 10 01 10 01 00 00 00 11 11 11 S 0 11 11 11 11 11 00 00 00 00 00 t =0 t=1 t=2 t=3 t=4 t=5 Fig.11 Trellis Diagram for Binary Rate 1/2 Encoder with K =3

Note that the trellis always starts with the all-zero state S 0 at time t = 0 as the root node. This corresponds to the convention that convolutional encoders must be initialized to the all-zero state before they are first used. The labels on the branches are the code frames that the encoder outputs when that particular transition from a state at time t to a state at time t + 1 is made in response to a data symbol u t. The highlighted path in Figure 11, for example, coresponds to the data sequence u = (1, 1, 0, 1, 0,...) and the resulting code sequence c = (11, 10, 10, 00, 01,...).

In its simplest and most common form, the Viterbi algorithm is a maximum likelihood (ML) decoding algorithm for convolutional codes. Recall that a ML decoder outputs the estimate ĉ = c i iff i is the index (or one of them selected at random if there are several) which maximizes the expression p Y X (v c i), over all codewords c 0, c 1, c 2,.... The conditional pmf p defines the Y X channel model with input X and output Y which is used, and v is the received (and possibly corrupted) codeword at the output of the channel. For the important special case of memoryless channels used without feedback, the computation of p can be Y X considerably simplified and brought into a form where metrics along the branches of a trellis can be added up and then a ML decision can be obtained by comparing these sums. In a nutshell, this is what the Viterbi algorithm does.

Definition: A channel with input X and output Y is said to be memoryless if p(y j x j, x j 1,..., x 0, y j 1,..., y 0 ) = p Y X (y j x j ). Definition: A channel with input X and output Y is used without feedback if p(x j x j 1,..., x 0, y j 1,..., y 0 ) = p(x j x j 1,..., x 0 ). Theorem: For a memoryless channel used without feedback N 1 p (y x) = Y X j=0 p Y X (y j x j ), where N is the length of the channel input and output vectors X and Y. Proof: Left as an exercise.

Definition: The ML decoding rule at the output Y of a discrete memoryless channel (DMC) with input X, used without feedback is: Output code sequence estimate ĉ = c i iff i maximizes the likelihood function p Y X (v c i) = N 1 j=0 p Y X (v j c ij ), over all code sequences c i = (c i0, c i1, c i2,...) for i = 0, 1, 2,.... The pmf p Y X is given by specifying the transition probabilities of the DMC and v j are the received symbols at the output of the channel. For block codes N is the blocklength of the code. For convolutional codes we set N = n (L + m), where L is the number of data frames that are encoded and m is the maximal memory order of the encoder.

Definition: The log likelihood function of a received sequence v at the channel output with respect to code sequence c i is the expression log [ p Y X (v c i) ] = N 1 j=0 log [ p Y X (v j c ij ) ], where the logarithm can be taken to any basis.

Definition: The path metric µ(v c i ) for a received sequence v given a code sequence c i is computed as µ(v c i ) = N 1 j=0 µ(v j c ij ), where the symbol metrics µ(v j c ij ) are defined as µ(v j c ij ) = α ( log[p Y X (v j c ij )] + f (v j ) ). Here α is any positive number and f (v j ) is a completely arbitrary real-valued function defined over the channel output alphabet B. Usually, one selects for every y B f (y) = log [ min x A p Y X (y x)], where A is the channel input alphabet. In this way the smallest symbol metric will always be 0. The quantity α is then adjusted so that all nonzero metrics are (approximated by) small positive integers.

Example: A memoryless BSC with transition probability ɛ < 0.5 is characterized by p Y X (v c) v = 0 v = 1 c = 0 1 ɛ ɛ c = 1 ɛ 1 ɛ min c p Y X (v c) ɛ ɛ Thus, setting f (v) = log[min c p Y X (v c)], yields f (0) = f (1) = log ɛ.

With this, the bit metrics become µ(v c) v = 0 v = 1 c = 0 α(log(1 ɛ) log ɛ) 0 c = 1 0 α(log(1 ɛ) log ɛ) Now choose α as α = 1 log(1 ɛ) log ɛ, so that the following simple bit metrics for the BSC with ɛ < 0.5 are obtained µ(v c) v = 0 v = 1 c = 0 1 0 c = 1 0 1

Definition: The partial path metric µ (t) (v c i ) at time t, t = 1, 2,..., for a path, a received sequence v, and given a code sequence c i, is computed as t 1 µ (t) (v c i ) = µ(v (l) c (l) tn 1 i ) = µ(v j c ij ), l=0 j=0 where the branch metrics µ(v (l) c (l) i ) of the l-th branch, l = 0, 1, 2,..., for v and a given c i are defined as (l+1)n 1 µ(v (l) c (l) i ) = µ(v j c ij ). j=ln

The Viterbi algorithm makes use of the trellis diagram to compute the partial path metrics µ (t) (v c i ) at times t = 1, 2,..., N for a received v, given all code sequences c i that are candidates for a ML decision, in the following well defined and organized manner. (1) Every node in the trellis is assigned a number that is equal to the partial path metric of the path that leads to this node. By definition, the trellis starts in state 0 at t = 0 and µ (0) (v c i ) = 0. (2) For every transition from time t to time t + 1, all q (M+k) (there are q M states and q k different input frames at every time t) t-th branch metrics µ(v (t) c (t) i ) for v given all t-th codeframes are computed.

(3) The partial path metric µ (t+1) (v c i ) is updated by adding the t-th branch metrics to the previous partial path metrics µ (t) (v c i ) and keeping only the maximum value of the partial path metric for each node in the trellis at time t + 1. The partial path that yields the maximum value at each node is called the survivor, and all other partial paths leading into the same node are eliminated from further consideration as a ML decision candidate. Ties are broken by flipping a coin. (4) If t + 1 = N (= n(l + m) where L is the number of data frames that are encoded and m is the maximal memory order of the encoder), then there is only one survivor with maximum path metric µ(v c i ) = µ (N) (v c i ) and thus ĉ = c i is announced and the decoding algorithm stops. Otherwise, set t t + 1 and return to step 2.

Theorem: The path with maximum path metric µ(v c i ) selected by the Viterbi decoder is the maximum likelihood path. Proof: Suppose c i is the ML path, but the decoder outputs ĉ = c j. This implies that at some time t the partial path metrics satisfy µ (t) (v c j ) µ (t) (v c i ) and c i is not a survivor. Appending the remaining path that corresponds to c i to the survivor c j at time t thus results in a larger path metric than the one for the ML path c j. But this is a contradiction for the assumption that c j is the ML path. QED Example: Encoder #1 (binary R = 1/2, K = 3 encoder with G(D) = [1 + D 2 1 + D + D 2 ]) was used to generate and transmit a codeword over a BSC with transition probability ɛ < 0.5. The following seqence was received: v = (10, 10, 00, 10, 10, 11, 01, 00,...). To find the most likely codeword ĉ that corresponds to this v, use the Viterbi algorithm with the trellis diagram shown in Figure 12.

3 4 5 7 8. 10 11. 01 01 01 01 01 01 S 3 X X X 10 10 10 10 X X X 10 X X 10 X 10 1 10 4 10 6 10 7 10 8. 10 11 10 11. S 2 01 01 01 01 01 01 01 X X X 00 X 00 00 X 00 X00 00 S 1 11 11 11 11 11 11 1 2 3 5. 7 9 10 13 11 11 11X 11 11X 11 11 11 X X X X 00 00 00 00 00 00 00 00 S 0 X X X 0 1 2 4 5. 7 9 10 12 v: 10 10 00 10 10 11 01 00 Fig.12 Viterbi Decoder: R = 1/2, K = 3 Encoder, Transmission over BSC At time zero start in state S 0 with a partial path metric µ (0) (v c i ) = 0. Using the bit metrics for the BSC with ɛ < 0.5 given earlier, the branch metrics for each of the first two brances are 1. Thus, the partial path metrics at time t = 1 are µ (1) (10 00) = 1 and µ (1) (10 11) = 1.

Continuing to add the branch metrics µ(v (1) c (1) i ), the partial path metrics µ (2) ((10, 10) (00, 00)) = 2, µ (2) ((10, 10) (00, 11)) = 2, µ (2) ((10, 10) (11, 01)) = 1, and µ (2) ((10, 10) (11, 10)) = 3 are obtained at time t = 2. At time t = 3 things become more interesting. Now two branches enter into each state and only the one that results in the larger partial path metric is kept and the other one is eliminated (indicated with an X ). Thus, for instance, since 2 + 2 = 4 > 1 + 0 = 1, µ (3) ((10, 10, 00) (00, 00, 00)) = 4 whereas the alternative path entering S 0 at t = 3 would only result in µ (3) ((10, 10, 00) (11, 01, 11)) = 1. Similarly, for the two paths entering S 1 at t = 3 one finds either µ (3) ((10, 10, 00) (00, 00, 11)) = 2 or µ (3) ((10, 10, 00) (11, 01, 00)) = 3 and therefore the latter path and corresponding partial path metric survive. If there is a tie, e.g., as in the case of the two paths entering S 0 at time t = 4, then one of the two paths is selected as survivor at random. In Figure 12 ties are marked with a dot following the value of the partial path metric. Using the partial path metrics at time t = 8, the ML decision at this time is to choose the codeword corresponding to the path with metric 13 (highlighted in Figure 12), i.e., ĉ = (11, 10, 01, 10, 11, 11, 01, 00,...) = û = (1, 1, 1, 0, 0, 1, 0, 1,...).

Definition: A channel whose output alphabet is the same as the input alphabet is said to make hard decisions, whereas a channel that uses a larger alphabet at the output than at the input is said to make soft decisions. Note: In general, a channel which gives more differentiated output information is preferred (and has more capacity) than one which has the same number of output symbols as there are input symbols, as for example the BSC. Definition: A decoder that operates on hard decision channel outputs is called a hard decision decoder, and a decoder that operates on soft decision channel outputs is called a soft decision decoder.

Example: Use again encoder #1, but this time with a soft decision channel model with 2 inputs and 5 outputs as shown in the following figure. 1 1! Input X @ 0 0 Output Y Fig.13 Discrete Memoryless Channel (DMC) with 2 Inputs and 5 Outputs

The symbols @ and! at the channel output represent bad 0 s and bad 1 s, respectively, whereas is called an erasure (i.e., it is uncertain whether is closer to a 0 or a 1, whereas a bad 0, for example, is closer to 0 than to 1). The transition probabilities for this channel are p Y X (v c) v = 0 v = @ v = v =! v = 1 c = 0 0.5 0.2 0.14 0.1 0.06 c = 1 0.06 0.1 0.14 0.2 0.5 After taking (base 2) logarithms log 2 [p Y X (v c)] v = 0 v = @ v = v =! v = 1 c = 0 1.00 2.32 2.84 3.32 4.06 c = 1 4.06 3.32 2.84 2.32 1.00 log 2 [min c p Y X (v c)] 4.06 3.32 2.84 3.32 4.06

Using µ(v c) = α ( log 2 [p Y X (v c)] log 2 [min c p Y X (v c)] ) with α = 1 and rounding to the nearest integer yields the bit metrics µ(v c) v = 0 v = @ v = v =! v = 1 c = 0 3 1 0 0 0 c = 1 0 0 0 1 3 The received sequence v = (11,!@, @0, 0, 1!, 00, 0, 10,...), can now be decoded using the Viterbi algorithm as shown in Figure 14 on the next page.

S 3 8 9 13 17 20 25 29 01 01 01 01 01 01 X X X X 10 10 10 10 10 10 X X 10 6 10 11 10 12 10 16 10 20 10 23 10 31 S 2 01 01 01 01 01 01 01 X X X X X X 00 00 X 00 00 00 X 00 S 1 11 11 11 11 11 11 6 1 10 14 15 22 23 28 11 11 11X 11X 11 11 X 11 X 11 X X X 00 00 00 00 00 00 00 00 S 0 X X X 0 0 1 6 11 16 22 25 28 v: 11!@ @0 0 1! 00 0 10 Fig.14 Viterbi Decoder: R = 1/2, K = 3 Encoder, 2-Input, 5-Output DMC Clearly, the Viterbi algorithm can be used either for hard or soft decision decoding by using appropriate bit metrics. In this example the ML decision (up to t = 8) is ĉ = (11, 01, 00, 10, 10, 00, 10, 10,...), corresponding to û = (1, 0, 1, 1, 0, 1, 1, 0,...).