CMOS Binary Full Adder A Survey of Possible Implementations Group : Eren Turgay Aaron Daniels Michael Bacelieri William Berry - -
Table of Contents Key Terminology...- - Introduction...- 3 - Design Architectures...- 3 - Static Ripple-Carry (SRC) Implementation...- 3 - Dynamic Ripple-Carry (DRC) Implementation...- 7 - Carry Loo-Ahead (CLA) Implementation...- - Transistor sizing to optimize performance...- 4 - Design Validation...- 5 - Design Performance...- 5 - Performance Across Corners...- 5 - Speed...- 5 - Power...- - Conclusions...- - References...- - Key Terminology AOI Add-Or-Invert logic BFA Binary full-adder CLA Carry loo-ahead CMOS Complementary Metal-Oxide Semiconductor (complementary usage of NMOS and PMOS transistors) DRC Dynamic ripple-carry LALB Loo-ahead logic bloc PFA Partial full-adder NAND Negated logical AND NMOS n-type metal-oxide semiconductor NOR Negated logical OR PMOS p-type metal-oxide semiconductor SRC Static ripple-carry A B = AB + AB XOR Exclusive logical OR ( ) - -
Introduction A basic survey of three different logic implementations of an 8-bit binary full adder is provided in this document. The three designs tested are the static ripple-carry, dynamic ripple-carry, and carry loo-ahead architectures. Each will first be thoroughly explained, and then the suitability of each for use in a MHz RISC embedded processor will be evaluated with the aid of simulation data. Design Architectures The ultimate goal of a binary full-adder (BFA) is to implement the following truth table for each bit: C in A B Sum C out Table : Truth table for -bit adder slice Logically, C = A B + C ( A + B ) Sum = A B C, where is an + and integer to n for an n-bit adder. Generally, adders of n-bits are created by chaining together n of these -bit adder slices. Three (3) adder designs have been examined: static ripple-carry, dynamic ripple-carry, and carry loo-ahead. Static Ripple-Carry (SRC) Implementation The most basic and intuitive BFA is an SRC adder. This type of adder has the benefits of simplicity and asynchronicity. Asynchronicity means that the output of the adder can be accessed at any point during a cloc cycle. This allows the adder to be used in two main styles of processors: ) those that read/calculate data on the rising cloc edge and write data on the falling cloc edge and ) those that read/calculate data during one or more full cloc cycles and write data during one or more subsequent cloc cycles. However, the largest drawbac to an SRC adder is that is usually has the longest propagation time compared to other adder designs using the same process technology. The particular design of SRC adder implemented in this discussion utilizes And- Or-Invert (AOI) logic []. AOI logic is a technique of using equivalent Boolean logic expressions to reduce the number of gates required for a particular expression. This, in turn, reduces capacitance and consequently propagation times. For this design, AOI logic has been applied to the calculation of the Sum bit: - 3 -
( A + B + C ) C A BC Sum + = A B C = + Instead of using two () XOR gates to implement the Sum bit, the circuit taes advantage of the fact thatc + is already computed and uses fewer gates to calculate the rest of the expression. The schematic of this design is shown in Figure. An 8-bit implementation using this design is shown in Figure. - 4 -
Figure : -bit SRC adder schematic - 5 -
Figure : 8-bit SRC adder schematic - 6 -
For this particular implementation of an n-bit SRC adder, the number of gates required is defined as G SRC = 8n. n 4 8 6 56 4 448 Table : Gate counts for n-bit SRC adders Dynamic Ripple-Carry (DRC) Implementation The DRC adder is an advanced version of the SRC. Utilizing a cloc allows the adder to tae advantage of a technique nown as precharging. This involves charging the sum and carry bits to an intermediate value (usuallyv DD ). This reduces the rise and fall time when a logic low or high is computed. The downside to this approach, however, is that the adder result is only available when the cloc signal is high. Consequently, a latch is generally used to hold the data for the remainder of the cloc cycle. Power consumption of the adder is also increased due to the precharging. A processor designer has a few choices when choosing a cloc to wor with this type of adder. Since the result can only be calculated when the cloc is high, the cloc period must be at least twice as long as the adder propagation time. Depending upon the needs of the processor, anywhere from one () to n number of bits could be computed in one cloc cycle. The schematic for this design is shown in Figure 3. [] An 8-bit implementation using this design is shown in Figure 4. - 7 -
Figure 3: -bit DRC adder schematic - 8 -
Figure 4: 8-bit DRC adder schematic - 9 -
For this particular implementation of an n-bit dynamic ripple carry adder, the number of gates is defined asg DRC = n. n 4 8 6 44 88 76 35 Table 3: Gate counts for n-bit dynamic ripple-carry adders Carry Loo-Ahead (CLA) Implementation CLA adders tae advantage of computational parallelization at the cost of increased complexity and power consumption. This parallelization yields significant decreases in propagation time. Intermediate terms, called propagate and generate bits, are used to calculate sum and carry bits. The logic equations for the calculations are as follows: P C G Sum = A + = A B B = A B = C P + G C = P C A CLA adder uses two fundamental logic blocs a partial full-adder (PFA) and a loo-ahead logic bloc (LALB). The PFA computes the propagate, generate and sum bits. The LALB uses the propagate and generate bits from m number of PFAs to compute each of C through C m carry bits, where m is the number of loo-ahead bits. For maximum performance, m is equal to n. However, in practice n is usually a multiple of m, resulting in a hybrid of loo-ahead and ripple logic. The carry loo-ahead adder was implemented using eight (8) -bit PFAs and two () 4-bit LALBs. A 4-bit LALB design was chosen as a balance between the smaller area and lower power of a -bit bloc and the speed of a full 8-bit bloc. The carry equations for a 4-bit LALB are as follows: C C C C 3 4 = C P + G 3 = C P + G 3 = C P + G = C P + G 3 = C P P + G P + G 3 = C P P P + G P P + G P + G = C P P P P + G P P P + G P P + G P + G 3 3 3 3 These logic functions were implemented using parallel cascading NAND gates (see Figure 6). - -
Figure 5: -bit PFA schematic - -
Figure 6: 4-bit LALB schematic - -
Figure 7: 8-bit CLA schematic - 3 -
While ripple-carry adders scale linearly with n number of adder bits, carry looahead adders scale roughly with m n. For optimum performance, m is equal to n. However, in practice n is usually a multiple of m. For this particular implementation of an n-bit carry loo-ahead adder with m-bit loo-ahead logic, the number of gates is defined as follows: G CLA m = 3n + = n / m [( + )( + 4) ] + 4( n / m ) where 3n is the number of gates in a -bit PFA, n / m loo-ahead logic blocs needed, m = is the number of carry [( + )( + 4) ] is the number of gates in a logic bloc, and 4( n / m ) is the number of gates for the buffers between logic blocs. n m 4 8 6 9 4 84 8 37 448 67 6 748 9 344 75 Table 4: Gate counts for various sized CLA adders Transistor sizing to optimize performance A number of attempts were made, particularly on the carry loo-ahead adder, to optimize performance through proper transistor sizing. In general, it was found that minimizing the transistor sizes also minimized the propagation delays. However, in cases where just a few transistors were used to drive a large number of outputs, increasing the width to length ratios of the driving gates often increased performance. This technique was used on each of the NAND gates driving the cout pins in the loo-ahead logic bloc. Increasing the width of the NMOS transistors in these NAND gates to double the minimum value while eeping the PMOS transistors minimum optimized the performance. When the number of gates driven by a single pair of transistors was particularly large, a buffer was used to decrease the output delays. This proved beneficial in two cases; once when driving the P (propagate) signal in the PFA logic, and once when driving the C input to the second loo-ahead logic bloc. In both cases, a buffer with only two stages was used, and the sizes of the first inverters were ept to a minimum in order to minimize the input capacitance to the first stage. The width of the second stage was made to be very nearly e times the first, in eeping with the theory for minimum buffer delay. - 4 -
Design Validation All of the designs outlined above were sufficiently tested in order to prove that the correct results were produced for all inputs. For both the static and dynamic implementations of the ripple carry adder, the one-bit slices were first exhaustively tested and confirmed to produce outputs matching the truth table shown in Table. The full 8- bit chained adders were not exhaustively tested, however, since testing all of the possible ^7 input combinations was simply not feasible. Instead, selected inputs and worst-case scenarios were chosen and tested. The primary concern was to ensure that all of the carry bits were connected to the next bit slice properly and that there were no loading effects, since each bit slice had already been proven correct. The validation of the carry loo-ahead adder was slightly more involved than the other designs. Each PFA slice was first exhaustively tested and confirmed to produce the correct propagate, generate and sum bits. Additionally, the LALB was selectively tested to validate the four carry bits. Finally, the full 8-bit adder was tested using selected inputs and worst-case scenarios. All three designs were successfully validated. Design Performance Performance Across Corners To assess the theoretical range of conditions under which each design would function properly, the performance of each implementation was evaluated at the three primary process corners: fast-fast, typi-typi, and slow-slow. Speed The Shmoo plots shown in Figure 8- display the maximum allowable operating frequency of each design at the three primary process corners. At each corner, VDD is varied % from its nominal value of.8v and the temperature is varied from 3 5 C. It is assumed that the target operating frequency is MHz. Values below this are colored red to indicate failure to meet this condition, while green indicates success. Temperature Carry Looahead Frequency (MHz) Fast-Fast (C) 5 485. 555.56 68.4 7.75 775.9 848.8 99.96 439.37 5.5 586.5 66.69 739.64 86.33 89.7 9 39.77 466.4 543.77 63.5 73.3 783.9 86.7 6 345.3 4.7 499.5 58.73 665.34 749.6 83.6 3 96.74 37.6 453.5 538.5 65.78 73.7 8. 46. 3.3 44. 49.6 583.43 675. 766.8-3 93.99 67.3 35. 44.8 537.35 633.7 73.46 VDD (V).6.68.74.8.86.9.98 Temperature Carry Looahead Frequency (MHz) Typi-Typi (C) 5 47.73 94.5 345.75 399.3 45.9 5. 566.89-5 -
5.89 64.5 33.93 368.78 46.68 484.75 544.5 9 84.6 3.9 8.99 336.4 395. 457. 57.7 6 53. 98.7 49.5 34.97 364.6 48.5 489.3 3.97 65.56 5.86 7.9 33.93 397. 463.65 9.78 9.97 79.9 35.3 97.69 36.43 43.69-3 63.36 97.87 4.97 96.64 58.48 35.46 394.59 VDD (V).6.68.74.8.86.9.98 Temperature Carry Looahead (MHz) Slow_Slow (C) 5 94.6 9.75 49.5 8.55 8.96 57.86 98.6 75.47 99.4 7.89 6.7 97.8 36.8 78.55 9 58. 79.8 6.73 38.68 75. 4.9 57.47 6 4.9 6. 85.9 6.33 5. 9.9 35.7 3 8.46 44.5 65.7 93.8 8.4 67.39.88 7.34 9.6 46.75 7.48 3.3 4.8 84.43-3 9. 6.94 9.95 49.98 77.88 3.43 55.38 VDD (V).6.68.74.8.86.9.98 Figure 8: Carry looahead shmoo plots Temperature Dynamic Ripple Carry Adder Frequency (MHz) Fast_Fast (C) 5 85.6 33.58 378.36 47.9 478.93 53.79 58.75 53.68 99. 347.7 398.4 45.86 54.9 558.35 9 8.77 63.5 3.7 36.3 45. 469.4 53.56 6 77.4 9.44 65.3 34.37 365.63 48.59 47.37 3 45.37 85.87 3. 8.3 33.56 385.6 439.75 8.5 57.5.7 5.4 35.34 36.36 46.84-3 9.4 8.83 73.76 4. 78.63 335.68 394.63 VDD (V).6.68.74.8.86.9.98 Temperature Dynamic Ripple Carry Adder Frequency (MHz) Typi-Typi (C) 5 4.96 69.7. 35.7 7. 38.55 347. 7.73 45. 75.53 8.5 43.84 8.8 39.8 9 9.5 7.54 46.5 77.49.5 47.5 85.6 6 74.5 98.3 5.87 57.4 9.3 7.43 65.6 3 58.34 8.45 7.3 38.4 7.38 9.5 48.39 43. 63.7 88.5 8.74 53. 9.4 3.4-3 9.37 46.45 69.54 98.5 3.4 7.5. VDD (V).6.68.74.8.86.9.98 Temperature Dynamic Ripple Carry Adder Frequency (MHz) Slow_Slow (C) 5 48.38 6.9 78.55 97.37 8.54 4.74 66.6 36.5 48.78 63.73 8.43.8 4.4 48.94 9 7.4 38.7 5.8 68.97 88.8.33 35.9 6 9.6 8.89 4.36 57. 76.39 98.6 3.33 3 3.4.5 3.5 45.68 63.94 85.6. - 6 -
7.86 3.38. 34.55 5.8 7.5 96.5-3 4.6 7.77 4.3 4. 38.79 58.4 8.43 VDD (V).6.68.74.8.86.9.98 Figure 9: Dynamic ripple carry Shmoo plots Temperature Static Ripple Carry Adder Frequency (MHz) Fast_Fast (C) 5 5. 43.7 7.96 3. 33. 36.66 393.4 99.4 7.48 57.4 87.44 38.88 35.39 38.7 9 8.48.46 4.66 7.93 35.34 337.84 37.89 6 64.69 94.48 5.48 57.53 9. 34.68 359.7 3 46.6 76.8 7.64 4.5 74.95 3.66 347. 6.37 56.8 88.9.5 57.6 94.64 33. -3 4.59 33.96 66.4.7 37.76 76. 35.86 VDD (V).6.68.74.8.86.9.98 Temperature Static Ripple Carry Adder Frequency (MHz) Typi_Typi (C) 5 8.5 38.3 59.57 8.95 5. 9. 5.6 4.43 4.47 45.94 68.69 9.49 3.86 34.58 9 9.33.4 3.7 54.73 76.37 96.89 8.5 6 75.93 95.33 6.66 38.8 58.3 78.79.4 3 6.3 79.74. 8.6 38.43 59.34 8.39 46.69 63.65 8.6 98.4 7.44 38.37 6.7-3 3.49 45.3 59.59 76. 94.88 5.55 38.8 VDD (V).6.68.74.8.86.9.98 Temperature Static Ripple Carry Adder Frequency (MHz) Slow_Slow (C) 5 49.8 63.33 78.86 96.6.93 3. 47. 4.5 5.77 67.75 84.75 99.7 8.6 33.69 9 3.88 4.5 56.69 73.5 87.7 4.53 9.57 6.5 3.7 45.77 6.39 74.46 89.77 4.7 3 5. 3.6 35. 49.43 6.6 74.46 88.97 9.9 5.63 5.8 37.48 48.7 58.5 7.4-3 4.98 9.5 6.4 6.3 35.97 4.9 55.4 VDD (V).6.68.74.8.86.9.98 Figure : Static ripple carry adder Shmoo plots The Shmoo plots above clearly demonstrate the supremacy of the carry looahead implementation over the others in terms of speed, since there is a significantly larger range of values over which the design exceeds the MHz mar. It is interesting to note the extremely large range of possible values that the maximum frequency taes on in these plots. For instance, a carry looahead adder with a fast_fast process, temperature of 5 C, and VDD of.98v has a maximum frequency of over 9MHz, while one with a slow_slow process, temperature of -3 C, and VDD of.6 only has a maximum - 7 -
frequency of about 9MHz. This is a difference of roughly fold from corner to corner. The other designs also show this wide variation. Perhaps an even more lucid visualization of the differences in speed between the various implementations is given by the graphs in Figure, Figure, and Figure 3, which show how the maximum frequency varies with process, temperature, and VDD. FAST-FAST Maximum Frequency [Hz] 8 6 4-5.6.7 5.8.9 5 VDD [V] Temperature [C] Figure : Max frequency of designs for fast_fast process - 8 -
TYPI-TYPI 5 Maximum Frequency [Hz] 4 3 -.75.7.65.9.85.8.95 Temperature [C] VDD [V] Figure : Max frequency of designs for typi_typi process SLOW-SLOW 5 Maximum Frequency [Hz] 5 5 5 5.95.9.85.8.75.7.65 Temperature [C] VDD [V] Figure 3: Max frequency of designs for slow_slow process - 9 -
The surface floating far above the other two surfaces represents the carry looahead data, the surface in between the other two represents the dynamic ripple carry data, and the bottom surface represents the static ripple carry data. From these graphs, not only can it be seen that the carry loo-ahead is the fastest design, but the general trend of increasing speed with increasing VDD and temperature can be seen as well. Power Another important concern is how the designs compare to one another in terms of power dissipation. Figure shows values of average power and energy per transition for (not finished) Conclusions Through the comparison of the three distinct adder architectures, the carry looahead adder was shown to be vastly superior in terms of circuit speed over virtually all testing conditions. Power analysis subsequently showed that the carry loo-ahead dissipated the most power and too up the most chip area, while the dynamic ripple-carry design was the most efficient in terms of power dissipation and chip area. However, as shown by the Shmoo plots and the data presented, the carry loo-ahead architecture is the only design out of those presented that will consistently be able to operate at frequencies greater than MHz. By default this must eliminate all other candidates. The carry loo-ahead adder is thus the most suitable design. References [] Baer, R. Jacob (5). CMOS Circuit Design, Layout, and Simulation (Second Edition). p368 [] Baer, R. Jacob (5). CMOS Circuit Design, Layout, and Simulation (Second Edition). P4 - -