An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths

An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths N. KRANITIS M. PSARAKIS D. GIZOPOULOS 2 A. PASCHALIS 3 Y. ZORIAN 4 Institute of Informatics & Telecommunications, NCSR Demokritos, Athens, Greece {nkran / mpsarak}@iit.demokritos.gr 3 Department of Informatics & Telecommunications, University of Athens, Greece paschali@di.uoa.gr Abstract Effective Built-In Self-Test (BIST) schemes using deterministic sequences generated by small counters have been proposed in the past for the common multiplier/accumulator pair. In this paper we show how near complete testability can be achieved with a regular counter-generated deterministic test set for the shifteraccumulator pair (accumulation performed either by an adder or an ALU) which appears very often in embedded processor or DSP datapaths. The BIST scheme provides near complete coverage with respect to the stuck-at fault model for any datapath width as it is verified by a comprehensive set of experiments. The proposed BIST scheme uses the same Test Pattern Generation (counters) and Output Data Evaluation (accumulators) resources as in our earlier BIST schemes for multiplier/accumulator pairs, thus completing a deterministic counter-based datapath BIST architecture.. Introduction Arithmetic and logic operations in datapaths are implemented by functional modules such as adders, subtracters, multipliers, Arithmetic Logic Units (ALUs) and shifters. Datapath architectures of microprocessors, embedded processors or specialized Digital Signal Processors usually suffer from testability problems within the arithmetic functional modules and for this reason research in the area of datapath testing focuses on these modules. Today's requirement for many, complex arithmetic modules deeply embedded in datapaths which are further embedded in larger System-on-Chip (SoC) designs, imposes serious testability problems. The use of efficient BIST schemes for datapath modules, such as multipliers [], [2], [3], [4], adders [3], [5], ALUs [3],[6], as well as for other embedded macros like RAMs [7], ROMs [8] and 2 Department of Informatics, University of Piraeus, Greece dgizop@unipi.gr 4 LogicVision, San Jose, CA, USA zorian@logicvision.com FIFOs [9], is a well justified solution since it permits atspeed testing, provides very high fault coverage and drives down the overall test cost. Datapath BIST schemes have been proposed in [], [], [3] and [5] to efficiently test the functional modules of datapaths. In [], [] testability modifications at the behavioral level are performed. These modifications are based on high-level metrics and require synthesis and stuck-at fault simulation in order to measure their effectiveness. In [3], Arithmetic BIST is proposed, a pseudo-random datapath BIST technique which effectiveness must be estimated for each particular datapath architecture via fault simulation. During this process, the above method attempts to optimize the relation between the gained fault coverage and the overheads imposed by its application (test application time, extra hardware and performance degradation). In [5], a deterministic BIST scheme for datapaths is proposed. It provides a guaranteed very high fault coverage, following deterministic rather than pseudorandom strategy. It does not depend on specific implementation of functional modules and thus does not require fault simulations to estimate its effectiveness. As it will be shown in this paper similarly to the case of the multiplier/adder (or multiplier/subtracter) pair, shifters can be tested in a combined fashion with adders/subtracters or with ALUs so that the overall test set size is reduced. The configuration of a shifter directly driving the inputs of an adder or ALU is a common one as shown in Figure. This datapath architecture is the core of the processor architecture of ARM [2]. The combined testing of shifter modules along with adders or ALUs in a single BIST session is very important, particularly in cases where no multiplier module exists in the datapath. In this paper we keep the same nature for the BIST Test Pattern Generation and Output Data Evaluation as in our earlier BIST schemes for datapath modules like multipliers (of all architectures) and adders/subtracters. Proceedings of the International Symposium on Quality Electronic Design (ISQED ) -7695-25-6/ $. 2 IEEE

Register File Steering Logic Multiplier Steering Logic Control Logic Figure. A Typical Datapath Structure This means that BIST Test Pattern Generation (TPG) is performed by counters of small sizes and Output Data Evaluation (ODE) is performed by existing accumulators (either by an adder or by an ALU). We present for first time a BIST scheme that resolves the specific testability problems of the shifter/adder and shifter/alu pair combinations. First, we determine the specific testability needs of the barrel shifter. Then, based on these we propose a regular test set that can be generated by a small counter-based TPG. The difficulties of the specific BIST approach come from the following two requirements: Testing of the shifter module must be performed by counters generating regular test patterns as in our earlier datapath BIST schemes (counter re-usability). The second-level test patterns generated at the shifter outputs must be capable of testing the subsequent adder or ALU of the shifter/adder or shifter/alu pair and no extra TPG means must be necessary for them. This way the entire functional modules pair is simply tested by a small counter and there is no need for insertion of any multiplexing logic between the shifter and the adder or ALU which could cause performance degradation. The deterministic BIST scheme proposed in this paper guarantees a fault coverage higher than 99% for any datapath width of the shifter/adder or shifter/alu pairs. The BIST scheme efficiency is verified by a comprehensive set of experimental results. 2. Datapath Built-In Self-Test A typical datapath consists of the functional elements that perform the arithmetic or logic calculations (ALUs, multipliers, adders, shifters, etc), storage elements (register files, etc) and steering logic (multiplexers, buses). A datapath is coupled with a control logic (a Finite State Machine) which provides it with control signals for the data flow and receives from it a group of status signals for subsequent program execution. A typical datapath architecture is shown in Figure. Shifter Adder/ALU Built-In Self-Test methodologies for general digital circuits are usually classified as pseudorandom and deterministic. The same classification is applied to datapath testing strategies. In pseudorandom BIST a large number of test patterns is applied to the circuit under test generated by a pseudorandom generator like a Linear Feedback Shift Register (LFSR) or a Cellular Automaton (CA). Output Data Evaluation (ODE) is performed by a Multiple Input Signature Register (MISR). Another excellent alternative in pseudorandom BIST for datapath architectures is Arithmetic BIST (ABIST) proposed in [3]. In this scheme, pseudorandom test vectors are generated [3] and responses are compacted [4] by arithmetic modules (adders, subtractors). The efficiency of ABIST has been proved to be equal to LFSR-based pseudorandom testing while it imposes near zero hardware overhead since BIST is performed by pre-existing modules of the datapath. Deterministic BIST for datapath architectures has been proposed in [5]. The generation of the constant deterministic test set is performed by fixed length countbased machines or by slight modifications of the input registers. Response compaction is performed by existing arithmetic modules like adders or subtractors. The efficiency of the BIST architecture is the same in any datapath width or any implementation of the modules and no new TPG must be designed when the size of the datapath changes. We complete the deterministic datapath BIST architecture that we have proposed so far in the literature by presenting an effective BIST scheme for the common shifter/adder and shifter/alu functional units pairs. The requirements we set for an effective datapath BIST scheme are: BIST Test Pattern Generation should be performed by counters so that the same TPG resources (reusability) can be used for all the datapath functional modules (multipliers, adders, ALUs, shifters). We do not use any pseudorandom (LFSR-based) test pattern generators. BIST Output Data Evaluation should be performed by existing modules (accumulator based compaction) which is the most efficient compaction technique in datapaths where accumulator always exist (using adder modules or ALU modules). We do not need MISR-based compaction. We satisfy the above requirements and provide a complete effective BIST scheme fully compatible with our previously proposed datapath BIST architecture. Proceedings of the International Symposium on Quality Electronic Design (ISQED ) -7695-25-6/ $. 2 IEEE

Proceedings of the International Symposium on Quality Electronic Design (ISQED ) -7695-25-6/ $. 2 IEEE 3. Proposed Shifter/Adder and Shifter/ALU BIST Architecture The target functional modules configuration is shown in Figure 2, which is part of Figure. The path from the output of register R2 to the input of the Adder/ALU always exists in datapath architectures for the implementation of the accumulation operation. Also, the output of the shifter is in most cases directly connected to the one of the two operands of the adder or the ALU to increase performance. 3.. Barrel Shifter Design The majority of datapath architectures contains a structure that allows the datapath words to be shifted. Efficient shifting capability is very important since very common operations like multiplication or division by powers of 2 can be easily implemented by multiple shifts instead of time consuming multiplications and divisions. Inclusion of a barrel shifter with multiple bits shift functionality results in performance increase since shifting any number of positions is executed in a single clock cycle. If the datapath does not include a barrel shifter, multiple shifts require multiple clock cycles. We consider the testability of a classical barrel shifter implementation based on multiplexers as implemented in Synopsys DesignWare Foundation Library [5]. The 8-bit barrel shifter implementation performing Rotate Left (ROL) is shown in Figure 3. Input word is denoted X 7 X and output word is denoted Y 7 Y. The number of shifts is determined by 3 control signals S 2, S, S each of which controls the select inputs of the multiplexers of each stage, i.e. S controls the multiplexer select inputs of left stage, S of middle stage and S 2 of right stage. Each of signals S 2, S, S shown in Figure 3 is connected to all multiplexers of the same stage. R Shifter Adder/ALU Figure 2. Shifter accumulator pair R2 X D, X 7 D,6 X D, X D,7 X D 2,2 X D, X D 3,3 X 2 D, X D 4,4 X 3 D,2 X D 5,5 X 4 D,3 X D 6,6 X 5 D,4 X D 7,7 X 6 D,5 D, D, D,2 D,3 D,4 D,5 D,6 D,7 D,4 D,5 D,6 D,7 D, D, D,2 D,3 S S S 2 Figure 3. 8-bit barrel shifter The architecture of the N-bit barrel shifter supports left rotation of N-bit word by to N- positions. Shifting is performed in m stages, where m=log(n). Each shifter stage consists of N total 2: multiplexers that select the proper signal according to the number of positions denoted by the control lines S, S,, S log(n)-. These m shifter stages are cascaded and each stage i, i m- shifts the data input by 2 i positions. For example, in the 8- bit barrel shifter implementation of Figure 3, three shifter stages are cascaded each one performing a one bit, two bits and four bits positions shift, respectively. The number of shifts is equal to log( N ) i i= S 2 i Y Y Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 since the arithmetic value of word S = S log(n)-,, S, S selects the number of bits that the input data have to be shifted, i.e. when S equals to binary a 6-bit shift should be implemented. The multiplexers in each stage select either to shift the data by its power of 2 for S i =, or pass the data through unchanged to the next stage for S i =.

3.2. Barrel Shifter Testability In this paper we deal with barrel shifters performing the rotate operation (ROL or ROR). The way we deal with the barrel shifter testability has two major objectives: The test set to be used in the shifter/adder or shifter/alu BIST scheme must be easily generated by counter-based TPG so that it is compatible to our earlier datapath BIST schemes. This way the same TPG and ODE resources are re-used for all the functional modules of the datapath. The test set to be used for the shifter must be capable of testing the subsequent adder/alu since the shifter outputs in our configuration are connected directly to the adder/alu inputs. Especially in the case of the ALU which is much more complex than the adder, this is very important. For the multiplexer-based implementation of the barrel shifter shown in Figure 3, a test set provides complete single stuck-at fault testability if it applies to each of the 2: multiplexers the following four input combinations: S A A where S is the select signal of the multiplexer and A, A are the two data inputs which are selected when S= and S=, respectively. It can be easily verified that these four multiplexer input combinations detect all stuck-at faults at the input and output ports of the multiplexer if the multiplexer is considered as a library cell with no information of its internal implementation. Additionally, in the case that the multiplexer is realized internally, at the gate level, as a classical two-level AND-OR or NAND- NAND implementation, the four input combinations detect all gate single stuck-at faults. Thus, in any case complete testability of the shifter is guaranteed with such a test set. We can verify from the structure of the barrel shifter depicted in Figure 3, that a test set that has the following property is sufficient to apply the four input combinations to all the multiplexer cells of a barrel shifter of any size: Property : For each j=,,, N- and for each i=,,, logn-, the test set contains a test vector where X[j] X[(j+2 i ) mod N], for both values and of X[j] and for both values and of the select signal S[i] at column i. The simple idea behind Property is that input signals X[j] and X[(j+2 i ) mod N] of a barrel shifter that performs either ROR or ROL, converge at level i at a multiplexer which must receive different data input values for both values of the select signal S[i]. In this paper we present a test set consisting of only 6 test vectors that satisfies Property and thus provides % stuck-at fault testability independently of the shifter size. This test set is constructed as follows: The first 8 test vector group applies S[]=S[]= =S[logN-]= and the second 8 test vector group applies S[]=S[]= =S[logN-]=. For each 8 test vector group, the data inputs of the shifter X[], X[],, X[N-] receive their value from the outputs c[2], c[], c[] of a 3-bit binary counter that goes through all its 8 states, with the following assignment (N is the word length and it is an even integer): X[j]=c[j mod 3], if j N/2 or X[j]=NOT {c[(n-j) mod 3]}, if N/2+ j N- We will prove that using this assignment, it is guaranteed that Property is satisfied or, in other words, shifter inputs X[j] and X[(j+2 i ) mod N], for every i receive a different value (both for S[i]= and S[i]=). Let us consider the two possible cases that X[j] and X[(j+2 i ) mod N] belong to a different half part of the input word (one half part is between and N/2 and the other one is between N/2+ and N-) or they belong to the same half part of the input word. If they belong to a different half part of the input word, then given the proposed assignment, they are connected either to different counter outputs, or they are connected to the same counter output but always receive complementary values. Thus, applying the proposed test set, the multiplexer inputs X[j] and X[(j+2 i ) mod N] belonging to a different half part of the input word receive complementary values for both S[i]= and S[i]=. If they belong to the same half part of the input word they are always connected to independent counter outputs as we will prove by contradiction. If we assume that they are connected to the same counter output and they belong to the first half part, this implies that: j mod 3 = (j+2 i ) mod 3, or j+2 i = j +3p, or 2 i = 3p which can never happen. Also, if we assume that they are connected to the same counter output and they belong to the second half part, this implies that: (N-j) mod 3 = (N-j-2 i ) mod 3, or N-j= N- j-2 i +3p, or 2 i = 3p which again can never happen. Therefore applying the proposed test set, the multiplexer inputs X[j] and X[(j+2 i ) mod N] belonging to the same half part of the input word receive complementary values for both S[i]= and S[i]=. Proceedings of the International Symposium on Quality Electronic Design (ISQED ) -7695-25-6/ $. 2 IEEE

9-bit binary counter c[8:5] c[4] c[3] c[2] c[] c[] 4 S[3:] M 4 F[3:] X[5:] Figure 4. BIST TPG for 6-bit shifter-accumulator (ALU-based compaction) As a conclusion, the 6-vectors test set generated by a 3-bit binary counter and with the described above assignment to the shifter inputs, provides complete testability with respect to stuck-at fault model for any word length of the barrel shifter design. 3.3. Shifter/Adder, Shifter/ALU Testability According to our generic BIST architecture presented in the past with which we want to be compatible, the BIST Test Pattern Generator must be based on a binary counter that feeds the control and data inputs of the shifter (in the case of shifter/alu pair the control inputs of the ALU are also fed by the binary counter). For the implementation of the adders we used the ripple-carry, carry lookahead and Brent-Kung architectures and for the implementation of the Arithmetic Logic Unit (ALU) we used an architecture that is functionally equivalent with the classical 748 ALU [6]. In a 748 functionally equivalent ALU, the functions are selected by one mode select input M, which distinguishes between logic and arithmetic functions and four function select inputs denoted F[] F[3]. Based on requirements for the shifter testing presented earlier and after constructing the sufficient test set of the 6 test vectors for the barrel shifter of any size, we can extend it so that the subsequent adder or ALU that accumulates the shifter outputs as shown in Figure 2, tested as well, since it is obvious that the 6 test vectors used to test the barrel shifter are not sufficient to completely test the subsequent adder or ALU. We propose the use of an log 2 N+3 binary counter as a Test Pattern Generator for the case that the accumulation is performed by an adder, or the use of an log 2 N+5 binary counter as a Test Pattern Generator for the case that the accumulation is performed by an ALU. In each case the 3 bits of the counter are connected to the barrel shifter data inputs based on the assignment given earlier, the log 2 N bits of the counter are connected to the barrel shifter select inputs, while in the case of the ALU accumulator the 2 extra bits of the counter are connected to the function select inputs of the ALU (one bit to the M input and one bit to the F[i] inputs). During BIST, the ALU operates in four of its possible functions (M F[3:] =,,,, which are addition, subtraction, exclusive-or and exclusive-nor) which is one of the possible selections that guarantee complete testability for the ALU as it was noted in [5]. The binary counter TPGs provide in both cases (adder or ALU accumulation) the sufficient test patterns to achieve near complete stuck-at fault testability for the adder or ALU accumulator. The binary counter TPG is shown in Figure 2 for the case of an ALU accumulator and for a 6 bits word length (N=6). If accumulation is performed by an adder the 2 bits of the counter c[4:3] are eliminated and the log 2 N+5 bits (9 bits in this case) counter is simplified to a log 2 N+3 bits counter (7 bits in this case). When accumulation is performed in an adder the total number of test patterns applied to an 8-bit, 6-bit and 32- bit datapath are 64, 28 and 256, respectively. When accumulation is performed in an ALU the total number of test patterns applied to an 8-bit, 6-bit and 32-bit datapath are 256, 52 and 24, respectively. The test set is applied to any datapath width and provides excellent fault coverage results in all cases as we summarize in the following section. Proceedings of the International Symposium on Quality Electronic Design (ISQED ) -7695-25-6/ $. 2 IEEE

Proceedings of the International Symposium on Quality Electronic Design (ISQED ) -7695-25-6/ $. 2 IEEE 4. Experimental Results We have implemented various architectures for the shifter-accumulator pairs. All designs were implemented using a.8 micron double-metal 5V CMOS standard cell library provided by AMS. Specifically, we implemented the following three adder architectures for the accumulator: (i) a simple ripple carry adder (rca), (ii) a carry lookahead adder (cla), (iii) a Brent-Kung adder (bka) [7] and a 748 functionally equivalent ALU [6]. Output data evaluation is performed by using a cascaded accumulator-based compaction scheme, which provides nearly zero aliasing effects [3]. The BIST scheme provides a generic solution for any datapath width, achieves full coverage for the shifter and near complete fault coverage for the shifter/accumulator pair, and the following table summarizes the results for 6-bit and 32-bit datapath width as a sample of the performed verification set of experiments. The fault coverage results shown in Table are calculated after compaction. It should be noted that the test set applied achieves full (%) coverage for the barrel shifter which is verified by the experiments but not indicated in Table. Datapath Width 5. Conclusions Table : Experimental Results Functional Pair Fault coverage 6 Shifter/Adder(rca).% 6 Shifter/Adder (cla) 99.6% 6 Shifter/Adder (bka).% 6 Shifter/ALU 99.9% 32 Shifter/Adder (rca).% 32 Shifter/Adder (cla).% 32 Shifter/Adder (bka) 99.5% 32 Shifter/ALU.% We presented an effective deterministic BIST scheme for shifter/adder and shifter/alu and showed how near complete testability can be achieved with a regular counter-generated deterministic test set for the shifteraccumulator pair (accumulation performed either by an adder or an ALU). The proposed scheme is compatible with our earlier datapath BIST schemes. TPG is performed by counters and ODE is performed by accumulators (using either adders or ALUs). This way the same TPG, ODE resourses are re-used for all modules. References [] D.Gizopoulos, A.Paschalis, Y.Zorian, An Effective Built-In Self-Test Scheme for Array Multipliers, IEEE Trans. on Computers, vol. 48, no. 9, pp. 936-95, September 999. [2] D.Gizopoulos, A.Paschalis, Y.Zorian, An Effective Built-In Self-Test Scheme for Booth Multipliers, IEEE Design & Test of Computers, vol. 5, no. 3, pp. 5-, July-September 998. [3] J.Rajski, J.Tyszer, Arithmetic Built-In Self-Test for Embedded Systems, Prentice Hall, 997. [4] A.Paschalis, M.Psarakis, D.Gizopoulos, N.Kranitis, Y.Zorian, An Effective BIST Architecture for Fast Multiplier Cores, in Proc. of the IEEE Design Automation & Test in Europe, pp. 7-2, 999. [5] D.Gizopoulos, A.Paschalis, Y.Zorian, An Effective BIST Scheme for Datapaths, ITC, pp. 76-85, 996. [6] D.Gizopoulos, A.Paschalis, Y.Zorian, M.Psarakis, An Effective BIST Scheme for Arithmetic Logic Units, in Proc. of the IEEE International Test Conference, pp. 868-877, November 997. [7] B.Nadeau-Dostie, A.Silburt, V.K.Agarwal, "Serial Interfacing for Embedded Memory Testing", IEEE D&T of Comp., vol. 7, no. 2, pp. 52-63, April 99. [8] Y. Zorian and A. Ivanov "An Effective BIST Scheme for ROMs", IEEE Trans. on Computers, vol. 4, no. 5, pp. 646-653, May 992. [9] Y. Zorian, A.J. Van de Goor and I. Schanstra, "An Effective BIST Scheme for Ring-Address Type FIFOs", in Proc. IEEE ITC, pp. 378-387, 994. [] C. Papachristou, S. Chiu, H. Harmanani, "A Data Path Synthesis Method for Self-Testable Designs", in Proc. 28th ACM/IEEE DAC, pp. 378-384, 99. [] M. Vahidid, A. Orailoglu, "Testability Metrics for Synthesis of Self-Testable Designs and Effective Test Plans", in Proc. 3th IEEE VTS., pp. 7-75, 995. [2] ARM9TDMI (Rev. ), Technical Reference Manual, Nov. 998. [3] S.Gupta, J. Rajski, J. Tyszer, Arithmetic Additive Generators of Pseudo-Exhaustive Test Patterns, IEEE Transactions on Computers, vol. 45, no. 8, pp. 939-949, August 996. [4] J. Rajski, J. Tyszer, Test Responses Compaction in Accumulators with Rotate Carry Adders, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 2, no. 4, pp. 53-539, April 993. [5] Synopsys, DesignWare Foundation Library Databook, vol., ver. 999.5, February 999. [6] Texas Instruments, TTL Data Book, vol 2, Dallas, Texas, 985. [7] R.P.Brent, H.T.Kung, A Regular Layout for Parallel Adders, IEEE Transactions on Computers, vol. C-3, no. 3, pp 26-264, March 982.