Design and Implementation of Concurrent Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications



Similar documents
DESIGN OF AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE FOR MOTION ESTIMATION TESTING APPLICATIONS

MOTION ESTIMATION TESTING USING AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE

Efficient Motion Estimation by Fast Three Step Search Algorithms

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

302 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009

FPGA Design of Reconfigurable Binary Processor Using VLSI

Low-resolution Image Processing based on FPGA

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

Floating Point Fused Add-Subtract and Fused Dot-Product Units

How To Improve Performance Of The H264 Video Codec On A Video Card With A Motion Estimation Algorithm

An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers

Efficient Coding Unit and Prediction Unit Decision Algorithm for Multiview Video Coding

Image Compression through DCT and Huffman Coding Technique

On the Data Reuse and Memory Bandwidth Analysis for Full-Search Block-Matching VLSI Architecture

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Design Verification and Test of Digital VLSI Circuits NPTEL Video Course. Module-VII Lecture-I Introduction to Digital VLSI Testing

Reversible Data Hiding for Security Applications

Module-I Lecture-I Introduction to Digital VLSI Design Flow

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Digital Systems Design! Lecture 1 - Introduction!!

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition

Friendly Medical Image Sharing Scheme

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques

Introduction to Digital System Design

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

Combating Anti-forensics of Jpeg Compression

Implementation of Full -Parallelism AES Encryption and Decryption

Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm

Testing of Digital System-on- Chip (SoC)

Attaining EDF Task Scheduling with O(1) Time Complexity

Invisible Image Water Marking Using Hybrid DWT Compression/Decompression Technique

Introduction to VLSI Testing

IJESRT. [Padama, 2(5): May, 2013] ISSN:

Complexity-rate-distortion Evaluation of Video Encoding for Cloud Media Computing

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

A comprehensive survey on various ETC techniques for secure Data transmission

Functional-Repair-by-Transfer Regenerating Codes

Memory Testing. Memory testing.1

MAXIMIZING RESTORABLE THROUGHPUT IN MPLS NETWORKS

Improved NAND Flash Memories Storage Reliablity Using Nonlinear Multi Error Correction Codes

A Fast Signature Computation Algorithm for LFSR and MISR

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Image Authentication Scheme using Digital Signature and Digital Watermarking

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

A System for Capturing High Resolution Images

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

Load Balancing in Fault Tolerant Video Server

Clock Distribution in RNS-based VLSI Systems

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Architectures and Platforms

Computation of Forward and Inverse MDCT Using Clenshaw s Recurrence Formula

路 論 Chapter 15 System-Level Physical Design

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

MEDICAL IMAGE COMPRESSION USING HYBRID CODER WITH FUZZY EDGE DETECTION

AN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS

A New Programmable RF System for System-on-Chip Applications

AUTOMATED AND ADAPTIVE DOWNLOAD SERVICE USING P2P APPROACH IN CLOUD

VLSI Design Verification and Testing

DCT-JPEG Image Coding Based on GPU

Parametric Comparison of H.264 with Existing Video Standards

LOW POWER MULTIPLEXER BASED FULL ADDER USING PASS TRANSISTOR LOGIC

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

Design Verification & Testing Design for Testability and Scan

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier

What is a System on a Chip?

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

WATERMARKING FOR IMAGE AUTHENTICATION

REDUCED COMPUTATION USING ADAPTIVE SEARCH WINDOW SIZE FOR H.264 MULTI-FRAME MOTION ESTIMATION

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

Systolic Computing. Fundamentals

Internet Video Streaming and Cloud-based Multimedia Applications. Outline

Hardware Implementation of AES Encryption and Decryption System Based on FPGA

TABLE OF CONTENTS. xiii List of Tables. xviii List of Design-for-Test Rules. xix Preface to the First Edition. xxi Preface to the Second Edition

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

Transcription:

Design and Implementation of Concurrent Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications 1 Abhilash B T, 2 Veerabhadrappa S T, 3 Anuradha M G Department of E&C, JSS Academy of Technical Education, Bengaluru -60 1 bt.abhilash@gmail.com, 2 anuarun.19@gmail.com, 3 veeru.st@gmail.com Abstract: BIST schemes generally focus on memory circuit; testing-related issues of video coding have seldom been addressed. Thus, exploring the feasibility of an embedded testing approach to detect errors and recover data of motion estimation (ME) is of worthwhile interest. Additionally, the reliability issue of numerous processing elements (PEs) in a ME can be improved by enhancing the capabilities of concurrent error detection (CED). Given the critical role of ME in a video coder, testing such a module is of priority concern. While focusing on the testing of ME in a video coding system, this work presents an error detection and data recovery (EDDR) design, based on the residue-and quotient (RQ) code, to embed into ME for video coding testing applications. An error in PEs, i.e. key components of a ME, can be detected and recovered effectively by using the proposed EDDR design [1]-[2]. In this paper, necessary modules have been implemented to make architecture to insert into motion estimation. The architecture designed and tested using exhaustive test bench using modelsim simulator. Experimental results indicate that the modules required for proposed EDDR design for ME testing can detect errors and recover data with 3 clock cycles and area of 17% LUTs, 4% Flip-flops and 12% of GCLKs. Keywords Motion Estimation, SAD Tree, Error detection and recover circuits. I. INTRODUCTION Video compression is necessary in a wide range of applications to reduce the total data amount required for transmitting or storing video data. Among the coding systems, a ME is of priority concern in exploiting the temporal redundancy between successive frames, yet also the most time consuming aspect of coding. Additionally, while performing up to 60% 90% of the computations encountered in the entire coding system, a ME is widely regarded as the most computationally intensive of a video coding system. AME generally consists of PEs with a size of 4x4. However, accelerating the computation speed depends on a large PE array, especially in high resolution devices with a large search range such as HDTV. Additionally, the visual quality and peak signal-to-noise ratio (PSNR) at a given bit rate are influenced if an error occurred in ME process. ADVANCES in semiconductors, digital signal processing, and communication technologies have made multimedia applications more flexible and reliable [3] - [4]. In the advance of VLSI technologies facilitate the integration of a large number of PEs of a ME into a chip, the logic-per-pin ratio is subsequently increased, thus decreasing the efficiency of logic testing on the chip. As a commercial chip, it is absolutely necessary for the ME to introduce design for testability (DFT) [5] - [7].DFT focuses on increasing the ease of device testing, thus guaranteeing high reliability of a system. DFT methods rely on reconfiguration of a circuit under test (CUT) to improve testability. While DFT approaches enhance the testability of circuits, advances in sub-micron technology and resulting increases in the complexity of electronic circuits and systems have meant that built-in self-test (BIST) schemes have rapidly become necessary in the digital world. BIST for the ME does not expensive test equipment, ultimately lowering test costs [8] - [10]. Moreover, BIST can generate test simulations and analyse test responses without outside support. However, increasingly complex density of circuitry requires that the BIST schemes not only detect faults but also specify their location for error correcting.. Thus, extended schemes of BIST referred to as built-in selfdiagnosis [11] and built-in self-correction [12] - [14] have been developed recently. The extended BIST schemes generally focus on memory circuit, testing-related issues of video coding have been addressed. Thus, exploring the feasibility of an embedded testing approach to detect errors and recover data of a ME is of worthwhile interest. Additionally, the reliability issue of numerous processing elements (PEs) in a ME can be improved by enhancing the capabilities of concurrent error detection (CED)[15]. The CED approach can detect errors through conflicting and undesired results generated from operations on the same operands. CED can also test the circuit at full operating speed without interrupting a system. Thus, based on the CED concept, this 1

work develops an EDDR architecture based on the RQ code to detect errors and recovery data in PEs of a ME. This paper is organized as follows. Section 2 gives the circuit design of RQ code generator. Section 3 introduces the EDDR architecture and test method. Conclusions are in section 4. II. RQ CODE GENERATION Coding approaches such as parity code, Berger code, and residue code have been considered for design applications to detect circuit errors. Residue code is generally separable arithmetic codes by estimating a residue for data and appending it to data. Error detection logic for operations is typically derived by a separate residue code, making the detection logic is simple and easily implemented. Error detection logic for operations is typically derived using a separate residue code such that detection logic is simply and easily implemented. However, only a bit error can be detected based on the residue code. Additionally, an error can t be recovered effectively by using the residue codes. Therefore, this work presents a quotient code, which is derived from the residue code, to assist the residue code in detecting multiple errors and recovering errors. the corresponding circuit design of the RQCG is easily realized by using the simple adders (ADDs). Namely, the RQ code can be generated with a low complexity and little hardware cost.. The mathematical model of RQ code is simply described as follows. Assume that binary data X is expressed as Significantly, the value of k is equal to[n/2] and the data formation of Y 0 and Y 1 are a decimal system. If the modulus m = 2 k - 1, then the residue code of modulo is given by R = X m = Y 0 +Y 1 m = Z 0 + Z 1 m = (Z 0 +Z 1 )α (5) Notably, since the value of Y 0 + Y 1 is generally greater than that of modulus m, the equations in (5) and (6) must be simplified further to replace the complex module operation with a simple addition operation by using the parameters Z 0,Z 1, α and β. The RQ code of X modulo m expressed as R= X m Q=[X/m], respectively. Notably [i] denotes the largest integer not exceeding i.according to the above RQ code expression, the corresponding circuit design of the RQCG can be realized. In order to simplify the complexity of circuit design, the implementation of the module is generally dependent on the addition operation. Additionally, based on the concept of residue code, the following definitions shown can be applied to generate the RQ code for circuit design. Definition 1: N1+N2 m = N1 m + N2 m m. (2) Definition 2: Let N j = n 1 +n 2 + + n j, then N j m = n 1 m + n 2 m + n j m m. (3 ) To accelerate the circuit design of RQCG, the binary data shown in (1) can generally be divided into two parts: Fig. 1. M modulo N operation Based on (5) and (6), the corresponding circuit design of the RQCG is easily realized by using the simple adders (ADDs). Namely, the RQ code can be generated with a low complexity and little hardware cost. III. EDDR ARCHITECTURE Fig. 2 shows the conceptual view of the proposed EDDR scheme, which comprises two major circuit designs, i.e. error detection circuit (EDC) and data recovery circuit (DRC), to 2

detect errors and recover the corresponding data in a specific CUT. The test code generator (TCG) in Fig. 1 utilizes the concepts of RQ code to generate the corresponding test codes for error detection and data recovery. In other words, the test codes from TCG and the primary output from CUT are delivered to EDC to determine whether the CUT has errors. DRC is in charge of recovering data from TCG. Fig. 2. Conceptual view of the EDDR Architecture Additionally, a selector is enabled to export error-free data or data-recovery results. Importantly, an array based computing structure, such as ME, discrete cosine transform (DCT), iterative logic array (ILA), and finite impulse filter (FIR), is feasible for the proposed EDDR scheme to detect errors and recover the corresponding data. This work adopts the systolic ME as a CUT to demonstrate the feasibility of the proposed EDDR architecture. A ME consists of many PEs incorporated in a 1- D or 2-D array for video encoding applications. A PE generally consists of two ADDs (i.e. an 8-b ADD and a 12-b ADD) and an accumulator (ACC). Next, the 8-b ADD (a pixel has 8-b data) is used to estimate the addition of the current pixel (Cur_pixel) and reference pixel (Ref_pixel). Additionally, a 12-b ADD and an ACC are required to accumulate the results from the 8-b ADD in order to determine the sum of absolute difference (SAD) value for video encoding applications. Notably, some registers and latches may exist in ME to complete the data shift and storage. Fig. 2 shows an example of the proposed EDDR circuit design for a specific PE i of a ME. The fault model definition, RQCGbased TCG design, operations of error detection and data recovery, and the overall test strategy are described carefully as follows. A. Sum of Absolute Difference Tree PEs utilizing the concept of the proposed SAD Tree Architecture (Fig. 3). Fig. 3 SAD Tree Architecture The proposed SAD Tree is a 2-D intra-level architecture and consists of a 2-D PE array and one 2-D adder tree with propagation registers Current pixels are stored in each PE, and reference pixels are stored in propagation registers for data reuse. In each cycle, current and reference pixels are inputted to PEs. Simultaneously, continuous reference pixels in a row are inputted into propagation registers to update reference pixels. In propagation registers, reference pixels are propagated in the vertical direction row by row. In SAD Tree architecture, all distortions of a searching candidate are generated in the same cycle, and by an adder tree, distortions are accumulated to derive the SAD in one cycle.. B. Fault Model The PEs are essential building blocks and are connected regularly to construct a ME. Generally, PEs are surrounded by sets of ADDs and accumulators that determine how data flows through them. PEs can thus be considered the class of circuits called ILAs, whose testing assignment can be easily achieved by using the fault model, cell fault model (CFM). Using CFM has received considerable interest due to accelerated growth in the use of high-level synthesis, as well as the parallel increase in complexity and density of integration circuits (ICs). Using CFM makes the tests independent of the adopted synthesis tool and vendor library. Arithmetic modules, like ADDs (the primary element in a PE), due to their regularity, are designed in an extremely dense configuration. Moreover, a more comprehensive fault model, i.e. the stuck-at (SA) model, must be adopted to cover actual failures in the interconnect data bus between PEs. The SA fault is a well known structural fault model, which assumes that faults cause a line in the circuit to behave as if it were permanently at logic 0 (stuck-at 0 (SA0)) or logic 1 [stuck-at 1 (SA1)]. The SA fault in a ME architecture can incur errors in computing SAD values. A distorted computational error(e) and the magnitude of (e) are assumed here to be equal to SAD - SAD, where SAD denotes the computed SAD value with SA faults. 3

C. TCG Design According to Fig 4, TCG is an important component of the proposed EDDR architecture. Notably, TCG design is based on the ability of the RQCG circuit to generate corresponding test codes in order to detect errors and recover data. The specific PE i in Fig. 3 estimates the absolute difference between the Cur_pixel of the search area and the Ref_pixel of the current macro block Thus, by utilizing PEs, SAD shown in as follows, in a macro block with size of N X N can be evaluated: Fig. 4. A specific PE i testing process of the proposed EDDR architecture Where r xij, q xij and r yij, q yij denote the corresponding RQ code of X ij and Y ij modulo m. Importantly, X ij and Y ij represent the luminance pixel value of Cur_pixel and Ref_pixel, respectively. Based on the residue code, the definitions shown in (2) and (3) can be applied to facilitate generation of the RQ code (R T and Q T ) form TCG. Namely, the circuit design of TCG can be easily achieved (see Fig. 4) by using Based on the definition of the fault model, the SAD value is influenced if either SA1 and/or SA0 errors have occurred in a specific PE. In other words, the SAD value is transformed to SAD =SAD + e, if an error occurred. Notably, the error signal is expressed as e = q e. m + r e Fig. 5. Circuit design of the TCG D. EDDR Process Error detection in a specific PE is achieved by using EDC, which is utilized to compare the outputs between TCG and RQCG in order to determine whether errors have occurred. The EDC output is then used to generate a 0/1 signal to indicate that the tested is error free errancy. This work presents a mathematical statement to verify the operations of error detection. 4

During data recovery, the circuit DRC plays a significant role in recovering RQ code from TCG. The data can be recovered by implementing the mathematical model as SAD = m * Q T + R T = (2 j -1)* Q T + R T = 2 j *Q T Q T + R T To realize the operation of data recovery in, a Barrel shift and a corrector circuits are necessary to achieve the functions of (2j*Q T ) and ( Q T + R T ) respectively. Notably, the proposed EDDR design executes the error detection and data recovery operations simultaneously. Additionally, error-free data from the tested PE i or the data recovery that results from DRC is selected by a multiplexer (MUX) to pass to the next specific PEi+1 for sub sequent testing. D. Overall Test Strategy By extending the testing processes of a specific PE i in Fig. 4, Fig. 6 illustrates the overall EDDR architecture design of a ME. First, the input data of Cur_pixel and Ref_pixel are sent simultaneously to PEs and TCGs in order to estimate the SAD values and generate the test RQ code R T and Q T. Second, the SAD value from the tested object PE i, which is selected by MUX1, is then sent to the RQCG circuit in order to generate R PE i and Q PEi codes. Meanwhile, the corresponding test codes R Ti and Q Ti from a specific TCG i are selected simultaneously by MUXs 2 and 3, respectively. Third, the RQ code from TCG i and RQCG circuits are compared in EDC to determine whether the tested object PE i have errors. The tested object PEi is error-free if and only if R PEi = R Ti and Q PEi = Q Ti. by MUX. Notably, control signal S4 is generated from EDC, indicating that the comparison result is error-free (S4 = 0) or errancy (S4 = 1). Finally, the error-free data or the datarecovery result from the tested object PE i is passed to a De- MUX, which is used to test the next specific PE i+1 ; otherwise, the final result is exported 227 226 226 225 227 226 226 226 226 226 226 225 224 225 225 224 Table 1. Cur_pixel 87 88 88 88 88 88 88 89 88 88 88 89 88 89 89 89 Table 2. Ref_pixel IV RESULTS AND DISCUSSION The architecture designed and tested using exhaustive test bench using modelsim simulator. The results comprise of 2 parts. The first part is the simulation and the synthesis results obtained. The second part is the experimental results indicate that the modules required for proposed EDDR design for ME testing can detect errors and recover data with 3 clock cycles for simulation and area of 17% LUTs,4% Flip-flops and 12% of GCLKs. The simulation result in Figure 8(a) and 8(b) indicate that the data may be corrected if it is corrupted due to the faults in the system. For Existing architecture, the 22 clock cycles were needed for simulation and the area is more compared to proposed architecture. Extensive verification of the circuit design is performed using the Verilog and then simulated by the Model Sim is satisfactory for the proposed EDDR architecture design for ME testing applications. Fig.6. EDDR Architecture design for a ME Additionally, DRC is used to recover data encoded by TCG i, i.e. the appropriate R Ti and Q Ti codes from TCGi are selected by MUXs 2 and 3, respectively, to recover data. Fourth, the error-free data or data recovery results are selected 5

[3] Y. W. Huang, B. Y. Hsieh, S. Y. Chien, S. Y. Ma, and L. G. Chen, Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. 507 522, Apr. 2006. Fig 8(a).Simulation result of proposed EDDR [4] C. Y. Chen, S. Y. Chien, Y. W. Huang, T. C. Chen, T. C. Wang, and L. G. Chen, Analysis and architecture design of variable block-size motion estimation for H.264/AVC, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 3, pp. 578 593, Mar. 2006. [5] T. H. Wu, Y. L. Tsai, and S. J. Chang, An efficient design-for-testability scheme for motion estimation in H.264/AVC, in Proc. Int. Symp. VLSI Design, Autom. Test, Apr. 2007, pp. 1 4. [6] M. Y. Dong, S. H. Yang, and S. K. Lu, Design-fortestability techniques for motion estimation computing arrays, in Proc. Int. Conf. Commun., Circuits Syst., May 2008, pp. 1188 1191. [7] Y. S. Huang, C. J. Yang, and C. L. Hsu, C-testable motion estimation design for video coding systems, J. Electron. Sci. Technol., vol. 7, no. 4, pp. 370 374, Dec. 2009. Fig 8(b).Simulation result of proposed EDDR V. CONCLUSION This work presents EDDR architecture for detecting the errors and recovering the data of PEs in a ME. Based on the Residue Quotient code, a RQCG-based Test Code Generation design is developed to generate the corresponding test codes to detect errors and recover data. The EDDR architecture is designed and tested using exhaustive test bench using Modelsim simulator and implemented by using Verilog on FPGA. Experimental results indicate that that the proposed EDDR architecture can effectively detect errors and recover data in PEs of a ME with with 3 clock cycles for simulation and area of 17% LUTs,4% Flip-flops and 12% of GCLKs. Throughput is satisfactory for the proposed EDDR architecture design for ME testing applications.. REFERENCES [1] Advanced Video Coding for Generic Audiovisual Services, ISO/IEC 14496-10:2005 (E), Mar. 2005, ITU-T Rec. H.264 (E). [2] Information Technology-Coding of Audio-Visual Objects Part 2: Visual, ISO/IEC 14 496-2, 1999. [8] D. Li, M. Hu, and O. A. Mohamed, Built-in self-test design of motion estimation computing array, in Proc. IEEE Northeast Workshop Circuits Syst., Jun. 2004, pp. 349 352. [9] Y. S. Huang, C. K. Chen, and C. L. Hsu, Efficient built-in self-test for video coding cores: A case study on motion estimation computing array, in Proc. IEEE Asia Pacific Conf. Circuit Syst., Dec. 2008, pp. 1751 1754. [10] W. Y Liu, J. Y. Huang, J. H. Hong, and S. K. Lu, Testable design and BIST techniques for systolic motion estimators in the transform domain, in Proc. IEEE Int. Conf. Circuits Syst., Apr. 2009, pp. 1 4. [11] J. M. Portal, H. Aziza, and D. Nee, EEPROM memory: Threshold voltage built in self diagnosis, in Proc. Int. Test Conf., Sep. 2003, pp. 23 28. [12] J. F. Lin, J. C. Yeh, R. F. Hung, and C. W. Wu, A builtin self-repair design for RAMs with 2-D redundancy, IEEE Trans. Vary Large Scale Integr. (VLSI) Syst., vol. 13, no. 6, pp. 742 745, Jun. 2005. [13] C. L. Hsu, C. H. Cheng, and Y. Liu, Built-in selfdetection/correction architecture for motion estimation computing arrays, IEEE Trans. Vary Large Scale Integr. (VLSI) Systs., vol. 18, no. 2, pp. 319 324, Feb. 2010. [14] C. H. Cheng, Y. Liu, and C. L. Hsu, Low-cost BISDC design for motion estimation computing array, in Proc. IEEE Circuits Syst. Int. Conf., 2009, pp. 1 4. 6

[15] S. Bayat-Sarmadi and M. A. Hasan, On concurrent detection of errors in polynomial basis multiplication, IEEE Trans. Vary Large Scale Integr. (VLSI) Systs., vol. 15, no. 4, pp. 413 426, Apr. 2007. 7