Error Detection and Data Recovery Architecture for Systolic Motion Estimators

Error Detection and Data Recovery Architecture for Systolic Motion Estimators L. Arun Kumar #1, L. Sheela *2 # PG Scholar, * Assistant Professor, Embedded System Technologies, Regional Center of Anna University Tirunelveli, Tamilnadu, India. Abstract - Motion estimation plays a vital role in today s media applications. Hence testing of such a module is a significant concern. Even though several algorithms have been proposed in the past testing of motion estimators are seldom addressed. The proposed system describes an Error Detection and Data Recovery (EDDR) architecture that detects and recovers data in the motion estimator. The system uses the Mean Absolute Difference (MAD) method to compute the difference in the current and reference frames which overcomes some of the drawbacks in the existing techniques. The architecture comprises of an Error Detection Circuit (EDC) and a Data Recovery Circuit (DRC) to recover the original data. A Residue-Quotient code is used to compute the change in value between the error and expected values. Result show that the errors are effectively recovered with a delay of a single clock cycle. Keywords - Area overhead, data recovery, error detection, motion estimation, reliability, residue-andquotient (RQ) code. I. INTRODUCTION Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom. Closely related to motion estimation is optical flow, where the vectors correspond to the perceived movement of pixels. In motion estimation an exact 1:1 correspondence of pixel positions is not a requirement. Applying the motion vectors to an image to synthesize the transformation to the next image is called motion compensation. The combination of motion estimation and motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codec. Generally, Motion Estimator (ME) is made up of Processing Elements of an array size of 16 16. Although the advance of VLSI technology allows a large number of PEs of the ME to be integrated into a chip, this increases the logic-per-pin ratio, thus significantly decreasing the efficiency of logic testing on the chip. In other words, testing of such high complexity and high density circuits becomes very difficult and expensive. Thus, architecture design of ME requires the design-for-testability (DFT), whose goal is to increase the ease with which a device can be tested, in order to guarantee high fault coverage of the system. The Processing elements (PE) are essential building blocks and are connected regularly to construct a Motion estimator (ME). Generally, PEs are surrounded by sets of Adders and accumulators that determine how data flows through them. PEs can thus be considered the class of circuits called Iterative Logic Arrays (ILA), whose testing assignment can be easily achieved by using the fault model, cell fault model (CFM). CFM has received considerable interest due to accelerated growth in the use of high-level synthesis, as well as the parallel increase in complexity and density of integration circuits (ICs). This makes the tests independent of the adopted synthesis tool and vendor library. Arithmetic modules, like Adders (the primary element in a PE), due to their regularity, are designed in an extremely dense configuration. Moreover, a more comprehensive fault model, i.e. the stuck-at (SA) model, must be adopted to cover actual failures in the interconnect data bus between PEs. The SA fault is a well known structural fault model, which assumes that faults cause a line in the circuit to behave as if it were permanently at logic 0 (stuck-at 0 (SA0)) or logic 1 stuck-at 1 (SA1). The SA fault in a ME architecture can incur errors in computing MAD values. Thus, exploring the feasibility of an embedded testing approach to detect errors and recover data of a ME is of worthwhile interest. Additionally, the ISSN: 2249-2615 http://www.ijpttjournal.org Page 262

reliability issue of numerous PEs in a ME can be improved by enhancing the capabilities of concurrent error detection (CED). The CED approach can detect errors through conflicting and undesired results generated from operations on the same operands. CED can also test the circuit at full operating speed without interrupting a system. Thus, based on the CED concept, this work develops a novel EDDR architecture based on the RQ code to detect errors and recovery data in PEs of a ME and, in doing so, further guarantee the excellent reliability for video coding testing applications. The rest of this paper is organized as follows. Section II describes the RQ code and the corresponding design of the RQ code generator (RQCG). Section III introduces the proposed EDDR architecture, fault model definition, and test method. Next, Section IV evaluates the performance in area overhead, timing penalty, throughput and reliability analysis to demonstrate the feasibility of the proposed EDDR architecture for ME testing applications. Conclusions are finally drawn in Section V. II. RQ CODE Coding approaches such as parity code, Berger code, and residue code have been considered for design applications to detect circuit errors. Residue code is generally separable arithmetic codes by estimating a residue for data and appending it to data. Error detection logic for operations is typically derived by a separate residue code, making the detection logic is simple and easily implemented. However, only a bit error can be detected based on the residue code. Additionally, an error cannot be recovered effectively by using the residue codes. There-fore, this work presents a quotient code, which is derived from the residue code, to assist the residue code in detecting multiple errors and recovering errors. III. PROPOSED ARCHITECTURE The conceptual view of the proposed EDDR scheme comprises of two major circuit designs, error detection circuit (EDC) and data recovery circuit (DRC), to detect errors and recover the corresponding data in a specific Circuit Under Test (CUT) which is the systolic array. The test code generator (TCG) in Fig.1 utilizes the concepts of RQ code to generate the corresponding test codes for error detection and data recovery. The proposed architecture comprises of the following modules, Processing Element (PE) Test Code Generation (TCG) RQ Code Generation (RQCG) Error Detection Circuit (EDC) Data Recovery Circuit (DRC) Figure 1. Conceptual view of proposed Architecture The test codes from TCG and the primary output from CUT are delivered to EDC to determine whether the CUT has errors. DRC is in- charge of recovering data from TCG. Additionally, a selector is enabled to export error-free data or data-recovery results. Importantly, an array-based computing structure, such as ME, discrete cosine transform (DCT), iterative logic array (ILA), and finite impulse filter (FIR), is feasible for the proposed EDDR scheme to detect errors and recover the corresponding data. A PE generally consists of two Adders (i.e. an 8-b ADD and a 12-b ADD) and an accumulator (ACC). Next, the 8-b ADD (a pixel has 8-b data) is used to estimate the addition of the current pixel (Cur pixel) and reference pixel (Ref_pixel). Additionally, a 12-b ADD and an ACC are required to accumulate the results from the 8-b ADD in order to determine the sum of absolute difference (SAD) value for video encoding applications. Notably, some registers and latches may exist in ME to complete the data shift and storage, encoding applications. Notably, some registers and latches may exist in ME to complete the data shift and storage. The PEs are essential building blocks and are connected regularly to construct a ME. Generally, PEs are surrounded by sets of Adders and accumulators that determine how data flows through them. PEs can thus be considered the class of circuits called ILAs, whose testing assignment can be easily achieved by using the fault model, cell fault model (CFM). Arithmetic modules, like Adderss (the primary element in a PE), due to their regularity, are designed in an extremely dense configuration. A proposed ME consists of PEs with a size of 4 x 4. However, ISSN: 2249-2615 http://www.ijpttjournal.org Page 263

accelerating the computation speed depends on a large PE array, especially in high-resolution devices with a large search range such as HDTV. The Mean Absolute Difference value of the current and reference pixel value are computed using the following equation. MAD = Xij Yij = q. m + r q. m + r (1) where r xij, q xij and r yij, q yij denote the corresponding RQ code X ij, Y ij of the current and reference values respectively and m denotes modulo operation. Importantly, X ij and Y ij represent the luminance pixel value of Cur_pixel and Ref_pixel, respectively. Based on the residue code, the definitions shown the following equations can be applied to facilitate generation of the RQ code and the TCG. A. Test Code Generator Figure 2, illustrates the structure of the TCG module in the proposed EDDR architecture. Notably, TCG design is based on the ability of the RQCG circuit to generate the corresponding test codes in order to detect errors and recover data. The specific PE i estimates the absolute difference between the Cur_pixel of the search area and the Ref_pixel of the current macroblock. Thus the Residue and Quotient value for an array size of N X N is given by equations (3) and (5). R T = X Y m (2) R T = [ r m + r m + r m + + r m] (3) Figure 2.Test Code Generator In the event of a fault, the RQ code from RQCG2 of the TCG is still equal to absolute difference value. However, R PEi and Q PEi are changed because an error e has occurred. R PEi = [ r m + r m + r m + + r m] + [ r m] Q PEi = [q + q + + q + q ] + B. Numerical Example A numerical example of the 16 pixels for a 4x4 macroblock in a specific PE i of a ME is described as follows. Fig. 5 presents an example of pixel values of the Cur_pixel and Ref_pixel. Based on (1), the MAD value of the 4x4 macroblock is, MAD = X Y = [ X Y + X Y +... + X Y ] = = 132.75 {(128-1) +(128-1)+ +(128-5)} Q T = (4) Q T = [q + q + + q ] + (5) Figure 3. Example of pixel values Notably the error signal e is expressed as, e = q e.m + r e ISSN: 2249-2615 http://www.ijpttjournal.org Page 264

IV. RESULTS AND DISCUSSION The proposed architecture was executed on Windows XP operating system at an operating frequency of 2.80GHz using ModelSim for functional verification and synthesized using Xilinx ISE simulator. Table 1 gives the synthesis report provides the device utilization which is analyzed in terms of percentage. The timing summary of the code execution is also obtained. The proposed architecture was simulated on XC3S100E device of Spartan3E family provided the following synthesis results. ====================================== Selected Device : 3s100evq100-5 Number of Slices: 651 out of 960 67% Number of Slice Flip Flops: 305 out of 1920 15% Number of 4 input LUTs: 1160 out of 1920 60% Number of IOs: 158 Number of bonded IOBs: 158 out of 66 239% IOB Flip Flops: 77 Number of MULT18X18SIOs: 4 out of 4 100% Number of GCLKs: 1 out of 24 4% Figure 4. Screenshot of the operation of the proposed system Table 1. Synthesis report The proposed architecture achieved a minimum period of 3.866ns and a maximum combinational path delay of 15.582ns.The following screenshots describes the operation of the proposed EDDR architecture. Results show that the pixel values are efficiently recovered by means of the data recovery module in the presence of a stuck at fault in the processing element. Figure 4 shows the output of the proposed system. The input pixel values and the final output are observed from the waveforms of the figure. Figure shows the operation the Data Recovery module. The CUT here is the second processing element which is given by the value of S. A stuck at fault model is used and the value of the second element is stuck at 64. The RQ code from the TCG module and the DRC module efficiently recover the original pixel which was initially 127. The error between the expected and the actual value is also obtained. Figure 5.Pixel values set to the PE array and the TCG module Figure 6. Comparison of error output with the recovered output ISSN: 2249-2615 http://www.ijpttjournal.org Page 265

V. CONCLUSION Our work presents an EDDR architecture for detecting the errors and recovering the data of PEs in a ME. Based on the RQ code, a RQCG-based TCG design is developed to generate the corresponding codes to detect errors and recover data. Experimental results obtained from ModelSim indicate that that the proposed EDDR architecture can effectively detect errors and recover data in PEs of a ME with reasonable area overhead and only a slight time penalty. Synthesis results show that the timing constraints and device utilization are minimum compared to error detection techniques available. [13] C. Y. Chen, S. Y. Chien, Y. W. Huang, T. C. Chen, T. C. Wang, and L. G. Chen, Analysis and architecture design of variable block-size motion estimation for H.264/AVC, IEEE Trans. Circuits Syst. I, Reg.Papers, vol. 53, no. 3, pp. 578 593, Mar. 2006. [14] Y. W. Huang, B. Y. Hsieh, S. Y. Chien, S. Y. Ma, and L. G. Chen, Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. 507 522, Apr. 2006. [15] L. Breveglieri, P. Maistri, and I. Koren, A note on error detection in an RSA architecture by means of residue codes, in Proc. IEEE Int. Symp. On-Line Testing, Jul. 2006, 176 177. REFERENCES [1] Chang- Hsin Cheng, Yu Liu, and Chun-Lung Hsu, Design of an Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications IEEE transactions on very large scale integration (VLSI) systems, VOL. 20, NO. 4, April 2012. [2] Chun-Lung Hsu, Chang-Hsin Cheng, and Yu Liu, Built-in Self-Detection/Correction Architecture for Motion Estimation Computing Arrays, IEEE transactions on very large scale integration (vlsi) systems, vol. 18, no. 2, February 2010. [3] Y. S. Huang, C. J. Yang, and C. L. Hsu, C-testable motion estimation design for video coding systems, J. Electron. Sci. Technol., vol. 7, no. 4, pp. 370 374, Dec. 2009. [4] Che Wun Chiou, Chin-Cheng Chang,Chiou-Yng Lee, Ting- Wei Hou, and Jim-Min Lin, Concurrent Error Detection and Correction in Gaussian Normal Basis Multiplier over GF(2m), IEEE transactions on computers, vol. 58, no. 6, June 2009. [5] W. Y Liu, J. Y. Huang, J. H. Hong, and S. K. Lu, Testable design and BIST techniques for systolic motion estimators in the transform domain, in Proc. IEEE Int. Conf. Circuits Syst., Apr. 2009, pp. 1 4. [6] S. Dhahri, A. Zitouni, H. Chaouch, and R. Tourki, Adaptive Motion Estimator Based on Variable Block Size Scheme, in World Academy of Science, Engineering and Technology Jan. 2009 [7] Yu-Sheng Huang, Chen-Kai Chen and Chun-Lung Hsu, Efficient Built-In Self-Test for Video Coding Cores: A Case Study on Motion Estimation Computing Array - Dec 2008. [8] Y. S. Huang, C. K. Chen, and C. L. Hsu, Efficient built-in self-test for video coding cores: A case study on motion estimation computing array, in Proc. IEEE Asia Pacific Conf. Circuit Syst., Dec. 2008, pp. 1751 1754. [9] M. Y. Dong, S. H. Yang, and S. K. Lu, Design-fortestability techniques for motion estimation computing arrays, in Proc. Int. Conf. Commun., Circuits Syst., May 2008, pp. 1188 1191. [10] D. K. Park, H. M. Cho, S. B. Cho, and J. H. Lee, A fast motion estimation algorithm for SAD optimization in subpixel, in Proc. Int. Symp. Integr. Circuits, Sep. 2007, pp. 528 531. [11] S. Bayat-Sarmadi and M. A. Hasan, On concurrent detection of errors in polynomial basis multiplication, IEEE Trans. Vary Large Scale Integr. (VLSI) Systs., vol. 15, no. 4, pp. 413 426, Apr. 2007. [12] T. H. Wu, Y. L. Tsai, and S. J. Chang, An efficient designfor-testability scheme for motion estimation in H.264/AVC, in Proc. Int. Symp. VLSI Design, Autom. Test, Apr. 2007, pp. 1 4. ISSN: 2249-2615 http://www.ijpttjournal.org Page 266