Error Detection and Data Recovery Architecture for Systolic Motion Estimators



Similar documents
Design and Implementation of Concurrent Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications

DESIGN OF AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE FOR MOTION ESTIMATION TESTING APPLICATIONS

High Speed Error Detection and Data Recovery Architecture for Video Applications

MOTION ESTIMATION TESTING USING AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE

Efficient Motion Estimation by Fast Three Step Search Algorithms

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

FPGA Design of Reconfigurable Binary Processor Using VLSI

High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

MAXIMIZING RESTORABLE THROUGHPUT IN MPLS NETWORKS

On the Data Reuse and Memory Bandwidth Analysis for Full-Search Block-Matching VLSI Architecture

A Fast Path Recovery Mechanism for MPLS Networks

AN IMPROVED DESIGN OF REVERSIBLE BINARY TO BINARY CODED DECIMAL CONVERTER FOR BINARY CODED DECIMAL MULTIPLICATION

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Detecting Multiple Selfish Attack Nodes Using Replica Allocation in Cognitive Radio Ad-Hoc Networks

Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation


A New Programmable RF System for System-on-Chip Applications

How To Improve Performance Of The H264 Video Codec On A Video Card With A Motion Estimation Algorithm

Image Compression through DCT and Huffman Coding Technique

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

302 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009

Low-resolution Image Processing based on FPGA

Friendly Medical Image Sharing Scheme

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

A Novel Low Power Fault Tolerant Full Adder for Deep Submicron Technology

Arbitrary Density Pattern (ADP) Based Reduction of Testing Time in Scan-BIST VLSI Circuits

Efficient Coding Unit and Prediction Unit Decision Algorithm for Multiview Video Coding

IJESRT. [Padama, 2(5): May, 2013] ISSN:

A comprehensive survey on various ETC techniques for secure Data transmission

An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths

ISSN: ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013

Architectures and Platforms

Design and analysis of flip flops for low power clocking system

International Journal of Advanced Research in Computer Science and Software Engineering

Super-resolution method based on edge feature for high resolution imaging

Performance Analysis of medical Image Using Fractal Image Compression

A Compact FPGA Implementation of Triple-DES Encryption System with IP Core Generation and On-Chip Verification

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

A System for Capturing High Resolution Images

Testing of Digital System-on- Chip (SoC)

CUSTOMER RELATIONSHIP MANAGEMENT SYSTEM

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration

Implementation and Design of AES S-Box on FPGA

SAD computation based on online arithmetic for motion. estimation

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL

Original Research Articles

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Load Balancing in Fault Tolerant Video Server

Duobinary Modulation For Optical Systems

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

Digital Systems Design! Lecture 1 - Introduction!!

White paper. H.264 video compression standard. New possibilities within video surveillance.

Implementation of Full -Parallelism AES Encryption and Decryption

AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APPLICATION TIME IN CORE BASED SOC

Index Terms Audio streams, inactive frames, steganography, Voice over Internet Protocol (VoIP), packet loss. I. Introduction

A Survey of Error Resilient Coding Schemes for Image/Video Transmission Based on Data Embedding

Programmable Logic Devices: A Test Approach for the Input/Output Blocks and Pad-to-Pin Interconnections

Design Verification & Testing Design for Testability and Scan

Parametric Comparison of H.264 with Existing Video Standards

An Isolated Multiport DC-DC Converter for Different Renewable Energy Sources

Design and FPGA Implementation of a Novel Square Root Evaluator based on Vedic Mathematics

Complexity-rate-distortion Evaluation of Video Encoding for Cloud Media Computing

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM

A Dynamic Approach to Extract Texts and Captions from Videos

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Attaining EDF Task Scheduling with O(1) Time Complexity

International Journal of Electronics and Computer Science Engineering 1482

Research on the UHF RFID Channel Coding Technology based on Simulink

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

Study on Differential Protection of Transmission Line Using Wireless Communication

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Design and Verification of Area-Optimized AES Based on FPGA Using Verilog HDL

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier

Low-resolution Character Recognition by Video-based Super-resolution

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

Accurate and robust image superresolution by neural processing of local image representations

Research Article. ISSN (Print) *Corresponding author Shi-hai Zhu

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers

Reversible Data Hiding for Security Applications

Adaptive Equalization of binary encoded signals Using LMS Algorithm

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

INTER CARRIER INTERFERENCE CANCELLATION IN HIGH SPEED OFDM SYSTEM Y. Naveena *1, K. Upendra Chowdary 2

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

WATERMARKING FOR IMAGE AUTHENTICATION

Transcription:

Error Detection and Data Recovery Architecture for Systolic Motion Estimators L. Arun Kumar #1, L. Sheela *2 # PG Scholar, * Assistant Professor, Embedded System Technologies, Regional Center of Anna University Tirunelveli, Tamilnadu, India. Abstract - Motion estimation plays a vital role in today s media applications. Hence testing of such a module is a significant concern. Even though several algorithms have been proposed in the past testing of motion estimators are seldom addressed. The proposed system describes an Error Detection and Data Recovery (EDDR) architecture that detects and recovers data in the motion estimator. The system uses the Mean Absolute Difference (MAD) method to compute the difference in the current and reference frames which overcomes some of the drawbacks in the existing techniques. The architecture comprises of an Error Detection Circuit (EDC) and a Data Recovery Circuit (DRC) to recover the original data. A Residue-Quotient code is used to compute the change in value between the error and expected values. Result show that the errors are effectively recovered with a delay of a single clock cycle. Keywords - Area overhead, data recovery, error detection, motion estimation, reliability, residue-andquotient (RQ) code. I. INTRODUCTION Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom. Closely related to motion estimation is optical flow, where the vectors correspond to the perceived movement of pixels. In motion estimation an exact 1:1 correspondence of pixel positions is not a requirement. Applying the motion vectors to an image to synthesize the transformation to the next image is called motion compensation. The combination of motion estimation and motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codec. Generally, Motion Estimator (ME) is made up of Processing Elements of an array size of 16 16. Although the advance of VLSI technology allows a large number of PEs of the ME to be integrated into a chip, this increases the logic-per-pin ratio, thus significantly decreasing the efficiency of logic testing on the chip. In other words, testing of such high complexity and high density circuits becomes very difficult and expensive. Thus, architecture design of ME requires the design-for-testability (DFT), whose goal is to increase the ease with which a device can be tested, in order to guarantee high fault coverage of the system. The Processing elements (PE) are essential building blocks and are connected regularly to construct a Motion estimator (ME). Generally, PEs are surrounded by sets of Adders and accumulators that determine how data flows through them. PEs can thus be considered the class of circuits called Iterative Logic Arrays (ILA), whose testing assignment can be easily achieved by using the fault model, cell fault model (CFM). CFM has received considerable interest due to accelerated growth in the use of high-level synthesis, as well as the parallel increase in complexity and density of integration circuits (ICs). This makes the tests independent of the adopted synthesis tool and vendor library. Arithmetic modules, like Adders (the primary element in a PE), due to their regularity, are designed in an extremely dense configuration. Moreover, a more comprehensive fault model, i.e. the stuck-at (SA) model, must be adopted to cover actual failures in the interconnect data bus between PEs. The SA fault is a well known structural fault model, which assumes that faults cause a line in the circuit to behave as if it were permanently at logic 0 (stuck-at 0 (SA0)) or logic 1 stuck-at 1 (SA1). The SA fault in a ME architecture can incur errors in computing MAD values. Thus, exploring the feasibility of an embedded testing approach to detect errors and recover data of a ME is of worthwhile interest. Additionally, the ISSN: 2249-2615 http://www.ijpttjournal.org Page 262

reliability issue of numerous PEs in a ME can be improved by enhancing the capabilities of concurrent error detection (CED). The CED approach can detect errors through conflicting and undesired results generated from operations on the same operands. CED can also test the circuit at full operating speed without interrupting a system. Thus, based on the CED concept, this work develops a novel EDDR architecture based on the RQ code to detect errors and recovery data in PEs of a ME and, in doing so, further guarantee the excellent reliability for video coding testing applications. The rest of this paper is organized as follows. Section II describes the RQ code and the corresponding design of the RQ code generator (RQCG). Section III introduces the proposed EDDR architecture, fault model definition, and test method. Next, Section IV evaluates the performance in area overhead, timing penalty, throughput and reliability analysis to demonstrate the feasibility of the proposed EDDR architecture for ME testing applications. Conclusions are finally drawn in Section V. II. RQ CODE Coding approaches such as parity code, Berger code, and residue code have been considered for design applications to detect circuit errors. Residue code is generally separable arithmetic codes by estimating a residue for data and appending it to data. Error detection logic for operations is typically derived by a separate residue code, making the detection logic is simple and easily implemented. However, only a bit error can be detected based on the residue code. Additionally, an error cannot be recovered effectively by using the residue codes. There-fore, this work presents a quotient code, which is derived from the residue code, to assist the residue code in detecting multiple errors and recovering errors. III. PROPOSED ARCHITECTURE The conceptual view of the proposed EDDR scheme comprises of two major circuit designs, error detection circuit (EDC) and data recovery circuit (DRC), to detect errors and recover the corresponding data in a specific Circuit Under Test (CUT) which is the systolic array. The test code generator (TCG) in Fig.1 utilizes the concepts of RQ code to generate the corresponding test codes for error detection and data recovery. The proposed architecture comprises of the following modules, Processing Element (PE) Test Code Generation (TCG) RQ Code Generation (RQCG) Error Detection Circuit (EDC) Data Recovery Circuit (DRC) Figure 1. Conceptual view of proposed Architecture The test codes from TCG and the primary output from CUT are delivered to EDC to determine whether the CUT has errors. DRC is in- charge of recovering data from TCG. Additionally, a selector is enabled to export error-free data or data-recovery results. Importantly, an array-based computing structure, such as ME, discrete cosine transform (DCT), iterative logic array (ILA), and finite impulse filter (FIR), is feasible for the proposed EDDR scheme to detect errors and recover the corresponding data. A PE generally consists of two Adders (i.e. an 8-b ADD and a 12-b ADD) and an accumulator (ACC). Next, the 8-b ADD (a pixel has 8-b data) is used to estimate the addition of the current pixel (Cur pixel) and reference pixel (Ref_pixel). Additionally, a 12-b ADD and an ACC are required to accumulate the results from the 8-b ADD in order to determine the sum of absolute difference (SAD) value for video encoding applications. Notably, some registers and latches may exist in ME to complete the data shift and storage, encoding applications. Notably, some registers and latches may exist in ME to complete the data shift and storage. The PEs are essential building blocks and are connected regularly to construct a ME. Generally, PEs are surrounded by sets of Adders and accumulators that determine how data flows through them. PEs can thus be considered the class of circuits called ILAs, whose testing assignment can be easily achieved by using the fault model, cell fault model (CFM). Arithmetic modules, like Adderss (the primary element in a PE), due to their regularity, are designed in an extremely dense configuration. A proposed ME consists of PEs with a size of 4 x 4. However, ISSN: 2249-2615 http://www.ijpttjournal.org Page 263

accelerating the computation speed depends on a large PE array, especially in high-resolution devices with a large search range such as HDTV. The Mean Absolute Difference value of the current and reference pixel value are computed using the following equation. MAD = Xij Yij = q. m + r q. m + r (1) where r xij, q xij and r yij, q yij denote the corresponding RQ code X ij, Y ij of the current and reference values respectively and m denotes modulo operation. Importantly, X ij and Y ij represent the luminance pixel value of Cur_pixel and Ref_pixel, respectively. Based on the residue code, the definitions shown the following equations can be applied to facilitate generation of the RQ code and the TCG. A. Test Code Generator Figure 2, illustrates the structure of the TCG module in the proposed EDDR architecture. Notably, TCG design is based on the ability of the RQCG circuit to generate the corresponding test codes in order to detect errors and recover data. The specific PE i estimates the absolute difference between the Cur_pixel of the search area and the Ref_pixel of the current macroblock. Thus the Residue and Quotient value for an array size of N X N is given by equations (3) and (5). R T = X Y m (2) R T = [ r m + r m + r m + + r m] (3) Figure 2.Test Code Generator In the event of a fault, the RQ code from RQCG2 of the TCG is still equal to absolute difference value. However, R PEi and Q PEi are changed because an error e has occurred. R PEi = [ r m + r m + r m + + r m] + [ r m] Q PEi = [q + q + + q + q ] + B. Numerical Example A numerical example of the 16 pixels for a 4x4 macroblock in a specific PE i of a ME is described as follows. Fig. 5 presents an example of pixel values of the Cur_pixel and Ref_pixel. Based on (1), the MAD value of the 4x4 macroblock is, MAD = X Y = [ X Y + X Y +... + X Y ] = = 132.75 {(128-1) +(128-1)+ +(128-5)} Q T = (4) Q T = [q + q + + q ] + (5) Figure 3. Example of pixel values Notably the error signal e is expressed as, e = q e.m + r e ISSN: 2249-2615 http://www.ijpttjournal.org Page 264

IV. RESULTS AND DISCUSSION The proposed architecture was executed on Windows XP operating system at an operating frequency of 2.80GHz using ModelSim for functional verification and synthesized using Xilinx ISE simulator. Table 1 gives the synthesis report provides the device utilization which is analyzed in terms of percentage. The timing summary of the code execution is also obtained. The proposed architecture was simulated on XC3S100E device of Spartan3E family provided the following synthesis results. ====================================== Selected Device : 3s100evq100-5 Number of Slices: 651 out of 960 67% Number of Slice Flip Flops: 305 out of 1920 15% Number of 4 input LUTs: 1160 out of 1920 60% Number of IOs: 158 Number of bonded IOBs: 158 out of 66 239% IOB Flip Flops: 77 Number of MULT18X18SIOs: 4 out of 4 100% Number of GCLKs: 1 out of 24 4% Figure 4. Screenshot of the operation of the proposed system Table 1. Synthesis report The proposed architecture achieved a minimum period of 3.866ns and a maximum combinational path delay of 15.582ns.The following screenshots describes the operation of the proposed EDDR architecture. Results show that the pixel values are efficiently recovered by means of the data recovery module in the presence of a stuck at fault in the processing element. Figure 4 shows the output of the proposed system. The input pixel values and the final output are observed from the waveforms of the figure. Figure shows the operation the Data Recovery module. The CUT here is the second processing element which is given by the value of S. A stuck at fault model is used and the value of the second element is stuck at 64. The RQ code from the TCG module and the DRC module efficiently recover the original pixel which was initially 127. The error between the expected and the actual value is also obtained. Figure 5.Pixel values set to the PE array and the TCG module Figure 6. Comparison of error output with the recovered output ISSN: 2249-2615 http://www.ijpttjournal.org Page 265

V. CONCLUSION Our work presents an EDDR architecture for detecting the errors and recovering the data of PEs in a ME. Based on the RQ code, a RQCG-based TCG design is developed to generate the corresponding codes to detect errors and recover data. Experimental results obtained from ModelSim indicate that that the proposed EDDR architecture can effectively detect errors and recover data in PEs of a ME with reasonable area overhead and only a slight time penalty. Synthesis results show that the timing constraints and device utilization are minimum compared to error detection techniques available. [13] C. Y. Chen, S. Y. Chien, Y. W. Huang, T. C. Chen, T. C. Wang, and L. G. Chen, Analysis and architecture design of variable block-size motion estimation for H.264/AVC, IEEE Trans. Circuits Syst. I, Reg.Papers, vol. 53, no. 3, pp. 578 593, Mar. 2006. [14] Y. W. Huang, B. Y. Hsieh, S. Y. Chien, S. Y. Ma, and L. G. Chen, Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. 507 522, Apr. 2006. [15] L. Breveglieri, P. Maistri, and I. Koren, A note on error detection in an RSA architecture by means of residue codes, in Proc. IEEE Int. Symp. On-Line Testing, Jul. 2006, 176 177. REFERENCES [1] Chang- Hsin Cheng, Yu Liu, and Chun-Lung Hsu, Design of an Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications IEEE transactions on very large scale integration (VLSI) systems, VOL. 20, NO. 4, April 2012. [2] Chun-Lung Hsu, Chang-Hsin Cheng, and Yu Liu, Built-in Self-Detection/Correction Architecture for Motion Estimation Computing Arrays, IEEE transactions on very large scale integration (vlsi) systems, vol. 18, no. 2, February 2010. [3] Y. S. Huang, C. J. Yang, and C. L. Hsu, C-testable motion estimation design for video coding systems, J. Electron. Sci. Technol., vol. 7, no. 4, pp. 370 374, Dec. 2009. [4] Che Wun Chiou, Chin-Cheng Chang,Chiou-Yng Lee, Ting- Wei Hou, and Jim-Min Lin, Concurrent Error Detection and Correction in Gaussian Normal Basis Multiplier over GF(2m), IEEE transactions on computers, vol. 58, no. 6, June 2009. [5] W. Y Liu, J. Y. Huang, J. H. Hong, and S. K. Lu, Testable design and BIST techniques for systolic motion estimators in the transform domain, in Proc. IEEE Int. Conf. Circuits Syst., Apr. 2009, pp. 1 4. [6] S. Dhahri, A. Zitouni, H. Chaouch, and R. Tourki, Adaptive Motion Estimator Based on Variable Block Size Scheme, in World Academy of Science, Engineering and Technology Jan. 2009 [7] Yu-Sheng Huang, Chen-Kai Chen and Chun-Lung Hsu, Efficient Built-In Self-Test for Video Coding Cores: A Case Study on Motion Estimation Computing Array - Dec 2008. [8] Y. S. Huang, C. K. Chen, and C. L. Hsu, Efficient built-in self-test for video coding cores: A case study on motion estimation computing array, in Proc. IEEE Asia Pacific Conf. Circuit Syst., Dec. 2008, pp. 1751 1754. [9] M. Y. Dong, S. H. Yang, and S. K. Lu, Design-fortestability techniques for motion estimation computing arrays, in Proc. Int. Conf. Commun., Circuits Syst., May 2008, pp. 1188 1191. [10] D. K. Park, H. M. Cho, S. B. Cho, and J. H. Lee, A fast motion estimation algorithm for SAD optimization in subpixel, in Proc. Int. Symp. Integr. Circuits, Sep. 2007, pp. 528 531. [11] S. Bayat-Sarmadi and M. A. Hasan, On concurrent detection of errors in polynomial basis multiplication, IEEE Trans. Vary Large Scale Integr. (VLSI) Systs., vol. 15, no. 4, pp. 413 426, Apr. 2007. [12] T. H. Wu, Y. L. Tsai, and S. J. Chang, An efficient designfor-testability scheme for motion estimation in H.264/AVC, in Proc. Int. Symp. VLSI Design, Autom. Test, Apr. 2007, pp. 1 4. ISSN: 2249-2615 http://www.ijpttjournal.org Page 266