Memory-Efficient Decoding of LDPC Codes

Similar documents
MOST error-correcting codes are designed for the equal

Semi-Parallel Reconfigurable Architectures for Real-Time LDPC Decoding

An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA

Adaptive Linear Programming Decoding

Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska

CODED SOQPSK-TG USING THE SOFT OUTPUT VITERBI ALGORITHM

Whitepaper November Iterative Detection Read Channel Technology in Hard Disk Drives

Non-Data Aided Carrier Offset Compensation for SDR Implementation

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

Teaching Convolutional Coding using MATLAB in Communication Systems Course. Abstract

A Network Flow Approach in Cloud Computing

Linear-Time Encodable Rate-Compatible Punctured LDPC Codes with Low Error Floors

Digital Video Broadcasting By Satellite

Coding and decoding with convolutional codes. The Viterbi Algor

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

Regularized Variable-Node LT Codes with Improved Erasure Floor Performance

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Coding Theorems for Turbo-Like Codes Abstract. 1. Introduction.

Joint Message-Passing Decoding of LDPC Codes and Partial-Response Channels

Algorithms for Interference Sensing in Optical CDMA Networks

Applying Active Queue Management to Link Layer Buffers for Real-time Traffic over Third Generation Wireless Networks

CNR Requirements for DVB-T2 Fixed Reception Based on Field Trial Results

Irregular Designs for Two-State Systematic with Serial Concatenated Parity Codes

Signal integrity in deep-sub-micron integrated circuits

MULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY

Application Note 83 Fundamentals of RS 232 Serial Communications

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

MIMO detector algorithms and their implementations for LTE/LTE-A

Capacity Limits of MIMO Channels

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Performance of Quasi-Constant Envelope Phase Modulation through Nonlinear Radio Channels

Adaptive Cut Generation Algorithm for Improved Linear Programming Decoding of Binary Linear Codes

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

HIGH SPEED DECODING OF NON-BINARY IRREGULAR LDPC CODES USING GPUS

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

A New Digital Communications Course Enhanced by PC-Based Design Projects*

Measurement, Modeling and Simulation of Power Line Channel for Indoor High-speed Data Communications

Load Balancing and Switch Scheduling

Complexity-bounded Power Control in Video Transmission over a CDMA Wireless Network


FPGAs for High-Performance DSP Applications

A Parallel Processor for Distributed Genetic Algorithm with Redundant Binary Number

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

FPGA area allocation for parallel C applications

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

A Numerical Study on the Wiretap Network with a Simple Network Topology

Low-Power Error Correction for Mobile Storage

954 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

Alpha CPU and Clock Design Evolution

Gaming as a Service. Prof. Victor C.M. Leung. The University of British Columbia, Canada

From Concept to Production in Secure Voice Communications

A WEB BASED TRAINING MODULE FOR TEACHING DIGITAL COMMUNICATIONS

Aalborg Universitet. Published in: I E E E Signal Processing Letters. DOI (link to publication from Publisher): /LSP.2014.

Research on the UHF RFID Channel Coding Technology based on Simulink

INTER CARRIER INTERFERENCE CANCELLATION IN HIGH SPEED OFDM SYSTEM Y. Naveena *1, K. Upendra Chowdary 2

Weakly Secure Network Coding

QUANTIZED PRINCIPAL COMPONENT ANALYSIS WITH APPLICATIONS TO LOW-BANDWIDTH IMAGE COMPRESSION AND COMMUNICATION. D. Wooden. M. Egerstedt. B.K.

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

Numerical Matrix Analysis

Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Characterizing Digital Cameras with the Photon Transfer Curve

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

Dynamic Resource Allocation in Softwaredefined Radio The Interrelation Between Platform Architecture and Application Mapping

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

Multimedia Data Transmission over Wired/Wireless Networks

THE general view is that the use of a continuous thin-film

Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition

TOWARDS A P2P VIDEOCONFERENCING SYSTEM BASED ON LOW-DELAY NETWORK CODING. Attilio Fiandrotti, Anooq Muzaffar Sheikh, Enrico Magli

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Spike-Based Sensing and Processing: What are spikes good for? John G. Harris Electrical and Computer Engineering Dept

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

Module 13 : Measurements on Fiber Optic Systems

A PC-BASED TIME INTERVAL COUNTER WITH 200 PS RESOLUTION

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Achievable Strategies for General Secure Network Coding

Coding Theorems for Turbo Code Ensembles

Performance Oriented Management System for Reconfigurable Network Appliances

Dynamic Load Balance Algorithm (DLBA) for IEEE Wireless LAN

Efficient Motion Estimation by Fast Three Step Search Algorithms

Image Compression through DCT and Huffman Coding Technique

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

IN current film media, the increase in areal density has

Chapter 1 Introduction

Uniboard based digital receiver

A comprehensive survey on various ETC techniques for secure Data transmission

Large-Scale IP Traceback in High-Speed Internet

Multi-Hypothesis based Distributed Video Coding using LDPC Codes

Example/ an analog signal f ( t) ) is sample by f s = 5000 Hz draw the sampling signal spectrum. Calculate min. sampling frequency.

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Transcription:

MemoryEfficient Decoding of LDPC Codes Jason KwokSan Lee University of California, Berkeley Email: jlee@eecsberkeleyedu Jeremy Thorpe Jet Propulsion Laboratory California Institute of Technology Email: thorpe@caltechedu Abstract We present a lowcomplexity quantization scheme 1 for the implementation of regular (3, 6) LDPC codes The quantization parameters are optimized to maximize the mutual information between the source and the quantized messages Using this nonuniform quantized belief propagation algorithm, we have simulated that an optimized 3bit quantizer operates with 02dB implementation loss relative to a floating point decoder, and an optimized 4bit quantizer operates less than 01dB quantization loss I INTRODUCTION LowDensityParityCheck (LDPC) codes[1] have recently received a lot of attention because of their excellent errorcorrecting capability LDPC codes have been shown to be able to perform close to the Shannon limit[2] In the past decade or so, much of the research on LDPC codes has focused on the analysis and improvement of codes under decoding algorithms with floating point precision However, to make LDPC codes practical in the real world, the design of an efficient quantization scheme used in hardware implementation is crucial Belief propagation algorithm is used to decode LDPC codes The standard belief propagation algorithm defines realvalued messages passing along edges in a code graph The standard way to simulate this algorithm is to store and update the messages in a very accurate representation such as floatingpoint numbers However, for very highspeed LDPC decoders, it is clear that the high complexity associated with computing and storing a very accurate representation is to be avoided if possible Therefore, we propose a lowcomplexity LDPC quantization scheme to make efficient hardware implementation possible In this paper, we present a general quantization scheme whose parameters we have optimized to target regular (3,6) codes In addition to being a testbed for comparing quantized algorithms, this class of codes remains an appealing choice in rate 1 2 applications that cannot tolerate the error floors typically induced in codes highly optimized to perform close to capacity II MOTIVATION The advantages of using a lowcomplexity quantization schemes are many They include: 1) In the belief propagation algorithm, messages passing along edges in a code graph may have to be stored 1 The work described was funded by the IND Technology Program and performed at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration Logic complexity #Interconnect (routing complexity) Memory size Fig 1 Memory Memory check Memory check Complexity quantization The memory needed scales with the quantization as O(n) 2) The number of interconnect wires to connect s and check s is the quantization The complexity of interconnect routing scales at least linearly with n 3) A smaller quantization makes it simpler for s and check s to update the messages The logic complexity of s and check s units are often more than linear with quantization In the worst case, an input output lookup table has logic complexity O(2 n ) Other schemes have complexity which scales as O(n 2 ) Recently, several research groups have developed LDPC decoders running on FPGA[3][4] New generations of FPGA chips, such as Xilinx VirtexII and Virtex4, provide a sufficient amount of onboard block memory for the memorydemanding applications of digital signal processing However, these devices also impose a practical constraint since the block memory is only divisible into 4bit wide, or highresolution such as 9bit, 18bit, or 36bit[5][6] Therefore, in order to utilize the onboard memory efficient, we should apply a quantization scheme compatible to the block memory division 9bit quantization provides very fine resolution, but can limit the size of code implementable in the device and can require significant amounts of power to be consumed By comparison, an efficient 4bit quantization can allow larger codes to be decoded, and is especially attractive if it can achieve small

quantization loss III QUANTIZED BELIEF PROPAGATION ALGORITHM In [8] a general nonuniform quantized belief propagation algorithm to decode regular LDPC codes is proposed That scheme was a generalization of a message passing rule described in [9] In it, the messages representing the likelihood ratios are essentially compressed by each computation before being transmitted to the adjacent computation s The operation of each type of computation (check and ) occurs in a domain in which updates can be performed through simple additions and subtractions For the s, this is essentially the loglikelihoodratio (LLR) domain or reliability domain For check s, the domain is called unreliability domain Note that values in the two computational domains are typically represented by many more bits than are required to transmit and store inter messages The functions Q v and Q c which quantize the messages in the reliability domain and unreliability domain respectively into compressed messages Complimentary to these are the functions φ v and φ c which restore the compressed messages into the computational domains of each Note that since s always send messages to check s and viceversa, a message which is compressed from the reliability domain will always be restored into the unreliability domain, and viceversa Initially, information from the channel is interpreted and quantized by a channel quantizer Q ch which takes realvalued loglikelihoodratios and produces a quantized representation The function φ ch takes a message produced by the channel quantizer and outputs a value to be used by the At each iteration the produces the messages v i j At iteration 0, the messages are given by v i j (0) v i j (0) = Q ch (channel i ), i {1n} (1) At the t th iteration, the parity check phase occurs first All r check units read the tocheck messages v i j from some edge memory connecting the i th to the j th check in the code graph, update the message by equation 2, then write the resulting checkto messages u j i back to the edge memory according to the code graph connections u j i (t) = Q c ( i φ c (v i j(t 1))), j {1r} (2) where i ranges over all edges connected connected to the j th check excluding i, Q c is the quantization rule for the checkto message u j i, and φ c is the reconstruction function for the tocheck message v i j The architecture diagram of a check unit is shown in Fig2 n bit input 1 n bit input 2 n bit input 3 MSB(input 6) channel i input 1 input 2 input 3 MSB(input 2) MSB(input 1) + Fig 2 h Fig 3 8bit Check Node Unit s architecture + Variable Node s architecture n bit output 1 n bit output 2 n bit output 6 output 1 output 2 output 3 update the message by equation 3, then write the tocheck messages v i j back to edge memory according to the code graph connections v i j (t) = Q v ( h (Q ch (channel i )) + j j φ v (u j i(t)), i {1n} where j ranges over all edges connected connected to the i th excluding j, Q v is the quantization rule for the tocheck message v i j, φ v is the reconstruction function for the checkto message u j i, and φ ch is the reconstruction function for the channel message Q ch (channel i ) The architecture diagram of a unit is shown in Fig3 At the final K th iteration, hard decisions X i are made in s following: ) (3) Next, the phase occurs n units read the checkto messages u j i from edge memory, X i = 0, j u j i(k) 0 1, j u j i(k) < 0 (4)

Gaussian Channel Noise ~ N(0,σ2) TABLE I OPTIMIZED 3BIT QUANTIZATION PARAMETERS: RECONSTRUCTION VALUES Encoder Decoder X Z Y Quantizer x φ ch (x) φ v(x) φ c(x) 4 21 20 1 3 15 12 2 2 9 6 6 1 3 2 26 0 3 2 26 1 9 6 6 2 15 12 2 3 21 20 1 Fig 4 Quantization of channel messages TABLE II OPTIMIZED 3BIT QUANTIZATION PARAMETERS: QUANTIZER INTERVALS IV OPTIMIZING THE QUANTIZATION PARAMETERS We targeted regular (3, 6) codes to optimize the quantization parameters In order to optimize the errorcorrecting performance in the quantization scheme, the intuition is to maximize the mutual information between the source and the quantized message As the binary source signal is corrupted by the gaussian noise channel, the signal before the quantization process is a Gaussiandistributed realvalued message Y = X + noise Therefore, the mutual information between X (binary source) and Y (Gaussian channel output) is: I(X; Y ) = 1 + Pr(y x)log 2 (1 + Pr(y x) )dy (5) Pr(y x) Next, the quantizer maps the realvalued message Y into the appropriated quantized message Z according to the quantization parameters, Z = Q ch (Y )(See Fig4) Therefore, the mutual information between the binary source X and the quantized message Z is: I(X; Z) = 1 + z Pr(z x)log 2 (1 + Pr(z x) Pr(z x) ) (6) In order to maximize the mutual information, the quantization function Q ch is found such that: arg maxi(x; Z) = arg max{1 + (7) Pr(z x) Pr(z x)log 2 (1 + Pr(z x) )} where Pr(z x) = z max Q 1 ch (z) min Q 1 ch (z) Pr(y x)dy (8) Once Q ch is determined, initial values for φ ch can be found by taking the midpoints of Q ch quantized to an appropriate Q ch (ch) Q v(v) Q c(c) ch v c 4 ch < 33 v < 18 5 c < 0 3 33 ch < 22 18 v < 12 9 c < 5 2 22 ch < 11 12 v < 6 26 c < 9 1 11 ch < 0 6 v < 0 c < 26 0 0 ch 11 0 v 6 c > 26 1 11 < ch 22 6 < v 12 9 < c 26 2 22 < ch 33 12 < v 18 5 < c 9 3 ch > 33 v > 18 0 c 5 scale Initial values for the other four parameters can be found similarly After initial values of the quantization parameters are determined, these values are optimized using both simulation and density evolution Currently, significant amount of handoptimization is used, and we have not had time to explicate our optimization procedures in detail Using this strategy, we found several sets of optimized nonuniform quantization parameters, and listed as follows: Table I: Optimized 3bit Quantization rules: Reconstruction functions φ ch (x), φ v (x),and φ c (x) Table II: Optimized 3bit Quantization rules: Quantizers interval values Q ch (ch), Q v (v),and Q c (c) Table III: Optimized 4bit Quantization rules: Reconstruction functions φ ch (x), φ v (x),and φ c (x) Table IV: Optimized 4bit Quantization rules: Quantizers interval values Q ch (ch), Q v (v),and Q c (c) V SIMULATION PERFORMANCE Using the optimized 3bit and 4bit quantization parameters targeted for regular (3,6) codes, we simulated our proposed nonuniform quantization scheme on a (4096, 2048) regular code Using the 3bit optimized quantizer, the LDPC decoder operates with 02dB implementation loss relative to a floating

10 1 floating point, 3 bit, and 4 bit quantization performance TABLE III OPTIMIZED 4BIT QUANTIZATION PARAMETERS: RECONSTRUCTION VALUES x φ ch (x) φ v(x) φ c(x) 8 114 114 1 7 87 87 4 6 64 64 12 5 48 48 27 4 36 36 50 3 25 25 88 2 15 15 153 1 5 5 312 0 5 5 312 1 15 15 153 2 25 25 88 3 36 36 50 4 48 48 27 5 64 64 12 6 87 87 4 7 114 114 1 TABLE IV OPTIMIZED 4BIT QUANTIZATION PARAMETERS: QUANTIZER INTERVALS Q ch (ch) or Q v(v) or Q c(c) ch v c 8 ch < 50 v <210 2 c <0 7 50 ch <37 210 v <115 7 c <2 6 37 ch <28 115 v <67 18 c <7 5 28 ch <21 67 v <36 36 c <18 4 21 ch <15 36 v <18 67 c <36 3 15 ch <10 18 v <7 115 c <67 2 10 ch <05 7 v <2 210 c <115 1 05 ch <0 2 v <0 c <210 0 0 ch 05 0 v 10 c >210 1 05< ch 10 10< v 20 115< c 210 2 10< ch 15 20< v 30 67< c 115 3 15< ch 21 30< v 42 36< c 67 4 21< ch 28 42< v 56 18< c 36 5 28< ch 37 56< v 74 7< c 18 6 37< ch 50 74< v 100 2< c 7 7 ch >50 v >100 0 c 2 BER floating point BP 4 bit non uniform quantization; less than 01 db loss than floating point BP Simulations on (4096, 2048) regular (3,6) code 3 bit non uniform quantization; 02 db loss than floating point BP 10 8 1 12 14 16 18 2 22 24 26 N 0 Fig 5 Simulation performance of various quantization scheme point belief propagation decoder Using the 4bit optimized quantizer, the LDPC decoder achieves quantization loss less than 01dB(See Fig 5) For the sake of quantization simplicity, we adopted the optimized 3bit quantizer into our FPGAbased structured LDPC decoder[3] VI COMPARING NONUNIFORM QUANTIZATION AND UNIFORM QUANTIZATION [7] examined various uniform quantization schemes including uniform quantized offset BPbased decoding algorithms in details While computationally more involved, our proposed nonuniform quantization schemes outperforms the uniform quantized counterpart when constrained by stored bit width For example, decoding a regular (8000, 4000) LDPC code, [7] s 5bit uniform quantized offset BPbased algorithms suffers a degradation of 01dB compared with the unquantized BP algorithms In comparison, simulating on a similar blocklength (8192, 4096) regular LDPC code, our proposed 4bit nonuniform quantization scheme operates less than 01dB implementation loss relative to a unquantized BP decoder(see Fig 6) Benefiting from a smaller quantization bit number while enjoying less implementation loss, nonuniform quantization may be preferable to be adopted in hardware implementation of LDPC decoder, especially on a FPGAplatform in which 4bit quantization optimizes the block memory utilization VII CONCLUSION We have presented a general nonuniform lowcomplexity quantization scheme for the implementation of LDPC decoders, and demonstrated the 3bit and 4bit optimized quantization rules for regular (3, 6) LDPC decoders Maximizing the mutual information between the binary source and received quantized message allows the optimization of quantized LDPC decoding As demonstrated by this work, an efficient lowcomplexity quantization can reduce the memory requirements and routing complexity in the hardware implementation of practical LDPC decoders

10 1 5 bit Uniform Quantization Vs 4 bit Non uniform Quantization 5 bit uniform quantizer (8000, 4000) regular code 10 1 4 bit non uniform quantizer (8192, 4096) regular code BER of 5 bit uniform quantizer 01 db gap floating point BP 5 bit uniform quantized Offset BP BER of 4 bit non uniform quantizer less than 01 db gap floating point BP 4 bit non uniform quantized BP 1 12 14 16 18 N 0 1 12 14 16 18 N 0 Fig 6 5bit uniform quantizer Vs 4bit nonuniform quantizer REFERENCES [1] R G Gallager, LowDensity ParityCheck Codes, MIT Press, Cambridge,MA, 1963 [2] S Chung, G D Forney, T J Richardson, and R Urbanke, On the design of lowdensity paritycheck codes within 00045 db of the Shannon limit, IEEE Comm Letters, vol5, pp5860, Feb 2001 [3] J Lee, B Lee, J Thorpe, K Andrews, S Dolinar, J Hamkins, A Scalable Architecture of a Strructured LDPC Decoder, Proc IEEE ISIT 2004, Chicago, Jun 27 Jul 2 2004, pp292 [4] T Zhong and K Parhi, A 54 Mbps (3, 6)Regular FPGA LDPC Decoder, Proc IEEE SIPS 2002, San Diego, CA, Oct 16 18, 2002, pp 127 32 [5] 18 Kbit Block SelectRAM Resources, Xilinx Virtex II Platform FPGAs Complete Data Sheet, pp 21, http:directxilinxcombvdocspublicationsds031pdf [6] Block RAM Summary, Xilinx Virtex4 User Guide, pp 109, http:directxilinxcombvdocsuserguidesug070pdf [7] J Chen, A Dholakia, E Eleftherioum M Fossorier, and XY Hu, ReducedComplexity Decoding of LDPC Codes [8] J Thorpe, Lowcomplexity approximations to belief propagation for LDPC codes, available at http:wwwsystemscaltechedujeremyresearchpapersresearchhtml [9] T J Richardson and R L Urbanke, The Capacity of LowDensity Parity Check Codes Under MessagePassing Decoding, IEEE Trans on Info Th, vol 47, no 2, pp 599 618, 2001