Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

Similar documents

Channel Assignment Strategies for Cellular Phone Systems

Computer Networks Framing

Sebastián Bravo López

A Holistic Method for Selecting Web Services in Design of Composite Applications

5.2 The Master Theorem

AUDITING COST OVERRUN CLAIMS *

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Weighting Methods in Survey Sampling

Performance Analysis of IEEE in Multi-hop Wireless Networks

Trade Information, Not Spectrum: A Novel TV White Space Information Market Model

A Context-Aware Preference Database System

Capacity at Unsignalized Two-Stage Priority Intersections

Chapter 1 Microeconomics of Consumer Theory

Static Fairness Criteria in Telecommunications

Deadline-based Escalation in Process-Aware Information Systems

Supply chain coordination; A Game Theory approach

Optimal Online Buffer Scheduling for Block Devices *

) ( )( ) ( ) ( )( ) ( ) ( ) (1)

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty

REDUCTION FACTOR OF FEEDING LINES THAT HAVE A CABLE AND AN OVERHEAD SECTION

1.3 Complex Numbers; Quadratic Equations in the Complex Number System*

User s Guide VISFIT: a computer tool for the measurement of intrinsic viscosities

Chapter 5 Single Phase Systems

Granular Problem Solving and Software Engineering

A novel active mass damper for vibration control of bridges

Intelligent Measurement Processes in 3D Optical Metrology: Producing More Accurate Point Clouds

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts

Optimal Sales Force Compensation

A Robust Optimization Approach to Dynamic Pricing and Inventory Control with no Backorders

Improved SOM-Based High-Dimensional Data Visualization Algorithm

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

Big Data Analysis and Reporting with Decision Tree Induction

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method

Srinivas Bollapragada GE Global Research Center. Abstract

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

A Keyword Filters Method for Spam via Maximum Independent Sets

protection p1ann1ng report

Deduplication with Block-Level Content-Aware Chunking for Solid State Drives (SSDs)

WORKFLOW CONTROL-FLOW PATTERNS A Revised View

arxiv:astro-ph/ v2 10 Jun 2003 Theory Group, MS 50A-5101 Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley, CA USA

Recovering Articulated Motion with a Hierarchical Factorization Method

Parametric model of IP-networks in the form of colored Petri net

RISK-BASED IN SITU BIOREMEDIATION DESIGN JENNINGS BRYAN SMALLEY. A.B., Washington University, 1992 THESIS. Urbana, Illinois

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

On the Characteristics of Spectrum-Agile Communication Networks

Discovering Trends in Large Datasets Using Neural Networks

i_~f e 1 then e 2 else e 3

Electrician'sMathand BasicElectricalFormulas

An Enhanced Critical Path Method for Multiple Resource Constraints

Pattern Recognition Techniques in Microarray Data Analysis

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms

MATE: MPLS Adaptive Traffic Engineering

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD)

Lemon Signaling in Cross-Listings Michal Barzuza*

Fixed-income Securities Lecture 2: Basic Terminology and Concepts. Present value (fixed interest rate) Present value (fixed interest rate): the arb

Price-based versus quantity-based approaches for stimulating the development of renewable electricity: new insights in an old debate

The Application of Mamdani Fuzzy Model for Auto Zoom Function of a Digital Camera

THE PERFORMANCE OF TRANSIT TIME FLOWMETERS IN HEATED GAS MIXTURES

AngelCast: Cloud-based Peer-Assisted Live Streaming Using Optimized Multi-Tree Construction

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

Findings and Recommendations

Open and Extensible Business Process Simulator

A Three-Hybrid Treatment Method of the Compressor's Characteristic Line in Performance Prediction of Power Systems

HEAT CONDUCTION. q A q T

10.1 The Lorentz force law

Software Ecosystems: From Software Product Management to Software Platform Management

Optimal Allocation of Filters against DDoS Attacks

Chapter 1: Introduction

Bypassing Space Explosion in Regular Expression Matching for Network Intrusion Detection and Prevention Systems

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

3 Game Theory: Basic Concepts

Outline. Planning. Search vs. Planning. Search vs. Planning Cont d. Search vs. planning. STRIPS operators Partial-order planning.

HEAT EXCHANGERS-2. Associate Professor. IIT Delhi P.Talukdar/ Mech-IITD

Polarization codes and the rate of polarization

Hierarchical Beta Processes and the Indian Buffet Process

cos t sin t sin t cos t

Recommending Questions Using the MDL-based Tree Cut Model

Professional Certificate Training in Business Writing

A Survey of Usability Evaluation in Virtual Environments: Classi cation and Comparison of Methods

To Coordinate Or Not To Coordinate? Wide-Area Traffic Management for Data Centers

BUILDING CODE SUMMARY GENERAL NOTES DESIGN BUILD ELECTRICAL DESIGN BUILD MECHANICAL & PLUMBING GENERAL NOTES GENERAL NOTES G101

Masters Thesis- Criticality Alarm System Design Guide with Accompanying Alarm System Development for the Radioisotope Production L

DSP-I DSP-I DSP-I DSP-I

SUDOKU: Secure and Usable Deployment of Keys on Wireless Sensors

Transcription:

1 Asymmetri Error Corretion and Flash-Memory Rewriting using Polar Codes Eyal En Gad, Yue Li, Joerg Kliewer, Mihael Langberg, Anxiao (Andrew) Jiang and Jehoshua Bruk Abstrat We propose effiient oding shemes for two ommuniation settings: 1. asymmetri hannels, and 2. hannels with an informed enoder. These settings are important in non-volatile memories, as well as optial and broadast ommuniation. The shemes are based on non-linear polar odes, and they build on and improve reent work on these settings. In asymmetri hannels, we takle the exponential storage requirement of previously known shemes, that resulted from the use of large Boolean funtions. We propose an improved sheme, that ahieves the apaity of asymmetri hannels with polynomial omputational omplexity and storage requirement. The proposed non-linear sheme is then generalized to the setting of hannel oding with an informed enoder, using a multioding tehnique. We onsider speifi instanes of the sheme for flash memories, that inorporate error-orretion apabilities together with rewriting. Sine the onsidered odes are non-linear, they eliminate the requirement of previously known shemes (alled polar write-one-memory odes) for shared randomness between the enoder and the deoder. Finally, we mention that the multioding sheme is also useful for broadast ommuniation in Marton s region, improving upon previous shemes for this setting. I. INTRODUCTION In this paper we make several ontributions to the design and analysis of error-orreting odes in two important ommuniation settings: 1. asymmetri hannel oding, and 2. hannel oding with an informed enoder. Asymmetri hannel oding is important for appliations suh as non-volatile memories, in whih the eletrial mehanisms are dominantly asymmetri [5]. Another important appliation is in optial ommuniation, where photons may fail to be deteted (1 0) but the reation of spurious photons (0 1) is impossible [15, Setion IX]. Channel oding with an informed enoder is also important for non-volatile memories, sine the memory state in these devies affets the fate of writing attempts. Channel oding with an informed enoder is also useful in broadast ommuniation, where it is used in Marton s oding sheme to ahieves high ommuniation rates (see [7, p. 210]). The fous of this paper is on polar oding tehniques, as they are both highly effiient in terms of ommuniation rate and omputational omplexity, and are relatively easy to analyze and understand. Polar odes were introdued by Arikan in [1], ahieving the symmetri apaity of binary-input memoryless hannels. The first task that we onsider in this paper is that of point-to-point ommuniation over asymmetri hannels. Several polar oding shemes for asymmetri hannels were proposed reently, inluding a pre-mapping using Gallager s sheme [11, p. 208] and a onatenation of two polar odes [30]. A more diret approah was proposed in [17], whih we onsider in this paper. A similar approah is also onsidered in [22]. The sheme in [17] ahieves the apaity of asymmetri hannels using non-linear polar odes, but it uses large Boolean funtions that require storage spae that is exponential in the blok length. We propose a modifiation for this sheme, that removes the requirement for the Boolean funtions, The material in this paper was presented in part at the IEEE Int. Symp. on Inform. Theory (ISIT), Honolulu, HI, USA, July 2014 [8]. This work was supported in part by Intelletual Ventures, NSF grants CIF-1218005, CCF-1439465, CCF-1440001 and CCF-1320785, NSF CAREER Award CCF-0747415 and the US-Israel Binational Siene Foundation (BSF) under Grant No. 2010075. Eyal En Gad, Yue Li and Jehoshua Bruk are with the California Institute of Tehnology, Pasadena, CA 91125, {eengad, yli, bruk}@alteh.edu. Joerg Kliewer is with New Jersey Institute of Tehnology, Newark, NJ 07102, jkliewer@njit.edu. Mihael Langberg is with SUNY at Buffalo, Buffalo, NY 14260 and the Open University of Israel, Raanana 43107, Israel, mikel@buffalo.edu. Anxiao (Andrew) Jiang is with Texas A&M University, College Station, TX 77840, ajiang@se.tamu.edu.

2 and thus redues the storage requirement of the enoding and deoding tasks to a linear funtion of the blok length. The seond ontribution of this paper is a generalization of the non-linear polar-oding sheme to the availability of hannel side information at the enoder. We all this sheme a polar multioding sheme, and we prove that it ahieves the apaity of hannels with informed enoders. The apaity of suh hannels was haraterized by Gelfand and Pinsker in [12]. This sheme is useful for non-volatile memories suh as flash memories and phase hange memories, and for broadast hannels. We fous mainly on the flash memory appliation. A prominent harateristi of flash memories is that the response of the memory ells to a writing attempt is affeted by the previous ontent of the memory. This ompliates the design of error orreting shemes, and thus motivates flash systems to erase the ontent of the ells before writing, and by that to eliminate its effet. However, the erase operation in flash memories is expensive, and therefore a simple oding sheme that does not require erasures ould improve the performane of solid-state drives signifiantly. We show two instanes of the proposed polar multioding sheme that aim to ahieve this goal. A. Relation to Previous Work The study of hannel oding with an informed enoder was initiated by Kusnetsov and Tsybakov [19], with the hannel apaity derived by Gelfand and Pinsker [12]. The informed enoding tehnique of Gelfand and Pinsker was used earlier by Marton to establish an inner bound for the apaity region of broadast hannels [21]. Low-omplexity apaity-ahieving odes were first proposed for ontinuous hannels, using lattie odes [32]. In disrete hannels, the first low-omplexity apaity-ahieving sheme was proposed using polar odes, for the symmetri speial ase of information embedding [18, Setion VIII.B]. A modifiation of this sheme for the appliation of flash memory rewriting was proposed in [4], onsidering a model alled write-one memory. An additional sheme for the appliation of flash memory, based on randomness extrators, was also proposed reently [10]. Our work is onerned with a setup that is similar to those onsidered in [4], [10]. An important ontribution of the urrent paper ompared to [4], [10] is that our sheme ahieves the apaity of a rewriting model that also inludes noise, while the shemes in [4], [10] address only the noiseless ase. Indeed, error orretion is a ruial apability in flash memory systems. Our low-omplexity ahievability of the noisy apaity is done using a multioding tehnique. Comparing with [10], the urrent paper allows an input ost onstraint, whih is important in rewriting models for maximizing the sum of the ode rates over multiple rewriting rounds. Comparing with [4], the urrent paper also improves by removing the requirement for shared randomness between the enoder and the deoder, whih limits the pratial oding performane. The removal of the shared randomness is done by the use of non-linear polar odes. An additional oding sheme was proposed during the writing of this paper, whih also does not require shared randomness [20]. However, the sheme in [20] onsiders only the noiseless ase, and it is in fat a speial ase of the sheme in the urrent paper. Polar oding for hannels with informed enoders was impliitly studied reently in the ontext of broadast hannels, as the Marton oding sheme for broadast ommuniation ontains an informed enoding instane as an ingredient. In fat, a multioding tehnique similar to the one presented in this paper was reently presented for broadast hannels, in [13]. While we were unaware of the result of [13] and developed the sheme independently, this paper also has three ontributions that were not shown in [13]. First, by using the modified sheme of non-linear polar odes, we redue the storage requirement from an exponential funtion in the blok length to a linear funtion. Seondly, we onnet the sheme to the appliation of data storage and flash memory rewriting, that was not onsidered in the previous work. And thirdly, the analysis in [13] holds only for hannels whose apaity-ahieving distribution forms a ertain degraded struture. In this paper we onsider a speifi noisy rewriting model, whose apaityahieving distribution forms the required degraded struture, and by that we show that the sheme ahieves the apaity of the onsidered flash-memory model.

3 Another reent paper on polar oding for broadast hannels was published reently by Mondelli et. al. [23]. That paper proposed a method, alled haining, that allows to bypass the degraded struture requirement. In this paper we onnet the haining method to the flash-memory rewriting appliation and to our new non-linear polar oding sheme, and apply it to our proposed multioding sheme. This allows for a linear storage requirement, together with the ahievability of the informed enoder apaity and Marton s inner bound, eliminating the degraded struture requirement. Finally, we show an important instane of the haining sheme for a speifi flash-memory model, and explain the appliability of this instane in flash-memory systems. The rest of the paper is organized as following. Setion II proposes a new non-linear polar oding sheme for asymmetri hannels, whih does not require an exponential storage of Boolean funtions. Setion III proposes a new polar multioding sheme for hannels with informed enoders, inluding two speial ases for the rewriting of flash memories. Finally, Setion IV summarizes the paper. II. ASYMMETRIC POINT-TO-POINT CHANNELS Notation: For positive integers m n, let [m : n] denote the set {m, m + 1,..., n}, and let [n] denote the set [1 : n]. Given a subset A of [n], let A denote the omplement of A with respet to [n], where n is lear from the ontext. Let x [n] denote a vetor of length n, and let x A denote a vetor of length A obtained from x [n] by deleting the elements with indies in A. Throughout this setion we onsider only hannels with binary input alphabets, sine the literature on polar odes with non-binary odeword symbols is relatively immature. However, the results of this setion an be extended to non-binary alphabets without muh diffiulty using the methods desribed in [24] [29]. The main idea of polar oding is to take advantage of the polarization effet of the Hadamard transform on the entropies of random vetors. Consider a binary-input memoryless hannel model with an input random variable (RV) X {0, 1}, an output RV Y Y and a pair of onditional probability mass funtions (pmfs) p Y X (y 0), p Y X (y 1) on Y. Let n be a power of 2 that denotes the number of hannel uses, also referred to as the blok length. The hannel apaity is the tightest upper bound on the ode rate in whih the probability of deoding error an be made as small as desirable for large enough blok length. The hannel apaity is given by the mutual information of X and Y. Theorem 1. (Channel Coding Theorem) [6, Chapter 7] The apaity of the disrete memoryless hannel p(y x) is given by C = max I(X; Y). p(x) The Hadamard transform is a multipliation ( of the random ) vetor X [n] over the field of ardinality 2 with the matrix G n = G log 2 n 1 0, where G = and denotes the Kroneker power. In other 1 1 words, G n an be desribed reursively for n 4 by the blok matrix ( ) Gn/2 0 G n =. G n/2 G n/2 The matrix G n transforms X [n] into a random vetor U [n] = X [n] G n, suh that the onditional entropy H(U i U [i 1], Y [n] ) is polarized. That means that for a fration of lose to H(X Y) of the indies i [n], the onditional entropy H(U i U [i 1], Y [n] ) is lose to 1, and for almost all the rest of the indies, H(U i U [i 1], Y [n] ) is lose to 0. This result was shown by Arikan in [1], [2]. Theorem 2. (Polarization Theorem) [2, Theorem 1] Let n, U [n], X [n], Y [n] be defined as above. For any δ (0, 1), let { } H X Y i [n] : H(U i U [i 1], Y [n] ) (1 δ, 1),

4 and L X Y { } i [n] : H(U i U [i 1], Y [n] ) (0, δ). Then lim H n X Y /n = H(X Y) and lim L n X Y /n = 1 H(X Y). Note that H(X Y) denotes a onditional entropy, while H X Y denotes a subset of [n]. It is also shown in [1] that the transformation G n is invertible with Gn 1 = G n, implying X [n] = U [n] G n. This polarization effet an be used quite simply for the design of a oding sheme that ahieves the apaity of symmetri hannels with a running time that is polynomial in the blok length. The apaity of symmetri hannels is ahieved by a uniform distribution on the input alphabet, i.e. p(x) = 1/2 [6, Theorem 7.2.1]. Sine the input alphabet in this paper is binary, the apaity-ahieving distribution gives H(X) = 1, and therefore we have lim (1/n) L n X Y = 1 H(X Y) = H(X) H(X Y) = I(X; Y) = C. (1) Furthermore, for eah index in L X Y, the onditional probability p(u i u [i 1], y [n] ) must be lose to either 0 or 1 (sine the onditional entropy is small by the definition of the set L X Y ). It follows that the RV U i an be estimated reliably given u [i 1] and y [n]. This fat motivates the apaity-ahieving oding sheme that follows. The enoder reates a vetor u [n] by assigning the subvetor U LX Y with the soure message, and the subvetor U L with uniformly distributed random bits that are shared with the deoder. The X Y randomness sharing is useful for the analysis, but is in fat unneessary for using the sheme (the proof of this fat is desribed in [1, Setion VI]). The set U L is alled the frozen set. Equation (1) implies that X Y this oding rate approahes the hannel apaity. The deoding is performed iteratively, from index 1 up to n. In eah iteration, the deoder estimates the bit u i using the shared information or using a maximum likelihood estimation, aording to the set membership of the iteration. The estimations of the bits u i for whih i is in L are always suessful, sine these bits were known to the deoder in advane. The rest X Y of the bits are estimated orretly with high probability, leading to a suessful deoding of the entire message with high probability. However, this reasoning does not translate diretly to asymmetri hannels. Remember that the apaityahieving input distribution of asymmetri hannels is in general not uniform (see, for example, [14]), i.e. p X (1) = 1/2. Sine the Hadamard transform is bijetive, it follows that the apaity-ahieving distribution of the polarized vetor U [n] is non uniform as well. The problem with this fat is that assigning uniform bits of message or shared randomness hanges the distribution of U [n], and onsequentially also hanges the onditional entropies H(U i U [i 1], Y [n] ). To manage this situation, our approah is to make sure that the hange in the distribution of U [n] is kept to be minor, and thus its effet on the probability of deoding error is also minor. To do this, we onsider the onditional entropies H(U i U [i 1] ), for i [n]. Sine the polarization happens regardless of the hannel model, we an onsider a hannel for whih the output Y is a deterministi variable, and onlude by Theorem 2 that the entropies H(U i U [i 1] ) also polarize. For this polarization, a fration of H(X) of the indies admit a high H(U i U [i 1] ). To ensure a minor hange in the distribution of U [n], we restrit the assignments of uniform bits of message and shared randomness to the indies with high H(U i U [i 1] ). The insight of the last paragraph motivates a modified oding sheme. The loations with high entropy H(U i U [i 1] ) are assigned with uniformly distributed bits, while the rest of the loations are assigned with the pmf p(u i u [i 1] ). Note that p(u [n], x [n] ) and H(U [n] ) refer to the apaity-ahieving distribution of the hannel, whih does not equal to the distribution that the enoding proess indues. Similar to the notation of Theorem 2, we denote the set of indies with high entropy H(U i U [i 1] ) by H X. To ahieve a reliable deoding, we plae the message bits in the indies of H X that an be deoded reliably, meaning that their entropies H(U i U [i 1], Y [n] ) are low. So we say that we plae the message bits in the intersetion H X L X Y. The loations whose indies are not in L X Y must be known by the deoder in

5 advane for a reliable deoding. Previous work suggested to share random Boolean funtions between the enoder and the deoder, drawn aording to the pmf p(u i u [i 1] ), and to assign the indies in (H X L X Y ) = HX L aording to these funtions [13], [17]. However, we note that the storage X Y required for those Boolean funtions is exponential in n, and therefore we propose an effiient alternative. To avoid the Boolean funtion, we divide the omplement of H X L X Y into three disjoint sets. First, the indies in the intersetion H X L are assigned with uniformly distributed random bits that are X Y shared between the enoder and the deoder. As in the symmetri ase, this randomness sharing will in fat not be neessary, and a deterministi frozen vetor ould be shared instead. The rest of the bits of U [n] (those in the set HX ), are assigned randomly to a value u with probability p U i U [i 1] (u u [i 1] ) (where p Ui U [i 1] is alulated aording to the pmf p U[n],X [n],y [n], the apaity-ahieving distribution of the hannel). The indies in HX L X Y ould be deoded reliably, but not those in HX L X Y. Fortunately, the set HX L an be shown to be small (as we will show later), and thus we ould transmit those X Y loations separately with a vanishing effet on the ode rate. The enoding of the vetor u [n] is illustrated in Figure 1. The see the intuition of why the ode rate approahes the hannel apaity, notie that the soure message is plaed in the indies in the intersetion H X L X Y. The asymptoti fration of this intersetion an be derived as following. H X L X Y /n = 1 HX L X Y /n = 1 H X /n L X Y /n + H X L X Y /n. (2) The Polarization Theorem (Theorem 2) implies that HX /n 1 H(X) and L /n H(X Y). X Y Sine the fration HX L X Y vanishes for large n, we get that the asymptoti rate is H X L X Y /n H(X) H(X Y) = I(X; Y), ahieving the hannel apaity. For a more preise definition of the sheme, we use the so alled Bhattaharyya parameter in the seletion of subsets of U [n], instead of the onditional entropy. The Bhattaharyya parameters are polarized in a similar manner as the entropies, and are more useful for bounding the probability of deoding error. For a disrete RV Y and a Bernoulli RV X, the Bhattaharyya parameter is defined by Z(X Y) 2 y p X,Y (0, y)p X,Y (1, y). (3) Note that most of the polar oding literature is using a slightly different definition of the Bhattaharyya parameter, that oinides with Equation (3) when the RV X is distributed uniformly. We use the following relations between the Bhattaharyya parameter and the onditional entropy. Proposition 3. ( [2, Proposition 2]) (Z(X Y)) 2 H(X Y), (4) H(X Y) log 2 (1 + Z(X Y)) Z(X Y). (5) L X Y L X Y p Ui U [i 1] (u u [i 1] ) H X H X Fig. 1. Enoding the vetor u [n].

6 We now define the set of high and low Bhattaharyya parameters, and work with them instead of the sets H X Y and L X Y. For δ (0, 1), define { H X Y i [n] : Z(U i U [i 1], Y [n] ) 1 2 n1/2 δ}, { L X Y i [n] : Z(U i U [i 1], Y [n] ) 2 n1/2 δ}. As before, we define the sets H X and L X for the parameter Z(U i U [i 1] ) by letting Y [n] be a deterministi vetor. Using Proposition 3, it is shown in [17, ombining Proposition 2 with Theorem 2] that Theorem 2 holds also if we replae the sets H X Y and L X Y with the sets H X Y and L X Y. That is, we have lim H n X Y /n = H(X Y) and lim L n X Y /n = 1 H(X Y). (6) We now define our oding sheme formally. Let m [ HX L X Y ] {0, 1} H X L X Y be the realization of a uniformly distributed soure message, and f [ HX L X Y ] {0, 1} H X L X Y be a deterministi frozen vetor known to both the enoder and the deoder. We disuss how to find a good frozen vetor in Appendix C-C. For a subset A [n] and an index i A, we use a funtion r(i, A) to denote the rank of i in an ordered list of the elements of A. The probabilities p Ui U [i 1] (u u [i 1] ) and p Ui U [i 1],Y n(u u [i 1], y [n] ) an be alulated effiiently by a reursive method desribed in [17, Setion III.B]. Constrution 4. Enoding Input: a message m [ HX L X Y ] {0, 1} H X L X Y. Output: a odeword x [n] {0, 1} n. 1) For i from 1 to n, suessively, set u i = u {0, 1} with probability p Ui U [i 1] (u u [i 1] ) if i H X m r(i,hx L X Y ) if i H X L X Y f r(i,hx L X Y ) if i H X L X Y. 2) Transmit the odeword x [n] = u [n] G n. 3) Transmit the vetor u H X L separately using a linear, non-apaity-ahieving polar ode with a X Y uniform input distribution (as in [1]). In pratie, other error-orreting odes ould be used for this vetor as well. Deoding Input: a noisy vetor y [n] {0, 1} n. Output: a message estimation ˆm [ HX L X Y ] {0, 1} H X L X Y. 1) Estimate the vetor u H X L by û H X Y. X L X Y 2) For i from 1 to n, set û i = arg max u {0,1} û r(i,h X L X Y ) p Ui U [i 1],Y [n] (u u [i 1], y [n] ) if i L X Y if i H X L X Y f r(i,hx L X Y ) if i H X L X Y. 3) Return the estimated message ˆm [ HX L X Y ] = û HX L X Y. We say that a sequene of oding shemes ahieves the hannel apaity if the probability of deoding error vanishes with the blok length for any rate below the apaity.

7 Theorem 5. Constrution 4 ahieves the hannel apaity (Theorem 1) with a omputational omplexity of O(n log n) and a probability of deoding error of 2 n1/2 δ for any δ > 0 and large enough n. In the next setion we show a generalized onstrution and prove its apaity-ahieving property. Theorem 5 thus will follow as a orollary of the more general Theorem 15. We note here two differenes between Constrution 4 and the onstrution in [23, Setion III.B]. First, in the enoding of Constrution 4, the bits in the set H X and set randomly, while in [23, Setion III.B], those bits are set aording to a maximum likelihood rule. And seond, the vetor u H X L is being sent through a side hannel in X Y Constrution 4, but not in [23, Setion III.B]. These hanges allow us to prove that the oding sheme ahieves the hannel apaity. III. CHANNELS WITH NON-CAUSAL ENCODER STATE INFORMATION In this setion we generalize Constrution 4 to the availability of hannel state information at the enoder. We onsider mainly the appliation of rewriting in flash memories, and present two speial ases of the hannel model for this appliation. In flash memory, information is stored in a set of n memory ells. We mainly fous on a flash memory type that is alled Single-Level Cell (SLC), in whih eah ell stores a single information bit, and its value is denoted by either 0 or 1. We first note that the assumption of a memoryless hannel is not exatly aurate in flash memories, due to a mehanism of ell-to-ell interferene. However, we keep using this assumption, as it is nonetheless useful for the design of oding shemes with valuable pratial performane. The main limitation of flash memories that we do takle in this work is the high ost of hanging a ell level from 1 to 0 (in SLC memories). To perform suh a hange, an expensive operation, alled blok erasure, is required. To avoid this blok erasure operation, information is rewritten over existing memory in the sense that no ell is hanged from value 1 to 0. We thus onsider the use of the information about the previous state of the ells in the enoding proess. We model the memory ells as a hannel with a disrete state, and we also assume that the state is memoryless, meaning that the states of different ells are distributed independently. We assume that the state of the entire n ells is available to the writer prior to the beginning of the writing proess. In ommuniation terminology this kind of state availability is refereed to as non ausal. We note that this setting is also useful in the so alled Marton-oding method for ommuniation over broadast hannels. Therefore, the multioding shemes that will follow serve as a ontribution also in this important setting. One speial ase of the model whih we onsider is the noiseless write-one memory model. This model also serve as an ingredient for a type of odes alled rank-modulation rewriting odes [9]. Therefore, the shemes proposed in this setion an also be useful for the design of rank-modulation rewriting odes. We represent the hannel state as a Bernoulli random variable S with parameter β, whih equals the probability p(s = 1). A ell of state 1 an only be written with the value 1. Note that, intuitively, when β is high, the apaity of the memory is small, sine only a few ells are available for modifiation in the writing proess, and thus only a small amount of information ould be stored. This also means that the hoie of odebook has a ruial effet on the apaity of the memory in future writes. A odebook that ontains many odewords of high Hamming weight (number of 1 s in the odeword) would make the parameter β of future writes high, and thus the apaity of the future writes would be low. However, foring the expeted Hamming weight of the odebook to be low would redue the apaity of the urrent write. To settle this trade-off, previous work suggested to optimize the sum of the ode rates over multiple writes. It was shown that in many ases, onstraints on the odebook Hamming weight (heneforth just weight) stritly inreases the sum rate (see, for example, [16]). Therefore, we onsider an input ost onstraint in the model. The most general model that we onsider is a disrete memoryless hannel (DMC) with a disrete memoryless (DM) state and an input ost onstraint, where the state information is available non ausally at the enoder. The hannel input, state and output are denoted by x, s and y, respetively, and their respetive finite alphabets are denoted by X, S and Y. The random variables are denoted by X, S and

8 Y, and the random vetors by X [n], S [n] and Y [n], where n is the blok length. The state is distributed aording to the pmf p(s), and the onditional pmfs of the hannel are denoted by p(y x, s). The input ost funtion is denoted by b(x), and the input ost onstraint is n E[b(X i )] nb, i=1 where B is a real number representing the normalized onstraint. The hannel apaity with an informed enoder and an input ost onstraint is given by an extension of the Gelfand-Pinsker Theorem 1. Theorem 6. (Gelfand-Pinsker Theorem with Cost Constraint) [7, Equation (7.7) on p. 186] The apaity of a DMC with a DM state p(y x, s)p(s) and an input ost onstraint B when the state information is available non ausally only at the enoder is C = max (I(V; Y) I(V; S)), (7) p(v s),x(v,s):e(b(x)) B where V is an auxiliary random variable with a finite alphabet V, V min { X S, Y + S 1}. The main oding sheme that we present in this setion ahieves the apaity in Theorem 6. The proof of Theorem 6 onsiders a virtual hannel model, in whih the RV V is the hannel input and Y is the hannel output. Similar to the previous setion, we limit the treatment to the ase in whih the RV V is binary. In flash memory, this ase would orrespond to a single-level ell (SLC) type of memory. As mentioned in Setion II, an extension of the sheme to a non-binary ase is not diffiult. The non-binary ase is useful for flash memories in whih eah ell stores 2 or more bits of information. Suh memories are alled Multi-Level Cell (MLC). We also mention that the limitation to binary random variables does not apply on the hannel output Y. Therefore, the ell voltage in flash memory ould be read more aurately at the deoder to inrease the oding performane, similarly to the soft deoding method that is used in flash memories with LDPC odes. Another pratial remark is that the binary-input model an be used in MLC memories by oding separately on the MSB and the LSB of the ells, as in fat is the oding method in urrent MLC flash systems. The sheme that ahieves the apaity of Theorem 6 is alled Constrution 16, and it will be desribed in Subsetion III-C. The apaity ahieving result is summarized in the following theorem, whih will be proven in Subsetion III-C. Theorem 7. Constrution 16 ahieves the apaity of the Gelfand-Pinsker Theorem with Cost Constraint (Theorem 6) with a omputational omplexity of O(n log n) and a probability of deoding error of 2 n1/2 δ for any δ > 0 and large enough n. Note that the setting of Theorem 7 is a generalization of the asymmetri hannel-oding setting of Theorem 5, and therefore Constrution 16 and Theorem 7 are in fat a generalization of Constrution 4 and Theorem 5. Before we desribe the ode onstrution, we first show in Subsetion III-A two speial ases of the Gelfand-Pinsker model that are useful for the rewriting of flash memories. Afterwards, in subsetions III-B and III-C, we will show two versions of the onstrution that orrespond to generalizations of the two speial ases. A. Speial Cases We start with a speial ase that is quite a natural model for flash memory rewriting. Example 8. Let the sets X, S and Y be all equal to {0, 1}, and let the state pmf be p S (1) = β. This model orresponds to a single level ell flash memory. We desribe the ell behaviour after a bit x is attempted to 1 The ost onstraint is defined slightly differently in this referene, but the apaity is not affeted by this hange.

9 1 1 α 1 1 1 1 α 1 1 α 1 α 1 X α 0 Y X 1 α 1 Y 0 1 α 0 0 0 α 1 0 S = 0 S = 1 Fig. 2. Example 8: A binary noisy WOM model. be written. When s = 0, the ell behaves as a binary asymmetri hannel with input x, sine the all state does not interfere with the writing attempt. When s = 1, the ell behaves as if a value of 1 was attempted to be written, regardless of the atual value x attempted. However, an error might still our, during the writing proess or anytime afterwards (for example, due to harge leakage). Thus, we an say that when s = 1, the ell behaves as a binary asymmetri hannel with input 1. Formally, the hannel pmfs are given by p Y XS (1 x, s) = α 0 if (x, s) = (0, 0) 1 α 1 if (x, s) = (0, 1) 1 α 1 if (x, s) = (1, 0) 1 α 1 if (x, s) = (1, 1) The error model is also presented in Figure 2. The ost onstraint is given by b(x i ) = x i, sine it is desirable to limit the amount of ells written to a value of 1. Our oding-sheme onstrution for the setting of Theorem 6 is based on a more limited onstrution, whih serves as a building blok. We will start by desribing the limited onstrution, and then show how to extend it for the model of Theorem 6. We will prove that the limited onstrution ahieves the apaity of hannels whose apaity-ahieving distribution forms a ertain stohastially degraded struture. We first reall the definition of stohastially degraded hannels. Definition 9. [7, p. 112] A disrete memoryless hannel (DMC) W 1 : {0, 1} Y 1 is stohastially degraded (or simply degraded) with respet to a DMC W 2 : {0, 1} Y 2, denoted as W 1 W 2, if there exists a DMC W : Y 2 Y 1 suh that W satisfies the equation W 1 (y 1 x) = y2 Y 2 W 2 (y 2 x)w(y 1 y 2 ). Next, we bring the required property of hannels whose apaity is ahieved by the limited onstrution to be proposed. Property 10. There exist funtions p(v s) and x(v, s) that maximize the Gelfand-Pinsker apaity in Theorem 6 whih satisfy the ondition p(y v) p(s v). It is an open problem whether the model of Example 8 satisfies the degradation ondition of Property 10. However, we an modify the model suh that it will satisfy Property 10. Speifially, we study the following model: Example 11. Let the sets X, S and Y be all equal to {0, 1}. The hannel and state pmfs are given by p S (1) = β and α if (x, s) = (0, 0) p Y XS (1 x, s) = 1 α if (x, s) = (1, 0) (9) 1 if s = 1. In words, if s = 1 the hannel output is always 1, and if s = 0, the hannel behave as a binary symmetri hannel. The ost funtion is given by b(x i ) = x i. The error model is also presented in Figure 3. This model an represent a writing noise, as a ell of state s = 1 is not written on and it never suffers errors. (8)

10 1 α 1 1 1 1 α X X Y α Y 0 0 0 1 α S = 0 S = 1 Fig. 3. Example 11: A binary WOM with writing noise. We laim that the model of Example 11 satisfies the degradation ondition of Property 10. To show this, we need first to find the funtions p(v s) and x(v, s) that maximize the Gelfand-Pinsker apaity in Theorem 6. Those funtions are established in the following theorem. Theorem 12. The apaity of the hannel in Example 11 is C = (1 β)[h(ϵ α) h(α)], where ϵ = B/(1 β) and ϵ α ϵ(1 α) + (1 ϵ)α. The seletions V = {0, 1}, x(v, s) = v s (where is the logial AND operation, and is the logial negation), and ahieve this apaity. p V S (1 0) = ϵ, p V S (1 1) = ϵ(1 α) ϵ α Theorem 12 is similar to [16, Theorem 4], and its proof is desribed in Appendix A. Intuitively, the upper bound is obtained by assuming that the state information is available also at the deoder, and the lower bound is obtained by setting the funtions x(v, s) and p(v s) aording to the statement of the theorem. The proof that the model in Example 11 satisfies the degradation ondition of Property 10 is ompleted by the following lemma. Lemma 13. The apaity ahieving funtions of Theorem 12 for the model of Example 11 satisfy the degradation ondition of Property 10. That is, the hannel p(s v) is degraded with respet to the hannel p(y v). Lemma 13 is proven in Appendix B, and onsequently, the apaity of the model in Example 11 an be ahieved by our limited onstrution. In the next subsetion we desribe the onstrution for hannel models whih satisfy Property 10, inluding the model in Example 11. (10) B. Multioding Constrution for Degraded Channels Notie first that the apaity-ahieving distribution of the asymmetri hannel in Setion II atually satisfies Property 10. In the asymmetri hannel-oding ase, the state an be thought of as a degenerate random variable (a RV whih only takes a single value), and therefore we an hoose W in Definition 9 to be degenerate as well, and by that satisfy Property 10. We will see that the onstrution that we present in this subsetion is a generalization of Constrution 4. The onstrution has a similar struture as the ahievability proof of the Gelfand-Pinsker Theorem (Theorem 6). The enoder first finds a vetor v [n] in a similar manner to Constrution 4, where the RV X Y in Constrution 4 is replaed with V Y, and the RV X is replaed with V S. The vetor U [n] is now the polarization of the vetor V [n], meaning that U [n] = V [n] G n. The RV V is taken aording to the pmfs p(v s) that maximize the rate expression in Equation (7). The seletion of the vetor u [n] is illustrated in Figure 4. After the vetor u [n] is hosen, eah bit i [n] in the odeword x [n] is alulated by the funtion

11 x i (v i, s i ) that maximizes Equation (7). To use the model of Example 11, one should use the funtions p(v s) and x(v, s) aording to Theorem 12. The key to showing that the sheme ahieves the hannel apaity is that the fration H V S L /n an be shown to vanish for large n if the hannel satisfies V Y Property 10. Then, by the same intuition as in Equation (2) and using Equation (6), the replaements imply that the asymptoti rate of the odes is H V S L V Y /n = 1 H V S /n L V Y /n + H V S L V Y /n 1 (1 H(V S)) H(V Y) + 0 = I(V; Y) I(V; S), ahieving the Gelfand-Pinsker apaity of Theorem 6. We now desribe the oding sheme formally. Constrution 14. Enoding Input: a message m [ HV S L V Y ] {0, 1} H V S L V Y and a state s [n] {0, 1} n. Output: a odeword x [n] {0, 1} n. 1) For eah i from 1 to n, assign u i = u {0, 1} with probability p Ui U [i 1],S [n] (u u [i 1], s [n] ) if i H V S m r(i,hv S L V Y ) if i H V S L V Y (11) f r(i,hv S L V Y ) if i H V S L V Y. 2) Calulate v [n] = u [n] G n and for eah i [n], store the value x i (v i, s i ). 3) Store the vetor u H V S L V Y separately using a point-to-point linear non-apaity-ahieving polar ode with a uniform input distribution. The enoder here does not use the state information in the enoding proess, but rather treat it as an unknown part of the hannel noise. Deoding Input: a noisy vetor y [n] {0, 1} n. Output: a message estimation ˆm [ HV S L V Y ] {0, 1} H V S L V Y. 1) Estimate the vetor u H V S L V Y by û H V S L V Y. 2) Estimate u [n] by û [n] (y [n], f [ HV S L V Y ] ) as follows: For eah i from 1 to n, assign û i = arg max u {0,1} p Ui U [i 1],Y [n] (u u [i 1], y [n] ) if i L V Y û r(i,h V S L V Y ) if i H V S L V Y (12) f r(i,hv S L V Y ) if i H V S L V Y. L V Y p Ui U [i 1],S [n] (u u [i 1],s [n] ) L V Y H V S H V S Fig. 4. Enoding the vetor u [n] in Constrution 14.

12 3) Return the estimated message ˆm [ HV S L V Y ] = û HV S L V Y. The asymptoti performane of Constrution 14 is stated in the following theorem. Theorem 15. If Property 10 holds, then Constrution 14 ahieves the apaity of Theorem 6 with a omputational omplexity of O(n log n) and a probability of deoding error of 2 n1/2 δ for any δ > 0 and large enough n. The proof of Theorem 15 is shown in Appendix C. The next subsetion desribes a method to remove the degradation requirement of Property 10. This allows to ahieve also the apaity of the more realisti model of Example 8. C. Multioding Constrution without Degradation A tehnique alled haining was proposed in [23] that allows to ahieve the apaity of models that do not exhibit the degradation ondition of Property 10. The haining idea was presented in the ontext of broadast ommuniation and point-to-point universal oding. We onnet it here to the appliation of flash memory rewriting through Example 8. We note also that the haining tehnique that follows omes with a prie of a slower onvergene to the hannel apaity, and thus a lower non-asymptoti ode rate. The requirement of Constrution 14 for degraded hannels omes from the fat that the set H V S L V Y needs to be ommuniated to the deoder in a side hannel. If the fration (1/n) H V S L V Y vanishes with n, Constrution 14 ahieves the hannel apaity. In this subsetion we deal with the ase that the fration (1/n) H V S L does not vanish. In this ase we have V Y H V S L V Y /n =1 H V S L V Y /n =1 H V S /n L V Y /n + H V S L V Y /n I(V; Y) I(V; S) + H V S L V Y /n. The idea is then to store the subvetor u H V S L V Y in a subset of the indies H V S L V Y of an additional ode blok of n ells. The additional blok is using the same oding tehnique as the original blok. Therefore, it an use about I(V; Y) I(V; S) of the ells to store additional message bits, and by that to approah the hannel apaity. We denote the by R the subset of H V S L V Y in whih we store the the subvetor u H V S L V Y of the previous blok. Note that the additional blok also faes the same diffiulty as the original blok with the set H V S L. To solve this, we use the same solution, V Y reursively, sending a total of k bloks, eah of length n. Eah blok an store a soure message of fration that approahes the hannel apaity. The problemati bits of blok k (the last blok) will then be stored using yet another blok, but this blok will be oded without taking the state information into aount, and thus will not fae the same diffiulty. The last blok is thus ausing a rate loss, but this loss is of a fration 1/k, whih vanishes for large k. The deoding is performed bakwards, starting from H H V S V S H H H V S H V S V S V S L V Y L V Y L V Y L V Y L V Y L V Y L X Y R R L X Y Fig. 5. The haining onstrution

13 the last blok and ending with the first blok. The haining onstrution is illustrated in Figure 5. In the following formal desription of the onstrution we denote the index i of the j-th blok of the message by m i,j, and similarly for other vetors. The vetors themselves are are also denoted in two dimensions, for example x [n],[k]. Constrution 16. Let R be an arbitrary subset of H V S L V Y of size H V S L V Y. Enoding Input: a message m [( HV S L V Y H V S L V Y ],[k]) {0, 1} k( H V S L V Y H V S L V Y ) and a state s [n],[k] {0, 1} kn. Output: a odeword x [n],[k] {0, 1} kn. 1) Let u [n],0 {0, 1} n be an arbitrary vetor. For eah j from 1 to k, and for eah i from 1 to n, assign u {0, 1} with probability p Ui U [i 1],S [n] (u u [i 1], s [n] ) if i H V S m r(i,hv S L u i,j = V Y ),j if i (H V S L V Y ) \ R u r(i,h V S L V Y ),j 1 if i R f r(i,hv S L V Y ),j if i H V S L V Y. (13) 2) For eah j from 1 to k alulate v [n],j = u [n],j G n, and for eah i [n], store the value x i,j (v i,j, s i,j ). 3) Store the vetor u H V L V Y,k separately using a point-to-point linear non-apaity-ahieving polar ode with a uniform input distribution. The enoder here does not use the state information in the enoding proess, but rather treat it as an unknown part of the hannel noise. Deoding Input: a noisy vetor y [n],[k] {0, 1} k n. Output: a message estimation ˆm [ HV S L V Y H V S L V Y ],[k] {0, 1} k H V S L V Y H V S L V Y. 1) Estimate the vetor u H V S L V Y,k by û H V S L V Y,k, and let û R,k+1 = û H V S L V Y,k. 2) Estimate u [n],[k] by û [n],[k] (y [n],[k], f [ L V Y H V S ],[k]) as follows: For eah j down from k to 1, and for eah i from 1 to n, assign arg max p Ui U [i 1],Y [n] (u u [i 1],j, y [n],j ) if i L V Y û j i = u {0,1} û r(i,r),j+1 if i H V S L (14) V Y f r(i,hv S L V Y ),j if i H V S L V Y. 3) Return the estimated message ˆm [ HV S L V Y H V S L V Y ],[k] = û (HV S L V Y )\R,[k]. Construtions 14 and 16 an also be used for ommuniation over broadast hannels in Marton s region, as desribed in [13], [23]. Construtions 14 and 16 improve on these previous results sine they provably ahieve the apaity with linear storage requirement. Constrution 16 ahieves the apaity of Theorem 6 with low omplexity, without the degradation requirement of Property 10. This result was stated in Theorem 7. The proof of Theorem 7 follows from Theorem 15 and the fat that the rate loss vanishes with large k. Constrution 16 is useful for the realisti model of flash memory-rewriting of Example 8, using the appropriate apaity-ahieving funtions p(v s) and x(v, s). IV. CONCLUSION In this paper we proposed three apaity-ahieving polar oding shemes, for the settings of asymmetri hannel oding and flash memory rewriting. The sheme for asymmetri hannels improves on the sheme

14 of [17] by reduing the exponential storage requirement into a linear one. The idea for this redution is to perform the enoding randomly instead of using Boolean funtions, and to transmit a vanishing fration of information on a side hannel. The seond proposed sheme is used for the setting of flash memory rewriting. We propose a model of flash memory rewriting with writing noise, and show that the sheme ahieves its apaity. We also desribe a more general lass of hannels whose apaity an be ahieved using the sheme. The seond sheme is derived from the asymmetri-hannel sheme by replaing the Shannon random variables X and X Y with the Gelfand-Pinsker random variables V S and V Y. The last proposed sheme ahieves the apaity of any hannel with non-ausal state information at the enoder. We bring a model of noisy flash memory rewriting for whih the sheme would be useful. The main idea in this sheme is alled ode haining. Another potential appliation ould be in information embedding (as in [3]), where the model is asymmetri APPENDIX A In this appendix we prove Theorem 12. A similar result was proven in [16, Theorem 4]. We show a different proof here, whih we find more intuitive. Theorem 12 states that the apaity of the hannel in Example 11 is C = (1 β)[h(ϵ α) h(α)], where ϵ = B/(1 β) and ϵ α ϵ(1 α) + (1 ϵ)α. An upper bound on the apaity an be obtained by assuming that the state information is available also to the deoder. In this ase, the best oding sheme would ignore the ells with s i = 1 (about a fration β of the ells), and the rest of the ells would be oded aording to a binary symmetri hannel with an input ost onstraint. It is optimal to assign a hannel input x i = 0 for the ells with state s i = 1, suh that those ells who do not onvey information do not ontribute to the ost. We now fous on the apaity of the binary symmetri hannel with ost onstraint. To omply with the expeted input ost onstraint B of the hannel of Example 11, the expeted ost of the input to the binary symmetri hannel (BSC) must be at most ϵ = B/(1 β). To omplete the proof of the upper bound, we show next that the apaity of the BSC with ost onstraint is equals to h(α ϵ) h(α). For this hannel, we have H(Y X) = h(α)p X (0) + h(α)p X (1) = h(α). We are left now with maximizing the entropy H(Y) over the input pmfs p X (1) ϵ. We have p Y (1) =p Y X (1 0)p X (0) + p Y X (1 1)p X (1) =α(1 p X (1)) + (1 α)p X (1) =α p X (1). Now sine p X (1) ϵ 1/2 and α p X (1) is inreasing in p X (1) below 1/2, it follows that p Y (1) α ϵ 1/2 and therefore also that H(Y) h(α ϵ). So we have max I(X; Y) = p X (1) ϵ max (H(Y) H(Y X)) = h(α ϵ) h(α). p X (1) ϵ This ompletes the proof of the upper bound. The lower bound is obtained by onsidering the seletions V = {0, 1}, x(v, 0) = v, x(v, 1) = 0 and ϵ(1 α) p V S (1 0) = ϵ, p V S (1 1) = ϵ α, (15) and alulating the rate expression diretly. Notie first that the ost onstraint is met sine p X (1) = p X S (1 0)p S (0) = p V S (1 0)p S (0) = ϵ(1 β) = B.

15 We need to show that H(V S) H(V Y) = (1 β)[h(α ϵ) H(α)]. Given the distributions p S and p V S, the onditional entropy H(V S) is H(V S) = s {0,1} p S (s)h(v S = s) =p S (0)H(V S = 0) + p S (1)H(V S = 1) ( ) ϵ(1 α) =(1 β)h(ϵ) + βh ϵ α To ompute the onditional entropy H(V Y), we first ompute the probability distribution of the memory output Y as follows: p Y (0) = v {0,1} The onditional distribution p V Y is given by Therefore we have H(V Y) = y {0,1} p Y VS (0 v, 0)p V S (v 0)p S (0) =(1 β)((1 α)(1 ϵ) + αϵ) =(1 β)(α (1 ϵ)), p Y (1) =1 p Y (0) =(1 β)(α ϵ) + β. p V Y (1 0) = s {0,1} = s {0,1} p VS Y (1, s 0) p Y VS (0 1, s)p VS (1, s) p Y (0) p Y VS (0 1, s)p V S (1 s)p S (s) = p s {0,1} Y (0) αϵ = α (1 ϵ), p V Y (1 1) = p VS Y (1, s 1) s {0,1} = s {0,1} = s {0,1} p Y (y)h(v Y = y) p Y VS (1 1, s)p VS (1, s) p Y (1) p Y VS (1 1, s)p V S (1 s)p S (s) p Y (1) ϵ(1 α) (1 α)ϵ(1 β) + ϵ α = β (1 β)(α ϵ) + β ϵ(1 α) = ϵ α.

( ) ( ) αϵ ϵ(1 α) =(1 β)(α (1 ϵ))h + (β + (1 β)(α ϵ))h, α (1 ϵ) ϵ α and then [ ( ) ( ) ] αϵ ϵ(1 α) H(V S) H(V Y) =(1 β) H(ϵ) (α (1 ϵ))h (α ϵ)h α (1 ϵ) ϵ α [ αϵ =(1 β) H(ϵ) + αϵ log 2 α (1 ϵ) + (1 α)(1 ϵ) log (1 α)(1 ϵ) 2 α (1 ϵ) ] α(1 ϵ) ϵ(1 α) + α(1 ϵ) log 2 + ϵ(1 α) log α ϵ 2 α ϵ =(1 β)[h(α ϵ) + H(ϵ) + αϵ log 2 (αϵ) + (1 α)(1 ϵ) log 2 (1 α)(1 ϵ) + α(1 ϵ) log 2 α(1 ϵ) + ϵ(1 α) log 2 ϵ(1 α)] =(1 β) [H(α ϵ) + H(ϵ) H(α) H(ϵ)] =(1 β) [H(α ϵ) H(α)]. 16 APPENDIX B In this appendix we prove Lemma 13. We need to show that, using the funtions of Therem 12, there exists a DMC W : {0, 1} 2 {0, 1} 2 suh that p S V (s v) = To define suh hannel W, we first laim that Equation (17) follows diretly from Equation (10) sine Next, we laim that This follows from p V,S (v, 1) p V,Y (v, 1) p V,S (v,1) p V,Y (v,1) = y {0,1} 2 p Y V (y v)w(s y). (16) p Y V,S (1 v, 0)p V S (x 0) = (ϵ α)p V S (x 1). (17) p Y V,S (1 0, 0)p V S (0 0) p V S (0 1) p Y V,S (1 1, 0)p V S (1 0) p V S (1 1) = = α(1 ϵ) α(1 ϵ) ϵ α (1 α)ϵ (1 α)ϵ ϵ α = ϵ α, = ϵ α. β (ϵ α)(1 β)+β for any v {0, 1}, and therefore that p V,S (v,1) p V,Y (v,1) (a) p V S (v 1)p S (1) = p Y,V S (1, v 0)p S (0) + p Y,V S (1, v 1)p S (1) (b) p V S (v 1)β = p Y V,S (1 v, 0)p V S (v 0)(1 β) + p Y V,S (1 v, 1)p V S (x 1)β () p V S (v 1)β = (ϵ α)p V S (v 1)(1 β) + p V S (v 1)β β = (ϵ α)(1 β) + β, [0, 1]. where (a) follows from the law of total probability, (b) follows from the definition of onditional probability, and () follows from Equations (9) and (17).

17 Sine p V,S(v,1) p V,Y (v,1) is not a funtion of x and is in [0, 1], we an define W as following: W(s y) 1 if (s, y) = (0, 0) 1 p S V (1 v) p Y V (1 v) if (s, y) = (0, 1) p S V (1 v) p Y V (1 v) if (s, y) = (1, 1) 0 if (s, y) = (1, 0). We show next that Equation (16) holds for W defined above: p Y V (y v)w(s y) =p Y V (0 v)w(s 0) + p Y V (1 v)w(s 1) y {0,1} [ ( = p Y V (0 v) + p Y V (1 v) 1 p S V(1 v) p Y V (1 v) )] [ ] = 1 p S V (1 v) 1[s = 0] + p S V (1 v)1[s = 1] =p S V (0 v)1[s = 0] + p S V (1 v)1[s = 1] =p S V (s v). So the hannel W satisfies Equation (16) and thus the lemma holds. APPENDIX C 1[s = 0] + p S V (1 v)1[s = 1] In this appendix we prove Theorem 15. The omplexity laim of Theorem 15 is explained in [17, Setion III.B]. We start with the asymptoti rate of Constrution 14. We want to show that lim n (1/n) H V S L V Y = 0. Sine p S V is degraded with respet to p Y V, it follows from [13, Lemma 4] that L V S L V Y, and therefore that L V S L. So we have V Y lim n (1/n) H V S L V Y lim n (1/n) H V S L V S = 0, where the equality follows from the definition of the sets. To omplete the proof of the theorem, we need to show that for some frozen vetor, the input ost meets the design onstraint, while the deoding error probability vanishes sub-exponentially fast in the blok length. We show this result using the probabilisti method. That is, we first assume that the frozen vetor is random, drawn uniformly, and analyze the input ost and error probability in this ase, and then we show the existene of a good vetor using Markov s inequalities and the union bound. Denote the uniformly distributed random vetor that represents the frozen vetor by F [ HV S L V Y ]. We show next that the expeted input ost exeeds the design onstraint B by a vanishing amount. A. Expeted Input Cost Define b n (X [n] ) = n b(x i ). i=1 For a state vetor s [n] and the enoding rule (11) with uniform frozen vetor F [ HV S L V Y ], eah vetor u [n] appears with probability i H V S p Ui U [i 1],S [n] (u i u [i 1], s [n] ) 2 H V S.

18 Remember that the joint pmf p S[n],U [n] refers to the apaity-ahieving distribution of the hannel. The expeted ost is expressed as E F[ HV S L V Y ](bn (X [n] )) = p S[n] (s [n] ) u [n],s [n] Define the joint pmf q S[n],U [n] p S[n] (s [n] ) i H V S i H V S p Ui U [i 1],S [n] (u i u [i 1], s [n] ) 2 H V S b n (u [n] G n ). p Ui U [i 1],S [n] (u i u [i 1], s [n] ) 2 H V S. (18) Intuitively, q S[n],U [n] refers to the distribution imposed by the enoding proedure of Constrution 14. Then we have E F[ HV S L V Y ](bn (X [n] )) =E qs[n],u [n] [b n (U [n] G n )] E ps[n] [b n (U,U [n] [n] G n )] + max b(x) p S[n],U x [n] q S[n],U [n] =nb + max b(x) p S[n],U x [n] q S[n],U [n], where is the L 1 distane and the inequality follows from the triangle inequality. We will now prove that E F[ HV S L V Y ](bn (X [n] )) nb + 2 n1/2 δ, by showing that p U[n],S [n] q U[n],S [n] 2 n1/2 δ. To do this, we will prove a slightly stronger relation that will be used also for the proof of the probability of deoding error. We first define the joint pmf Then we notie that q S[n],U [n],y [n] q U[n],S [n] (u [n], s [n] )p Y[n] U [n],s [n] (y [n] u [n], s [n] ). p U[n],S [n] q U[n],S [n] = p(u [n], s [n] ) q(u [n], s [n] ) u [n],s [n] = [p(s [n], u [n], y [n] ) q(s [n], u [n], y [n] )] u [n],s [n] y [n] p(s [n], u [n], y [n] ) q(s [n], u [n], y [n] ) s [n],u [n],y [n] = p S[n],U [n],y [n] q S[n],U [n],y [n], where the inequality follows from the triangle inequality. The proof of the expeted ost is ompleted with the following lemma, whih will be used also for analyzing the probability of deoding error. Lemma 17. p S[n],U [n],y [n] q S[n],U [n],y [n] 2 n1/2 δ. (19) Proof: Let D( ) denote the relative entropy. Then p S[n],U [n],y [n] q S[n],U [n],y [n] = s [n],u [n],y [n] p(s [n], u [n], y [n] ) q(s [n], u [n], y [n] )

19 (a) = p(u [n] s [n] ) q(u [n] s [n] ) p(s [n] )p(y [n] u [n], s [n] ) s [n],u [n],y [n] (b) = s [n],u [n],y [n] () = s [n],u [n],y [n] (d) i n i=1 i 1 p(u j u [j 1], s [n] ) j=1 i H V S s [n],u [n],y [n] n p(u i u [i 1], s [n] ) q(u i u [i 1], s [n] ) p(s [n])p(y [n] u [n], s [n] ) i=1 [p(u i u [i 1], s [n] ) q(u i u [i 1], s [n] )] p(s [n])p(y [n] u [n], s [n] ) n q(u j u [j 1], s [n] ) j=i+1 p(u i u [i 1], s [n] ) q(u i u [i 1], s [n] ) n q(u j u [j 1], s [n] )p(y [n] u [n], s [n] ) j=i+1 = i H V S i H V S s [n],u [i] p(u i u [i 1], s [n] ) q(u i u [i 1], s [n] ) i 1 j=1 p(s [n] ) i 1 j=1 p(u j u [j 1], s [n] ) p(u j u [j 1], s [n] )p(s [n] ) (e) = p(u [i 1], s [n] ) p Ui U [i 1] =u [i 1],S [n] =s [n] q Ui U [i 1] =u [i 1],S [n] =s [n] i H V S s [n],u [i 1] (f) p(u [i 1], s [n] ) 2 ln 2 D(p Ui U [i 1] =u [i 1],S [n] =s [n] q Ui U [i 1] =u [i 1],S [n] =s [n] ) i H V S s [n],u [i 1] (g) (2 ln 2) = i H V S (h) = i H V S s [n],u [i 1] p(u [i 1], s [n] )D(p Ui U [i 1] =u [i 1],S [n] =s [n] q Ui U [i 1] =u [i 1],S [n] =s [n] ) (2 ln 2)D(p Ui q Ui U [i 1], S [n] ) (2 ln 2)[1 H(U i U [i 1], S [n] )], where (a) follows from the fat that p(s [n] ) = q(s [n] ) and p(y [n] u [n], s [n] ) = q(y [n] u [n], s [n] ), (b) follows from the hain rule, () follows from the telesoping expansion B [n] A [n] = = n i=1 n i=1 A [i 1] B [i:n] n A [i] B [i+1:n] i=1 (B i A i )A [i 1] B [i+1:n], where A [j:k] and B [j:k] denote the produts i=j k A i and i=j k B i, respetively, (d) follows from the triangular inequality and the fat that p(u i u [i 1], s [n] ) = q(u i u [i 1], s [n] ) for all i H (aording to Equation (18)), V S (e) follows from the hain rule again, (f) follows from Pinsker s inequality (see, e.g., [6, Lemma 11.6.1]),

20 (g) follows from Jensen s inequality and (h) follows from the fats that q(u i u [i 1], s [n] ) = 1/2 for i H V S and from [13, Lemma 10]. Now if i H V S, we have 1 H(U i U [i 1], S [n] ) 1 [Z(U i U [i 1], S [n] )] 2 2 2n1/2 δ, (20) where the first inequality follows from Proposition 3, and the seond inequality follows from the fat that i is in H V S. This ompletes the proof of the lemma. B. Probability of Deoding Error Let E i be the set of pairs of vetors (u [n], y [n] ) suh that û [n] is a result of deoding y [n], and û [i] satisfies both û [i 1] = u [i 1] and û i = u i. The blok deoding error event is given by E i LV Y E i. Under deoding given in (12) with an arbitrary tie-breaking rule, every pair (u [n], y [n] ) E i satisfies p Ui U [i 1],Y [n] (u i u [i 1], y [n] ) p Ui U [i 1],Y [n] (u i 1 u [i 1], y [n] ). (21) Consider the blok deoding error probability p e (F [ HV S L V Y ] ) for the random frozen vetor F [ HV S L V Y ]. For a state vetor s [n] and the enoding rule (11), eah vetor u [n] appears with probability i H V S p Ui U [i 1],S [n] (u i u [i 1], s [n] ) 2 H V S. By the definition of onditional probability and the law of total probability, the probability of error p e (F [ HV S L V Y ] ) is given by E F[ HV S L V Y ][p e] = p S[n] (s [n] ) p Ui U [i 1],S [n] (u i u [i 1], s [n] ) 2 H V S u [n],s [n],y [n] i H V S p Y[n] U [n],s [n] (y [n] u [n], s [n] )1[(u [n], y [n] ) E]. Then we have E F[ HV S L V Y ][p e] =q U[n],Y [n] (E) q U[n],Y [n] p U[n],Y [n] + p U[n],Y [n] (E) q U[n],Y [n] p U[n],Y [n] + p U[n],Y [n] (E i ), i L V Y where the first inequality follows from the triangle inequality. Eah term in the summation is bounded by [ ] p U[n],Y [n] (E i ) p(u [i], y [n] )1 p(u i u [i 1], y [n] ) p(u i 1 u [i 1], y [n] ) u [i],y [n] p(ui 1 u [i 1], y [n] ) p(u [i 1], y [n] )p(u i u [i 1], y [n] ) u [i],y [n] p(u i u [i 1], y [n] ) = Z(U i U [i 1], Y [n] ) 2 n1/2 δ, where the last inequality follows from the fat that i belongs to the set L V Y.

21 To prove that E F[ HV S L V Y ][p e] 2 n1/2 δ for some δ > δ, we are left with showing that p U[n],Y [n] q U[n],Y [n] 2 n1/2 δ. Notie that 2 p U[n],Y [n] q U[n],Y [n] = p(u [n], y [n] ) q(u [n], y [n] ) u [n],y [n] = [p(s [n], u [n], y [n] ) q(s [n], u [n], y [n] )] u [n],y [n] s [n] p(s [n], u [n], y [n] ) q(s [n], u [n], y [n] ) s [n],u [n],y [n] =2 p S[n],U [n],y [n] q S[n],U [n],y [n], where the inequality follows from the triangle inequality. Lemma 17 now ompletes the proof that E F[ HV S L V Y ][p e] = 2 n1/2 δ. C. Existene of a Good Frozen Vetor We showed that E F[ HV S L V Y ](bn (X [n] )) nb + 2 n1/2 δ. In this subsetion we will be interested to find a frozen vetor whih satisfies a slightly higher expeted ost, namely E(b n (X [n] )) nb + 1/n 2. Remember that B is a onstant. We take a uniformly distributed frozen vetor, and bound the probability that the expeted ost exeeds nb + 1/n 2, using Markov s inequality. The following inequalities hold for large enough n, and we are not onerned here with the exat values required. P(E(b n (X [n] )) nb + 1/n 2 ) E F[ HV S L V Y ](bn (X))/(nB + 1/n 2 ) (nb + 2 n1/2 δ )/(nb + 1/n 2 ) (nb + n 3 )/(nb + n 2 ) = 1 (n 2 n 3 )/(nb + n 2 ) 1 (n 2.1 )/(nb + n 2 ) 1 1/(Bn 3.1 + n 0.1 ) 1 1/(Bn 3.2 ) 1 1/n 4. We now apply Markov s inequality on the probability of deoding error: P(p e n 5 2 n1/2 δ ) E F[ HV S L V Y ](p e) n 5 2 n1/2 δ 2 n1/2 δ n 5 2 n1/2 δ = 1/n5. By the union bound, the probability that either E(b n (X [n] )) nb + 1/n 2 or p e n 5 2 n1/2 δ is at most 1 1/n 4 + 1/n 5 1 1/n 5. This implies that the probability that both E(b n (X [n] )) nb + 1/n 2 and p e n 5 2 n1/2 δ is at least 1/n 5. So there exists at least one frozen vetor for whih the desired properties for both the expeted

22 ost and the probability of error hold. Furthermore, suh a vetor an be found by repeatedly seleting a frozen vetor uniformly at random, until the required properties hold. The properties an be verified effiiently with lose approximations by the upgradation and degradation method proposed in [31]. The expeted number of times until a good vetor is found is polynomial in the blok length (at most n 5 ). This ompletes the proof of Theorem 15. REFERENCES [1] E. Arikan, Channel polarization: A method for onstruting apaity-ahieving odes for symmetri binary-input memoryless hannels, IEEE Trans. Inf. Theor., vol. 55, no. 7, pp. 3051 3073, July 2009. [2], Soure polarization, in Pro. IEEE Int. Symp. on Inf. Theor. (ISIT), June 2010, pp. 899 903. [3] R. Barron, B. Chen, and G. W. Wornell, The duality between information embedding and soure oding with side information and some appliations, IEEE Tran. on Inf. Theor., vol. 49, no. 5, pp. 1159 1180, May 2003. [4] D. Burshtein and A. Strugatski, Polar write one memory odes, IEEE Trans. Inf. Theor., vol. 59, no. 8, pp. 5088 5101, Aug. 2013. [5] Y. Cassuto, M. Shwartz, V. Bohossian, and J. Bruk, Codes for asymmetri limited-magnitude errors with appliation to multilevel flash memories, IEEE Trans. on Inf. Theor., vol. 56, no. 4, pp. 1582 1595, April 2010. [6] T. M. Cover and J. A. Thomas, Elements of information theory (2. ed.). Wiley, 2006. [7] A. El Gamal and Y. Kim, Network information theory. Cambridge University Press, 2012. [8] E. En Gad, Y. Li, J. Kliewer, M. Langberg, A. Jiang, and J. Bruk, Polar oding for noisy write-one memories, in Pro. IEEE Int. Symp. on Inf. Theor. (ISIT), June 2014, pp. 1638 1642. [9] E. En Gad, E. Yaakobi, A. Jiang, and J. Bruk, Rank-modulation rewrite oding for flash memories, Available at http://arxiv.org/abs/1312.0972, De 2013. [10] A. Gabizon and R. Shaltiel, Invertible zero-error dispersers and defetive memory with stuk-at errors, in APPROX-RANDOM, 2012, pp. 553 564. [11] R. G. Gallager, Information Theory and Reliable Communiation. Wiley, 1968. [12] S. Gel fand and M. Pinsker, Coding for hannel with random parameters, Problems of Control Theory, vol. 9, no. 1, pp. 19 31, 1980. [13] N. Goela, E. Abbe, and M. Gastpar, Polar odes for broadast hannels, Available at http://arxiv.org/abs/1301.6150, Jan 2013. [14] S. Golomb, The limiting behavior of the z-hannel (orresp.), IEEE Trans. on Inf. Theor., vol. 26, no. 3, pp. 372 372, May 1980. [15] J. Gordon, Quantum effets in ommuniations systems, Proeedings of the IRE, vol. 50, no. 9, pp. 1898 1908, Sept 1962. [16] C. Heegard, On the apaity of permanent memory, IEEE Trans. Inf. Theor., vol. 31, no. 1, pp. 34 42, January 1985. [17] J. Honda and H. Yamamoto, Polar oding without alphabet extension for asymmetri models, IEEE Trans. Inf. Theor., vol. 59, no. 12, pp. 7829 7838, 2013. [18] S. Korada and R. Urbanke, Polar odes are optimal for lossy soure oding, IEEE Trans. Inf. Theor., vol. 56, no. 4, pp. 1751 1768, April 2010. [19] A. Kusnetsov and B. S. Tsybakov, Coding in a memory with defetive ells translated from Problemy Peredahi Informatsii, vol. 10, no. 2, pp. 52 60, 1974. [20] X. Ma, Write-one-memory odes by soure polarization, Available at http://arxiv.org/abs/1405.6262, May 2014. [21] K. Marton, A oding theorem for the disrete memoryless broadast hannel, IEEE Trans. on Inf. Theor., vol. 25, no. 3, pp. 306 311, May 1979. [22] M. Mondelli, I. Hassani, and R. Urbanke, How to ahieve the apaity of asymmetri hannels, Available at http://arxiv.org/abs/1406.7373, June 2014. [23] M. Mondelli, S. H. Hassani, I. Sason, and R. Urbanke, Ahieving Marton s region for broadast hannels using polar odes, Available at http://arxiv.org/abs/1401.6060, January 2014. [24] R. Mori and T. Tanaka, Channel polarization on q-ary disrete memoryless hannels by arbitrary kernels, in IEEE Int. Symp. on Inf. Theor. (ISIT), June 2010, pp. 894 898. [25] W. Park and A. Barg, Polar odes for q-ary hannels, q = 2 r, IEEE Trans. Inf. Theor., vol. 59, no. 2, pp. 955 969, Feb 2013. [26] A. Sahebi and S. Pradhan, Multilevel polarization of polar odes over arbitrary disrete memoryless hannels, in Communiation, Control, and Computing (Allerton), 2011 49th Annual Allerton Conferene on, Sept 2011, pp. 1718 1725. [27] E. Sasoglu, Polar odes for disrete alphabets, in Pro. IEEE Int. Symp. on Inf. Theor. (ISIT), July 2012, pp. 2137 2141. [28] E. Sasoglu, I. Telatar, and E. Arikan, Polarization for arbitrary disrete memoryless hannels, in IEEE Information Theory Workshop (ITW), Ot 2009, pp. 144 148. [29] E. Sasoglu, Polarization and polar odes, Foundations and Trends in Communiations and Information Theory, vol. 8, no. 4, pp. 259 381, 2012. [Online]. Available: http://dx.doi.org/10.1561/0100000041 [30] D. Sutter, J. Renes, F. Dupuis, and R. Renner, Ahieving the apaity of any dm using only polar odes, in IEEE Information Theory Workshop (ITW), Sept 2012, pp. 114 118. [31] I. Tal and A. Vardy, How to onstrut polar odes, IEEE Trans. on Inf. Theor., vol. 59, no. 10, pp. 6562 6582, Ot 2013. [32] R. Zamir, S. Shamai, and U. Erez, Nested linear/lattie odes for strutured multiterminal binning, IEEE Trans. Inf. Theor., vol. 48, no. 6, pp. 1250 1276, Jun 2002.