Resilient Dynamic Programming Irene Finocchi, Saverio Caminiti, and Emanuele Fusco Dipartimento di Informatica, Sapienza Università di Roma via Salaria, 113-00198 Rome, Italy. {finocchi, caminiti, fusco}@di.uniroma1.it Kickoff AlgoDEEP Bertinoro, Italia. April 16-17 2010 (task C.1.1)
Outline 1 Introduction 2 A resilient framework for dynamic programming 3 Testing and experimental validation
Memories and faults Why should we care about memory faults in algorithm design? Memory faults happen: a large cluster of computers with a few gigabytes per node can experience one bit error every few minutes [Sah06]. Memory faults are harmful: undetected memory faults cause data corruption to spread; (potentially safety critical, e.g., avionics). Hardware solutions may be inadequate: fault-tolerant memory chips does not guarantee complete fault coverage; (expensive system halt upon detection of uncorrectable errors interruptions of service) [JNW08].
From liars to data corruption Algorithmic research related to memory errors has focused mainly on sorting and searching problems: late 70 s: Rényi [Rén94] and Ulam [Ula77]: twenty questions game against a liar, handling noise in binary search. Yao and Yao [YY85], and then [AU91, LM99, LMP97]: destructive faults in fault-tolerant sorting networks, comparison gates can destroy one of the input values.... [FI04] sorting in the faulty RAM model.
Faulty memories: an adversarial model Memory in a faulty-ram of word-size w is divided in three classes: a large unreliable memory: an adaptive adversary of unlimited computational power can modify up to δ memory words; O(1) safe memory words: the adversary can read but not modify this memory; O(1) private memory words: the adversary cannot even read this memory.
Local dependency dynamic programming edit distance Let e i,j be the edit distance between the prefix up to the i-th symbol of the input string X and the prefix up to the j-th symbol of the input string Y. e i,j := { ei 1,j 1 if i, j > 0 and x j = y i 1 + min {e i 1,j, e i,j 1, e i 1,j 1 } if i, j > 0 and x j y i (e 0,j = j, e i,0 = i.)
Correctness requirements Correctness of sorting and searching required only on uncorrupted values. In our setting, such a relaxed definition of correctness does not seem to be natural.
Correctness requirements Correctness of sorting and searching required only on uncorrupted values. In our setting, such a relaxed definition of correctness does not seem to be natural. We seek algorithms that correctly compute the edit distance between the two input strings, in spite of memory faults.
Tools Majority. Table decomposition. Fingerprinting.
Majority A variable can be made resilient by making 2δ + 1 copies. As at most δ of them can be altered by the adversary, the majority value is the correct value. The majority value can be read in time O(δ) and space O(1) [BM91].
Table decomposition The DP table is split in blocks of size δ δ. The boundaries of each block are written reliably in the faulty memory. δ 2 values result in roughly 5δ 2 memory words.
Fingerprinting A fingerprint for a column is computed as: ϕ k = v 1 v 2... v δ mod p where p is a prime number uniformly chosen at randomly in interval [n c 1, n c ] (where c is an appropriate constant).
Fingerprinting A fingerprint for a column is computed as: ϕ k = v 1 v 2... v δ mod p where p is a prime number uniformly chosen at randomly in interval [n c 1, n c ] (where c is an appropriate constant). Using logical shifts and Horner s rule, each fingerprint can be incrementally computed while generating the values v h : for h = 1 to δ do ϕ = ((ϕ 2 w ) + v h ) mod p end for
Block computation B i 1,j 1 B i 1,j B i,j 1 B i,j The first column of a block is computed reading reliably all values it depends from. ϕ 1
Block computation B i 1,j 1 B i 1,j B i,j 1 B i,j While computing the first column, fingerprint ϕ 1 is also computed. ϕ 1
Block computation B i 1,j 1 B i 1,j B i,j 1 B i,j While computing the first column, fingerprint ϕ 1 is also computed. ϕ 1
Block computation B i 1,j 1 B i 1,j B i,j 1 Bi,j While computing column k + 1, we produce two fingerprints, ϕ k+1 and ϕ k. ϕ k ϕ k ϕk+1
Block computation B i 1,j 1 B i 1,j B i,j 1 Bi,j Fingerprint ϕ k is then compared with ϕ k (i.e., the fingerprint produced while computing column k). ϕ k ϕ k ϕk+1
Block computation B i 1,j 1 B i 1,j B i,j 1 Bi,j If ϕ k ϕ k, the block is recomputed from scratch. ϕ k ϕ k ϕk+1
As a result...... we have: Theorem The edit distance between two strings of length n and m, with n m, can be correctly computed, with high probability, in: O(nm + αδ 2 ) time; O(nm) space, when δ is polynomial in n.
Generalizing Theorem A d-dimensional local dependency dynamic programming table M of size n d can be correctly computed, with high probability, in: O(n d + αδ d ) time; O(n d + nδ) space, when the actual number α δ of memory faults occurring during the computation is polynomial in n. (Edit distance, longest common subsequence, sequence alignment,...)
faultylib We are developing a library to test program behavior in presence of memory faults. Plugging in the library should be very easy: existing C/C++ code should require minimal changes to be tested with our library. Implementation of different (and meaningful) adversaries should be easy....
faultylib: usage FaultyUInt M[n+1u][m+1u]; // An n+1 X m+1 matrix of // faulty unsigned int... for (unsigned int i = 1; i <= n; i++) { for (unsigned int j = 1; j <= m; j++) { M[i][j] = min(1 + min(m[i-1][j], M[i][j-1]), M[i-1][j-1] + ((x[i-1]==y[j-1])? 0 : 1)); } }...
faultylib: faulty types implementation template <typename T> class Faulty : public FaultyBase {... private: T _val; T read() const { FaultyMM::getInstance()->faultBeforeRead(&_val, sizeof(t), context); return _val; } void write(t v) { _val = v; FaultyMM::getInstance()->faultAfterWrite(&_val, sizeof(t), context); } }... typedef Faulty<unsigned int> FaultyUInt;
faultylib: overriding operators... //Assignment operator template <typename Targ> Faulty & operator=(const Targ & v) { write((t)v); return *this; }... //OR template <typename Targ> bool operator (const Targ & v) const { return (read() (T)v); } }...
faultylib: adversaries implementation class REDAdversary : public Adversary {... virtual void faultafterwrite(void * location, size_t s, Context * cnt) { if ((cnt!= NULL) && (cnt->tag == EDMATRIX_TAG)) { MatrixContext * m = (MatrixContext *)cnt; unsigned int * i = (unsigned int *)location; if (m->getindex(0) == 3) if (m->getindex(1) == 7) *i = *i +3; } }...
Thanks! Thank you for your attention!
References [AU91] [BM91] [FI04] S. Assaf and E. Upfal. Fault tolerant sorting networks. SIAM J. Discrete Math., 4(4):472 480, 1991. R. S. Boyer and J. S. Moore. Mjrty: A fast majority vote algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, pages 105 118, 1991. Irene Finocchi and Giuseppe F. Italiano. Sorting and searching in the presence of memory faults (without redundancy). In László Babai, editor, STOC, pages 101 110. ACM, 2004. [JNW08] B. L. Jacob, S. W. Ng, and D. T. Wang. Memory Systems: Cache, DRAM, Disk. [LM99] Morgan Kaufmann, 2008. F. T. Leighton and Y. Ma. Tight bounds on the size of fault-tolerant merging and sorting networks with destructive faults. SIAM J. Comput., 29(1):258 273, 1999. [LMP97] F. T. Leighton, Y. Ma, and C. G. Plaxton. Breaking the θ(n log 2 n) barrier for sorting with faults. J. Comput. Syst. Sci., 54(2):265 304, 1997. [Rén94] [Sah06] [Ula77] [YY85] A. Rény. A diary on information theory. J. Wiley and Sons, 1994. Original publication: Napló az információelméletröl, Gondolat, Budapest, 1976. G. K. Saha. Software based fault tolerance: a survey. Ubiquity, 7(25), 2006. S. M. Ulam. Adventures of a mathematician. Charles Scribner s Sons, New York, 1977. A. C. Yao and F. F. Yao. On fault-tolerant networks for sorting. SIAM J. Comput., 14(1):120 128, 1985.