Resilient Dynamic Programming



Similar documents
Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Compiling CAO: from Cryptographic Specifications to C Implementations

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

A Catalogue of the Steiner Triple Systems of Order 19

Apache Spark and Distributed Programming

Versatile weighting strategies for a citation-based research evaluation model

A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)

Arithmetic Coding: Introduction

Physical Data Organization

Distributed Storage Networks and Computer Forensics

Algorithms and Methods for Distributed Storage Networks 5 Raid-6 Encoding Christian Schindelhauer

Why you shouldn't use set (and what you should use instead) Matt Austern

Load Distribution on a Linux Cluster using Load Balancing

Data Corruption In Storage Stack - Review

Secure Way of Storing Data in Cloud Using Third Party Auditor

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

CSE-E5430 Scalable Cloud Computing Lecture 11

Factoring Algorithms

GraySort on Apache Spark by Databricks

Lecture 9 - Message Authentication Codes

Categorical Data Visualization and Clustering Using Subjective Factors

Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research

Tolerating Multiple Faults in Multistage Interconnection Networks with Minimal Extra Stages

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

The Complexity of Online Memory Checking

BUSINESS ANALYTICS. Data Pre-processing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.

How To Create A P2P Network

12.0 Statistical Graphics and RNG

Big Data and Scripting map/reduce in Hadoop

Fact Sheet In-Memory Analysis

Data Streams A Tutorial

Variable Base Interface

CUDA Programming. Week 4. Shared memory and register

Optimization Problems in Infrastructure Security

Effective Data Mining Using Neural Networks

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

What s New in MATLAB and Simulink

Hardware-Aware AlgorithmsandDataStructures. Gabriel Moruz BRICS University of Aarhus

RAM & ROM Based Digital Design. ECE 152A Winter 2012

New Hash Function Construction for Textual and Geometric Data Retrieval

Confinement Problem. The confinement problem Isolating entities. Example Problem. Server balances bank accounts for clients Server security issues:

A Mathematical Programming Solution to the Mars Express Memory Dumping Problem

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Distributed Data Stores

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Parallel Computing for Data Science

Dušan Bernát

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

MODELING RANDOMNESS IN NETWORK TRAFFIC

Quantum Computing Lecture 7. Quantum Factoring. Anuj Dawar

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

Large-Scale Test Mining

Architectures for massive data management

Lecture 2: Universality

Cours de C++ Utilisations des conteneurs

Project DALKIT (informal working title)

Hadoop Fair Scheduler Design Document

Efficient Fault-Tolerant Infrastructure for Cloud Computing

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

A FAST STRING MATCHING ALGORITHM

Review of Hashing: Integer Keys

C++ Programming Language

Boolean Network Models

RAID Technology Overview

Support Vector Machines with Clustering for Training with Very Large Datasets

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Leonardo Aniello

A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching

Business. Control Administration. Alessandro Colantonio. Bay31 GmbH, Switzerland. Roberto Di Pietro. Universita di Roma Tre, Italy.

Data Storage - II: Efficient Usage & Errors

The Online Set Cover Problem

Longest Common Extensions via Fingerprinting

Efficient LDPC Code Based Secret Sharing Schemes and Private Data Storage in Cloud without Encryption

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

A Tool for Generating Partition Schedules of Multiprocessor Systems

Contents. SnapComms Data Protection Recommendations

Discrete Optimization

Deploy App Orchestration 2.6 for High Availability and Disaster Recovery

Find-The-Number. 1 Find-The-Number With Comps

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

A Performance Comparison of Five Algorithms for Graph Isomorphism

Application of Data Mining Techniques in Intrusion Detection

- Behind The Cloud -

Hadoop Architecture. Part 1

System Aware Cyber Security

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

1 Abstract Data Types Information Hiding

Reliable Systolic Computing through Redundancy

The Advantages and Disadvantages of Network Computing Nodes

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

FAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara

GURLS: A Least Squares Library for Supervised Learning

Transcription:

Resilient Dynamic Programming Irene Finocchi, Saverio Caminiti, and Emanuele Fusco Dipartimento di Informatica, Sapienza Università di Roma via Salaria, 113-00198 Rome, Italy. {finocchi, caminiti, fusco}@di.uniroma1.it Kickoff AlgoDEEP Bertinoro, Italia. April 16-17 2010 (task C.1.1)

Outline 1 Introduction 2 A resilient framework for dynamic programming 3 Testing and experimental validation

Memories and faults Why should we care about memory faults in algorithm design? Memory faults happen: a large cluster of computers with a few gigabytes per node can experience one bit error every few minutes [Sah06]. Memory faults are harmful: undetected memory faults cause data corruption to spread; (potentially safety critical, e.g., avionics). Hardware solutions may be inadequate: fault-tolerant memory chips does not guarantee complete fault coverage; (expensive system halt upon detection of uncorrectable errors interruptions of service) [JNW08].

From liars to data corruption Algorithmic research related to memory errors has focused mainly on sorting and searching problems: late 70 s: Rényi [Rén94] and Ulam [Ula77]: twenty questions game against a liar, handling noise in binary search. Yao and Yao [YY85], and then [AU91, LM99, LMP97]: destructive faults in fault-tolerant sorting networks, comparison gates can destroy one of the input values.... [FI04] sorting in the faulty RAM model.

Faulty memories: an adversarial model Memory in a faulty-ram of word-size w is divided in three classes: a large unreliable memory: an adaptive adversary of unlimited computational power can modify up to δ memory words; O(1) safe memory words: the adversary can read but not modify this memory; O(1) private memory words: the adversary cannot even read this memory.

Local dependency dynamic programming edit distance Let e i,j be the edit distance between the prefix up to the i-th symbol of the input string X and the prefix up to the j-th symbol of the input string Y. e i,j := { ei 1,j 1 if i, j > 0 and x j = y i 1 + min {e i 1,j, e i,j 1, e i 1,j 1 } if i, j > 0 and x j y i (e 0,j = j, e i,0 = i.)

Correctness requirements Correctness of sorting and searching required only on uncorrupted values. In our setting, such a relaxed definition of correctness does not seem to be natural.

Correctness requirements Correctness of sorting and searching required only on uncorrupted values. In our setting, such a relaxed definition of correctness does not seem to be natural. We seek algorithms that correctly compute the edit distance between the two input strings, in spite of memory faults.

Tools Majority. Table decomposition. Fingerprinting.

Majority A variable can be made resilient by making 2δ + 1 copies. As at most δ of them can be altered by the adversary, the majority value is the correct value. The majority value can be read in time O(δ) and space O(1) [BM91].

Table decomposition The DP table is split in blocks of size δ δ. The boundaries of each block are written reliably in the faulty memory. δ 2 values result in roughly 5δ 2 memory words.

Fingerprinting A fingerprint for a column is computed as: ϕ k = v 1 v 2... v δ mod p where p is a prime number uniformly chosen at randomly in interval [n c 1, n c ] (where c is an appropriate constant).

Fingerprinting A fingerprint for a column is computed as: ϕ k = v 1 v 2... v δ mod p where p is a prime number uniformly chosen at randomly in interval [n c 1, n c ] (where c is an appropriate constant). Using logical shifts and Horner s rule, each fingerprint can be incrementally computed while generating the values v h : for h = 1 to δ do ϕ = ((ϕ 2 w ) + v h ) mod p end for

Block computation B i 1,j 1 B i 1,j B i,j 1 B i,j The first column of a block is computed reading reliably all values it depends from. ϕ 1

Block computation B i 1,j 1 B i 1,j B i,j 1 B i,j While computing the first column, fingerprint ϕ 1 is also computed. ϕ 1

Block computation B i 1,j 1 B i 1,j B i,j 1 B i,j While computing the first column, fingerprint ϕ 1 is also computed. ϕ 1

Block computation B i 1,j 1 B i 1,j B i,j 1 Bi,j While computing column k + 1, we produce two fingerprints, ϕ k+1 and ϕ k. ϕ k ϕ k ϕk+1

Block computation B i 1,j 1 B i 1,j B i,j 1 Bi,j Fingerprint ϕ k is then compared with ϕ k (i.e., the fingerprint produced while computing column k). ϕ k ϕ k ϕk+1

Block computation B i 1,j 1 B i 1,j B i,j 1 Bi,j If ϕ k ϕ k, the block is recomputed from scratch. ϕ k ϕ k ϕk+1

As a result...... we have: Theorem The edit distance between two strings of length n and m, with n m, can be correctly computed, with high probability, in: O(nm + αδ 2 ) time; O(nm) space, when δ is polynomial in n.

Generalizing Theorem A d-dimensional local dependency dynamic programming table M of size n d can be correctly computed, with high probability, in: O(n d + αδ d ) time; O(n d + nδ) space, when the actual number α δ of memory faults occurring during the computation is polynomial in n. (Edit distance, longest common subsequence, sequence alignment,...)

faultylib We are developing a library to test program behavior in presence of memory faults. Plugging in the library should be very easy: existing C/C++ code should require minimal changes to be tested with our library. Implementation of different (and meaningful) adversaries should be easy....

faultylib: usage FaultyUInt M[n+1u][m+1u]; // An n+1 X m+1 matrix of // faulty unsigned int... for (unsigned int i = 1; i <= n; i++) { for (unsigned int j = 1; j <= m; j++) { M[i][j] = min(1 + min(m[i-1][j], M[i][j-1]), M[i-1][j-1] + ((x[i-1]==y[j-1])? 0 : 1)); } }...

faultylib: faulty types implementation template <typename T> class Faulty : public FaultyBase {... private: T _val; T read() const { FaultyMM::getInstance()->faultBeforeRead(&_val, sizeof(t), context); return _val; } void write(t v) { _val = v; FaultyMM::getInstance()->faultAfterWrite(&_val, sizeof(t), context); } }... typedef Faulty<unsigned int> FaultyUInt;

faultylib: overriding operators... //Assignment operator template <typename Targ> Faulty & operator=(const Targ & v) { write((t)v); return *this; }... //OR template <typename Targ> bool operator (const Targ & v) const { return (read() (T)v); } }...

faultylib: adversaries implementation class REDAdversary : public Adversary {... virtual void faultafterwrite(void * location, size_t s, Context * cnt) { if ((cnt!= NULL) && (cnt->tag == EDMATRIX_TAG)) { MatrixContext * m = (MatrixContext *)cnt; unsigned int * i = (unsigned int *)location; if (m->getindex(0) == 3) if (m->getindex(1) == 7) *i = *i +3; } }...

Thanks! Thank you for your attention!

References [AU91] [BM91] [FI04] S. Assaf and E. Upfal. Fault tolerant sorting networks. SIAM J. Discrete Math., 4(4):472 480, 1991. R. S. Boyer and J. S. Moore. Mjrty: A fast majority vote algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, pages 105 118, 1991. Irene Finocchi and Giuseppe F. Italiano. Sorting and searching in the presence of memory faults (without redundancy). In László Babai, editor, STOC, pages 101 110. ACM, 2004. [JNW08] B. L. Jacob, S. W. Ng, and D. T. Wang. Memory Systems: Cache, DRAM, Disk. [LM99] Morgan Kaufmann, 2008. F. T. Leighton and Y. Ma. Tight bounds on the size of fault-tolerant merging and sorting networks with destructive faults. SIAM J. Comput., 29(1):258 273, 1999. [LMP97] F. T. Leighton, Y. Ma, and C. G. Plaxton. Breaking the θ(n log 2 n) barrier for sorting with faults. J. Comput. Syst. Sci., 54(2):265 304, 1997. [Rén94] [Sah06] [Ula77] [YY85] A. Rény. A diary on information theory. J. Wiley and Sons, 1994. Original publication: Napló az információelméletröl, Gondolat, Budapest, 1976. G. K. Saha. Software based fault tolerance: a survey. Ubiquity, 7(25), 2006. S. M. Ulam. Adventures of a mathematician. Charles Scribner s Sons, New York, 1977. A. C. Yao and F. F. Yao. On fault-tolerant networks for sorting. SIAM J. Comput., 14(1):120 128, 1985.