Dynamic Programming in faulty memory hierarchies (cache-obliviously)

Similar documents
Resilient Dynamic Programming

Hardware-Aware AlgorithmsandDataStructures. Gabriel Moruz BRICS University of Aarhus

Thesis Proposal: Models and Algorithms with Asymmetric Read and Write Costs

Binary search tree with SIMD bandwidth optimization using SSE

Data Structure [Question Bank]

Exam study sheet for CS2711. List of topics

Data Structures and Algorithms

Big Data Systems CS 5965/6965 FALL 2015

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Division of Mathematical Sciences

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers

Using Data-Oblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Social Media Mining. Graph Essentials

Factoring Algorithms

I/O Management. General Computer Architecture. Goals for I/O. Levels of I/O. Naming. I/O Management. COMP755 Advanced Operating Systems 1

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Analysis of Algorithms, I

Big Data and Scripting. Part 4: Memory Hierarchies

System Aware Cyber Security

Apache Hama Design Document v0.6

How Efficient can Memory Checking be?

14.1 Rent-or-buy problem

Rethinking SIMD Vectorization for In-Memory Databases

Abstract Data Type. EECS 281: Data Structures and Algorithms. The Foundation: Data Structures and Abstract Data Types

Verifiable Delegation of Computation over Large Datasets

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sequential Data Structures

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Types of Workloads. Raj Jain. Washington University in St. Louis

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

NCPC 2013 Presentation of solutions

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Big Data Processing with Google s MapReduce. Alexandru Costan

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

1. What are Data Structures? Introduction to Data Structures. 2. What will we Study? CITS2200 Data Structures and Algorithms

How Efficient can Memory Checking be?

Side Channel Analysis and Embedded Systems Impact and Countermeasures

Timing of a Disk I/O Transfer

Vector storage and access; algorithms in GIS. This is lecture 6

Fast Multipole Method for particle interactions: an open source parallel library component

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

GraySort on Apache Spark by Databricks

A Survey and Analysis of Solutions to the. Oblivious Memory Access Problem. Erin Elizabeth Chapman

Lumousoft Visual Programming Language and its IDE

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination.

Big Data Interpolation: An Effcient Sampling Alternative for Sensor Data Aggregation

Chapter 11 I/O Management and Disk Scheduling

Graph Database Proof of Concept Report

CS 300 Data Structures Syllabus - Fall 2014

University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

INTERSECTION OF LINE-SEGMENTS

Load balancing Static Load Balancing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

Cuckoo Filter: Practically Better Than Bloom

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

Analyzing the Facebook graph?

Parallel Computing for Data Science

Data on Kernel Failures and Security Incidents

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Factoring. Factoring 1

Practical Performance Understanding the Performance of Your Application

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Computer Science Education Based on Fundamental

COMPUTER SCIENCE (5651) Test at a Glance

Prototyping Faithful Execution in a Java Virtual Machine

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

THE NAS KERNEL BENCHMARK PROGRAM

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

CS Introduction to Data Mining Instructor: Abdullah Mueen

Onion ORAM: Constant Bandwidth ORAM with Server Computation Chris Fletcher

Broadcasting in Wireless Networks

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Big Data With Hadoop

Chapter 8: Bags and Sets

Operating Systems. Virtual Memory

Application Design and Development

Model based testing tools. Olli Pekka Puolitaival

Course: Model, Learning, and Inference: Lecture 5

Cache Oblivious Search Trees via Binary Trees of Small Height

DATA STRUCTURES USING C

Topics in Computer System Performance and Reliability: Storage Systems!

1. Product Information

Test Specification. Introduction

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Online Backup Client User Manual Linux

CS473 - Algorithms I

HARDWARE AND SOFTWARE COMPONENTS Students will demonstrate an understanding of the relationship between hardware and software in program execution.

GameTime: A Toolkit for Timing Analysis of Software

Transcription:

Dynamic Programming in faulty memory hierarchies (cache-obliviously) Saverio Caminiti, Irene Finocchi, Emanuele G. Fusco Sapienza University of Rome Francesco Silvestri University of Padua

Memory fault One or more bits is read differently from how were last written Hardware problems Due to Transient electronic noises Impact Machine crash Unpredictable output Security vulnerabilities Meeting AlgoDEEP - Rome - July 14-15, 2011 2

Faulty RAM model Based on the unit cost RAM model Adversary Adaptive Unbounded computational power Can corrupt up to d words (at any time) Small safe memory (program code, registers, ) Small private memory (random primes and derivatives) [Finocchi & Italiano, 2004] [Brodal, Jørgensen & Mølhave, 2009: Faulty external memory model] Meeting AlgoDEEP - Rome - July 14-15, 2011 3

Known results Sorting [it] Dictionaries [it+dk] Priority queues [dk] Counting [dk+de] K-d Trees [de] Dynamic data structures [us(mit)] Correctness typically relaxed: correct (only) on uncorrupted data Local-dependency dynamic programming [it] Meeting AlgoDEEP - Rome - July 14-15, 2011 4

Dynamic programming Local dependency problems Edit distance Longest Common Subsequence etc. DP table E.g., ED (correct w.h.p.) in O(n 2 + d 2+e ) time Support well-known optimization techniques [Caminiti, Finocchi, Fusco, 2011] Meeting AlgoDEEP - Rome - July 14-15, 2011 5

Latest results Extend class of DP to triply nested loop (GEP) All-pairs shortest paths (Floyd-Warshall), Matrix multiplication, Gaussian elimination, LU decomposition without pivoting, etc. Local dependency DP and FFT Meeting AlgoDEEP - Rome - July 14-15, 2011 6

A recursive approach Semi-resilient variables with decreasing resiliency levels Insert and extract operations δ resilient δ/2 resilient extract extract Write fingerprint ϕ Read fingerprint ρ Write fingerprint ϕ Based on Chowdhury & Ramachandran, SODA 2006 δ/4 resilient Irene Finocchi ADS - Bertinoro, June 2011 7

Karp-Rabin fingerprints Let A = < a 0, a 1,, a n-1 > be a vector: p: random prime number w: memory word size Can be incrementally computed while writing/reading A in O(1) private memory Fingerprint mismatch recomputation Irene Finocchi ADS - Bertinoro, June 2011 8

Recursion in faulty RAM The adversary can corrupt the recursion stack unless it is maintained in safe/private memory bounded recursion depth λ λ matrix decomposition: Resiliency decreases by a factor of λ at each call Subproblems solved in Z-order Meeting AlgoDEEP - Rome - July 14-15, 2011 9

Implications λ λ matrix decomposition has non trivial implications on fault detection: 1. Lazy fault detection on extract operations recursion on corrupted data 2. Out-of-order fingerprints read/write data access patterns are different but regular maintain O(1) amortized update time (involves exponentiations) 1 2 5 6 1 3 2 4 5 7 6 8 3 9 4 10 7 13 8 14 9 11 10 12 13 15 14 16 11 12 15 16 Meeting AlgoDEEP - Rome - July 14-15, 2011 10

Local dependency DP bounds Running time: O(n 2 + δn f ) Cache misses: O(n 2 /(MB) + δn f /B) Lower bound: Ω(n 2 /(MB) + δn /B) f = log n if private memory is Ω(log n); f = n ε otherwise M = cache size B = cache line size The algorithm is cache-oblivious Non-resilient lower bound Resilient input and output Irene Finocchi ADS - Bertinoro, June 2011 11

GEP bounds Running time: O(n 3 + δn 2 f ) Cache misses: O(n 3 /(B M) + δn 2 f /B) Lower bound: Ω(n 3 /(B M) + δn 2 /B) f = log n if private memory is Ω(log n); f = n ε otherwise M = cache size B = cache line size The algorithm is cache-oblivious Non-resilient lower bound Resilient input and output Irene Finocchi ADS - Bertinoro, June 2011 12

The End

Hidden details Traceback computation less regular than forward Data possibly not accessed during traceback Force unnecessary reads to update fingerprints correctly If l = w(1), cannot store all the horizontal and vertical boundaries (poor spatial locality) Recycle space appropriately and obtain missing data, when needed, by repeating forward computations Data accessed more than once (even w(1) times): should appear in fingerprints with different exponents: amplified fingerprints Irene Finocchi ADS - Bertinoro, June 2011 14

Hierarchical faulty RAM Main faulty memory is as usual in hierarchical memory models (cache oblivious model considers only two levels) Safe/private memory either single level (if it is small enough) or hierarchical tied to the main memory. In this case the private cache line size should be reasonable (e.g., asymptotically comparable with B) otherwise private cache missies would dominate. Meeting AlgoDEEP - Rome - July 14-15, 2011 15