Merkle Hash Trees for Distributed Audit Logs



Similar documents
Physical Data Organization

Energy Efficiency in Secure and Dynamic Cloud Storage

Using the Bitcoin Blockchain for secure, independently verifiable, electronic votes. Pierre Noizat - July 2014

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Security of Cloud Storage: - Deduplication vs. Privacy

SECURITY ENHANCEMENT OF GROUP SHARING AND PUBLIC AUDITING FOR DATA STORAGE IN CLOUD

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

File Management. Chapter 12

International Journal of Advanced Research in Computer Science and Software Engineering

Full and Complete Binary Trees

Binary Search Trees 3/20/14

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Merkle Hash Tree based Techniques for Data Integrity of Outsourced Data

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Secure cloud access system using JAR ABSTRACT:

Data storage Tree indexes

The Advantages and Disadvantages of Network Computing Nodes

How To Ensure Correctness Of Data In The Cloud

Electronic Contract Signing without Using Trusted Third Party

MANAGED FILE TRANSFER: 10 STEPS TO SOX COMPLIANCE

Digital Evidence Search Kit

Database Design Patterns. Winter Lecture 24

Preserving Data Privacy in Third Party Cloud Audit

Hash-based Digital Signature Schemes

Towards a compliance audit of SLAs for data replication in Cloud storage

Identifying Data Integrity in the Cloud Storage

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Hushmail Express Password Encryption in Hushmail. Brian Smith Hush Communications

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

HTTPS Configuration for SAP Connector

Analysis of Algorithms I: Binary Search Trees

DATABASE DESIGN - 1DL400

DarkFS - An Encrypted File System

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Converting a Number from Decimal to Binary

Algorithms and Data Structures

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: Security Note

EMC Documentum Repository Services for Microsoft SharePoint

Databases and Information Systems 1 Part 3: Storage Structures and Indices

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

File Management. Chapter 12

Binary Coded Web Access Pattern Tree in Education Domain

Configuring Security Features of Session Recording

Original-page small file oriented EXT3 file storage system

Enhanced certificate transparency and end-to-end encrypted mail

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)

Key Components of WAN Optimization Controller Functionality

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Implementing a Tamper-evident Database System

Client Side Binding of Dynamic Drop Downs

How To Ensure Correctness Of Data In The Cloud

RIGOROUS PUBLIC AUDITING SUPPORT ON SHARED DATA STORED IN THE CLOUD BY PRIVACY-PRESERVING MECHANISM

Symbol Tables. Introduction

Data Integrity Check using Hash Functions in Cloud environment

Making Bitcoin Exchanges Transparent

Secure Authentication and Session. State Management for Web Services

Design and Analysis of Methods for Signing Electronic Documents Using Mobile Phones

Fixity Checks: Checksums, Message Digests and Digital Signatures Audrey Novak, ILTS Digital Preservation Committee November 2006

Sorting Hierarchical Data in External Memory for Archiving

Verifiable Delegation of Computation over Large Datasets

Energy Optimal Cloud Storage and Access Methods for Temporal Cloud Databases

SQL Injection January 23, 2013

Entrust Certificate Services. Java Code Signing. User Guide. Date of Issue: December Document issue: 2.0

Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System

Persistent Binary Search Trees

BIRT Application and BIRT Report Deployment Functional Specification

Naming vs. Locating Entities

File System Management

CyberSource Payment Security. with PCI DSS Tokenization Guidelines

Using NXLog with Elasticsearch and Kibana. Using NXLog with Elasticsearch and Kibana

Windows 7 Security Event Log Format

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

PowerChute TM Network Shutdown Security Features & Deployment

.INFO Agreement Appendix 1 Data Escrow Specification (22 August 2013)

Big Data and Scripting. Part 4: Memory Hierarchies

Recovery System C H A P T E R16. Practice Exercises

AsicBoost A Speedup for Bitcoin Mining

Qlik REST Connector Installation and User Guide

CMSS An Improved Merkle Signature Scheme

Index Terms: Data integrity, dependable distributed storage, Cloud Computing

ZooKeeper. Table of contents

Transcription:

Merkle Hash Trees for Distributed Audit Logs Subject proposed by Karthikeyan Bhargavan Karthikeyan.Bhargavan@inria.fr April 7, 2015 Modern distributed systems spread their databases across a large number of partially untrusted hosts, to improve their availability to users located across the Internet. In many of these scenarios, the users accessing the data do not fully trust the hosts from which they download data, but they still desire strong integrity guarantees, that is, they want to be able to detect tampering. In this project, we are concerned with a particular kind of distributed database called an audit log. Audit logs are used to log security events in a distributed system, such as the granting of access to a sensitive resource. For example, when a hacker breaks into an enterprise network, his activities will be logged on machines across the network. To escape detection, the hacker may try to erase or tamper with the log, and hence the integrity of this log is security-critical. More generally, audit logs are used to maintain a shared history of events between distributed parties, where this history can only be appended to, but not modified. Popular examples of this pattern include the Bitcoin block chain 1 and the Certificate Transparency (CT) global certificate log. 2 How can a user verify the integrity of a part of a log that it has retrieved from an untrusted server? One solution is for some trusted servers to publish a cryptographically strong hash of the full log. So a user can download the log from different servers and then verify that its hash matches the expected value (retrieved from a trusted server). If the hash function is collision resistant, no malicious host can tamper with any part of the log without it being detected. Audit logs can get very large, and waiting to download the full log to verify the integrity of the few events a user may be interested in would consume a lot of redundant time and bandwidth. Hence, distributed auditing systems use an efficient data structure called Merkle trees to compute the hash of a log with n events such that the contents of an individual event can be verified in O(log n) time. The Merkle hash tree for a log with 8 events {e 1,..e 8 } is depicted below. In general, this construction works similarly for any database with n records: 1 http://bitcoin.org 2 http://certificate-transparency.org 1

h 1..8 = hash(h 1..4 h 5..8) h 5..8 = hash(h 5..6 h 7..8) h 5..6 = hash(h 5 h 6) h 7..8 = hash(h 7 h 8) h 1 = hash(e 1) h 2 = hash(e 2) h 5 = hash(e 5) h 6 = hash(e 6) h 7 = hash(e 7) h 8 = hash(e 8) The leaves are the hashes of each individual event e 1,..., e 8. These hashes are incrementally combined with the hashes of their neighbors as we travel up the tree. The root hash h 1..8 is effectively a hash of the whole database, computed using the hashes of the left and right subtrees. Audit Paths If it is used only to compute a hash of the full log, the Merkle tree is not particularly efficient, since it requires roughly 2n hashes to be computed, whereas computing a sequential hash over the database only requires n hashes (or more accurately, compressions) to be applied. The true advantage of the Merkle tree is when we want to prove that the i-th record has not been tampered with. Suppose a user has obtained the root hash h 1..8 from a trusted server and then retrieved e 4 from an untrusted host. To verify the integrity of e 4, the user needs the untrusted host to also send it the hashes [h 3, h 1..2, h 5..8 ], so that it can reconstruct the path leading from e 4 to the root hash, as depicted below: h 1..8 = hash(h 1..4 h 5..8) h 5..8 = hash(h 5..6 h 7..8) Hence, verifying that any given record r is in fact the 4th event in the log only requires transmitting and computing 3 hashes, not the full tree. Verifying any event in a log of n events only takes log n hashes. (Can you prove it?) This sequence of hashes is called an audit path Consistency Proofs Logs grow over time as new events occur. However, they are append-only, that is, events may not be modified, deleted, or inserted in the middle. Once an event has occured, its presence in the log must be preserved. Suppose a user already has the root hash of the log after n events, and now the log has been extended to m > n events. How can we prove to the user that the new log has only appended events to the old log, without the user needing to check the full Merkle trees? Consider the Merkle tree below corresponding to an earlier version of the log with n = 6 events. 2

h 1..6 = hash(h 1..4 h 5..6) h 5..6 = hash(h 5 h 6) h 5 = hash(e 5) h 6 = hash(e 6) h 1 = hash(e 1) h 2 = hash(e 2) To prove that the root hash h 1..8 for the log with m = 8 elements strictly extends the log with n = 6 elements, it is enough to provide the user with the three hashes (h 1..4, h 5..6, h 7..8 ). Using these hashes, the user can reconstruct both root hashes h 1..6 and h 1..8 and be assured that the second corresponds to a tree that extends the first. This sequence of hashes is called a consistency proof. Programming Tasks Our goal is to program an audit log server and an auditor who can check the log on behalf of the user. We treat logs as simple text files where each line corresponds to one event. The tasks below are written with Java in mind, but the project could just as easily be coded in OCaml. The concrete design presented below is taken from Certificate Transparency - RFC 6962. 3 Merkle Hash Tree Define a Java class that represents Merkle Trees. Each tree object represents one node in the tree. It has fields containing the hash of the current tree, pointers to the left and right subtrees, and two fields denoting the beginning and ending index of the range of the log they cover (e.g. h 5..8 covers records 5 to 8). The constructor for a leaf takes a string containing a single event of the log, converts it to bytes, prepends it by the single byte 0x00 and computes its hash. h i = SHA 256(0 x00 u t f 8 (e i ) ) The constructor for an internal node takes two Merkle trees for the left and right subtrees. It concatenates the hashes of its two subtrees, prepends them with 0x01, and hashes the result. h 1..8 = SHA 256(0 x01 h 1..4 h 5..8 ) The use of different byte-tags in the two hashes offers protection against some kinds of hash collisions. We can use the function SHA256 from java. security.messagedigest to compute each individual hash; for example, for a byte array ( leaf ) the hash can be computed as: MessageDigest d i g e s t = MessageDigest. g e t I n s t a n c e ( SHA 256 ) ; byte [ ] hash = d i g e s t. d i g e s t ( l e a f ) ; 3 https://tools.ietf.org/html/rfc6962 3

To convert a string (text) to a byte array, we can use text.getbytes( UTF 8 ). Write a function that takes an input text file and computes its Merkle tree (treating each line as a new event). Test your code on large (>1GB) text files. Log Server Implement a Log Server class that uses a Merkle tree to implement an event log. Define a constructor that reads a full log from a text file to construct the tree. Define a method that returns the current root hash. Define methods to append events to the log one at a time, or as a batch of events appended at once. Compare and discuss the complexity of these operations. Implement a method genpath that given the current Merkle tree of size n and any index i <= n returns the audit path for the i-th event, that is, the hashes needed to verify e i. For example, in the Merkle tree for 1..8 above, genpath(4) for e 4 would return the array of hashes [h 3, h 1..2, h 5..8 ]. Implement a method genproof that given the current Merkle tree of size n and any previous tree size m <= n returns the consistency proof that the current tree extends the previous tree. For example, in the Merkle tree for 1..8 above, genpoof(6) would return the array of hashes [h 7..8, h 5..6, h 1..4 ]. Auditor Implement an Auditor class that calls the Log Server and verifies its behavior on various inputs. Define a constructor that takes an instance of the Log Server as input and retrieves the current size and root hash for the log. Implement a method ismember that takes an event (as text) and verifies that it exists in the current log by calling genpath to obtain an audit path and verifying the hashes on the audit path: the path should begin with a hash of the event and end with the root hash, and all hashes on the path should be correctly computed. Note that this method does not use any knowledge of the full Merkle tree beyond its size and root hash. Test your code by verifying membership for various events. Implement a method isconsistent that retrieves a new size and root hash from the Log Server and verifies that the new log is consistent with the previous one. If the log has changed, the method calls genproof to get a consistency proof and verifies it using only the previous root hash and size and the current root hash and size. Evaluation The project will be evaluated on the basis of the clean design and implementation, the careful analysis of the complexity of various methods, and on the quality of tests that have been used to verify the solution. Both the report and the presentation should answer questions about the design choices made by the student. For example: What is the complexity of appending events, audit path generation and verification? How much memory or time does it take to verify k records? Can you experimentally confirm the time taken for various operations? Does this depend on the type of file, on the size of each event? At what size of the database does it become more efficient to use Merkle trees as opposed to a flat list of hashes? 4

For extra credit, can you deploy your Auditor and Log Server as independent processes communicating over the network? Can you use your Auditor to check that some public CT log is consistent? You can get the latest root hash in JSON format for the Log Server ct.googleapis.com/aviator from https://ct. googleapis.com/aviator/ct/v1/get-sth. You can get the consistency proof betwen the hash trees of size m and n by accessing https://ct.googleapis. com/aviator/ct/v1/get-sth-consistency?first=m&second=n. For other CT Log Servers, see http://www.certificate-transparency. org/known-logs. For other CT JSON APIs, see https://tools.ietf.org/ html/rfc6962#section-4. 5