From bsdtar to tarsnap

Similar documents

Crowdsourcing security

Peer-to-peer Cooperative Backup System

1 Step 1: Select... Files to Encrypt 2 Step 2: Confirm... Name of Archive 3 Step 3: Define... Pass Phrase

Chapter 9 Key Management 9.1 Distribution of Public Keys Public Announcement of Public Keys Publicly Available Directory

Using BroadSAFE TM Technology 07/18/05

File Systems Management and Examples

SENSE Security overview 2014

Complying with PCI Data Security

Acronis Backup & Recovery: Events in Application Event Log of Windows

New Technologies File System (NTFS) Priscilla Oppenheimer. Copyright 2008 Priscilla Oppenheimer

File System Forensics FAT and NTFS. Copyright Priscilla Oppenheimer 1

Key Management Interoperability Protocol (KMIP)

FL EDI SECURE FTP CONNECTIVITY TROUBLESHOOTING GUIDE. SSL/FTP (File Transfer Protocol over Secure Sockets Layer)

Analyzing the Security Schemes of Various Cloud Storage Services

Client Server Registration Protocol

Out of Harms Reach -A Whitepaper on Online Backup

Gladinet Cloud Backup V3.0 User Guide

RFG Secure FTP. Web Interface

CrashPlan Security SECURITY CONTEXT TECHNOLOGY

Zmanda Cloud Backup Frequently Asked Questions

CS 356 Lecture 25 and 26 Operating System Security. Spring 2013

DarkFS - An Encrypted File System

Talk With Someone Live Now: (760) One Stop Data & Networking Solutions PREVENT DATA LOSS WITH REMOTE ONLINE BACKUP SERVICE

SecureDoc Disk Encryption Cryptographic Engine

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Network Security - ISA 656 Security

Bit Chat: A Peer-to-Peer Instant Messenger

DRAFT Standard Statement Encryption

bup: the git-based backup system Avery Pennarun

Communication Security for Applications

COS 318: Operating Systems

efolder White Paper: The Truth about Data Integrity: 5 Questions to ask your Online Backup Provider

commodity grid computing with Amazon s S3 and EC2

File System Design and Implementation

Connected from everywhere. Cryptelo completely protects your data. Data transmitted to the server. Data sharing (both files and directory structure)

Introweb Remote Backup Client for Mac OS X User Manual. Version 3.20

OPENID AUTHENTICATION SECURITY

Overview. SSL Cryptography Overview CHAPTER 1

12/3/08. Security in Wireless LANs and Mobile Networks. Wireless Magnifies Exposure Vulnerability. Mobility Makes it Difficult to Establish Trust

CRYPTOGRAPHY IN NETWORK SECURITY

HOW ENCRYPTION WORKS. Introduction to BackupEDGE Data Encryption. Technology Overview. Strong Encryption BackupEDGE

1. Product Information

Online Backup Client User Manual Linux

BookKeeper overview. Table of contents

Online Backup Client User Manual

Einführung in SSL mit Wireshark

Google File System. Web and scalability

IPsec Details 1 / 43. IPsec Details

Secure Shell SSH provides support for secure remote login, secure file transfer, and secure TCP/IP and X11 forwarding. It can automatically encrypt,

Security Engineering Part III Network Security. Security Protocols (I): SSL/TLS

SSL A discussion of the Secure Socket Layer

SAS Data Set Encryption Options

7 Network Security. 7.1 Introduction 7.2 Improving the Security 7.3 Internet Security Framework. 7.5 Absolute Security?

Web Application Hacking (Penetration Testing) 5-day Hands-On Course

Secure Sockets Layer (SSL ) / Transport Layer Security (TLS) Network Security Products S31213

Message Authentication Codes

Criteria for web application security check. Version

RecoveryVault Express Client User Manual

Using etoken for SSL Web Authentication. SSL V3.0 Overview

Hypertable Architecture Overview

Semantic based Web Application Firewall (SWAF V 1.6) Operations and User Manual. Document Version 1.0

GNUTLS. a Transport Layer Security Library This is a Draft document Applies to GnuTLS by Nikos Mavroyanopoulos

Network Security Essentials Chapter 5

Network Security. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross 8-1

!!!! Memeo C1 Security !!!!!!!!!!! Bret Savage, CTO. October Memeo Inc. All rights reserved Memeo Inc. All rights reserved.

How SafeVelocity Improves Network Transfer of Files

Secure Programming and Source-Code Reviews - In the UNIX Environment. Thomas Biege <thomas@suse.de>

SecureCom Mobile s mission is to help people keep their private communication private.

Online Backup Linux Client User Manual

Online Backup Client User Manual

: Network Security. Name of Staff: Anusha Linda Kostka Department : MSc SE/CT/IT

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: Security Note

How To Protect Your Data From Being Hacked On Security Cloud

WS_FTP Professional 12. Security Guide

SECURE SOCKETS LAYER (SSL) SECURE SOCKETS LAYER (SSL) SSL ARCHITECTURE SSL/TLS DIFFERENCES SSL ARCHITECTURE. INFS 766 Internet Security Protocols

Back2Cloud: A highly Scalable and Available Remote Backup Service

Chapter 7 Transport-Level Security

A Deduplication File System & Course Review

Overview of Public-Key Cryptography

Simple Storage Service (S3)

Authentication applications Kerberos X.509 Authentication services E mail security IP security Web security

CRYPTOGRAPHY AS A SERVICE

Reliable log data transfer

Distributed File Systems

Spreed Keeps Online Meetings Secure. Online meeting controls and security mechanism.

3.2: Transport Layer: SSL/TLS Secure Socket Layer (SSL) Transport Layer Security (TLS) Protocol

Project #2: Secure System Due: Tues, November 29 th in class

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Web Security Considerations

libchop,alibraryfordistributed storage&databackup

FL EDI SECURE FTP CONNECTIVITY TROUBLESHOOTING GUIDE. SFTP (Secure File Transfer Protocol)

INTERNET SECURITY: FIREWALLS AND BEYOND. Mehernosh H. Amroli

Transcription:

Building an online backup service Colin Percival Tarsnap Backup Inc. cperciva@tarsnap.com September 28, 2013

Who am I? A computer scientist. Publications in computational mathematics, numerical analysis, data compression, and cryptography. A FreeBSD developer (since 2004). FreeBSD Security Officer from August 2005 to May 2012. Maintainer of the FreeBSD/EC2 platform. Author of some small UNIX utilities: bsdiff, freebsd-update, portsnap, scrypt, spiped, and kivaloo. The Tarsnap guy. This is my day job.

What is Tarsnap? Tarsnap is online backups for the truly paranoid. Data is encrypted and signed with keys held at the client side. You have the source code audit it, please! Tarsnap is online backup for UNIX-like operating systems, with a tar-line command line: # tarsnap -cf home-2013-09-28 --exclude *.core /home Tarsnap is snapshotted deduplicated storage. Tarsnap is a pay-as-you-go service: 300 picodollars / byte-month of storage. 300 picodollars / byte of bandwidth.

How to build tarsnap (overview) 1. Start with bsdtar...... and libarchive, which does most of the work. 2. Deduplicate archive data. 2.1 Split data into blocks. 2.2 Reference-count the blocks. 3. Add cryptography. 3.1 Encrypt all the blocks. 3.2 Sign the archive. 3.3 Sign all the blocks. 4. Upload the new data...... while the archive is being created. 5. Have a server which puts the data somewhere safe...... and can find it for you again when you need it!

1. Start with bsdtar In February 2004, Tim Kientzle imported libarchive into the FreeBSD src tree. Two months later, he added bsdtar. libarchive:gnutar :: llvm:gcc libarchive is clean, well-organized, designed as a reusable library, and BSD licensed. bsdtar is a front-end to libarchive 5% of the size, and mostly concerned with parsing command-line options. Until October 2008, all libarchive development happened in the FreeBSD src tree.

Working with libarchive Set archive format: archive write set format pax Read and write headers: archive write header, archive read next header Read and write file data: archive write data, archive read data Skip file data: archive read data skip Create a new archive: archive write open(... open callback, write callback, close callback) Read an archive: archive read open2(..., open callback, read callback, skip callback, close callback)

2.1. Split data into blocks For deduplication in a filesystem, splitting data into fixed-size blocks works well: Theq uick brow nfox jump sove rthe lazy dog. Theq uick brow nfox jump sove rthe EDIT dog. Even if writes aren t aligned it isn t too bad: Theq uick brow nfox jump sedi Tthe lazy dog. With archives you need to handle files being added or deleted in the middle however: Theq uick foxj umps over thel azyd og. Solution: Use variable-sized blocks constructed by splitting at content-dependent points: Th quickbrow nf oxjumpsov erth elazy dog. Th quickf oxjumpsov erth elazy dog.

Data-dependent splitting Simple method: Split before byte x n if H(x n k,x n k+1,...x n 2,x n 1 ) = 0 for a convenient hash function H and a fixed k. If you use the rolling hash function H(x 0...x k 1 ) = i x i α i mod p the H() values can be computed in two modular multiplications and two modular additions per byte processed. This yields an equal probability of splitting at each position and an exponential distribution of block sizes.

Exponential block size distribution Relative probability 1 0.8 0.6 0.4 0.2 0 0 1 2 3 Block size / average block size

Data-dependent splitting Improved method: Split before byte x n if H(x n k,x n k+1,...x n 2,x n 1 ) = 0 for a convenient hash function H and any value of k. Trick: Use the same function H(x 0...x k 1 ) = i x i α i mod p, compute the values H(x 0...x n 1 ), and store them in a hash table. Split if we find values such that H(x 0...x m 1 ) = H(x 0...x n 1 ).

Data-dependent splitting α n H(x 0...x m 1 ) = H(x 0...x n 1 ) x i α i = x i α i i=0...m 1 i=n...m 1 i=n...m 1 i=n...m 1 x i α i = 0 x i α i n = 0 x i α i n = 0 H(x n...x m 1 ) = 0 i=0...n 1

Improved block size distribution Relative probability 1 0.8 0.6 0.4 0.2 0 0 1 2 3 Block size / average block size

2.2 Reference-count the blocks The splitting process converts an archive into a sequence of blocks plus an index. Blocks are identified by their HMAC-SHA256 values. The index is identified by the HMAC-SHA256 of the archive name. The client keeps a cache directory locally which contains block HMACs and reference counts. Required for creating or deleting archives, but not for reads. Updated atomically when an archive creation or deletion is committed. Can be regenerated by a fsck operation which reads the block indexes for each archive. The server has no idea how many archives use a particular block or when a new archive is re-using an existing block.

3. Cryptography File data, tar headers, the block index, archive metadata should all be kept confidential. Everything making up an archive (including its name) is part of an encrypted block with an ID generated via HMAC-SHA256. Don t necessarily want someone who can write archives to be able to read them. Blocks are encrypted with an ephemeral 256-bit AES key, which is itself encrypted using a 2048-bit RSA key. Don t want anyone to be able to tamper with archives (even blindly). The block index contains HMACs of blocks, the archive metadata contains the hash of the index, and the metadata is signed with a 2048-bit RSA key.

3. Cryptography Encrypting blocks and signing archives is sufficient for cryptographic security, but not sufficient for real-world security. Block data is compressed using the deflate algorithm. The HMACs used to verify blocks cover the uncompressed data. If a block is tampered with, we wouldn t know until after decompression. zlib has a long history of vulnerabilities! To protect against an attacker who has a zlib exploit and can tamper with our backups, we append a physical HMAC to the end of each block.

4. Networking Tarsnap uses a custom request-response protocol: Read block, write block, delete block. List blocks. Start a transaction and cancel any ongoing transaction. Commit transaction if it has not been committed or cancelled. Transactions are necessary to handle client crashes without being left with orphaned data. All operations are idempotent running them twice has the same effect as running them once. This makes it safe for the client to retry any failed requests. This allows the server to drop connections arbitrarily and rely on the client handling the failure.

4. Networking Requests and responses are signed with read, write, or delete HMAC keys. Using separate keys makes it possible to have a server which can upload data but cannot tamper with existing backups. These are the only client keys possessed by the server. Everything is wrapped inside a cryptographic transport layer which is similar to SSL using an RSA DH certificate, except without all the certificate authority nonsense. The server public key is distributed with the tarsnap source code.

4. Networking As tarsnap archives are generated, they are streamed to the server. Obvious strategy: Have one thread generating the archive and another thread doing networking. Anyone who s done serious work with concurrent systems knows that they are actively malicious. Robert Watson Instead, I decided to use non-blocking networking. Using a framework like libevent would probably have been a good idea, but instead I wrote my own. Sprinkled throughout the tar code I have go do non-blocking networking stuff calls. If too many requests are pending, we block until the networking catches up with the tar code.

5. Server Tarsnap stores data in Amazon S3 by synthesizing a log-structured filesystem. Amazon advertises that S3 is designed to provide 99.999999999% durability. Incoming requests are logged to S3 before responses are sent back to the tarsnap client. The Tarsnap server code runs in Amazon EC2 and keeps a local cache of metadata. (machine X, block Y) (S3 object A, offset B, length C) S3 is the ultimate source of truth. The Tarsnap server runs a continuous cleaning process which reads data from S3 and writes it back minus any deleted blocks. This is why Tarsnap uses EC2: S3 EC2 bandwidth is free, and cleaning uses a lot of bandwidth.

Availability Tarsnap source code (not open source licensed): http://www.tarsnap.com Open source spin-off work: scrypt key derivation function: http://www.tarsnap.com/scrypt.html spiped secure pipe daemon: http://www.tarsnap.com/spiped.html kivaloo data store: http://www.tarsnap.com/kivaloo.html FreeBSD/EC2: http://www.daemonology.net/freebsd-on-ec2/ Questions?