libchop,alibraryfordistributed storage&databackup Ludovic Courtès <ludo@gnu.org> GNU Hackers Meeting, Den Haag, 24 July 2010
The Basics - inception & rationale - storing to remote, untrusted sites - library & tools for data storage The Mechanics The Plan
inception& rationale PhD in 2004 2007, LAAS-CNRS, France cooperative backup for the people! flexible & self-managed distributed storage storage to remote (untrusted) sites
storing to remote, untrusted sites dissemination of data fragments guarantees for data integrity, confidentiality, & authenticity
library& tools for data storage content-addressable storage ("deduplication") encryption via content-hash keys file chopping compression
from the command line $ chop-archiver --archive holiday-pic.jpg tree_indexer:hash_block_fetcher:hash_index_handle:64:sha1:onx5ee $ chop-archiver --restore \ tree_indexer:hash_block_fetcher:hash_index_handle:64:sha1:onx5ee \ > restored-pic.jpg
fromc chop_indexer_t *indexer; chop_chopper_t *chopper; chop_block_indexer_t *block_indexer; chop_block_store_t *store; chop_index_handle_t *index; /* Instantiate these things */ err = chop_indexer_index_blocks (indexer, chopper, block_indexer, store, store, index);
from Guile Scheme (define (archive-file file store) ;; Chop FILE and write the blocks to STORE. ;; Return an ASCII string denoting its index. (let* ((f (file-stream-open file)) (c (anchor-based-chopper-open f)) (bi (hash-block-indexer-open hash-method/sha1)) (i (tree-indexer-open 100))) (let ((index (indexer-index-blocks i c bi store store))) (index-handle-ascii-serialize index))))
The Basics The Mechanics - the storage pipeline - block stores - fixed-size chopper - content-based chopper - filters - tradeoffs The Plan
the storage pipeline block indexer data stream filter chopper stream indexer filter block store
block stores put (key, data); ring, can get (key); a byte string, can be anything
block store implementations gdbm, tdb, bdb, etc. file system (à la Git) easily transferred over HTTP, FTP, rsync, etc. remote (chop-block-server) ONC RPC, over TLS discovery using Avahi (chop-store-discover)
block indexer: content-addressable storage $ chop-archiver -i hash_block_indexer -I sha1 \ --archive britney.mp3 tree_indexer:hash_block_fetcher:hash_index_handle:64:sha1:7c4 x7e/42 sub-classes of block_fetcher and index_handle hash algo to address blocks & check integrity root block hash
block indexer: content-hash keys clear text aes256 cipher text sha256 H $ chop-archiver -i chk_block_indexer \ -I aes256,cbc,sha256,sha1 \ --archive naked.jpg :chk_index_handle:64:aes256,cbc,sha1:7c4 x7e,et4 vek/8a sha256 aes256 cipher algo & integrity check H sha1(cipher-text) key = sha256(clear-text)
content-addressable storage& compression single-instance storage single-instance
fixed-size chopper The quick brown fox jumps over the quick lazy The quick brown wildebeest jumps over the quick la
content-based chopper The quick brown fox jumps over the quick lazy The quick brown wildebeest jumps over the quick
filters block indexer data stream zlib bzip2 lzo GPG(?) chopper stream indexer zlib bzip2 lzo GPG(?) block store
tradeoffs 0.3 (for content-addressed source code revisions) compression ratio 0.2 0.1 content-defined blocks + zlib d blocks content-defined blocks + no compression 0.0 throughput
The Basics The Mechanics The Plan - (built-in) support for directories & versioning - a cooperative/p2p backup daemon - storage in GNU
(built-in) support for directories& versioning root root previous date 1188917920 previous date 1189430254 (head) /src/chbouib.c /src/chbouib.c /src/backup.scm /src/backup.scm /new-file.txt R 0 R 1 I 0 I 1 I 2 D 0 D 1 D 2 D 3 D 4
a cooperative/p2p backup daemon Backup directories versioning libchop Opportunistic Replication Service Discovery Avahi Storage Contribution put (key, data) Authentication GnuTLS Cooperation & Replication Policies File System Networking put
your idea here
storage in GNU tar cpio RCS? GNUnet Sharutils
Summary libchop is cool!
Summary mechanisms, not policy content-addressable storage, content-hash keys, etc. supports disseminated data blocks build distributed backup/storage tools!
Thanks! ludo@gnu.org http://www.nongnu.org/libchop/
Copyright 2010 Ludovic Courtès You may copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is available at http://www.gnu.org/licenses/gfdl.html.