Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud
Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data De-duplication and Data Transmission Metadata Server Communication with Clients, Global Fingerprint Lookup and Store, and Highly Scalable Cluster of Metadata Servers Demo Preliminary experimental results Development status 2
Motivation Yuruware needs incremental backup in the cloud Cloud storage providers High reliability and scalability at low cost Ultra large-scale storage space 905 billion objects in Amazon S3, Q1/2012 Customers Backup and restore progressive data within short time Backup up to petabytes of data in total To build a large-scale cloud backup system System scalability Storage efficiency Backup and restoration performance NICTA Copyright [1] http://aws.typepad.com/aws/2012/04/amazon-s3-905-billion-objects-and-650000-requestssecond.html 2010 3
The Architecture of YuruBackup To increase scalability to accommodate PB-scale data To improve space efficiency to reduce costs To save bandwidth to adapt to the low bandwidth of WAN Metadata of PB-scale data Backup Agent Write master Source-side De-duplication PB-scale space A cluster of metadata servers Target-side De-duplication slave Metadata Agent slave Snapshots Cloud Storage Read Read RPC, parallel transmission, data/metadata separation 4
Storage Hierarchy Snapshot A virtual file Collection Block Chunk Snapshot A Snapshot B Collection Block Chunk 5
Mapping blocks from memory to disk A block <collectionuuid, blockno, checksum, start, length> Components Memory Block, Block Proxy and TAR Store Memory Block... Memory Block... Memory Block In Memory Block Proxy TAR Store In Disk Collection...... Collection Collection 6
The Flow Chart of Backup Process Create DB connection to metadata catalog Initialize the TAR store T Initialize the Metadata Manager Scan a directory to get a file list The file list is empty? Yes Release the Metadata Manager Release the TAR store Close DB connection to metadata catalog No Remove a file and write its incremental backup into T T s size >= a given size? Yes Write T into disk and clear it No 7
Backup Client It provides a functional interface to users. Backup and restoration To reduce I/O requests Read/Write Buffer To locate items Compressed BF Berkeley DB Source-side dedup CD Chunking Transmission Batched RPC Parallel uploading 8
Source-side de-duplication Rabin s Fingerprinting Given a string A = a m a m-1 a 1 A k-bit Rabin fingerprint is computed as follows: m 1 m 2 Let, A( t) a t a t a t a m m 1 Choose an irreducible polynomial P(t) P k k 1 ( t) pkt pk 1t p0 Compute Rabin s fingerprint f(a) f ( A) A( t) mod P( t) Content-defined Chunking (SOSP 01) low_order(f, k) = c 2 1 C 1 C 2 C 3... [1] Muthitacharoen A, Chen B, Maziéres D. A low-bandwidth network file system. In: Proc. of the 18th ACM Symp. on Operating System Principles (SOSP 2001). New York: ACM Press, 2001. 174 187. w 9
Duplication Detection based on Bloom filter Observations Most files are never changed after their creations (ATC 04) Over 2/3 of files have not been modified (FAST 07) Index Summary based on Compressed BF(ACM 70, PODC 01) Approximate set membership problem Trade-off between space and false positive probability Three functions 1) Initialize(initElementCount, desiredfpp) 2) Insert(fingerprint) 3) Lookup(fingerprint) [1] Burton H. Bloom. Space/time trade-o s in hash coding with allowable errors. ACM Communications, 13(7), 1970. [2] Mitzenmacher. Compressed Bloom Filters. In Twentieth ACM Symposium on Principles of Distributed Computing, August 2001. 10
Metadata Server Communication with Clients A single, batched and asynchronous lookup RPC for n FPs The callback function enqueues the updated request Global FP Lookup and Store Global Index Summary Global target-side deduplication FP Lookup FP Store 11
Highly Scalable Cluster of MDSs SQL Nodes with NDB...... YuruBackup Clients...... Load Balancer DataNodes Slaves...... Masters SQL Nodes with NDB+InnoDB Data replication To make reads scalable MySQL replication Failover Data partitioning To make writes scalable MySQL cluster Read Write Replication Load balancing To aware of which nodes are readable and writable 12
Demo of YuruBackup Chunk Partition Duplication Detection 13
An example of a snapshot (5 new blocks) B 1 B 3 B 5 B 7 B 12 14
An example of incremental backup emacs-23.2a emacs-23.3a 15
Comparison ReducedRatio = Datasets Hbase 48.0 68.7 (97.5) 28.0 1.3 3,462 4,375 59.25 16 Average 92.14 114.3 (162.8) 62.41 3.75 5,144 17,308 47.79 Nonoverlap data size (MB) # BytesSentByRsync - # BytesOfData - # BytesOfMetadata # BytesSentByRsync rsync Transferred data size (MB) Transferred data size (MB) Table 1. Dataset YuruBackup # chunks Data Metadata # old chunks # new chunks Emacs 140.2 155.7 (155.9) 60.4 1.6 15,731 11,484 61.23 Eclipse 234.4 233.0 (234.9) 220.3 1.1 277 84,317 5.53 GCC 107.8 94.7 (428.6) 37.8 23.8 12,386 9,659 60.05 Hadoop-src 93.0 210.8 (214.1) 57.5 2.4 5,365 15,420 72.73 Hadoop-bin 37.2 110.1 (110.5) 27.8 0.2 656 10,489 74.71 Lucene-src 17.1 14.8 (64.8) 6.2 1.2 296 1,590 58.02 Lucene-bin 143.1 153.9 (156.4) 132.6 2.8 2,191 26,200 13.79 Hive-src 94.9 94.1 (144.0) 48.0 3.0 11,072 7,885 48.94 Hive-bin 5.7 7.2 (21.7) 5.5 0.1 0 1,660 23.62 (%) 16
Others YuruBackup is deployed atop Amazon S3 metadata servers are running in EC2 will be deployed in other cloud platforms Performance evaluation De-duplication Efficiency De-duplication Overhead Scalability Backup Window Fine-granularity Restoration, etc. 17
Current Development Status Program directories (~12,000 LOC) include: header files, ~1,200 LOC src: source files, ~5,200 LOC 18
Thank you! Q&A
Dataset OverlapRatio = OverlapDataSize TransferredDataSize Emacs eclipse gcc Hadoopsrc Hadoopbin Objects # Files Data size (MB) 23.2a 4,321 155.4 23.3a 4,331 155.9 galileo 2,587 225.9 Helios-SR2 2,754 234.9 4.6.0 71,103 427.2 4.6.1 71,376 428.6 0.20.204.0 5,811 208.0 0.20.205.0 6,004 214.1 0.20.204.0 507 105.0 0.20.205.0 538 110.5 # Overlap Files Overlap data size (MB) (%) 957 15.7 (10.09) 33 0.5 (0.21) 70,545 320.8 (74.86) 3,246 121.1 (56.56) 429 73.3 (66.36) 20
Dataset lucenesrc Lucenebin Hive-src Hive-bin Objects # Files Data size (MB) 3.3.0 2,644 62.4 3.4.0 2,956 64.8 3.3.0 6,520 136.9 3.4.0 7,150 156.4 0.7.0 7,934 143.7 0.7.1 7,976 144.0 0.7.0 280 21.6 0.7.1 295 21.7 hbase 0.90.3 3,428 97.2 0.90.4 3,444 97.5 Linux shell: diff urnas v1 v2 # Overlap Files Overlap data size (MB) (%) 2,226 47.7 (73.58) 208 13.3 (8.51) 3,720 49.1 (34.10) 257 16.0 (73.88) 1,477 49.6 (50.81) Return 21
The rsync Algorithm f.old f.new A 2. A sends the checksums to B 4. B tells A how to construct file f.new from f.old and the literal data. B 1. A computes the checksum of each block S i in file f.old 3. B searches the file f.new and find the difference between f.old and f.new. The checksum consist of rolling 32-bit checksums (adler-32 checksum) and a 128-bit MD4 checksum. Return 22