Deduplication with Block-Level Content-Aware Chunking for Solid State Drives (SSDs)



Similar documents
Channel Assignment Strategies for Cellular Phone Systems

A Holistic Method for Selecting Web Services in Design of Composite Applications

Performance Analysis of IEEE in Multi-hop Wireless Networks

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

Static Fairness Criteria in Telecommunications

Price-based versus quantity-based approaches for stimulating the development of renewable electricity: new insights in an old debate

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Unit 12: Installing, Configuring and Administering Microsoft Server

Computer Networks Framing

Weighting Methods in Survey Sampling

Sebastián Bravo López

A Keyword Filters Method for Spam via Maximum Independent Sets

A Data De-duplication Access Framework for Solid State Drives

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

Open and Extensible Business Process Simulator

Capacity at Unsignalized Two-Stage Priority Intersections

Using Live Chat in your Call Centre

Effects of Inter-Coaching Spacing on Aerodynamic Noise Generation Inside High-speed Trains

Improved SOM-Based High-Dimensional Data Visualization Algorithm

A novel active mass damper for vibration control of bridges

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

Granular Problem Solving and Software Engineering

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

Procurement auctions are sometimes plagued with a chosen supplier s failing to accomplish a project successfully.

Chapter 5 Single Phase Systems

An Enhanced Critical Path Method for Multiple Resource Constraints

A Three-Hybrid Treatment Method of the Compressor's Characteristic Line in Performance Prediction of Power Systems

REDUCTION FACTOR OF FEEDING LINES THAT HAVE A CABLE AND AN OVERHEAD SECTION

A Context-Aware Preference Database System

Parametric model of IP-networks in the form of colored Petri net

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

) ( )( ) ( ) ( )( ) ( ) ( ) (1)

Chapter 1 Microeconomics of Consumer Theory

Intelligent Measurement Processes in 3D Optical Metrology: Producing More Accurate Point Clouds

protection p1ann1ng report

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

AUDITING COST OVERRUN CLAIMS *

Computational Analysis of Two Arrangements of a Central Ground-Source Heat Pump System for Residential Buildings

THE PERFORMANCE OF TRANSIT TIME FLOWMETERS IN HEATED GAS MIXTURES

Green Cloud Computing

Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

A Comparison of Service Quality between Private and Public Hospitals in Thailand

Findings and Recommendations

Trade Information, Not Spectrum: A Novel TV White Space Information Market Model

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD)

Impedance Method for Leak Detection in Zigzag Pipelines

A DESIGN OF A FAST PARALLEL-PIPELINED IMPLEMENTATION OF AES: ADVANCED ENCRYPTION STANDARD

Suggested Answers, Problem Set 5 Health Economics

Robust Classification and Tracking of Vehicles in Traffic Video Streams

The Application of Mamdani Fuzzy Model for Auto Zoom Function of a Digital Camera

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

Discovering Trends in Large Datasets Using Neural Networks

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization

Supply chain coordination; A Game Theory approach

The assignment of chunk size according to the target data characteristics in deduplication backup system

Algorithm of Removing Thin Cloud-fog Cover from Single Remote Sensing Image

Bandwidth Allocation and Session Scheduling using SIP

On the Characteristics of Spectrum-Agile Communication Networks

Recovering Articulated Motion with a Hierarchical Factorization Method

Optimal Allocation of Filters against DDoS Attacks

Deadline-based Escalation in Process-Aware Information Systems

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

A Game Theoretical Approach to Gateway Selections in Multi-domain Wireless Networks

Optimal Sales Force Compensation

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

i_~f e 1 then e 2 else e 3

Programming Basics - FORTRAN 77

protection p1ann1ng report

Picture This: Molecular Maya Puts Life in Life Science Animations

hybridfs: Integrating NAND Flash-Based SSD and HDD for Hybrid File System

WORKFLOW CONTROL-FLOW PATTERNS A Revised View

OpenScape 4000 CSTA V7 Connectivity Adapter - CSTA III, Part 2, Version 4.1. Developer s Guide A31003-G9310-I D1

Electrician'sMathand BasicElectricalFormulas

A Reputation Management Approach for Resource Constrained Trustee Agents

MATE: MPLS Adaptive Traffic Engineering

Impact Simulation of Extreme Wind Generated Missiles on Radioactive Waste Storage Facilities

Microcontroller Based PWM Controlled Four Switch Three Phase Inverter Fed Induction Motor Drive

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE

A Comparison of Default and Reduced Bandwidth MR Imaging of the Spine at 1.5 T

Chapter 1: Introduction

2LGC: An Atomic-Unit Garbage Collection Scheme with a Two-Level List for NAND Flash Storage

Retirement Option Election Form with Partial Lump Sum Payment

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

Optimal Online Buffer Scheduling for Block Devices *

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms

Online Energy Generation Scheduling for Microgrids with Intermittent Energy Sources and Co-Generation

Melbourne Docklands ESD Guide

Table of Contents. Appendix II Application Checklist. Export Finance Program Working Capital Financing...7

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

Pattern Recognition Techniques in Microarray Data Analysis

Earthquake Loss for Reinforced Concrete Building Structures Before and After Fire damage

WATER CLOSET SUPPORTS TECHNICAL DATA

Transcription:

23 IEEE International Conferene on High Performane Computing and Communiations & 23 IEEE International Conferene on Embedded and Ubiquitous Computing Dedupliation with Blok-Level Content-Aware Chunking for Solid State Drives (SSDs) Jin-Yong Ha, Young-Sik Lee, and Jin-Soo Kim College of Information and Communiation Engineering, Sunkyunkwan University, Suwon, South Korea Email: jinyongha@sl.skku.edu and jinsookim@skku.edu Department of Computer Siene, KAIST, Daejeon, South Korea Email: yslee@alab.kaist.a.kr Abstrat A ritial weak point of Solid State Drives (SSDs) is the limited lifespan whih stems from the harateristis of NAND flash memory. In this paper, we propose a new dedupliation sheme alled blok-level ontent-aware hunking to extend the lifetime of SSDs. The proposed sheme divides the data within a fixed-size blok into a set of variable-sized hunks based on its ontents and avoids storing dupliate opies of the same hunk. Our evaluations on a real SSD platform show that the proposed sheme improves the average dedupliation rate by 77% ompared to the previous blok-level fixed-size hunking sheme. Additional optimizations redue the average memory onsumption by 39% with a.4% gain in the average dedupliation rate. I. INTRODUCTION Solid State Drives (SSDs) based on NAND flash memory are quikly gaining popularity. In ontrast to hard disk drives (HDDs), SSDs have several advantages suh as high performane, shok resistane, and low power onsumption. However, one of the ritial weak points of SSDs is the limited lifespan. NAND flash memory requires erase operations to overwrite data into an already programmed area. As NAND flash memory ells are gradually worn out by repeated program/erase yles, the life expetany of SSDs is limited by write traffi, i.e., the amount of data written to NAND flash memory. The lifespan of SSDs is getting worse as the NAND tehnology moves from MLC (Multi-Level Cell) to TLC (Triple-Level Cell) to redue the ost per bit []. Dedupliation is a tehnique whih redues write traffi by dividing data into small piees or hunks and then storing only the unique hunks. Many data dedupliation mehanisms have been proposed to extend the lifetime of SSDs [2], [3]. However, they ommonly partition the inoming data into fixed-size hunks. This blok-level fixed-size hunking suffers from low dedupliation rate espeially when a portion of a file is inserted or deleted; in this ase, all the hunks after the modified position beome different from the original versions and thus annot be dedupliated. To improve the dedupliation rate further, this paper proposes a new dedupliation sheme alled blok-level ontentaware hunking. The proposed sheme divides the data within We define the dedupliation rate as the aggregated redution in storage requirements gained from dedupliation tehnology [4]. Thus, the higher dedupliation rate uses less storage spae. a fixed-size blok into a set of variable-sized hunks based on its ontents and avoids storing dupliate opies of the same hunk. We also explore several optimizations whih onsider () hunks lied at the blok boundary, (2) hunks onsisting of all zeroes, and (3) hunks of small size. To examine the impat of eah optimization on dedupliation rate and memory footprint, we have performed omprehensive experiments on a real SSD platform. Our evaluations with four different workloads show that the proposed sheme improves the average dedupliation rate by 77% over the blok-level fixed-size hunking sheme. After applying all the optimizations, the average memory onsumption is redued by 39% with an additional gain of.4% in the average dedupliation rate. This paper makes following ontributions. The idea of ontent-aware hunking is not new in the dedupliation domain. However, it is not used for SSDs as the file-level information is not available at the devie level. We propose a new approah alled blok-level ontent-aware hunking whih applies the ontent-aware hunking within eah blok. Even if the ontent-aware hunking is performed for eah fixed-size blok (e.g., 4KB), we find a signifiant improvement in dedupliation rate an be ahieved over the traditional blok-level fixedsize hunking. For blok-level ontent-aware hunking, we suggest several optimizations to further improve dedupliation rate and memory footprint. We design the firmware arhiteture and major data strutures required to implement the proposed sheme in SSDs. In partiular, we have developed a prototype based on a real SSD platform to demonstrate the feasibility of our approah. We perform extensive experiments with real workloads to quantitatively evaluate the performane of the bloklevel ontent-aware hunking and the effet of eah optimization. The rest of this paper is organized as follows. Setion II provides bakground information. In Setion III, the proposed blok-level ontent-aware hunking sheme is presented with several optimizations. Setion IV shows our evaluation results and Setion V summarizes the related work. Setion VI 978--7695-588-6/3 $26. $3. 23 IEEE DOI.9/HPCC.and.EUC.23.286 982

onludes the paper. II. BACKGROUND A. Flash Translation Layer (FTL) NAND flash memory is physially organized in flash bloks and eah flash blok ontains 64 256 pages. The read and write operation take plae on a page basis, while the erase operation leans all the pages within a flash blok. The internal NAND flash memory of an SSD is managed by speial firmware alled Flash Translation Layer (FTL). Sine in-plae update is not allowed, FTL writes eah inoming data into lean pages and maintains mapping information between the logial page number (LPN) from the host and the physial page number (PPN) in NAND flash memory. As the new data is written, the previous version beomes obsolete and those obsolete pages are later relaimed by the garbage olletion (GC) proedure. During GC, FTL selets a vitim flash blok and onverts it into a lean flash blok. Before erasing the vitim flash blok, any valid pages still in the vitim blok should be opied to other lean pages. B. Extending the Lifespan of SSDs SSDs have a limited lifespan as NAND flash memory is worn out due to repeated program/erase operations. When the number of erase operations performed on a flash blok goes beyond the number guaranteed by the manufaturer, it is not reommended to use the flash blok anymore. Even worse, the guaranteed number for eah flash blok is getting smaller as the manufaturing tehnology shrinks in size and moves towards more than 2-bit MLC. Thus, extending the lifespan of SSDs is one of the hottest issues in SSDs. Several different approahes have been studied to extend the lifespan of SSDs. The first approah is to develop sophistiated NAND flash management algorithms for FTL. In many ases, the additional writes aused by GC oupy a large portion of the total write traffi. Therefore, various mapping shemes and GC poliies have been proposed to improve the performane and lifetime of SSDs [5], [6]. The seond approah is to add a novel wear-leveling algorithm to FTL. Wear-leveling algorithms attempt to balane the erase ounts over the entire flash bloks to prolong the time until one of flash bloks fails [7], [8]. The final approah is to redue the amount of atual data in SSDs via ompression [9] or dedupliation [2], [3]. C. Dedupliation Dedupliation has long been used in bakup systems where many opies of very similar or even idential files are stored. In suh an environment, a file is divided into a set of hunks and eah hunk is ompared with already stored hunks. To make hunk omparisons faster, the fingerprint of a hunk, generated by a ryptographi hash funtion suh as MD5 [] and SHA- [], is used. The probability of having an idential fingerprint from two different hunks is very small, even less than hardware bit errors [2]. Although fingerprint ollision is not a serious issue in dedupliation, the ollision an be A A B B C C hunk a a2 a3 a a a a a a a2 a3 a a a a a (a) File-level Content-Aware Chunking (FILE-CAC) lok b 3 3 b2 (b) Blok-level Fixed-Size Chunking (BLK-FSC) 2 2 () Blok-level Content-Aware Chunking (BLK-CAC) Dupliated data b3 b b2 b3 Newly inserted data Fig. : Examples of dedupliation with various hunking shemes. ompletely eliminated by byte-to-byte omparison of their ontents as is done in [3]. One of the most important issues in dedupliation is the hunking sheme. To save storage spae and network bandwidth, many bakup systems and network file systems use the hunking sheme suh as Rabin fingerprinting [4] where the hunk boundary is determined only when its ontents satisfy a speifi ondition [5], [2]. We all this the file-level ontent-aware hunking (FILE-CAC). An example of FILE-CAC is depited in Figure (a) in whih the file A is updated to A as the new data is inserted in the middle of the file. Sine the hunks are determined by their ontents, only the hunk a 4 is affeted by this insertion and the remaining hunks are eligible for dedupliation. Unfortunately, FILE-CAC annot be diretly used for SSDs as storage devies have no information on whih sequene of data bloks belongs to the same file. Although several dedupliation shemes suh as CAFTL [2] and CA-SSD [3] have been reently proposed for SSDs and a ommerial SSD ontroller is known to support dedupliation [6], this is why they all use the blok-level fixed-size hunking (BLK-FSC) as shown in Figure (b). In the same situation as in Figure (a), BLK-FSC is able to dedupliate only the first hunk b sine the original data is shifted by the newly inserted data making the hunks b 2 and b 3 differ from b 2 and b 3, respetively. Therefore, it is very diffiult for BLK-FSC to ahieve a high dedupliation rate. To inrease the dedupliation rate further in SSDs, this paper proposes a novel hunking sheme alled blok-level ontentaware hunking (BLK-CAC), as depited in Figure (). In the following setion, we desribe the design and implementation of the proposed sheme in detail. 983

Write requests hunking Split ontent into fixed size bloks N 3 # of hunks Ptr 2 3 Table (9,, 24) List (5,, 372) List (5,, 372) List (2, 24, 3) List (3,, 893) List lash ages lements N 9 9 9 9 3 Fingerprint Store ingerprinting athing New hunk N 9 9 9 9 3 # of hunks Ptr 3 Table xabcabc 2 List Ptr Ptr Ptr x23456 xabc23 3 List List lements ingerprint Store Dupliated hunk Adjust mapping Write to flash Fig. 3: Mapping and indexing data strutures. Fig. 2: Dedupliation proess in BLK-CAC. III. DESIGN A. Blok-Level Content-Aware Chunking (BLK-CAC) In the blok-level ontent-aware hunking sheme (BLK- CAC), eah blok is divided into several hunks aording to its ontents as in FILE-CAC. However, sine eah blok is treated separately similar to BLK-FSC, a hunk annot span aross two different bloks. Therefore, eah hunk ends if its ontents satisfy a speifi ondition or it meets the blok boundary. In Figure (), assume that the file C, whose ontents are the same as the file A, is stored in three different bloks of the storage devie. Note that the hunk a 3 (or a 6 ) in Figure (a) has been split into two hunks 3 and 4 (or 7 and 8 )in Figure () due to the blok boundary. Even if the new data is inserted into 5, we an see that the hunks 4, 6 and 9 an still be dedupliated after the insertion. In this way, BLK- CAC an dedupliate a more number of hunks than BLK-FSC. Espeially, BLK-CAC is more effetive when a file size is large as BLK-FSC usually fails to dedupliate those hunks loated after the modified region of the file. One downside of BLK-CAC is that the number of hunks is inreased ompared to BLK-FSC. The larger the number of hunks, the more memory required as we need to maintain a fingerprint for eah hunk. In Setion III-D, we desribe several optimizations designed for reduing the number of hunks without a signifiant impat on the dedupliation rate. B. Dedupliation Proess Figure 2 illustrates the overall dedupliation proess in BLK-CAC whih onsists of three major phases: hunking, fingerprinting, and mathing. When a write request is issued from the host, its ontents are first divided into fixed-size bloks. We use the blok size of 4KB whih is idential to the blok size of the Ext4 file system. In the hunking phase, eah blok is split into several variable-sized hunks based on its ontents. We use Rabin fingerprinting for the ontent-aware hunking. If the blok boundary is met while applying Rabin fingerprinting, the hunking phase is finished. In the fingerprinting phase, a 6-bit fingerprint is generated for eah hunk using the SHA- hash funtion. The generated fingerprint is ompared with other fingerprints stored in the fingerprint store in the mathing phase. If there is a mathing fingerprint, the physial address of the mathing hunk is retrieved from the fingerprint store and the mapping is adjusted aordingly. Otherwise, the hunk is written to NAND flash memory and the orresponding entry is added to the fingerprint store. As the hunk size is variable and typially smaller than the flash page size, we use a write buffer to oalese multiple hunks before writing them into NAND flash memory. C. Data Strutures for Dedupliation Figure 3 depits major data strutures needed for implementing dedupliation with BLK-CAC. The mapping table for translating logial addresses to physial addresses is omposed of the L2P table and L2P elements. When the write buffer is flushed, the number of hunks and the pointer to an L2P element is stored in the orresponding entries in the L2P table. Eah L2P element ontains the physial flash page number (PPN), the starting offset, the size of eah hunk, and pointers to link all the L2P elements that point to the same physial data. For example, Figure 3 shows that LPN # has two hunks. From the assoiated L2P element, we an see that the first hunk is 24 bytes in size and it is loated in the byte offset of PPN #9. The seond hunk of LPN # and the first hunk of LPN #2 have been dedupliated, sharing the same data stored in the byte offset of PPN #5. In our design shown in Figure 3, the memory spae for L2P elements must be alloated at runtime sine the number of hunks in a blok is determined after the hunking phase. In addition, the dynami memory alloation for variable-sized L2P elements has large overheads to manage the additional metadata suh as memory offset and size. Therefore, we limit the maximum number of hunks in a blok to n (f. Table I) for ease of memory management. When the number of hunks reahes n value while hunking a blok, the remaining portion of the blok is regarded as one hunk. 984

We alloate the memory spae for L2P elements in advane and eliminate the overhead of dynami memory management. The fingerprint store provides the mapping information from fingerprints to their physial addresses. Eah entry onsists of the fingerprint of a hunk, the number of L2P elements pointing to the hunk, and pointers to link those L2P elements. In Figure 3, the first entry of the fingerprint store has the fingerprint value of xabcabc and it is shared by two L2P elements. By following the pointers, we an identify that the orresponding hunk is stored in PPN #5 and its size is 372 bytes. The P2L table and the assoiated P2L elements are used to find the logial address for the given physial address during GC. Figure 3 depits an example where PPN #9 has three hunks in it and the first hunk is mapped to the first hunk of LPN #. D. Optimizations This subsetion presents several optimization tehniques for BLK-CAC that an further redue the number of entries in the fingerprint store and the write traffi to NAND flash memory. ) Chunks in the Blok Boundary: As eah unique hunk oupies an entry in the fingerprint store, it is important to redue the number of entries kept in the fingerprint store to save memory. One possible way is to remove the hunks lied at the blok boundary (alled boundary hunks) from the fingerprint store. This is beause those boundary hunks are formed not based on their ontents but due to the presene of a blok boundary. Hene, they are very likely to be hanged when the ontents of a file is shifted by the newly inserted or deleted data. For example, we an see that the boundary hunks 7 and 8 beome different from the hunks 7 and 8, respetively, due to the inserted data on the hunk 5 in Figure (). This optimization is more effetive for large files sine many boundary hunks are affeted by the hange in the file ontents. We should be areful so as not to disard the first boundary hunk of a file as this hunk is not affeted by the hange in the later part of the file. Beause the file-level information is not available in the storage devie, it is normally impossible to identify whether a boundary hunk is the first hunk of a file. However, there is a way to detet those hunks when the file size is smaller than the blok size using the fat that the file system initializes the unused spae of a blok with zeroes. More speifially, we use a heuristi that the boundary hunk in the front of a blok is not disarded if the last hunk onsists of all zeroes. 2) Chunks Consisting of All Zeroes: The hunks onsisting of all zeroes (alled zero hunks) need not be written to NAND flash sine read operations an be served by filling the buffer with zeroes. There are many suh zero hunks beause the unused spae within a data blok after the end of the file is filled with zeroes in most file systems inluding Ext4. Thus, if there is a region in the blok whih is omposed of suessive zeroes, we treat it as a separate zero hunk. The ontents of the zero hunk is neither dedupliated nor written to NAND flash Parameters Default values Blok size b 496 bytes Average hunk size 52 bytes (fixed to b for BLK-FSC) Min. size of zero hunks z 52 bytes Min. size of non-zero hunks x 28 bytes Max. number of hunks / blok n 2 TABLE I: Parameters used in experiments memory. Instead, we simply mark a flag in the orresponding entry of the L2P element. One side effet of having zero hunks is that the number of hunks an be inreased. If a hunk had a series of zeroes in the middle of it, the hunk would be divided into three different hunks one of whih is the zero hunk. Although the zero hunk itself does not onsume any entry in the fingerprint store, the fingerprints of other two hunks should be maintained. We an redue the side effet of zero hunks by regulating the minimum size of zero hunks. Beause zero hunks of small size do not inrease the dedupliation rate signifiantly, we simply disard zero hunks whose sizes are less than a predefined value z (f. Table I). Due to these speial handling of zero hunks, the amount of write traffi to NAND flash memory is redued notieably. 3) Chunks of Small Size: When the hunk size is small, the benefit of dedupliation is not signifiant ompared to the overhead of keeping the orresponding entry in the fingerprint store. Therefore, we avoid storing the fingerprints of hunks whose sizes are smaller than the predefined threshold x (f. Table I). IV. EVALUATION A. Methodology We perform experiments on a PC with Intel Core i7-355 3.4GHz CPU and 8GB RAM, whih is onneted to the Jasmine OpenSSD platform [7] via the SATA2 interfae. The Jasmine platform is an SSD development board whih onsists of Indilinx Barefoot SSD ontroller, 64MB SDRAM, and two 32GB NAND flash memory modules. For ease of prototyping, we have modified the firmware of the Jasmine platform so that it exposes the native flash read, write, and erase operations, and implemented BLK-FSC and BLK-CAC shemes as a kernel module in the host system running Linux 2.6.32. The dedupliation rate of FILE-CAC is measured using a file-level simulator. The average hunk size in FILE-CAC and BLK-CAC is set to 52 bytes, by default. Note that the larger average hunk size results in less dedupliation rate, while the smaller size auses more memory footprint. For BLK-FSC, the average hunk size is set to 4KB whih is same as the blok size of the Ext4 file system. To aelerate the experiments, the total apaity of SSD is onfigured to 4GB. Some parameters and their default values used in our experiments are summarized in Table I. 985

8 FILE-CAC BLK-FSC BLK-CAC 8 No-opt Boundary Boundary+Zero Boundary+Zero+Small Dedupliation Rate (%) 6 4 2 Dedupliation Rate (%) 6 4 2 Apahe2 Emas Wiki Fig. 4: A omparison of dedupliation rates for different hunking shemes Apahe2 Emas Wiki Fig. 5: The impat of various optimizations on the dedupliation rate We use four workloads, APACHE2 [8], EMACS [9], KER- NEL [2], and WIKI [2]. APACHE2, EMACS, and KERNEL onsist of five (from 2.2.8 to 2.2.22), six (from 23. to 24.2), and two (2.6.32.6 and 2.6.34.4) different versions of the soure tar files, respetively, whih are extrated on the Ext4 file system one by one from the lowest version to the highest version. In WIKI, two large files, whih are snapshots of all Wikimedia Wikis taken on Jan. and Feb. 22, are opied to the file system in hronologial order. Number of Fingerprints (nomalized to No-opt).4.2.6.4.2 No-opt Boundary Boundary+Zero Boundary+Zero+Small B. Dedupliation Rate Figure 4 ompares the dedupliation rate of eah hunking sheme. Note that the theoretial bound for the dedupliation rate of KERNEL and WIKI is 5% as only two versions are written into the SSD in these workloads. In Figure 4, we an see that the differene in dedupliation rates between FILE- CAC and BLK-CAC is % 2% in APACHE2, EMACS, and WIKI. Sine FILE-CAC utilizes the file-level information for the ontent-aware hunking, the dedupliation rate of FILE- CAC is normally better than that of BLK-FSC or BLK-CAC. In spite of this, it is surprising that BLK-CAC shows slightly better dedupliation rate for KERNEL. This is beause there is a large amount of zero hunks in KERNEL and BLK-CAC effetively dedupliates suh zero hunks as presented in Setion III-D2. Overall, BLK-CAC redues the total write traffi by 77% on average ompared to BLK-FSC. Although almost 9% of two snapshots of WIKI have the same ontents, BLK-FSC fails to dedupliate showing the dedupliation rate of only.4%. This is beause the ontents of many fixed-size hunks are affeted by small insertions or deletions under BLK-FSC. In the same situation, BLK-CAC ahieves the dedupliation rate of 23%. Compared to FILE-CAC, the derease in the dedupliation rate of BLK-CAC is due to boundary hunks whih are not dedupliated. The results of BLK-CAC inlude all the optimizations presented in Setion III-D with the default parameter values shown in Table I. The effet of eah optimization under different parameter values is disussed in Setion IV-E. Apahe2 Emas Wiki Fig. 6: The number of entries in the fingerprint store after applying various optimizations (normalized to No-opt) C. Impat of Eah Optimization To investigate the impat of various optimizations on the dedupliation rate, we apply eah optimization inrementally. In Figure 5, No-opt represents the baseline BLK-CAC sheme where no optimization is performed. In Boundary, we eliminate boundary hunks from the fingerprint store. Additionally, zero hunks are handled as presented in Setion III-D2 in Boundary+Zero. Boundary+Zero+Small represents the ase where all the optimizations desribed in Setion III-D are applied. Figure 5 shows that the average dedupliation rate drops from 45% to 39% when dedupliation is not tried for boundary hunks. However, the speial handling of zero hunks reovers the average dedupliation rate to 46%. The impat of handling zero hunks is notieable exept for WIKI beause APACHE2, EMACS, and KERNEL generate a lot of zero hunks due to small files. Additional elimination of small hunks has hardly affeted the dedupliation rate. Eah optimization has no notieable impat when the file size is large as in WIKI. By applying all the optimizations, the dedupliation rate is inreased by.4% on average. 986

F (MB) B S b (MB) S a (MB) C b C a APACHE2 52 49,567 94 76,4 62,45 EMACS 56 56,266 6 278 394,27 25,97 KERNEL 73 232,584 99 58 849,899 585,723 WIKI 2,44 5964 2,58,652 2,836,243,734,652 TABLE II: Summary of BLK-CAC results. (F : the total file size, B: the number of 4KB bloks, S b (or S a ): the total SSD spae used before (or after) dedupliation, C b (or C a ): the total number of hunks in the fingerprint store before (or after) optimizations.) Cumulative Distribution.9.7.6.5.4.3.2. Apahe2 Emas 52 24 536 248 256 372 3584 496 Zero Chunk Size (B) Fig. 7: The umulative distribution of the number of zero hunks Cumulative Distribution.9.7.6.5.4.3.2. Apahe2 Emas 52 24 536 248 256 372 3584 496 Zero Chunk Size (B) Fig. 8: The umulative distribution of the amount of spae oupied by zero hunks D. Memory Footprint Figure 6 depits the number of entries in the fingerprint store normalized to the value of No-opt. The less number of fingerprints means that the less memory footprint is needed for the fingerprint store. We an see that the impat of removing boundary hunk is quite substantial in reduing the required size of the fingerprint store. Getting rid of zero hunks and small hunks from the fingerprint store also helps to save memory, but the effet is rather limited. The overall memory onsumption of the fingerprint store is redued by 39% on average when we apply all the optimizations. Table II summarizes the results of the proposed BLK-CAC sheme with all optimizations enabled. In Table II, the final dedupliation rate is given by (S b S a) S b. E. Effet of Parameter Values The proposed BLK-CAC sheme relies on three parameters, namely, the minimum size of zero hunks (z), the minimum size of non-zero hunks (x), and the maximum number of hunks in a blok (n) (f. Table I) 2. The optimal values of these parameters are important to get better dedupliation rate and lower memory footprint. This subsetion investigates the impat of these parameters on the overall performane. ) Minimum Size of Zero Chunks (z): When handling zero hunks, BLK-CAC regards a series of onseutive zeroes as a 2 The blok size (b) is usually set to the file system blok size. We fix the average hunk size () to 52 bytes whih balanes dedupliation rate and memory footprint. zero hunk only if it is equal to or larger than the minimum size of zero hunks, z. The larger value of z results in less number of zero hunks, however it also redues the dedupliation rate. Thus, we examine the relationship between the value of z and the dedupliation rate. Figure 7 and Figure 8 depit the umulative distributions of the number of hunks and the total amount of spae oupied by zero hunks, respetively. We measure the results for APACHE2, EMACS, and KERNEL as WIKI has only few zero hunks. From Figure 7, we an see that 97% of zero hunks are smaller than 5 bytes. However, these small zero hunks take only 3% of the total bytes of zeroes as illustrated in Figure 8. Thus, if we remove small zero hunks, we an redue a large number of entries in the fingerprint store while minimizing its impat on the dedupliation rate. Figure 9 presents the hanges in the dedupliation rate when we vary the value of z from byte to 248 bytes. The results are normalized to the ase where the optimization for zero hunks is not enabled. Until the value of z inreases to 52 bytes, the dedupliation rate is almost same with the ase whih regards all zeroes as zero hunks (the leftmost point in eah line). However, the dedupliation rate of APACHE2 starts to fall evidently when the minimum zero hunk size beomes 24 bytes. From Figure 9, we set the value of z to 52 bytes to avoid a derease in the dedupliation rate. In this ase, we an inrease the dedupliation rate signifiantly while removing a large number of entries orresponding to zero hunks in the fingerprint store. 987

Normalized Dedupliation Rate.5.4.3.2..9 Apahe2 Emas 6 32 64 28 256 52 24 Minimum Zero Chunk Size (B) 248 None Fig. 9: Changes in the dedupliation rate aording to the minimum zero hunk size (normalized to the results whih do not onsider zero hunks) Normalized Dedupliation Rate.95.9 5.75.7.65 Apahe2 Emas Wiki.6 6 32 64 28 256 52 Minimum Non-Zero Chunk Size (B) Fig. : Changes in the dedupliation rate aording to the minimum non-zero hunk size (normalized to the results whih do not onsider small hunks) 2) Minimum Size of Non-Zero Chunks (x): As explained in Setion III-D, small hunks make little ontribution to the overall dedupliation rate. To eliminate those small hunks as many as possible, we measure the dedupliation rate while varying the minimum size of non-zero hunks (x) from byte to 52 bytes. Figure depits the hanges in the dedupliation rate whose results are normalized to the ase where small hunks are not speially handled. Figure shows that the dedupliation rate remains the same until the value of x beomes 28 bytes. When we inrease the value of x further beyond 28 bytes, the dedupliation rate is dereased sharply. This means that eliminating small hunks whose sizes are less than 28 bytes from the fingerprint store does not harm the dedupliation rate. Therefore, we hoose 28 bytes as the default value of x. 3) Maximum Number of Chunks in a Blok (n): The total size of the mapping table depends on the maximum number of hunks in a blok (n). The umulative distribution of the number of hunks in a blok is depited in Figure. We an observe that the ontents of most bloks are divided to less than 2 hunks. Cumulative Distribution.9.7.6.5.4.3.2 Apahe2. Emas Wiki 2 4 6 8 2 4 6 8 2 22 24 26 28 3 Number of Chunks in a Blok Fig. : The umulative distribution of the number of hunks in a blok Normalized Dedupliation Rate.95.9 5.75.7.65.6 Apahe2 Emas Wiki.55 Infinite 6 4 2 8 6 Maximum Number of Chunks in a Blok Fig. 2: Changes in the dedupliation rate aording to the maximum number of hunks in a blok (normalized to the results when there is no suh limitation) Figure 2 illustrates the hanges in the dedupliation rate while we vary the maximum number of hunks in a blok from 6 to 6. The results are normalized to the ase when there is no suh limitation. As the value of n dereases, the dedupliation rate also diminishes beause the last hunk of a blok is not properly dedupliated. There is only about 2% drop in the dedupliation rate until the value of n is larger than 2. When we derease the value of n less than 2, the dedupliation rate starts to be affeted severely. Aording to these results, it is reasonable to set the default value of n to 2. V. RELATED WORK As dedupliation an improve the performane and lifespan of SSDs, there have been several approahes to implement dedupliation in SSDs. Berman et al. [22] propose an integrate dedupliation and write module in the SSD ontroller. However, they only present the basi idea and the expeted benefit through a simple analytial model. Chen et al. [2] suggest CAFTL (Content-Aware Flash Translation Layer) to enhane the endurane of SSDs by removing dupliated writes. To redue the overhead of garbage 988

olletion, they utilize 2-level indiret mapping. They also introdue several tehniques to aelerate fingerprinting with small buffer. Gupta et al. [3] propose CA-SSD (Content- Addressable SSD) whih employs dedupliation on SSDs to exploit value loality. The value loality means that ertain ontents of data are likely to be aessed preferentially. Thus, CA-SSD improves the performane by ahing small number of fingerprints. Kim et al. [23] develop a mathematial model to alulate the effiieny of dedupliation in SSDs and implement the prototype on real SSDs. They analyze the trade offs among design hoies to implement dedupliation using various ombinations of hardware and software tehniques. Lee et al. [24] ombine several shemes suh as dedupliation, ompression, and performane throttling to maximize the lifetime of SSDs. However, all the aforementioned previous work employ the blok-level fixed-size hunking sheme for dedupliation in SSDs. We believe that the use of our proposed bloklevel ontent-aware hunking promises higher dedupliation rate and longer lifespan of SSDs ompared to the onvention approah. VI. CONCLUSION We propose a novel dedupliation sheme alled bloklevel ontent-aware hunking to enhane the dedupliation rate in SSDs. The baseline version of the proposed sheme enhanes the average dedupliation rate by 77% ompared to the traditional blok-level fixed-size hunking, and even shows the performane similar to the file-level ontent-aware hunking in some ases. We also suggest several optimizations to redue memory footprint and write traffi. When we apply all the optimizations, the memory onsumption required for the fingerprint store is redued by 39% on average with a slight inrease in the average dedupliation rate ompared to the baseline version. We have been very suessful in reduing the number of entries kept in the fingerprint store through several optimizations. Our future work inludes studying of the speialized hardware logi [23] or applying more sophistiated tehniques suh as pre-hashing and sampling [2] to aelerate the fingerprinting and fingerprint mathing proess. We also plan to investigate a more effiient mehanism for managing the mapping information and the fingerprint store for the blok-level ontent-aware hunking. ACKNOWLEDGMENT This work was supported by the National Researh Foundation of Korea (NRF) grant funded by the Korea Government (MSIP) (No. 23RA2AA644). This work was also supported by the IT R&D program of MKE/KEIT (No.4244, SmartTV 2. Software Platform). [2] F. Chen, T. Luo, and X. Zhang, CAFTL: A ontent-aware flash translation layer enhaning the life span of flash memory based solid state drives, in Pro. USENIX Conferene on File and Storage Tehnologies, 2, pp. 6 6. [3] A. Gupta, R. Pisolkar, B. Urgaonkar, and A. Sivasubramaniam, Leveraging value loality in optimizing NAND flash-based ssds, in Pro. USENIX Conferene on File and Storage Tehnologies, 2, pp. 7 7. [4] Em glossary: Data dedupliation rate, http://www.em.om/orporate/glossary/data-dedupliation-rate.htm. [5] S. Lee, D. Park, T. Chung, D. Lee, S. Park, and H. Song, A log bufferbased flash translation layer using fully-assoiative setor translation, ACM Transations on Embedded Computing Systems, vol. 6, no. 8, July 27. [6] A. Gupta, Y. Kim, and B. Urgaonkar, DFTL: a flash translation layer employing demand-based seletive ahing of page-level address mappings, in Pro. International Conferene on Arhitetural Support for Programming Languages and Operating Systems, 29, pp. 229 24. [7] L. Chang, On effiient wear leveling for large-sale flash-memory storage systems, in Pro. ACM Symposium on Applied Computing, 27, pp. 26 3. [8] Y. Chang, J. Hsieh, and T. Kuo, Improving flash wear-leveling by proatively moving stati data, IEEE Transations on Computers, vol. 59, January 2. [9] Y. Park and J. Kim, zftl: Power-effiient data ompression support for nand flash-based onsumer eletronis devies, Pro. IEEE Transations on Consumer Eletronis, vol. 57, no. 3, pp. 48 56, August 2. [] R. Rivest, The MD5 message-digest algorithm, in Pro. Cryptology Conferene on Advanes in Cryptology, 99, pp. 33 3. [] Seure hash standard, Federal Information Proessing Standards Publiation 8-, 995. [2] A. Muthitaharoen, B. Chen, and D. Mazieres, A low-bandwidth network file system, in Pro. ACM Symposium on Operating Systems Priniples, 2, pp. 74 87. [3] L. Freeman, How safe is dedupliation, http://media.netapp.om/douments/tot68.pdf, NatApp, Teh. Rep., 28. [4] M. Rabin, Fingerprinting by random polynomials, Harvard University, Report TR-5-8, Teh. Rep., 98. [5] I. Papapanagiotou, R. Callaway, and M. Devetsikiotis, Chunk and objet level dedupliation for web optimization: a hybrid approah, in Pro. International Communiations Conferene, 22. [6] Sandfore SSDs break TPC-C reords, http://semiaurate.om/2/5/3/sandfore-ssds-break-tp-reords/. [7] The OpenSSD Projet, http://www.openssd-projet.org/. [8] Ubuntu Soure Code Repository for Apahe2 (versions 2.2.8. 2.2.4, 2.2.7, 2.2.2, and 2.2.22), http://ftp.ubuntu.om/ubuntu/pool/main/a/apahe2/. [9] Ubuntu Soure Code Repository for Emas (versions 23., 23.2, 23.3, 23.4, 24. and 24.2), http://ftp.ubuntu.om/ubuntu/pool/main/e/emas23/, http://ftp.ubuntu.om/ubuntu/pool/main/e/emas24/. [2] Linux Soure Code Repository (versions 2.6.32.6 and 2.6.34.4), http://www.kernel.org/pub/linux/kernel/. [2] http://dumps.wikimedia.org/. [22] A. Berman, Integrating de-dupliation and write for inreased performane and endurane of solid-state drives, in Pro. IEEE Convention of Eletrial and Eletronis Engineers, 2, pp. 82 823. [23] J. Kim, C. Lee, S. Lee, I. Son, J. Choi, S. Yoon, H. Lee, S. Kang, Y. Won, and J. Cha, Dedupliation in SSDs: Model and quantitative analysis, in Pro. International Conferene on Mass Storage Systems and Tehnologies, 22, pp. 2. [24] S. Lee, T. Kim, J. Park, and J. Kim, An integrated approah for managing the lifetime of flash-based SSDs, in Pro. Conferene on Design, Automation and Test, 23, pp. 522 525. REFERENCES [] L. Grupp, J. Davis, and S. Swanson, The bleak future of NAND flash memory, in Pro. USENIX Conferene on File and Storage Tehnologies, 22. 989