I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance



Similar documents
EFFECT OF GEOMETRICAL PARAMETERS ON HEAT TRANSFER PERFORMACE OF RECTANGULAR CIRCUMFERENTIAL FINS

Architecture of the proposed standard

An Broad outline of Redundant Array of Inexpensive Disks Shaifali Shrivastava 1 Department of Computer Science and Engineering AITR, Indore

A Project Management framework for Software Implementation Planning and Management

Adverse Selection and Moral Hazard in a Model With 2 States of the World

Cisco Data Virtualization

Free ACA SOLUTION (IRS 1094&1095 Reporting)

REPORT' Meeting Date: April 19,201 2 Audit Committee

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

QUANTITATIVE METHODS CLASSES WEEK SEVEN

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

C H A P T E R 1 Writing Reports with SAS

Key Management System Framework for Cloud Storage Singa Suparman, Eng Pin Kwang Temasek Polytechnic

Parallel and Distributed Programming. Performance Metrics

Traffic Flow Analysis (2)

Development of Financial Management Reporting in MPLS

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

AP Calculus AB 2008 Scoring Guidelines

WORKERS' COMPENSATION ANALYST, 1774 SENIOR WORKERS' COMPENSATION ANALYST, 1769

Gold versus stock investment: An econometric analysis

Category 7: Employee Commuting

Planning and Managing Copper Cable Maintenance through Cost- Benefit Modeling

Fleet vehicles opportunities for carbon management

Continuity Cloud Virtual Firewall Guide

Data warehouse on Manpower Employment for Decision Support System

Abstract. Introduction. Statistical Approach for Analyzing Cell Phone Handoff Behavior. Volume 3, Issue 1, 2009

June Enprise Rent. Enprise Author: Document Version: Product: Product Version: SAP Version:

User-Perceived Quality of Service in Hybrid Broadcast and Telecommunication Networks

Nimble Storage Exchange ,000-Mailbox Resiliency Storage Solution

A Loadable Task Execution Recorder for Hierarchical Scheduling in Linux

Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop

Lecture 3: Diffusion: Fick s first law

Category 1: Purchased Goods and Services

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

Dehumidifiers: A Major Consumer of Residential Electricity

union scholars program APPLICATION DEADLINE: FEBRUARY 28 YOU CAN CHANGE THE WORLD... AND EARN MONEY FOR COLLEGE AT THE SAME TIME!

STATEMENT OF INSOLVENCY PRACTICE 3.2

A Multi-Heuristic GA for Schedule Repair in Precast Plant Production

Moving Securely Around Space: The Case of ESA

Rural and Remote Broadband Access: Issues and Solutions in Australia

The international Internet site of the geoviticulture MCC system Le site Internet international du système CCM géoviticole

Use a high-level conceptual data model (ER Model). Identify objects of interest (entities) and relationships between these objects

Global Sourcing: lessons from lean companies to improve supply chain performances

Review and Analysis of Cloud Computing Quality of Experience

Performance Evaluation

IHE IT Infrastructure (ITI) Technical Framework Supplement. Cross-Enterprise Document Workflow (XDW) Trial Implementation

Enforcing Fine-grained Authorization Policies for Java Mobile Agents

Question 3: How do you find the relative extrema of a function?

Scalable Transactions for Web Applications in the Cloud using Customized CloudTPS

Keywords Cloud Computing, Service level agreement, cloud provider, business level policies, performance objectives.

IBM Healthcare Home Care Monitoring

An Adaptive Clustering MAP Algorithm to Filter Speckle in Multilook SAR Images

Basis risk. When speaking about forward or futures contracts, basis risk is the market

DENTAL CAD MADE IN GERMANY MODULAR ARCHITECTURE BACKWARD PLANNING CUTBACK FUNCTION BIOARTICULATOR INTUITIVE USAGE OPEN INTERFACE.

ITIL & Service Predictability/Modeling Plexent

Sci.Int.(Lahore),26(1), ,2014 ISSN ; CODEN: SINTE 8 131

Whole Systems Approach to CO 2 Capture, Transport and Storage

SCHOOLS' PPP : PROJECT MANAGEMENT

Category 11: Use of Sold Products

Government Spending or Tax Cuts for Education in Taylor County, Texas

A Secure Web Services for Location Based Services in Wireless Networks*

LG has introduced the NeON 2, with newly developed Cello Technology which improves performance and reliability. Up to 320W 300W

A Note on Approximating. the Normal Distribution Function

Meerkats: A Power-Aware, Self-Managing Wireless Camera Network for Wide Area Monitoring

TIME MANAGEMENT. 1 The Process for Effective Time Management 2 Barriers to Time Management 3 SMART Goals 4 The POWER Model e. Section 1.

Remember you can apply online. It s quick and easy. Go to Title. Forename(s) Surname. Sex. Male Date of birth D

Policies for Simultaneous Estimation and Optimization

Entity-Relationship Model

Secure User Data in Cloud Computing Using Encryption Algorithms

81-1-ISD Economic Considerations of Heat Transfer on Sheet Metal Duct

Data Encryption and Decryption Using RSA Algorithm in a Network Environment

Maintain Your F5 Solution with Fast, Reliable Support

A Theoretical Model of Public Response to the Homeland Security Advisory System

Teaching Computer Networking with the Help of Personal Computer Networks

Personal Identity Verification (PIV) Enablement Solutions

Cost Benefit Analysis of the etir system Summary, limitations and recommendations

CARE QUALITY COMMISSION ESSENTIAL STANDARDS OF QUALITY AND SAFETY. Outcome 10 Regulation 11 Safety and Suitability of Premises

Cost-Volume-Profit Analysis

FEASIBILITY STUDY OF JUST IN TIME INVENTORY MANAGEMENT ON CONSTRUCTION PROJECT

Increasing Net Debt as a percentage of Average Equalized ValuaOon

Cookie Policy- May 5, 2014

Caution laser! Avoid direct eye contact with the laser beam!

Expert-Mediated Search

High Interest Rates In Ghana,

Presentation on Short-Term Certificates to the CAPSEE Conference. September 18, 2014

Relationship between Cost of Equity Capital And Voluntary Corporate Disclosures

Lecture 20: Emitter Follower and Differential Amplifiers

Asset set Liability Management for

Developing Economies and Cloud Security: A Study of Africa Mathias Mujinga School of Computing, University of South Africa mujinm@unisa.ac.

Entry Voice Mail for HiPath Systems. User Manual for Your Telephone

Important Information Call Through... 8 Internet Telephony... 6 two PBX systems Internet Calls... 3 Internet Telephony... 2

CalOHI Content Management System Review

Developing Software Bug Prediction Models Using Various Software Metrics as the Bug Indicators

Over-investment of free cash flow

Transcription:

I/O Dduplication: Utilizing Contnt Similarity to Improv I/O Prformanc Ricardo Kollr Raju Rangaswami rkoll001@cs.fiu.du raju@cs.fiu.du School of Computing and Information Scincs, Florida Intrnational Univrsity Abstract Duplication of data in storag systms is bcoming incrasingly common. W introduc I/O Dduplication, a storag optimization that utilizs contnt similarity for improving I/O prformanc by liminating I/O oprations and rducing th mchanical dlays during I/O oprations. I/O Dduplication consists of thr main tchniqus: contnt-basd caching, dynamic rplica rtrival, and slctiv duplication. Each of ths tchniqus is motivatd by our obsrvations with I/O workload tracs obtaind from activly-usd production storag systms, all of which rvald surprisingly high lvls of contnt similarity for both stord and accssd data. Evaluation of a prototyp implmntation using ths workloads rvald an ovrall improvmnt in disk I/O prformanc of 28-47% across ths workloads. Furthr brakdown also showd that ach of th thr tchniqus contributd significantly to th ovrall prformanc improvmnt. 1 Introduction Duplication of data in primary storag systms is quit common du to th tchnological trnds that hav bn driving storag capacity consolidation. Th limination ofduplicatcontntatboththfilandblocklvls for improving storag spac utilization is an activ ara ofrsarch[7,17,19,22,30,31,41]. Indd,liminating most duplicat contnt is invitabl in capacitysnsitiv applications such as archival storag for costffctivnss. On th othr hand, thr xist systms with modrat dgr of contnt similarity in thir primary storag such as mail srvrs, virtualizd srvrs, and NAS dvics running fil and vrsion control srvrs. In cas of mail srvrs, mailing lists, circulatd attachmnts and SPAM can lad to duplication. Virtual machins may run similar softwar and thus crat colocatd duplicat contnt across thir virtual disks. Finally, fil and vrsion control systms srvrs of collaborativ groups oftn stor copis of th sam documnts, sourcs and xcutabls. In such systms, if th dgr of contnt similarity is not ovrwhlming, liminating duplicatdatamaynotbaprimaryconcrn. GrayandShnoyhavpointdoutthatgivnthtchnology trnds for pric-capacity and pric-prformanc of mmory/disk sizs and disk accsss rspctivly, disk datamust cool atthratof10xprdcad[11].thy suggstdatarplicationasamanstothisnd. Aninstantiation of this suggstion is intrinsic rplication of datacratddutoconsolidationassnnowinmany storag systms, including th ons illustratd arlir. Hr, w rfr to intrinsic(or application/usr gnratd) data rplication as opposd to forcd(systm gnratd) rdundancysuchasinaraid-1storagsystm.insuch systms, capacity constraints ar invariably scondary to I/O prformanc. W analyzd on-disk duplication of contnt and I/O tracs obtaind from thr varid production systms at FIU that includd a virtualizd host running two dpartmnt wb-srvrs, th dpartmnt mail srvr, and a fil srvr for our rsarch group. W mad thr obsrvations from th analysis of ths tracs. First, our analysis rvald significant lvls of both disk static similarity and workload static similarity within ach of ths systms. Disk static similarity is an indicator of th amount of duplicat contnt in th storag mdium, whil workload static similarity indicats th dgr of on-disk duplicat contnt accssd by th I/O workload. W dfin ths similarity masurs formally in 2. Scond, w discovrd a consistnt and markd discrpancy btwn rusdistancs[23]forsctorandcontntinthi/oaccsss on ths systms indicating that contnt is rusd mor frquntly than sctors. Third, thr is significant ovrlap in contnt accssd ovr succssiv intrvals of longr tim-frams such as days or wks. Basd on ths obsrvations, w xplor th prmis that intrinsic contnt similarity in storag systms and accss to rplicatd contnt within I/O workloads can both b utilizd to improv I/O prformanc. In doing so, w dsign and valuat I/O Dduplication, a storag optimization that utilizs contnt similarity to ithr liminat I/O oprations altogthr or optimiz th rsulting disk had movmnt within th storag systm. I/O Dduplication compriss thr ky tchniqus: (i) contnt-basd caching that uss th popularity of data 1

Workload Fil Systm Mmory Rads[GB] Writs[GB] Fil Systm typ siz[gb] siz[gb] Total Sctors Contnt Total Sctors Contnt accssd wb-vm 70 2 3.40 1.27 1.09 11.46 0.86 4.85 2.8% mail 500 16 62.00 29.24 28.82 482.10 4.18 34.02 6.27% homs 470 8 5.79 2.40 1.99 148.86 4.33 33.68 1.44% Tabl 1: Summary statistics of on wk I/O workload tracs obtaind from thr diffrnt systms. contnt rathr than data location of I/O accsss in making caching dcisions,(ii) dynamic rplica rtrival thatuponacachmissforaradopration,dynamically chooss to rtriv a contnt rplica which minimizs disk had movmnt, and(iii) slctiv duplication that dynamically rplicats frquntly accssd contnt in scratch spac that is distributd ovr th ntir storag mdium to incras th ffctivnss of dynamic rplica rtrival. W valuatd a Linux implmntation of th I/O Dduplication tchniqus for workloads from th thr systms dscribd arlir. Prformanc improvmnts masurdasthrductionintotaldiskbusytiminthrang 28-47% wr obsrvd across ths workloads. W masurd th influnc of ach tchniqu of I/O Dduplication sparatly and found that ach tchniqu contributd substantially to th ovrall prformanc improvmnt Particularly, contnt-basd caching incrasd mmory cachingffctivnssbyatlast10%andbyasmuchas 4X in cach hit rat for rad oprations. Had-position awar dynamic rplica rtrival dirctd I/O oprations to altrnat locations on-th-fly and additionally rducd avrag I/O tims by 10-20%. And finally, slctiv duplication cratd additional rplicas of popular contnt during priods of low forground I/O activity to furthr improvd th ffctivnss of dynamic rplica rtrival, ladingtoarductioninavragi/otimsby23-35%. WalsomasurdthmmoryandCPUovrhadsof I/O Dduplication and found ths to b nominal. InSction2,wmakthcasforI/Odduplication. W laborat on a spcific dsign and implmntation of its thr tchniqus in Sction 3. W prform a dtaild valuation of improvmnts and ovrhad for thr diffrnt workloads in Sction 4. W discuss rlatd rsarch in Sction 5, discuss salint dsign and dploymnt altrnativs in Sction 6, and finally conclud with dirctions for futur work. 2 Motivation and Rational In this sction, w invstigat th natur of contnt similarity and accss to duplicat contnt using workloads from thr production systms that ar in activ, daily us at th FIU Computr Scinc dpartmnt. W collctd I/O tracs downstram of an activ pag cach fromachsystmforadurationofthrwks. Ths systms hav diffrnt I/O workloads that consist of a virtual machin running two wb-srvrs(wb-vm workload), an mail srvr(mail workload), and a fil srvr (homs workload). Th wb-vm workload is collctd from a virtualizd systm that hosts two CS dpartmnt wb-srvrs, on hosting th dpartmnt s onlin cours managmnt systm and th othr hosting th dpartmnt s wb-basd mail accss portal; th local virtual disks which wr tracd only hostd root partitions containing th OS distribution, whil th http data for ths wb-srvrs rsid on a ntwork-attachd storag. Th mail workload srvs usr INBOXs for th ntir Computr Scinc dpartmnt at FIU. Finally, th homsworkloadisthatofanfssrvrthatsrvsth hom dirctoris of our small-sizd rsarch group; activitis rprsnt thos of a typical rsarchr consisting of softwar dvlopmnt, tsting, and xprimntation, th us of graph-plotting softwar, and tchnical documnt prparation. Ky statistics rlatd to ths workloads ar summarizdintabl1.thmailsrvrisahavilyusdsystm and gnrats a highly-intnsiv I/O workload in comparison to th othr two. Howvr, som uniform trnds can b obsrvd across ths workloads. A fairly small prcntag of th total fil systm data is accssd during th ntir wk(1.44-6.27% across th workloads), rprsnting small working sts. Furthr, ths ar writintnsiv workloads. Whil it is thrfor important to optimiz writ I/O oprations, w also not that most writs ar committd to prsistnt storag in th background and do not affct usr-prcivd prformanc dirctly. Optimizing rad oprations, on th othr hand, has a dirct impact on usr-prcivd prformanc and systm throughput bcaus this rducs th waiting tim for blockd forground I/O oprations. For rad I/O s, w obsrv that in ach workload, th uniqu contnt accssd is lssr than th uniqu locations that ar accssd on th storag dvic. Ths obsrvation dirctly motivats th thr tchniqus of our approach as w laborat nxt. 2.1 Contnt-basdcach Thsystmsofintrstinourworkarthosinwhich thrarpattrnsofworkshardacrossmorthanon mchanism within a singl systm. A mchanism rprsntsanyactivntity,suchasasinglthradorprocss or an ntir virtual machin. Such duplicatd mch- 2

1+07 1+06 100000 10000 Sctor Contnt 1+06 100000 10000 Sctor Contnt 1000 Rad Writ Rad+Writ 1000 Rad Writ Rad+Writ Numbr of cach hits 1+08 1+07 1+06 Sctor Contnt Rad Writ Rad+Writ Avrag rus distanc 1+07 1+06 100000 Sctor Contnt Rad Writ Rad+Writ 1+07 1+06 100000 Sctor Contnt 1+07 1+06 Sctor Contnt 10000 Rad Writ Rad+Writ 100000 Rad Writ Rad+Writ Figur1:Pagcachhitsforthwb-vm(top),mail (middl), and homs(bottom) workloads. A singl day trac was usd with an infinit cach assumption. anisms also lad to intrinsic duplication in contnt accssd within th rspctiv mchanisms I/O oprations. Duplicat contnt, howvr, may b indpndntly managd by ach mchanism and stord in distinct locations on a prsistnt stor. In such systms, traditional storaglocation(sctor) addrssd caching can lad to contnt duplication in th cach, thus rducing th ffctivnss of th cach. Figur 1 shows that cach hit ratio (for rad rqusts) can b improvd substantially by using a contntaddrssd cach instad of a sctor-addrssd on. Whil writ I/Os lading to contnt hits could b liminatd for improvdprformanc,wdonotxploritinthispapr.agratrnumbrofsctorhitswithwriti/osar du to journaling writs by th fil systm, rpatdly ovrwriting locations within a circular journal spac. For furthr analysis, w dfin th avrag sctor rus distancforaworkloadasthavragnumbrofrqusts btwn succssiv rqusts to th sam sctor. Th avrag contnt rus distanc is dfind similarly ovraccssstothsamcontnt. Figur2showsthat th avrag rus distanc for contnt is smallr than for sctorforachofththrworkloadsthatwstudidfor both rad and writ rqusts. For such workloads, data addrssd by contnt can b cach-rsidnt for lssr tim yt b mor ffctiv for srvicing rad rqusts thanifthsamcachddataisaddrssdbylocation. Writrqustsonthothrhanddonotdpndoncach Figur 2: Contrasting contnt and sctor rus distancs for th wb-vm(top), mail(middl), and homs (bottom) workloads. hits sinc data is flushd to rathr than rqustd from th storag systm. Ths obsrvations and thos from Figur 1 motivat contnt-basd caching in I/O Dduplication. 2.2 Dynamic rplica rtrival Systms with intrinsic duplication of mchanism may also oprat on duplicat data stord in th prsistnt stors managd by ach mchanism. Such intrinsic contnt duplication crats opportunitis for optimizing I/O oprations. W dfin th disk static similarity as th avrag numbr of copis pr filsystm-alignd block of contnt, typicallyofsiz4kb,asaformalmasurofcontnt similarity in th storag systm. Th disk static similarity is calculatd as (all zros)/(uniqu 1), whr allisthtotalnumbrofblocks, zrosarthnumbr of zrod blocks(nvr-usd), and uniqu is th numbr of blocks with uniqu contnt(aftr liminating duplicats). This static similarity masur includs blocks thatarnotcurrntlyinusbythfil-systm;winclud such blocks bcaus thy wr prviously usd and thrfor may contain th sam contnt as in-us data blocks. Tabl 2 summarizs static similarity valus for achofththrworkloads.wnoticthatthrissubstantial duplication of contnt on th disks usd by ach ofthsworkloads.inthcasofthmailworkload,on might xpct a highr lvl of contnt similarity du to 3

Workloads wb-vm mail homs Uniqu pags(millions) 1.9 27 62 Total pags(millions) 5.2 73 183 Static similarity 2.67 2.64 2.94 Tabl 2: Disk static similarity. T otal pags xcluds zro pags; U niqu pags xcluds rpatd pags in addition to zro pags. Workload static similarity 4 3.5 3 2.5 2 1.5 1 0.5 0 wb-vm mail homs 10 100 1000 no limit Maximum numbr of copis Figur 3: Workload static similarity. On day tracs wr usd. Th x axis limits th static similarity considrationtoblockswhichhavatmostxcopisondisk. mailing-list mails and circulatd attachmnts apparing inmanyinboxs.howvr,wpointoutthatallmails withinausr sinboxarmanagdasasingllargfil by mail srvr and thrfor individual mails ar lss likly to b alignd to th filsystm block-siz, impacting th disk static similarity masur. Nvrthlss, th lvl of contnt similarity in ths systms is high. Whil th prsnc of substantial duplicat contnt on ach of ths systms is promising, it is possibl that duplicat contnt is not accssd frquntly in th actual I/O workload. W masurd th avrag numbr of copisinthstoragsystmforallthblocksradwithin achofthsworkloads.wrfrtothismasurasth workload static similarity. By considring only th ondisk duplicat contnt prtinnt to th workload w can bttr stimat th impact of optimizations basd on contnt similarity. To improv th accuracy our masur, w limit th numbr of copis of targt contnt. This allows ustoprvntasmallstofhighlyrplicatdcontntfrom inflating th workload static similarity valu. As shown in Figur 3, th workload static similarity limitd to contntnotrpatdmorthan 1000timsis2.5. Whil morthanoncopyofblocksradisprsntinthstoragsystmonanavrag,wnotthatthdiskstatic similarity valus(in Tabl 2) do ovrstimat th prformanc improvmnt potntial. Basd on ths obsrvations, w can hypothsiz that forachofthsworkloads,accssstodatathatisduplicatd on th storag dvic can b optimally rdirctd Contnt accss ovrlap (%) 120 100 80 60 40 20 0 wb-vm mail homs 1 2 3 4 5 6 7 Intrvals Figur 4: Contnt working-sts for thr wk tracs. Th trac duration is dividd into 7 3-day intrvals and rad contnt ovrlap for ach intrval with all contnt from th prvious intrval is prsntd. to th location that minimizs th mchanical ovrhad of disk I/O oprations. This motivats dynamic rplica rtrival in our approach. 2.3 SlctivDuplication A third proprty of workloads is rpatd accss to th sam contnt. Hr, w rfr to accsss to spcific contnt, which is a diffrnt masur than rpatd accss to th sam block addrss. To illustrat this diffrnc, accsss to two copis of th sam xcutabl stord within two virtual disks ownd by distinct virtual machins do notladtorpatdaccsstothsamblock,butdorsult in rpatd accss to th sam contnt. InFigur4,willustratthovrlapincontntbing accssd across tim for ach of th workloads using tracs ovr a longr, thr wk duration. Mor spcifically, w divid th thr wk trac duration into svn, 3-day intrvals and masur th ovrlap in contnt rad (thus, w xclud writs) within ach intrval with all data accssd(both rad and writtn) in th prvious intrval. Th first 3-day intrval uss slf-similarity and thrfor rprsnts a 100% contnt ovrlap. For th rmaining intrvals w obsrv high lvls of ovrlap in th contnt bing rad within ach intrval with all data accssd during th prvious intrval; avrag ovrlaps ar 45%,85%,and60%,forthmail,wb-vm,andhoms workloads rspctivly. Basd on ths obsrvation, w can assum that if data accssd in th rcnt past wr rplicatd in locations disprsd across th disk ara, th choic in accss providd by such rplicas for futur I/O oprations can hlp rduc disk arm movmnt and improv I/O prformanc. Complmntary findings about diurnal pattrns in I/O workloads with altrnating priods of low and high storag activity[8, 20] suggst that such slctiv duplication, if prformd opportunistically during night-tim, may rsult in ngligibl impact to forground I/O activity. 4

3 SystmDsign I/O Dduplication systmatically xplors th us of contnt similarity within storag systms to rduc th mchanical dlays incurrd in I/O oprations and/or to liminat I/O oprations altogthr. In this sction, w start with an ovrviw of th systm architctur and thn prsnt th various dsign choics and rational bhind constructing ach of th thr mchanisms that constitut I/O Dduplication. 3.1 ArchitcturalOvrviw An optimization basd on contnt similarity can b built at various layrs of th storag stack, with varying dgrs of accss and control ovr storag dvics and th I/O workload. Prior rsarch has argud for building storag optimizations in th block layr of th storagstack[12]. Wchoosthblocklayrforsvral rasons. First, th block intrfac is a gnric abstraction that is availabl in a varity of nvironmnts including oprating systm block dvic implmntations, softwar RAID drivrs, hardwar RAID controllrs, SAN (.g., iscsi) storag dvics, and th incrasingly popular storag virtualization solutions(.g., IBM SVC[16], EMC Invista[9], NtApp V-Sris[28]). Consquntly, optimizations basd on th block abstraction can potntially b portd and dployd across ths varid platforms.inthrstofthpapr,wdvlopanoprating systm block dvic orintd dsign and implmntation of I/O Dduplication. Scond, th simpl smantics of block layr intrfac allows asy I/O intrcption, manipulation, and rdirction. Third, by oprating at th block layr, th optimization bcoms indpndnt of th fil systm implmntation, and can support multipl instancs and typs of fil systms. Fourth, this layr nabls simplifid control ovr systm dvics at th block dvic abstraction, allowing an lgantly simpl implmntation of slctiv duplication that w dscrib latr. Finally, additional I/Os gnratd by I/O Dduplication can lvrag I/O schduling srvics, thrby automatically addrssing th complxitis of block rqust mrging and rordring. Figur 5 prsnts th architctur of I/O Dduplicationforablockdvicinrlationtothstoragstack within an oprating systm. W augmnt th storag stack s block layr with additional functionality, which w trm I/O Dduplication layr, to implmnt th thr major mchanisms: th contnt-basd cach, th dynamic rplica rtrivr, and th slctiv duplicator. Th contnt-basd cach is th first mchanism ncountrd bythi/oworkloadwhichfiltrsthi/ostrambasdon hits in a contnt-addrssd cach. Th dynamic rplica rtrivr subsquntly optionally rdircts th unfiltrd rad I/O rqusts to altrnat locations on th disk to avail th bst accss latncis to rqusts. Th slctiv Applications VFS Pag Cach Fil Systm: EXT3, JFS, I/O Dduplication I/O Schdulr Dvic Drivr Slctiv duplicator Slctiv Duplicator Contnt basd cach Dynamic rplica rtrivr : Nw componnts : Existing Componnts : Control Flow Figur 5: I/O Dduplication Systm Architctur. duplicator is composd of a krnl sub-componnt that tracks contnt accsss to crat a candidat list of contnt for rplication, and a usr-spac procss that runs during priods of low disk activity and populats rplica contnt in scratch spac distributd across th ntir disk. Thus, whil th krnl componnts run continuously, th usr-spac componnt runs sporadically. Sparating out th actual rplication procss into a usr-lvl thrad allows gratr usr/administrator control ovr th timing and rsourc consumption of th rplication procss, an I/O rsourc-intnsiv opration. Nxt, w laborat on thdsignofachofththrmchanismswithini/o Dduplication. 3.2 Contnt basd caching Buildingacontntbasdcachatthblocklayrcrats an additional buffr cach sparat from th virtual filsystm(vfs)cach.rquststothvfscachar sctor-basd whil thos to th I/O Dduplication cach ar both sctor- and contnt-basd. Th I/O Dduplication layr only ss th rad rqusts for sctor misss in th VFS cach. W discuss xclusivity across ths cachs shortly. In th I/O Dduplication layr, rad rqusts idntifid by sctor locations ar qurid against a dual sctor- and contnt-addrssd cach for hits bfor ntring th I/O schdulr quu or bing mrgd with an xisting rqust by th I/O schdulr. Population of th contnt-basd cach occurs along both th rad and writpaths.incasofacachmissduringaradopration, th I/O compltion handlr for th rad rqust is intrcptd and modifid to additionally insrt th data rad into th contnt-addrssd cach aftr I/O compltiononlyifitisnotalradyprsntinthcachandis importantnoughinthlrulisttobcachd.awrit rqust to a sctor which had containd duplicat data is simply rmovd from th corrsponding duplicat sctor list to nsur data consistncy for futur accsss. Th nw data containd within writ rqusts is optionally 5

Sctor p Pag(vcpag) {data, rfs count} Sctor-to-Hash Function MD5 Digst Digst-to-Hash Function p Lgnd Entry(vc ntry) {sctor, digst, stat} Figur 6: Data structur for th contnt-basd cach. Th cach is addrssabl by both sctor and contnthash. vc ntrysaruniquprsctor. Solidlinsbtwn vc ntrysindicatsthatthymayhavthsam contnt(thy may not in cas of hash function collisions.) Dottdlinsformalinkbtwnasctor(vc ntry)and agivnpag(vc pag.)notthatsom vc ntrysdonot pointtoanypag thrisnocachdcontntcachdfor ths. Howvr, this indicats that th linkd vc ntrys havthsamdataondisk.thishappnswhnsomof th pags ar victd from th cach. Additionally, pags formanlrulist. insrtd into th contnt-addrssd cach(if it is sufficintly important) in th onward path bfor ntring th rqust into th I/O schdulr quu to kp th contnt cach up-to-dat with important data. Th in-mmory data structur implmnting th contnt-basd cach supports look-up basd on both sctor and contnt-hash to addrss rad and writ rqusts rspctivly. Entris indxd by contnt-hash valus contain a sctor-list(list of sctors in which th contnt is rplicatd) and th corrsponding data if it was ntrd into th cach and not rplacd. Cach rplacmnt only rplacs th contnt fild and rtains th sctor-list in th in-mmory contnt-cach data structur. For rad rqusts, a sctor-basd lookup is first prformd to dtrminifthrisacachhit. Forwritrqusts,a p contnt-hash basd look-up is prformd to dtrmin a hit and th sctor information from th writ rqust is addd to th sctor-list. Figur 6 dscribs th data structur usd to manag th contnt-basd cach. A writtoasctorthatisprsntinasctor-listindxd by contnt-hash is simply rmovd from th sctor list andinsrtdintoanwlistbasdonthsctor snw contnthash. Itisimportanttoalsopointoutthatour dsign uss a writ-through cach to prsrv th smantics of th block layr. Nxt, w discuss som practical considrations for our dsign. Sinc th contnt cach is a scond-lvl cach placd blowthfilsystmpagcachor,incasofavirtualizd nvironmnt, within th virtualization mchanism, typically obsrvd rcncy pattrns in first lvl cachs ar lost at this caching layr. An appropriat rplacmnt algorithm for this cach lvl is thrfor on that capturs frquncy as wll. W propos using Adaptiv Rplacmnt Cach(ARC)[24] or CLOCK-Pro[18] as good candidats for a scond-lvl contnt-basd cach andvaluatoursystmwitharcandlruforcontrast. Anothr concrn is that thr can b a substantial amount of duplicatd contnt across th cach lvls. Thr ar two ways to addrss this. Idally, th contntbasd cach should b intgratd into a highr lvl cach(.g., VFS pag cach) implmntations if possibl. Howvr, this might not b fasibl in virtualizd nvironmnts whr pag cachs ar managd indpndntly within individual virtual machins. In such cass, tchniqus that hlp mak in-mmory cach contnt across cach lvls xclusiv such as cach hints[21], dmotions[38], and promotions[10] may b usd. An altrnat approach is to mploy mmory dduplication tchniqus such as thos proposd in th VMwar ESX srvr[36], Diffrnc Engin[13], and Satori[25]. In ths solutions, duplicat pags within and across virtualmachinsarmadtopointtothsammachin framwithusofanxtralvlofindirctionsuchas th shadow pag tabls. In mmory duplicat contnt across multipl lvls of cachs is indd an orthogonal problm and any of th rfrncd tchniqus could b usd as a solution dirctly within I/O Dduplication. 3.3 Dynamic rplica rtrival Th dsign of dynamic rplica rtrival is basd on th rational that bttr I/O schduls can b constructd with mor options for srvicing I/O rqusts. A storag systm with high disk static similarity(i.., duplicatd contnt) crats such options naturally. With dynamic rplica rtrival in such a systm, rad I/O rqusts ar optionally indirctd to altrnat locations bfor ntring th I/O schdulr quu. Choosing altrnat locationsforwritrqustsiscomplicatddutothndfor nsuring up-to-dat block contnt; whil w do not con- 6

sidr this possibility furthr in our work, invstigating altrnat mchanisms for optimizing writ oprations to utiliz contnt similarity is crtainly a promising ara of futur work. Th contnt-addrssd cach data structur that w xplord arlir supports look-up basd on sctor (containd within a rad rqust) and rturns a sctor-list that contain rplicas of th rqustd contnt, thus providing altrnat locations to rtriv th data from. Exportd Spac rad(...) Lgnd Mappd Spac Scratch Spac TohlpdcidifandtowhraradI/Orqustshould b rdirctd, th dynamic rplica rtrivr continuously maintains an stimat of th disk had position by monitoring I/O compltion vnts. For stimating had position, w us rad I/O compltion vnts only and ignor I/O compltion vnts for writ rqusts sinc writs maybrportdascompltassoonasthyarwrittn to th disk cach. Consquntly, th had position as computd by th dynamic rplica rtrivr is an approximation, sinc background writ flushs insid th disk ar not accountd for. To implmnt th had-position stimator, th last had position is updatd during th xcutionofthi/ocompltionhandlrofachradrqust. Additionally, th dirction of th disk arm managd by th schdulr is also maintaind for lvatorbasd I/O schdulrs. On complication with rdirction of an I/O rqust bfor a possibl mrg opration(don by th I/O schdulr latr) is that this optimization can rduc th chancs for mrging th rqust with anothr rqust alrady awaiting srvic in th I/O schdulr quu. For ach of th workloads w xprimntd with, w did indd obsrv rduction in mrging ngativly affcting prformanc whn using rdirction purly basd on currnt had-position stimats. Rqust mrging should gain priority ovr any othr opration sinc it liminats mchanical ovrhad altogthr. On mans to prioritiz rqust mrging is prforming th indirction of rqusts blow th I/O schdulr which prforms mrging within its mchanisms. Although this is an accptabl and corrct solution, it is substantially mor complx compard to implmntation at th block layr abov th I/O schdulr bcaus thr ar typically multipl dispatch points for I/O schdulr implmntations insid th oprating systm.thscondoption,andthonusdinoursystm,istovaluatwhthrornottordirctthi/orqusttoamoropportunlocation,basdonthanactivly maintaind digst of outstanding rqusts at th I/Oschdulr thsarrquststhathavbndispatchdtothi/oschdulrbutnotytrportdascompltdbythdvic. Ifanoutstandingrqusttoalocation adjacnt to th currnt rqust xists in th digst, rdirction is avoidd to allow for mrging. had Figur 7: Transparnt rplica managmnt for slctiv duplication. Th rad rqust to th solid block in th xportd spac can ithr b rtrivd from its originallocationinthmappdspacorfromanyofthrplicas in th scratch spac that rduc had movmnt. 3.4 Slctivduplication Figur 4 rvald that th ovrlap in longr-tim fram working sts can b substantial in workloads, mor than 80% in som cass. Whil such ovrlapping contnt ar th prfct choic for contnt to b cachd, such contnt wasfoundtobtoobigtofitinmmory. A complmntary optimization to dynamic rplica rtrivalbasdonthisobsrvationisthatanincrasinth numbr of duplicats for popular contnt on th disk can crat vn gratr opportunitis for optimizing th I/O schdul. A basic qustion thn is what to duplicat and whn. W implmntd slctiv duplication to run vrydayduringpriodsoflowdiskactivitybasdonth obsrvd diurnal pattrns in th I/O workloads that w xprimntd with. Th qustion of what to duplicat canbrphrasdaswhatisthcontntaccssdinth prviousdaysthatisliklytobaccssdinthfutur? Our analysis of th workloads rvald that th contnt ovrlap btwn th most frquntly usd contnt of th prviousdayswasfoundtobagoodprdictoroffutur accsss to contnt. Th slctiv duplicator krnl componnt calculats th list of frquntly usd contnt across multipl days by xtnding th ARC rplacmnt algorithm usd for th contnt-addrssd cach. Alistofsctorstoduplicatisthnforwarddtoth usr-spac rplicator procss which crats th actual rplicas during priods of low activity. Th priodic natur of this procss nsurs that th most rlvant contnt is rplicatd in th scratch spac whil oldr rplicasofcontntthathavithrbnovrwrittnorarno longr important ar discardd. To mak th rplication procss samlss to fil systm, w implmntd trans- 7

parnt rplica managmnt that implmnts th scratch spac usd to stor rplicas transparntly. Th scratch spac is provisiond by crating additional physical storag volums/partitions intrsprsd within th fil systm data. Figur 7 dpicts th transparnt rplica managmnt whrin th storag is intrsprsd with fiv scratch spac volums intrsprsd btwn fil systm mappd spac. For fil systm transparncy, a singl logically contiguous volum is prsntd to th fil systm by th I/O Dduplication xtnsion. Th scratch spac isusdtocratonormorrplicasofdatainthxportd spac. Sinc th I/O oprations issud during th slctiv duplication procss ar thmslvs routd via th in-krnl I/O Dduplication componnts, th additional contnt similarity information du to rplication is automatically rcordd into th contnt cach. 3.5 Prsistnc of mtadata Afinalissuisthprsistncofthin-mmorydata structur so that th systm can rtain intllignc about contnt similarity across systm rstart oprations. Prsistnc is important for rtaining th locations of ondisk intrinsic and artificially cratd duplicat contnt so that this information can b rstord and usd immdiatlyuponasystmrstartvnt. Wnotthatwhil prsistnc is usful to rtain intllignc that is acquird ovr a priod of tim, continuous prsistnc of mtadata in I/O Dduplication is not ncssary to guarant th rliability of th systm, unlik othr systms such as th agr writing disk array[40] or doubly distortd mirroring[29]. In this sns, slctiv duplication is similar to th opportunistic rplication as prformd by FS2[15] bcaus it tracks updats to rplicatd data in mmory and only guarants that th primary copy of data blocks ar up-to-dat at any tim. Whil prsistnc of th inmmory data is not implmntd in our prototyp yt, guaranting such prsistnc is rlativly straightforward. Bfor th I/O Dduplication krnl modul is unloadd(occuring at th sam tim th managd fil systm is unmountd), all in-mmory data structur ntriscanbwrittntoarsrvdlocationofthmanagd scratch-spac. Ths can thn b rad back to populat th in-mmory mtadata upon a systm rstart opration whn th krnl modul is loadd into th oprating systm. 4 ExprimntalEvaluation In this sction, w valuat ach mchanism in I/O Dduplication sparatly first and thn valuat thir cumulativ prformanc impact. W also valuat th CPU and mmory ovrhad incurrd by an I/O Dduplication systm.wusdthblocklvltracsforththrsystms thatwrdscribdindtailin 2forourvaluation. Thtracswrrplaydasblocktracsinasimilarway Hit ratio 1 0.1 0.01 0.001 0.0001 1-05 Sctor 4MB Contnt 4MB Sctor 200MB Contnt 200MB wb-vm mail homs Figur 8: Pr-day pag cach hit ratio for contnt- and sctor- addrssd cachs for rad oprations. Th totalnumbrofpagsradar0.18,2.3,and0.23million rspctivly for th wb-vm, mail and homs workloads. Thnumbrsinthlgndnxttoachtypofaddrssing rprsnt th cach siz. asdonbyblktrac[2]. Blktraccouldnotbusdasis sinc it dos not rcord contnt information; w usd a custom Linux krnl modul to rcord contnt-hashs for ach block rad/writtn in addition to othr attributs of ach I/O rqust. Additionally, th blktrac tool btrplaywasmodifidtoincludtracsinourformatand rplay thm using providd contnt. Rplay was prformd at a maximum acclration of 100x with car bingtakninachcastonsurthatblockaccsspattrnswrnotmodifidasarsultofthspdup.masurmnts for actual disk I/O tims wr obtaind with pr-rqust block-lvl I/O tracing using blktrac and th rsults rportd by it. Finally, all trac playback xprimnts wr prformd on a singl Intl(R) Pntium(R) 4CPU2.00GHzmachinwith1GBofmmoryanda Wstrn Digital disk WD5000AAKB-00YSA0 running Ubuntu Linux 8.04 with krnl 2.6.20. 4.1 Contnt basd cach In our first xprimnt, w valuatd th ffctivnss of a contnt-addrssd cach against a sctor-addrssd on. Th primary diffrnc in implmntation btwn th two is that for th sctor-addrssd cach, th sam contnt for two distinct sctors will b stord twic. W fixdthcachsizinbothvariantstoonoftwodiffrnt sizs, 1000 pags(4mb) and 50000 pags(200mb). Wrplaydtwowksofthtracsforachofththr workloads;thfirstwkwarmdupthcachandmasurmnts wr takn during th scond wk. Figur 8 shows th avrag pr-day cach hit counts for rad I/O oprations during th scond wk whn using an adaptiv rplacmnt cach(arc) in two mods, contnt and sctor addrssd. This xprimnt shows that thr is a larg incras in pr-daycachhitcountsforthwbandthhomwork- 8

Hit ratio 1 0.1 0.01 0.001 ARC - Rad LRU - Rad 0.0001 1 10 100 1000 10000 Cach siz (MByts) Pr-rqust disk I/O tim (sc) 0.02 0.015 0.01 0.005 0 wb-vm mail homs Hit ratio 1 0.1 0.01 0.001 ARC - Rad/Writ 0.0001 LRU - Rad/Writ 1 10 100 1000 10000 Cach siz (MByts) Figur 9: Comparison of ARC and LRU contnt basd cachs for pags rad only(top) and pags rad/writ oprations(bottom). A singl day trac(0.18 million pag rads and 2.09 million pag rad/writs) of th wb workload was usd as th workload. loads whn a contnt-addrssd cach is usd(rlativ to a sctor-addrssd cach). Th first obsrvation is that improvmnt trnds ar consistnt across th two cach sizs. Both cachs implmntations bnfit substantially fromalargrcachsizxcptforthmailworkload, indicating that mail is not a cach-frindly workload validatd by its substantially largr working st and workload I/O intnsity(as obsrvd in Sction 2). Th wbvm workload shows th biggst incras with an almost 10Xincrasincachhitswithacachof200MBcompardtothhomworkloadwhichhasanincrasof4X. Th mail workload has th last improvmnt of approximatly 10%. W prformd additional xprimnts to compar an LRU implmntation with th ARC cach implmntation(usd in th prvious xprimnts) using a singl daytracofthwb-vmworkload. Figur9providsa prformanc comparison of both rplacmnt algorithms whn usd for a contnt-addrssd cach. For small and largcachsizs,wobsrvthatarcisithrasgood or mor ffctiv than LRU with ARC s improvmnt ovr LRU incrasing substantially for writ oprations at small to modrat cach sizs. Mor gnrally, this xprimnt suggsts that th prformanc improvmnts for a contnt-addrssd cach ar snsitiv to th cach rplacmnt mchanism which should b chosn with car. Figur 10: Improvmnt in disk rad I/O tims with dynamic rplica rtrival. Box and whiskr plots dpicting mdian and quartil valus of th pr-rqust diski/otimsarshown. Forachworkload,thvalustothlftrprsntthvanillasystmandthaton th right is with dynamic rplica rtrival. 4.2 Dynamic rplica rtrival To valuat th ffctivnss of dynamic rplica rtrival, wrplaydaonwktracforachworkloadwith and without using I/O Dduplication. Whn using I/O Dduplication, prior to rplaying th trac workload, information about duplicats was loadd into th krnl modul s data structurs, as would hav bn accumulatdbyi/odduplicationovrthliftimofalldataon th disk. Contnt-basd caching and slctiv duplication wr turnd-off. In ach cas, w masurd th prrqust disk I/O tim pr rqust. A lowr pr-rqust diski/otiminformsusofamorfficintstoragsystm. Figur 10 shows th rsults of this xprimnt. For all th workloads thr is a dcras in mdian pr-rqust diski/otimofatlast10%andupto20%forthhoms workload. Ths findings indicat that thr is room for optimizing I/O oprations simply by using pr-xisting duplicat contnt on th storag systm. 4.3 Slctivduplication Givn th improvmnts offrd by dynamic rplica rtrival, w now valuat th impact of slctiv duplication, a mchanism whos goal is to furthr incras th opportunitis for dynamic rplica rtrival. Th workloads and mtric usd for this xprimnt wr th sam as th ons in th prvious xprimnt. To prform slctiv duplication, for ach workload, tn copis of th prdictd popular contnt wr cratd on scratch spac distributd across th ntir disk driv. Th st of popular data blocks to rplicat is dtrmind bythkrnlmodulduringthdayandxportdtousr spacaftratimthrsholdisrachd.ausrspacprogram logs th information about th popular contnt that ar candidats for slctiv duplication and crats th copis on disk basd on th information gathrd during priodsoflittlornodiskactivity. Asinthprvious 9

Pr-rqust disk I/O tim (sc) 0.02 0.015 0.01 0.005 0 wb-vm mail homs Figur 11: Improvmnt in disk rad I/O tims with slctiv duplication and dynamic rplica rtrival optimizations. Othr dtails ar th sam as Figur 10. xprimnt, prior to rplaying th trac workload, all th information about duplicats on disk was loadd into th krnl modul s data structurs. Figur 11(whn compard with th numbrs in Figur 10) shows how slctiv duplication improvs upon th prvious rsults using pur dynamic rplica rtrival. Figur4showdthatthwbworkloadhadmorthan 80% in contnt rus ovrlap and th ffct of duplicating this information can b obsrvd immdiatly. Ovrall, th rduction in pr-rqust disk I/O tim was improvd substantially for th wb-vm and homs workloads,andtoalssrxtntforthhomsworkloadusing this additional tchniqu whn compard to using dynamic rplica rtrival alon. Ovrall rductions in mdiandiski/otimswhncompardtothvanillasystmwr33%forthwbworkload,35%forthhoms workload, and 23% for mail. 4.4 Putting it all togthr Wnowxaminthimpactofusingallththrmchanisms of I/O Dduplication at onc for ach workload. W us a sctor-addrssd cach for th baslin vanilla systm and a contnt-addrssd on for I/O Dduplication. Wstthcachsizto200MBinbothcass. Sinc sctor- or contnt-basd caching is th first mchanism ncountrd by th I/O rqust stram, th rsults of th caching mchanism rmain unaffctd bcaus of thothrtwo,andthcachhitcountsrmainaswith th indpndnt masurmnts rportd in Sction 4.1. Howvr, cach hits do modify th rqust stram prsntd to th rmaining two optimizations. Whil thr is a rduction in th improvmnts to pr-rqust disk rad I/O tims with all thr mchanisms(not shown) whn compard to using th combination of dynamic rplica rtrival and slctiv duplication alon, th total numbrofi/orqustsisdiffrntinachcas. Thusth avragdiski/otimisnotarobustmtrictomasur rlativ prformanc improvmnt. Th total disk rad I/OtimforagivnI/Oworkload,onthothrhand,provids an accurat comparativ valuation by taking into account both th rducd numbr of I/O rad oprations Workload Vanilla(sc) I/O ddup(sc) Improvmnt wb-vm 3098.61 1641.90 47% mail 4877.49 3467.30 28% hom 1904.63 1160.40 39% Tabl3:RductionintotaldiskradI/Otims. Lookup CPU Cycls 1+08 1+07 1+06 100000 10000 1000 sctor 2 5 sctor 2 25 contnt 2 5 contnt 2 25 100 0 50000 100000 Numbr of uniqu pags Figur 12: Ovrhad of contnt and sctor lookup oprations with incrasing siz of th contnt-basd cach. du to contnt-basd caching and th improvmnts in disk latncis of th lattr two optimizations, and ffctivly masurs th tru incras in disk I/O fficincy. Whn comparing total disk rad I/O tim for ths thr workloads, substantial rductions wr obsrvd whncompardtoavanillasystmasshownontabl3. Ths uniformly larg improvmnts(28-47% across th thr workloads) ar a clar indication of th ffctivnss of I/O Dduplication in improving I/O prformanc for a rang of diffrnt storag workloads. 4.5 EvaluatingOvrhad Whil th gains du to I/O Dduplication ar promising, it incurs rsourc ovrhad. Spcifically, th implmntation uss contnt- and sctor- addrssd hashtabls to simplify lookup and insrt oprations into th contnt basd cach. W valuat th CPU ovrhad for insrt/lookup oprations and mmory ovrhad rquird for managing hash-tabl mtadata in I/O Dduplication. 4.5.1 CPUOvrhad To valuat th ovrhad of I/O Dduplication, w masurd th avrag numbr of CPU cycls rquird for lookup/insrt oprations as w vary th numbr of uniqu pags(i.., siz) in th contnt-basd cach(i.., cach siz)foradayofth wbworkload. Figur13dpicts ths ovrhads for two cach configurations, on configurdwith 2 25 bucktsinthhashtablsandth othrwith 2 5 buckts.radoprationsprformasctor lookup and additionally contnt lookup in cas of a miss 10

Lookup CPU Cycls 1+07 1+06 100000 10000 1000 sctor contnt 100 2 5 2 10 2 15 2 20 2 25 2 30 Hash-tabl Buckts Figur 13: Ovrhad of sctor and contnt lookup oprations with incrasing hash-tabl buckt ntris. for insrtion. Writ oprations always prform a sctor and contnt lookup du to our writ-through cach dsign. Contnt lookups nd to first comput th hash for th pag contnts which taks around 100000 CPU cyclsformd5. Withfwbuckts(2 5 )lookuptims approacho(n)whr N isthsizofthhash-tabl. Howvr,givnnoughhash-tablbuckts(2 25 ),lookup tims ar O(1). Nxt, w xamind th snsitivity to th hash-tabl buckt ntris. As th numbr of buckts ar incrasd, th lookup tims dcras as xpctd du to rduction incollisions, butbyond 2 20 buckts, thrisanincras.wattributthistol2cachandtlbmisssdu to mmory fragmntation, undr-scoring that hash-tabl buckt sizs should b configurd with car. In th swt spot of buckt ntris, th lookup ovrhad for both sctorandcontntrducsto1kcpucyclsorlssthan 1µsforour2GHzmachin.Notthatthcontntlookup opration includs a hash computation which inflats its cycls rquirmnt by at last 100000. 4.6 MmoryOvrhad Th managmnt of I/O Dduplication s contnt-basd cach introducs mmory ovrhad for managing mtadata for th contnt-basd cach. Spcifically, th mmoryovrhadisdictatdbythsizofthcachmasurdinpags(p),thdgrofworkloadstaticsimilarity(wss),andthconfigurdnumbrofbucktsin thhashtabls(htb)whichalsodtrminthlookup timaswsawarlir. Inourcurrntunoptimizdimplmntation, th mmory ovrhad in byts(assuming 4bytspointrsand4096bytspags): mm(p, WSS, HTB) = 13 P + 36 P WSS + 8 HTB (1) Ths ovrhads includ 13 byts pr-pag to stor th mtadataforaaspcificpagcontnt(vcpag),36byts prpagprduplicatdntry(vcntry),and8bytspr hash-tabl ntry for th corrsponding linkd list. For a 1GB contnt cach(256k pags), a static similarity of 4, and a hash-tabl of siz 1 million ntris, th mtadata ovrhad is 48MB or approximatly 4.6%. 5 RlatdWork In this sction, w xamin rsarch litratur rlatd to workload-basd I/O prformanc optimization and rsarch rlatd to th us of contnt similarity in mmory and storag systms. Whil thr is substantial work don along both ths dirctions, thy ar for th most part xplord as orthogonal tchniqus in th litratur, with th lattr primarily bing usd for optimizing storag capacity utilization using data dduplication. 5.1 I/O prformanc optimization Workload-basd I/O prformanc optimization has a long history. Th first class of optimizations is basd on crating optimizd layouts for storag systm data. TharlyworksofWong[37],Vongsathorntal.[35], and Rummlr and Wilks[32], which argud for shuffling on-disk data basd on data accss frquncy. Latr, Akyurk and Salm[1] argud for copying ovr shuffling of data with th obsrvation that original layouts ar oftn usful and data popularity and accss pattrns can b tmporary. Mor rcntly, ALIS[14] and BORG[3] hav mployd a ddicatd, rorganizd ara on th disk to improv both locality and squntiality of I/O accss. Thscondclassofworkisbasdonrplicatingdata and crating opportunitis for rducing disk had movmnt by incrasing th numbr of choics for rtrivingdata.thsincludthlargbodyofworkonmirroring systms[4]. Th work on doubly distortd mirrors[33] crats multipl rplicas on mastr and slav disks to incras both writ prformanc(using initial writ-anywhr and background updating of original locations) and rad prformanc by dispatching rad rqusts to th narst fr arm. Zhang t al. s work on agr writing[40] xtndd this approach to mirrord/stripd RAID configurations primarily for databas OLTP workload(which ar charactrizd by littl locality or squntiality). Yu t al.[39] propos an altrnat approach for trading disk capacity for prformanc in a RAID systm, by storing svral rotational rplicas of ach block and using a rotational latncy snsitiv disk schdulr. FS2[15] proposd rplication in fil systm fr-spac basd on block-accss frquncy and th us of such slctiv duplication of contnt to optimiz had movmnt during subsqunt rtrival of rplicatd data. Quit obviously, slctiv duplication is motivatd by th abov works, but is diffrnt in two rspcts:(i) it targts idntifying rplication candidats basd on contnt popularity, rathr than block addrss popularity, and(ii) duplication is prformd in pr-configurd ddicatd spac transparntly to th fil systm and/or othr managrs of thstoragsystm. Tothbstofourknowldgth only work to us contnt-basd optimization of I/O is th workoftoliatal.[34],whrthauthorsuscontnt hashs to prform dynamic rplica rtrival choosing b- 11

twn multipl hosts in an xtrinsically-duplicatd distributd storag systm. Our work, on th othr hand, uss intrinsic duplication within a singl storag systm. 5.2 Datadduplication Contnt similarity in both mmory and archival storag hav bn invstigatd in th litratur. Mmory dduplication has bn xplord bfor in th VMwar ESX srvr[36], Diffrnc Engin[13], and Satori[25], ach aiming to liminat duplicat in-mmory contnt both within and across virtual machins sharing a physical host. Of ths, Satori has apparnt similaritis to our work bcaus it idntifis candidats for in-mmory dduplication as data is rad from storag. Satori runs in two mods: contnt-basd sharing and copy-on-writ disk sharing. For contnt-basd sharing, Satori uss contnt-hashs to track pag contnts in mmory rad fromdisk. SincitsgoalisnotI/Oprformancoptimization, it dos not track duplicat sctors on disk and thrfor dos not liminat duplicatd I/Os that would rad th sam contnt from multipl locations. In copyon-writ disk sharing, th disk is alrady configurd to b copy-on-writ nabling th sharing of multipl VM disk imags on storag. In this mod, duplicatd I/Os du to multipl VMs rtriving th sam sctors on th shard physicaldiskwouldbliminatdinthsamwayas a rgular sctor-addrssd cach would do. In contrast, our work targts I/O prformanc optimization by ithr liminating I/Os if it wr to rtriv duplicat contnt irrspctiv of whr it may rsid on storag or rducing had movmnt othrwis Thus, th contributions of Satoriarcomplmntarytoourworkandcanbusd simultanously. Data dduplication in archival storag has also gaind importanc in both th rsarch and industry communitis. Currnt rsarch on data dduplication uss svral tchniqus to optimiz th I/O ovrhads incurrd du to data duplication. Vnti[30] proposd by QuinlanandDorwardwasthfirsttoproposthusofa contnt-addrssd storag for prforming data dduplication in an archival systm. Th authors suggstd th us of an in-mmory contnt-addrssd indx of data to spd up lookups for duplicat contnt. Similar contntaddrssd cachs wr usd in data backup solutions such as Pabody[26] and Foundation[31]. Contntbasd caching in I/O Dduplication is inspird by ths works.rcntworkbyzhuandhiscollagus[41]suggsts nw approachs to allviat th disk bottlnck via thusofbloomfiltrs[5]andbyfurthraccounting for locality in th contnt stram. Th Foundation work suggsts additional optimizations using batchd rtrival and flushing of indx ntris and a log-basd approach to writing data and indx ntris to utiliz tmporal locality[31]. Th work on spars indxing[22] suggsts improvmnts to Zhu t al. s gnral approach by xploiting locality in th chunk indx lookup oprations to furthr mitigat th disk I/O bottlnck. I/O Dduplication addrsss a orthogonal problm, that of improving I/O prformanc for forground I/O workload basd on th us of duplicats, rathr than thir limination. Nvrthlss, th abov approachs do suggst intrsting tchniqus to optimiz th managmnt of a contntaddrssd indx and cach in main-mmory that is complmntary to and can b usd dirctly within I/O Dduplication. 6 Discussion Svral aspcts of I/O Dduplication from dsign, implmntation, and dploymnt standpoints warrant furthr discussion. Som of ths also suggst avnus for futur work. Multi-disk dploymnt. In prvious sctions, w dsignd and valuatd a singl disk implmntation of I/O Dduplication. Multi-disk storag dploymnts in thformofraidormorcomplxnasappliancsar common in ntrpris data cntrs. On might qustion both th utility and ffctivnss of th singl disk had movmnt optimizations cntral to I/O Dduplication in such systms. W bliv that had movmnt optimizations basd on contnt similarity is viabl and can nabl complmntary optimizations by minimizing th unavoidabl mchanical dlays in any disk-basd storag systm. Th dynamic rplica rtrival and slctiv duplication sub-tchniqus rquir furthr considration for multi-disk systms. First, ths optimizations must b implmntd whr information about individual disk had positions is availabl. Such information is availabl insidthdrivrforsoftwarraid,inthraidcontrollr for hardwar RAID, and insid th firmwar/os or intrnal hardwar controllrs for NAS appliancs. Digst information about th outstanding rqusts and I/O compltionvntsatachdiskcanthnbutilizdasin th singl disk dsign. Whil th optimal location within achdiskforachi/orqustcanbthuscompild,th complmntary issu of load balancing across multipl disks must also b addrssd. Apart from th wll-known quu dpth basd tchniqus for load-balancing, altrnat solutions such as simultanous dispatching to multipl disks combind with just-in-tim I/O cancllation can also b nvisiond whr applicabl. Hash collisions. Our dsign and implmntation of I/O Dduplication maks th assumption that MD5(128 bits) is collision fr. Spcifically, this assumption is mad whnthcontnt-hashntryforanwpagbingwrittn is rgistrd. A similar assumption, for SHA-1 is mad for dduplication in archival storag[30] and lowbandwidth ntwork fil transfrs[27]. Whil this as- 12

sumption may b rasonabl in svral sttings, dlivring absolut corrctnss guarants rquirs that this assumption b rmovd. Systms lik Foundation[31] additionally includ th provision to prform a byt-wis comparison following a hit in th contnt cach by rading th targt location which potntially contains th duplicat data. This, of cours, rquirs an additional I/O opration. Thusofaspcifichashfunctionorth mthod of dtrmining duplicat contnt is not dcisiv in our dsign, and ths altrnativs can b mployd if found ncssary within th targt dploymnt scnario. Variabl-sizd chunks. Our implmntation of I/O Dduplication uss fixd siz blocks as th basic data unit for dtrmining contnt similarity. This choic was motivatdbyourgoalofsimplifiddploymntonavarity of block storag systms. Using variabl siz chunks as units has bn dmonstratd to b mor ffctiv for similarity dtction for mostly similar contnt and similarcontntatdiffrntoffstswithinafil[6,27].this capability is spcially important for archival storag whrasinglbackupfiliscomposdofmultipldata fils stord at diffrnt offsts and possibly with partial modifications. W bliv that for onlin storag systms,thismayboflssrconcrn,xcptforvryspcific applications(.g., a mail srvr whr ntir usr INBOXsorfoldrsarmanagdasasinglfil).Nvrthlss, th us of variabl sizd chunks for I/O dduplication provids an intrsting avnu of futur work. 7 Conclusions and Futur work Systm and storag consolidation trnds ar driving incrasd duplication of data within storag systms. Past fforts hav bn primarily dirctd towards th limination of such duplication for improving storag capacity utilization. With I/O Dduplication, w tak a contrary viw that intrinsic duplication in a class of systms which ar not capacity-bound can b ffctivly utilizd to improv I/O prformanc th traditional Achills hl for storag systms. Thr tchniqus containd within I/O Dduplication work togthr to ithr optimiz I/O oprations or liminat thm altogthr. An in-dpth valuation of ths mchanisms rvald that togthr thyrducdavragdiski/otimsby28-47%,alarg improvmnt all of which can dirctly impact th ovrall application-lvl prformanc of disk I/O bound systms. Th contnt-basd caching mchanism incrasd mmory caching ffctivnss by incrasing cach hit ratsby10%to4xforradoprationswhncompard to traditional sctor-basd caching. Had-position awar dynamic rplica rtrival dirctd I/O oprations to altrnat locations on-th-fly and additionally rducd I/O tims by 10-20%. And, slctiv duplication cratd additional rplicas of popular contnt during priods of low forground I/O activity and furthr improvd th ffctivnss of dynamic rplica rtrival by 23-35%. I/O Dduplication opns up svral dirctions for futurwork. Onavnuforfuturworkistoxplor contnt-basd optimizations for writ I/O oprations. A possibl futur dirction is to optionally coalsc or vn liminat altogthr writ I/O oprations for contnt that ar alrady duplicatd lswhr on th disk, or altrnativly dirct such writs to altrnat locations in th scratch spac. Whil th first option might sm similar to data dduplication at a high-lvl, w suggst a primary focus on th prformanc implications of such optimizations rathr than capacity improvmnts. Any optimization for writs affcts th rad-sid optimizations of I/O Dduplication and a carful analysis and valuation of th trad-off points in this dsign spac is important. Acknowldgmnts W thank th anonymous rviwrs and our shphrd Ajay Gulati for xcllnt fdback which improvd this papr substantially. W thank Eric Johnson for his hlp with production srvr tracs at FIU. This work was supportd by th NSF grants CNS-0747038 and IIS- 0534530 and by DoE grant DE-FG02-06ER25739. Rfrncs [1] Sdat Akyurk and Knnth Salm. Adaptiv Block Rarrangmnt. Computr Systms, 13(2):89 121, 1995. [2] Jns Axbo. blktrac usr guid, Fbruary 2007. [3] Mdha Bhadkamkar, Jorg Gurra, Luis Usch, Sam Burntt, Jason Liptak, Raju Rangaswami, and Vaglis Hristidis. BORG: Block-rORGanization for Slfoptimizing Storag Systms. In Proc. of th USENIX Fil and Storag Tchnologis, Fbruary 2009. [4] DinaBittonandJimGray.DiskShadowing.InProc.of th Intrnational Confrnc on Vry Larg Data Bass, 1988. [5] Burton H. Bloom. Spac/tim trad-offs in hash coding with allowabl rrors. Communications of th ACM, 13(7):422 426, 1970. [6] Srgy Brin, Jams Davis, and Hctor Garcia-Molina. Copy Dtction Mchanisms for Digital Documnts. In Proc.ofACMSIGMOD,May1995. [7] Austin Clmnts, Irfan Ahmad, Murali Vilayannur, and Jinyuan Li. Dcntralizd dduplication in san clustr fil systms. In Proc. of th USENIX Annual Tchnical Confrnc, Jun 2009. [8] Danil Ellard, Jonathan Ldli, Pia Malkani, and Margo Sltzr. Passiv NFS Tracing of Email and Rsarch Workloads. In Proc. of th USENIX Confrnc on Fil and Storag Tchnologis, March 2003. [9] EMC Corporation. EMC Invista. http://www.mc.com/products/softwar/invista/invista.jsp. [10] Binny S. Gill. On multi-lvl xclusiv caching: offlin optimality and why promotions ar bttr than dmotions. 13

In Proc. of th USENIX Fil and Storag Tchnologis, Fburary 2008. [11] JimGrayandPrashantShnoy.RulsofThumbinData Enginring. Proc. of th IEEE Intrnational Confrnc on Data Enginring, Fbruary 2000. [12] Jorg Gurra, Luis Usch, Mdha Bhadkamkar, Ricardo Kollr, and Raju Rangaswami. Th Cas for Activ Block Layr Extnsions. ACM Oprating Systms Rviw, 42(6), Octobr 2008. [13] Diwakr Gupta, Sangmin L, Michal Vrabl, Stfan Savag, Alx C. Snorn, Gorg Varghs, Goffry Volkr, and Amin Vahdat. Diffrnc Engin: Harnssing Mmory Rdundancy in Virtual Machins. Proc. of th USENIX OSDI, Dcmbr 2008. [14] Windsor W. Hsu, Alan Jay Smith, and Honsty C. Young. Th Automatic Improvmnt of Locality in Storag Systms. ACM Transactions on Computr Systms, 23(4):424 473, Nov 2005. [15] HaiHuang,WandaHung,andKangG.Shin. FS2:Dynamic Data Rplication In Fr Disk Spac For Improving Disk Prformanc And Enrgy Consumption. In Proc. ofthacmsosp,octobr2005. [16] IBM Corporation. IBM Systm Storag SAN Volum Controllr. http://www- 03.ibm.com/systms/storag/softwar/virtualization/svc/. [17] N.Jain,M.Dahlin,andR.Twari. TAPER:TirdApproach for Eliminating Rdundancy in Rplica Synchronization. In Proc. of th USENIX Confrnc on Fil And Storag Systms, 2005. [18] Song Jiang, Fng Chn, and Xiaodong Zhang. Clock-pro: An ffctiv improvmnt of th clock rplacmnt. In Proc. of th USENIX Annual Tchnical Confrnc, April 2005. [19] P.Kulkarni,F.Douglis,J.D.LaVoi,andJ.M.Tracy. Rdundancy Elimination Within Larg Collctions of Fils. Proc. of th USENIX Annual Tchnical Confrnc, 2004. [20] Andrw Lung, Shankar Pasupathy, Garth Goodson, and Ethan Millr. Masurmnt and Analysis of Larg-Scal Ntwork Fil Systm Workloads. Proc. of th USENIX Annual Tchnical Confrnc, Jun 2008. [21] Xuhui Li, Ashraf Aboulnaga, Knnth Salm, Aamr Sachdina, and Shaobo Gao. Scond-tir cach managmntusingwrithints.inproc.ofthusenixfiland Storag Tchnologis, 2005. [22] Mark Lillibridg, Kav Eshghi, Dpavali Bhagwat, Vinay Dolalikar, Grg Trzis, and Ptr Cambl. Spars indxing: larg scal, inlin dduplication using samplingandlocality. InProc.ofthUSENIXFiland Storag Tchnologis, Fbruary 2009. [23] R.L.Mattson,J.Gcsi,D.R.Slutz,andI.L.Traigr. Evaluation tchniqus for storag hirarchis. IBM Systms Journal, 9(2):78 117, 1970. [24] Nimrod Mgiddo and D. S. Modha. Arc: A slf-tuning, low ovrhad rplacmnt cach. In Proc. of USENIX Fil and Storag Tchnologis, 2003. [25] G.Milos,D.G.Murray,S.Hand,andM.Fttrman. Satori: Enlightnd Pag Sharing. In Proc. of th Usnix Annual Tchnical Confrnc, Jun 2009. [26] Charls B. Morry III and Dirk Grunwald. Pabody: Th Tim Travlling Disk. In Proc. of th IEEE/NASA MSST, 2003. [27] Athicha Muthitacharon, Bnji Chn, and David Mazièrs. A low-bandwidth ntwork fil systm. In Proc. ofthacmsosp,octobr2001. [28] Ntwork Applianc, Inc. NtApp V-Sris of Htrognous Storag Environmnts. http://mdia.ntapp.com/documnts/v-sris.pdf. [29] Cyril U. Orji and Jon A. Solworth. Doubly distortd mirrors. In Procdings of th ACM SIGMOD, 1993. [30] S.QuinlanandS.Dorward. Vnti:ANwApproachto Archival Storag. Proc. of th USENIX Fil And Storag Tchnologis, January 2002. [31] San Rha, Russ Cox, and Alx Pstrv. Fast, Inxpnsiv Contnt-Addrssd Storag in Foundation. Proc. of USENIX Annual Tchnical Confrnc, Jun 2008. [32] C. Rummlr and J. Wilks. Disk Shuffling. Tchnical Rport HPL-CSP-91-30, Hwltt-Packard Laboratoris, Octobr 1991. [33] Jon A. Solworth and Cyril U. Orji. Distortd Mirrors. Proc. of PDIS, 1991. [34] Niraj Tolia, Michal Kozuch, Mahadv Satyanarayanan, Brad Karp, and Thomas Brssoud. Opportunistic us of contnt addrssabl storag for distributd fil systms. Proc. of th USENIX Annual Tchnical Confrnc, 2003. [35] Paul Vongsathorn and Scott D. Carson. A Systm for Adaptiv Disk Rarrangmnt. Softw. Pract. Expr., 20(3):225 242, 1990. [36] Carl A. Waldspurgr. Mmory Rsourc Managmnt in VMwar ESX Srvr. Proc. of USENIX OSDI, 2002. [37] C. K. Wong. Minimizing Expctd Had Movmnt in On-Dimnsional and Two-Dimnsional Mass Storag Systms. ACM Computing Survys, 12(2):167 178, 1980. [38] ThodorM.WongandJohnWilks. MyCachor Yours? Making Storag Mor Exclusiv. In Proc. of th USENIX Annual Tchnical Confrnc, 2002. [39] X.Yu,B.Gum,Y.Chn,R.Y.Wang,K.Li,A.Krishnamurthy, and T. E. Andrson. Trading Capacity for PrformancinaDiskArray.Proc.ofUSENIXOSDI,2000. [40] C.Zhang,X.Yu,A.Krishnamurthy,andR.Y.Wang. Configuring and Schduling an Eagr-Writing Disk Array for a Transaction Procssing Workload. In Proc. of USENIX Fil and Storag Tchnologis, January 2002. [41] Bnjamin Zhu, Kai Li, and Hugo Pattrson. Avoiding th Disk Bottlnck in th Data Domain Dduplication Fil Systm. Proc. of th USENIX Fil And Storag Tchnologis, Fbruary 2008. 14