Using SSD-Assisted Scalable Elasticity to Improve Inline Data Deduplication Storage Systems

Size: px
Start display at page:

Download "Using SSD-Assisted Scalable Elasticity to Improve Inline Data Deduplication Storage Systems"

Transcription

1 1 Usng SSD-Asssted Scalable Elastcty to Improve Inlne Data Deduplcaton Storage Systems Yufeng Wang, Zhengyu Yang, Nngfang M, Chu C Tan Abstract Elastcty s the ablty to scale computng resources such as memory on-demand, and s one of the man advantages of utlzng cloud computng servces. Wth the ncreasng popularty of cloud based storage, t s natural that more deduplcaton based storage systems wll be mgrated to the cloud. Exstng deduplcaton systems, however, do not adequately take advantage of elastcty. In ths paper, we frst present a SSD (Sold State Drve)-based mult-ter storage archtecture to mprove cachng capacty of current deduplcaton systems. Wth such an enhanced system, we attempt to optmze deduplcaton approaches to acheve hgher memory utlzaton effcency. Then, we llustrate how to use elastcty to mprove deduplcaton based systems, and propose EAD (elastcty aware deduplcaton), an ndexng algorthm that uses the ablty to dynamcally ncrease memory and SSD resources to mprove overall deduplcaton performance. Our expermental results ndcate that EAD s able to detect more than 98% of all duplcate data, however only consumes less than 5% of expected memory space. Meanwhle, t clams four tmes of deduplcaton effcency than the state-of-art samplng technque whle costs less than half of the amount of memory. We further proposed an onlne scalng up algorthm that takes advantage of the elastcty of cloud computng to dynamcally trgger scalng up operaton. Our algorthm also offers a complete gudelne for ts large scale deployment. The expermental results show that our desgn save at least 74% of overall I/O access cost compared to the tradtonal desgn. Index Terms Deduplcaton, Flash-based SSD, Scalng Up, Mgraton, Cloud Computng, Cloud Storage Systems, Fuson Dsk 1 INTRODUCTION Data deduplcaton s a technque used to reduce storage and transmsson overhead by dentfyng and elmnatng redundant data segments. It splts fles nto multple data chunks that are each unquely dentfed by a fngerprnt (FP) that usually s a hash sgnature of the data chunk. The redundant data chunks n a fle are replaced by the ponters. Data deduplcaton has been an essental and crtcal component n cloud backup, synchronzaton and archvng storage systems. It not only reduces the storage space requrements, but also mproves the throughput of the backup and archvng systems by elmnatng the network transmsson of redundant data, as well as reduces the energy consumpton by deployng fewer dsks. Therefore, data deduplcaton plays an mportant role n exstng storage systems [1], and ts mportance wll contnue to grow as the amount of data ncreases (the growth of data s estmated to reach 35 zettabytes n the year 2020) n cloud backup, synchronzaton and archvng storage systems. There are two types of deduplcaton systems: nlne and offlne. Clents n the former system frst transmts Yufeng Wang and Zhengyu Yang share equal credt for ths paper. Yufeng Wang and Chu C Tan are wth Department of Computer and Informaton Scences at Temple Unversty. Zhengyu Yang and Nngfang M are wth Department of Electrcal and Computer Engneerng at Northeastern Unversty. metadata to the server to detect duplcatons and only the new data s gong to be sent to the server. Whle, clents n the latter system transfer all data to the server and then the server conducts the deduplcaton process. In ths paper we use the nlne deduplcaton system [2], [3]. However, n the nlne deduplcaton system, wth the ever ncreasng amount of new data, searchng chunks exstence n the huge dataset stored n the slow-access-speed dsk (usually magnetc dsk MD) s a tme consumng process, whch s the man bottleneck of deduplcaton mplementatons. To mprove overall speed, the memory (RAM) s practcally used to cache hot chunks metadata (e.g. ndex table). Unfortunately, deduplcaton systems currently need to scale to tens of terabytes to petabytes of data volume, the ever ncreasng amounts of new data wll fnally flush the exstng useful data cached n the memory, whose penalty beats the beneft brought from the performance gap between n-memory searchng and dsk lookups. Motvated by ths dsk I/O speed bottleneck ssue, we start to solve t n three dfferent drectons: Our frst drecton s to mprove the performance of the cache system. Snce we cannot easly ncrease RAM s sze and MD s access speed wth a low cost, a feasble way s to ntroduce an extra storage ter between RAM and MD. Ths storage ter should be faster than MD and cheaper than RAM. We fnd that NAND-flash based sold-state dsks (SSDs) meet these two condtons and can be a good canddate for the mddle storage. We thus adopt a three-ter cachng system, consstng of RAM, SSD and

2 2 MD, to ncrease the speed for accessng huge dataset n the MD and enlarge the cache sze for storng hot data. Our second drecton s to best use the cache through downsamplng algorthms. Snce cache systems cannot be nfnte large, we need to flter non-crtcal data out of the cache, and decde what to load from the MD to SSD and RAM. Exstng research [4] [5] [6] n ths area proposed dfferent samplng algorthms to ndex more data usng less memory. The key features of our soluton s that our deduplcaton algorthm s compatble wth current deduplcaton technques such as samplng to take advantage of localty [5], [6], and content-based chunkng [7] [9]. Our last drecton s to dynamcally scale up the storage system durng runtme. In real cases, due to customer s hgh desred deduplcaton degree or specal ncomng data types that naturally have low deduplcaton rato (lke vdeos and encrypted fles), the nsuffcent RAM (ncludng SSD) cachng system wll be eventually flushed by the ncomng data stream. Ths ssue cannot be thoroughly solved by the mult-ter cachng system and downsamplng algorthms. Fortunately, the key property of cloud computng elastcty can be used to mprove deduplcaton systems by allowng deduplcaton storage systems to dynamcally adjust the amount of memory resources as needed to detect suffcent amount of duplcate data. In ndustral envronments, the flexblty and cost advantages of cloud computng provders such as Azure [10], Amazon [11], etc. make deployng new resources n the cloud durng runtme as an possble opton. Based on these facts, we then proposed an elastcty-aware deduplcaton (EAD) algorthm to dynamcally assgn new resources. EAD solves the two man problems n ths topc when to trgger the scalng up operaton and how much new resources are enough for the future. In summary, n ths paper we desgn a mult-ter storage archtecture to enhance the cachng capacty of the deduplcaton system. We optmze the approach to clam hgher RAM utlzaton effcency and best use the cachng system, by usng downsamplng-based algorthms. We fnally propose an elastcty-aware deduplcaton (EAD) algorthm that takes advantage of the elastcty of cloud computng to dynamcally trgger scalng up operaton. Furthermore, we also present a detaled analyss of our algorthm, as well as the evaluaton usng extensve experments on real dataset. The rest of the paper s organzed as follows: Secton 2 explores the background of the deduplcaton system bottleneck and approaches. Secton 3 descrbes the detal of EAD desgn also ncludng scalng up strategy and the storage herarchy. Secton 4 evaluates our soluton, and Secton 5 contans the related work. Secton 6 draws concluson of the paper. 2 BACKGROUND Ths secton wll dscuss the background knowledges of deduplcaton, ncludng ts bottleneck, and further explore some alternatves to mprove deduplcaton performance. 2.1 Data Deduplcaton Bascs As shown n Fgure 1, n a typcal cloud-based deduplcaton system, there s a DedupServer takng charge of processng deduplcaton requests. The DedupServer tself does not store any chunk contents, but only keeps the entre metadata lbrary n ts dsk whch tracks the fngerprnt of each chunk and ts address n the storage pool. In addton, DedupServer s dsk memory caches the hot ndex table entres to accelerate the overall access speed. The basc deduplcaton process frst dves the ncomng data stream nto (fxed/varable sze) chunks, and segments (groups of chunks based on ther localty). Next, duplcate chunks are then dentfed by ther hash fngerprnts (FP) calculated on ther contents. The server then needs to lookup each chunk hash n an ndex t mantans for all chunks seen so far for that storage locaton (dataset) nstance. If there s a match, the ncomng chunk contans redundant data and can be deduplcated; f not, the (new) chunk needs to be added to the system and ts hash and metadata need to be nserted nto the ndex. Detaled algorthm s shown n Algorthm 1, where a memory-dsk deduplcaton archtecture s appled. Dedup Server Clents Cache Memory Dsk Entre Metadata Lbrary Storage Pool Entre Chunk Content Lbrary Clent 1 Clent 2 Clent 3 Fg. 1: A typcal cloud-based deduplcaton system 2.2 Understandng SSDs As dscussed before, the slow-access-speed of dsk s the bottleneck of a deduplcaton system. To address ths ssue, we consder to use SSDs as the mddle storage ter between RAM and MD. Table 1 shows a comparson between RAM, SSD and MD. Specfcally, flash-based SSDs have the followng characterstcs whch help to mprove effcency of the deduplcaton desgn: 1) Hgh access speed: Dfferent from MDs, flashbased SSDs are made of slcon memory chps and have no movng parts. Thus both read and wrte response tmes of SSDs are sgnfcantly better than those of MDs. As shown n Table 1, SSDs are almost 4.29 faster than MDs. In consumer products,

3 3 TABLE 1: Comparson between RAM, SSD and MD (updated n Sept 2014 US) Storage Speed Storage capacty Prce per byte Power outage mpact RAM (Memory) 6 GB/S 1-8 GB USD/GB Data lost SSD (Dsk) up to 600 MB/S GB, up to 2 TB 0.45 USD/GB Data stored MD (Dsk) up to 140 MB/S up to 10TB 0.05 USD/GB Data stored Algorthm 1 Basc deduplcaton strategy wth memory 1: The ncomng segment S n : Deduplcaton Phase I: Identfy duplcate chunks 2: for all x S n do 3: Clent: Send F P x to Server 4: Sever: Search F P x n IndexTable cached n Memory 5: f Found then 6: Set x dup (x dup ) 7: else 8: Sever: Search F P x n IndexTable n Dsk 9: f Found then 10: Set x dup (x dup ) 11: Load F P x from Dsk to cached IndexTable n Memory 12: else 13: Set x unq (x unq ) Deduplcaton Phase II: Data transmsson 14: for all x S n do 15: Transmts x unq along wth only metadata of x dup 16: System fnshes processng S n the maxmum transfer rate typcally ranges from about 100 MB/s to 600 MB/s, dependng on dsk types. Whle n the enterprse market, venders offer devces wth mult-gb/s throughput. 2) Store data after power outage: Not lke RAM, SSD s able to preserve data when power outage happens, whch s a bonus for relablty, because deduplcaton system cannot recover data losslessly wthout entre recpe. 3) Large sze: Snce 2014, SSDs wth szes up to 2 TB become avalable, whle 128 to 512 GB drves are more common. Although SSD cannot easly get same sze as MD, t has much larger capacty compared to RAM. 4) Affordable expense: The prce (e.g. cost per ggabyte) of SSDs changes rapdly, and keeps droppng down n recent years. Today cost per ggabyte of SSDs are about 1/68 of RAMs. These benefts enable SSDs to be wdely used n almost every sde of modern computng systems, from low-end PCs to hgh-end servers n supercomputng, thus makng SSD-based storage systems ncreasngly attractve to both academa and ndustry. 2.3 Understand Downsamplng In ths secton, we wll frst nvestgate why we cannot fnd a best memory sze and fx to that, and then gve an overvew of downsamplng algorthms, whch can be used to reduce data set sze cached n memory Why Not Choose The Best Memory Sze? One ntutve alternatve s to try and estmate the approprate amount of memory that s needed pror to deployng the deduplcaton system. A straghtforward approach s to perform smple proflng on a sample of data and compute the expected memory requrements based on the results. To llustrate why ths s dffcult to choose the rght amount of RAM n practce, we conducted a smple experment that represents a storage system used to archve vrtual machne (VM) mages (ths s a common workload used n deduplcaton evaluatons [6], [12]). We want to maxmze deduplcaton rato to conserve bandwdth and storage costs. For smplcty, we assume that all VMs are runnng the same OS, and are the same sze. A smple way to estmate memory requrements s to frst estmate the ndex sze for a sngle VM, and then use that to estmate the total RAM necessary for all users. Thus, gven n users and each user stores the same sze VM, we estmate m amounts of RAM to ndex one user such that our backup system wll need n m amounts of RAM. We can derve m va experments. Fgure 2 shows the results for two VMs. As far as we know, VM2 contans more text fles whle VM1 has more vdeo fles. Number of ndex entry slots ndcates how much nformaton of already stored data the system can provde for duplcate detecton. We set fxed number of ndex entres for duplcate detecton and gradually ncrease t. We see that when ndex entry slots number ncreases to 270 thousand, both VMs exhbt the same amount of duplcate data. As we ncrease the ndex sze, VM1 shows lmted mprovement, whle VM2 shows much better performance. If we had used VM1 to estmate m would have led to much less bandwdth savngs, especally f a sgnfcant number of VMs resemble VM2. Buyng too much memory s wasteful f most of the data resemble VM Usng Localty And Downsamplng We hereby provde an overvew of basc samplng approached by [5], [6], followed by a dscusson on the advantages of samplng and ts lmtatons. Storage systems that make use of data deduplcaton generally operate on chunk-level, and n order to quckly determne potental duplcate chunks, an ndex for exstng chunks needs to be mantaned n memory. For example, a 100TB data wll need about 800GB RAM for the ndex under standard deduplcaton parameters [13], whch makes keepng the entre ndex n memory challengng. Typcal

4 4 Duplcate data detected(mb) Duplcate detected by usng dfferent ndex szes VM1 VM # of entres n ndex x 10 5 Fg. 2: Intutve test on amount of duplcate detected on two equal-szed(4.7gb) VMs by usng equal-sze ndexes deduplcaton parameters whch have been expermentally shown to gve good performance [13] s gven n Table 2. We see that n order to support 12.5 bllon (100TB/8KB) chunks, we need 800GB amounts of RAM for the ndex. As data sze ncreases to 300TB, we need to support all 37.5 bllon chunks, and 2400 GB of RAM s needed only for ndexng (C E/c k = ). These estmated fgures are naturally conservatve, snce the actual amount of replcated chunks are unknown at run tme. The prncple of localty s used to desgn samplng algorthms that utlze smaller ndex sze whle provdng good performance [5]. The localty prncple suggests that f chunk X s observed to be surrounded by chunks Y, Z, W n the past; the next tme chunk X appears, there s a hgh probablty that chunks Y, Z, W wll also appear. In samplng-based deduplcaton, the data wll be frst dvded nto larger segments, each of whch contans thousands of chunks. Deduplcaton s executed based on these segments by dentfyng exstence of ther sampled chunks fngerprnts n the ndex. If a chunk s fngerprnt s found n the ndex, the correspondent segment whch contans that chunk wll be located and fngerprnts nformaton of all the other chunks n ths segment wll be pre-fetched from dsk to the chunk cache n memory. Downsamplng algorthm [6] works as an optmzed samplng approach, by takng advantage of the localty prncple. The dfference s that the samplng rate s ntalzed as 1,.e., t pcks all the chunks n a segment as ts sampled chunks. As the amount of ncomng data ncreases, ths value gradually decreases by droppng half of ndex entres. Thus the ndexng capacty doubles by only acceptng a part of chunks fngerprnts as samples to represent each segment. In other words, nstead of ndexng chunks X, Y, Z, and W n RAM, the downsamplng algorthm wll only ndex chunk X (or another one among four of them) n RAM after two tmes of adjustments, and the rest on dsk. The above samplng-based approaches have two man drawbacks. The frst (obvous) drawback s that not all data exhbt localty [14], and thus samplng algorthms do not work well wth these datasets. The second drawback s that even for data that exhbts Termnology Chunk sze c k Segment sze S Physcal storage capacty C Number of chunks N Index entry sze E Value c k = 8KB S = 16MB C = 300T B N = C/S E = 64B TABLE 2: An example of a cloud-base backup system confguraton localty, t s dffcult to select the correct samplng rate or how to adjust t, due to the large varance n possble deduplcaton rato [15] [16]. 2.4 Understandng Elastcty Awareness And Scalng Up Our last drecton s to dynamcally scale up the storage system durng runtme. After several downsamplng operatons, the assgned RAM may stll not large enough to process the ncomng data stream. Then we need to trgger the onlne scalng up operaton, whch s takng advantage of cloud computng s elastcty feature. Scalng up operaton can mprove deduplcaton systems by allowng deduplcaton storage systems to dynamcally adjust the amount of memory resources as needed to detect suffcent amount of duplcate data. Ths s especally useful when the ndex used for deduplcaton s often kept wthn the memory to avod the performance bottleneck from dsk I/O operatons. There are two problems n elastcty awareness and scalng up operaton: 1) When to trgger the scalng up operaton? The man gude s to trgger the scalng up operaton only when the result can beneft to the performance. We need to dstngush whether poor deduplcaton performance s due to overly aggressve downsamplng (caused by user s hgh expected deduplcaton degree) or nherent wthn the dataset type (dedup-unfrendly datasets that have low duplcaton rato). To solve that, we need to answer these three questons: () f the the RAM (for the cached ndex) s close to the preset lmtaton, how does system detects and evcts some noncrtal entres? () when does the system need to trgger downsamplng process? () after how many tmes of downsamplng, the system fnally requests to scale up? 2) How to assgn new resources? Both the prcng costs and resource utlze ratos should be consdered when assgnng new resources. A straghtforward strategy s to smply double (or preset n tmes) the current sze when scalng up. However even regardless of the hgh expense, ths exponental expanson brngs a bg waste snce later ncomng stream may not need so large space. An optmzed approach s requred to obtan hgher RAM utlzaton effcency, whch should expend RAM sze based on the occurrences of downsamplng whch may predct the dataset s future.

5 5 3 EAD: ELASTICITY-AWARE DEDUPLICATION Storage deduplcaton servces n the cloud often run n vrtual machnes (VM). Unlke a conventonal OS whch runs drectly on physcal hardware, the OS n a VM s runnng on top of a hypervsor or vrtual machne montor, whch n turn, communcates wth the underlyng physcal hardware. The hypervsor s responsble for ncreasng RAM resources to the vrtual machne (VM) dynamcally. Ths can be done n two generc ways. The frst s to use a balloonng algorthm to reclam memory from other VMs runnng on the same physcal machne (PM) [17]. Ths s a relatvely lghtweght process that reles on the OS s memory management algorthm, but can only ncrease relatvely small amounts of memory. Deduplcaton systems that requre ncreasngly larger amounts of memory need to run a VM mgraton algorthm [18], [19]. In VM mgraton, the hypervsor mgrates the RAM contents from one PM to another wth suffcent memory resources [18]. Regardless of the mgraton algorthm used, some downtme can nevtably occur when swtchng over to a new VM [19]. The second s a nave approach towards ncorporatng elastcty s to ncrease the memory sze once the ndex s close to beng full. Ths nave approach does not perform well snce frequent scalng up or mgraton nduce a hgh overhead. Furthermore, the nave approach always retans the entre old ndex durng each scalng up or mgraton, even those ndex entres do not fnger prnt many chunks. Such poor performng ndex entres take up valuable ndex space wthout provdng much benefts. Our approach combnes the benefts of downsamplng [6] and VM scalng up to allow users to mantan a satsfactory level of performance by adjustng samplng rate and memory sze accordngly. Our system desgn conssts of two components, an EAD clent that s responsble for fle chunkng, fngerprnt computaton and samplng, and an EAD server whch controls the ndex management and other memory management operatons. The EAD clent s run on the clent sde, for nstance, at the gateway server for a large company. The EAD server can be executed by the cloud provder. The entre system desgn s shown n Fgure 3. Only unque data s supposed to be store n Physcal Storage. The Fle Manager s responsble for data retreval and mantenance, how t works s out of ths paper s scope. 3.1 EAD Algorthm Dfferent types of users have dfferent deduplcaton requrements. Some users wll be wllng to tolerate worse deduplcaton performance n exchange for lower costs, whle others wll not. To accommodate dfferent requrements, EAD s desgned to allow a user to specfy a scalng up (or mgraton) trgger, Γ ( (0, 1)), whch specfes the level of deduplcaton performance the user s wllng to accept. Fg. 3: EAD nfrastructure. Deduplcaton performance s usually measured by reducton rato [20], [21], whch s the sze of the orgnal dataset dvded by the sze of the dataset after deduplcaton. To help the user select the mgraton trgger, we defne Deduplcaton Rato (DR), Deduplcaton Rato = 1 Sze after deduplcaton Sze of orgnal data Intutvely, we would lke to frst apply downsamplng algorthms untl the deduplcaton performance becomes unsatsfactory, and then mgrate the ndex to larger memory n order to obtan better performance. EAD wll mgrate to larger RAM only when mgraton wll result n deduplcaton performance better than Γ. Ths has an mportant but subtle mplcaton. EAD wll not always mgrate when deduplcaton performance falls under Γ, but only when mgraton wll mprove performance. Ths s mportant because gven a dataset that nherently exhbts poor deduplcaton characterstcs [22], addng more RAM wll ncur the mgraton overhead wthout mprovng deduplcaton performance. Ths means that EAD cannot smply compare the measured DR aganst Γ because the measured DR may not necessarly reflect the amount of duplcaton that exsts. To llustrate, let us assume that the deduplcaton system measures ts DR and t s less than Γ. There are two possbltes. The frst s that the system has performed overly aggressve downsamplng, and can beneft from ncreasng RAM. The second possblty s that the dataset tself has poor deduplcaton performance, e.g. data n multmeda or encrypted fles. In ths case, ncreasng RAM does not result n better performance. How our EAD algorthm determnes when to mgrate to more RAM resources can be found n Alg. 2. It executes n two phases as generc n-lne deduplcaton systems do. We use S n and x to denote the ncomng segment and chunks nsde t. F P x represents the fngerprnt of chunk x. In Phase I the EAD Clent sends all chunks fngerprnts nformaton (F P x ) of sampled chunks used for estmaton and duplcaton detecton, n each segment S n to EAD Server. The latter wll search ndex table T and chunkcache for duplcaton dentfcaton, as well as updatng estmaton base B. Each chunk x s then marked as dup or unq, ndcatng t s a duplcate or unque chunk. EAD Clent only transmts ( x S n ), ncludng labelng F P est x and F P dedup x (1)

6 6 unque data chunks along wth metadata of duplcate ones to EAD Server n Phase II, savng bandwdth and storage space. At the meantme, current samplng rate R 0 s subject to change to R based on deduplcaton performance. Detals on features of the EAD algorthm wll be presented next. 3.2 Estmatng Possble Deduplcaton Performance One of the key features of EAD s that the algorthm s able to determne whether mgraton wll be benefcal. In order to dstngush whether poor deduplcaton performance s due to overly aggressve downsamplng or nherent wthn the dataset, we frst need to be able to estmate the potental DR of the dataset. Obtanng the actual DR s mpractcal snce t requres performng the entre deduplcaton process. Pror work from [23] provded an estmaton algorthm to estmate the deduplcaton performance for statc, fxed-sze data sets. Ther algorthm requres the actual data to be avalable n order to perform random samplng and comparsons. However, n our problem, the dataset can be vewed as a stream of data. There s no pror knowledge of the sze or characterstcs of the data to be stored n advance. We also cannot perform back and forth scannng of the complete dataset for estmaton. In our EAD algorthm, we let the EAD Server mantan an estmaton base B. The EAD Clent randomly selects κ fngerprnts from each segment and sends them to EAD Server to be stored n B. Suppose there are n s segments come n, there wll be κ n s samples, whch wll ncrease along wth the ncreasng amount of ncomng data. Each entry slot n B ncludes a fngerprnt as well as two counters, x c1 and x c2, where counter x c1 records the number of occurrences of fngerprnt F P x appears n the B, and x c2 records the number of occurrences of fngerprnt F P x appears among that of all the chunks uploaded. We ntegrate our estmaton process nto the regular deduplcaton operatons so as to avod the separate samplng and scannng phases by [23]. Whle the clent sends the samples for duplcaton searchng to the storage server, these samples for estmaton are transmtted at the same tme for updatng B. Durng the fngerprnt comparson of ncomng chunks aganst that n chunk cache, we update B agan, ncrementng the counter x c2 by one every tme ts correspondent fngerprnt appears. Thus, there s no extra overhead for our estmaton purpose. Usng B, we can compute the estmated deduplcaton rato, EDR, as EDR = 1 1 κ n s x B x c1 x c2. (2) The computaton of EDR happens whle the ndex sze s approachng the memory lmt. Only n the case that DR s smaller than Γ EDR, there wll be a potental performance mprovement by mgraton, and EAD wll mgrate the ndex to larger RAM. Otherwse, EAD wll apply downsamplng on the ndex as the exchange for larger ndexng capacty. Algorthm 2 Elastc deduplcaton strategy 1: The ncomng segment S n : Deduplcaton Phase I: Identfy duplcate chunks 2: x S n : EAD Clent sends F P x to EAD Server 3: for all F Px dedup do 4: f F Px dedup T then 5: Locate ts correspondent segments S dup x j S dup : Fetch nformaton of x j (F P xj ) Set x j chunk cache 6: else 7: Add F Px dedup to T 8: for all F Px est do 9: f F Px est B then 10: x c1 = x c : else 12: Add F Px est to B Set x c1 = x c2 = 0 13: for all x S n do 14: x k chunk cache : Compare F P x wth F P xk 15: f F P x = F P xk then 16: Set x dup (x dup ) 17: else 18: Set x unq (x unq ) 19: x l B : Compare F P x wth F P xl 20: f F P x = F P xl then 21: x c2 = x c2 + 1 Deduplcaton Phase II: Data transmsson 22: for all x S n do 23: Transmts x unq along wth only metadata of x dup 24: EAD fnshes processng S n 25: f Index s approachng the RAM lmt then 26: f DR < Γ EDR then 27: f R 0 = 1 then 28: EAD sets Γ = DR EDR 29: else 30: EAD trggers mgraton, settng rate R = R 0 31: else 32: EAD sets R = R0 3.3 EAD Scalng Up Algorthm The performance of the EAD algorthm can be further mproved by observng addtonal nformaton obtaned durng the run tme and then adjustng the parameters of the algorthm. (1) Adjustng Γ. The parameter Γ s specfed by the user, and ndcates the user s desred level of deduplcaton performance. However, the user may sometmes be unaware of the underlyng potental deduplcaton performance of the data, and set an excessvely hgh Γ value, resultng n unnecessary mgraton over tme. We DR EDR adjust the user s Γ value to after each mgraton, and also n the case that DR has not reached accepted performance even the samplng rate s one. So that t represents the current system s maxmum deduplcaton ablty. In ths way, EAD s able to elastcally adapt varatons on ncomng data.

7 7 (2) Amount of RAM Post Mgraton. A smple way to compute the amount of RAM s allocatng after mgraton by usng a fxed szed, e.g. doublng the RAM each tme ( = 2). We then reset the samplng rate back to 1, and start all over agan. Another way s based on the observaton that the samplng rate before the latest downsamplng operaton s able to support a satsfyng performance, so that EAD wll adjust current samplng rate to the one before latest downsamplng operaton. Thus, the specfc amount of ncomng data wll requre dfferent amount of ndex spaces based on the adjusted samplng rate, mplyng that there exsts subtle relatons among R 0, and sze of RAM. When deduplcaton performance may be not satsfactory, EAD wll mgrate to more RAM as well as applyng a hgher samplng rate for future deduplcaton. A smple approach could be that RAM ncreases at the same changng rate ( ) of samplng rate. We can mprove over ths process by observng the next to last samplng rate used pror to mgraton. Ths rate s the last known samplng rate that produced acceptable deduplcaton performance. Ths s vald because f t dd not produce an acceptable performance, EAD would have already trggered mgraton. We hereby propose to utlze a more conservatve ndex RAM ncrementaton polcy based on above analyss. We ntroduce a parameter d (ntalzed as zero) to record occurrences of downsamplng, every tme the downsamplng happens, d ncreases by one. We set the New Index Sze (RAM new ) after mgraton followng the rule: { RAM org d = 1 RAM new = [1 d 1 =1 1 (3) ] RAM org d 2 Where RAM org represents the orgnal ndex sze before mgraton. As the tmes of downsamplng operaton ncrease, EAD requres less amount of RAM for ndex table after mgraton. Compared wth always requrng tmes of orgnal RAM, such optmzed approach s able to clam hgher memory utlzaton effcency. (3) Managng Sze of B. One concern wth our estmaton scheme s that the sze of B may become too large. If we need a large amount of RAM to store B, we wll be wastng RAM resources that could be used n the ndex. In practce, the sze of B s relatvely modest. Each entry n B conssts of a fngerprnt and two counters. Usng SHA-1 to compute the fngerprnt results n a 20 byte fngerprnt. An addtonal four bytes are used for each counter. Thus, each B entry s 28 bytes, ndcatng that the total sze of B would be at most approxmately MB to support 1 TB of data. In our experment, t only requres 4.32 MB for estmatng GB dataset. 3.4 Scalng Up Strategy Whle the scalng up (mgraton) has fnshed, we are left wth the orgnal ndex (coped over), and space for the new ndex, also the system wll apply a new samplng rate, whch s hgher than orgnal one, n order to keep a satsfyng deduplcaton performance. At ths stage, EAD wll compensate the poor deduplcaton performance due to prevously too sparse samplng rate n two steps: 1) Search through the orgnal ndex table, re-detect duplcaton chunks from already stored segments. Notce that the read/wrte operatons may brng unexpected cost, so that EAD only process lmted number of segments whch are able to clam duplcate chunks. Detaled mechansm wll be explaned later. 2) We know that t s possble not all ndex entres n the old ndex are useful, meanng that some entres contan fngerprnts for chunks that are unlkely to be encountered agan. Keepng these entres n the new merged ndex wll waste ndex slots. Therefore, after mgraton and duplcaton redetecton, these entres are removed from the ndex table. To dentfy segments that contan undetected duplcate chunks s a nontrval task. As a samplng approach, data segments are only represented by ther sampled chunks, whose fngerprnts are stored n the ndex. Thus EAD have to dentfy those segments only by searchng through ndex table. We propose to nject addtonal nformaton nto ndex to assst fnshng ths task: a counter (count F P, ntalzed as zero for new added entres) s used for each ndex entry to record ts httng tmes, whch we call httng rate. Every tme when an entry has been found a match, ths counter ncrements by 1. Therefore the larger the counter s, the more duplcate chunks ths entry can detect. Among those segments hooked by FPs wth low httng rate, there exsts evcted duplcate chunks. Ths concluson s derved based on the followng analyss of FPs n the ndex: Entres wth hgh httng rate. These FPs n the ndex ndcates that segments have found matches and lots of chunks near the sampled chunks are dentcal, whch s the natural results of chunk localty. Theoretcally, the hgher httng rate they have and the more entres whch have such hgh httng rate, the more space wll be saved. Entres wth low httng rate. The explanaton for them: some segment themselves share few chunks wth stored ones, resultng n lower ndex matchng rate; other of them share lots of chunks wth stored segments, however they are not hooked by rght FPs because of sparse samplng rate, thus no or not enough matches are found from ther sampled chunks fngerprnts n ndex. Majorty of duplcate chunks evcted could be elmnated from segments who are ndexed by FPs wth low httng rate. Compared wth reprocessng all the segments on the storage, EAD s able to select only part of them for detectng majorty duplcate evctons based on ther httng rate: It uses entres wth low httng rate to track ther correspondent segments and detect evcted duplcate chunks. The threshold for labelng httng rate as hgh or low s not arbtrary. Suppose

8 8 that we have n chunks come n for a backup process, the measured Deduplcaton Rato s f mr (f mr < Γ EDR). At the meantme, we have counters values as {0, 1,, c,, m}, ther correspondent amount of entres are {n 0, n 1,, n c, n m } (.e. There are n 0 entres whose counter values are zero, etc.). Therefore the total evcton amount of chunks (n evt ) whch are expected to be found duplcate on the cloud s calculated as: n evt = n (Γ EDR f mr ) (4) Assume that the orgnal samplng rate s R 0, thus the mnmum number of ndex entres to be selected s R 0 n evt. Based on above calculaton, EAD starts pckng ndex entres wth counter value as zero (n 0 ), f n 0 < R 0 n evt, EAD pcks entres wth httng rate as one and vce versa. Untl t satsfes: c n R 0 n evt (0 c m) (5) =0 By dong ths, those segments that are mostly potental for mprovng deduplcaton performance are dentfed. Then we locate segments ndexed by these FPs, choosng new set of samples as well as detectng duplcate chunks from them. We llustrate above process n detals as Algorthm. 3. Choosng new set of samples wll result extra FPs, whch wll be put nto the addtonal RAM as a part of new ndex table. Besdes, based on former analyss, those entres lead to poor performance are removed from orgnal ndex table, whch could clam addtonal savngs on RAM, makng our soluton more memory effcent. Furthermore, to avod addng them back to the ndex table, a Bloom Flter (BL) [24] s used on the cloud server to record hash nformaton of removed FPs. By dong so, old ndex wll not be entrely kept and valuable ndex space wll be released for future use. Whle scalng up fnshes, EAD wll merge old and new ndexes, calculatng updated Deduplcaton Rato. If Deduplcaton Rato s stll lower than Γ EDR after duplcaton re-detecton, EAD wll reset the value of Γ, makng Γ EDR equals to the value of current Deduplcaton Rato. So that the requrement on deduplcaton performance wll not surpass the system ablty. 3.5 Storage Herarchy The basc desgn s a classcal smple two-level herarchy: RAM for IndexTable (along wth other metadata tables lke EstmateBase and ContanerRecord) and ChunkCache, and MD for SegmentChunkHash. In detal, for the RAM level, we dvde the RAM nto several parttons (note that the RAM mentoned here only means the user accessble RAM part, whch gnores the part occuped by the operatng system as well as other background apps): the frst partton s for cachng IndexTable and some counters, whch wll keep occupyng more and more space n RAM durng runtme. Therefore the larger of avalable RAM assgned for t s, the hgher ht rato the system can acheve. On the other Algorthm 3 Elastc scalng up strategy 1: The ndex entry x (wth fngerprnt F P x ) has been chosen 2: Locate ts correspondent segment Seg x 3: f Seg x has not been processed then 4: Select new sample chunks from Seg x based on current samplng rate 5: for new sampled chunk (wth fngerprnt F P y ) selected do 6: f F P y fnds matchng record n the new ndex (F P y = F P j, F P j ɛf P newndex ) then 7: Locate Seg j and pull out ts FPs to chunk cache for duplcaton re-detecton 8: else 9: F P y has been added nto the new ndex as a new entry slot 10: for α = 1 : total number of chunks n Seg x do 11: Compare F P α wth those n Seg j 12: f F P α fnds match then 13: Chunk α s duplcate 14: Entry x wll be removed from old ndex, Bloom Flter records nformaton of F P x hand, the second RAM partton s an solated temporary loadng area, whch stores ChunkCache (meta data of all chunks from a certan segment), and data there wll be not useful after the comparson process s fnshed. In another word, ChunkCache wll not keep occupyng more and more space n RAM. For the MD level, the entre database of SegmentChunkHash s stored here. When there s a ht mss n the RAM, exstng chunks metadata are loaded from MD to RAM, and new segments and chunks data are updated nto MD. However, there are two man lmtatons of ths desgn: () small RAM sze results n frequently evctng cached IndexTable and lower the ht rato; and () MD s low access speed wll extremely decrease the overall speed. Thus, to mprove the deduplcaton speed, we need to ncrease the cache ht n RAM s IndexTable partton and speedup MD access. Before we desgn new herarchy of RAM, SSD and MD, we need to consder these two queston: () when does t make economc sense to make a pece of data resdent n RAM? and () when does t make sense to have t resdent n dsk? The answer s that RAM keeps most frequently used IndexTable so that the hot data can be accessed from a hgher speed storage RAM. In another word, RAM (a hgher speed storage level) s a typcal cache of MD (a lower speed storage level). A feasble soluton (as shown n Fgure 4) s to use the SSD as the cache of the MD, whch unfes a hgh-speed SSD and a large-capacty hard drve. We name t as the Fuson Dsk (FD) desgn. Bascally, ths desgn focuses on mprovng the dsk set s overall access speed, so we do not need to change any behavor between RAM to the dsk set n our deduplcaton system. RAM stores

9 RAM SSD Dsk Entre Metadata Lbrary HotData, whle SSD stores both HotData (as a copy of MD RAM) and W armdata. Memory Dsk Hot Hot Fuson Dsk Desgn RAM Warm Entre Metadata Lbrary SSD MD Fg. 4: Structure of fuson dsk desgn The last queston s what cachng algorthm should be appled on the SSD?. These followng thngs are key concerns for cachng algorthms: () when and what at to put n the cache or slow storage when facng wth new ncomng data; () when and what to admn from the slow storage to cache; and () when and what to evct from cache to slow storage. In the baselne EAD desgn, we use LRU algorthm whch dscards the least recently used tems frst. In general, there s no one sngle algorthm fx all traces. Thus, we need to conduct a tracedrven smulaton test to analyze dfferent algorthms. 4 EVALUATION In our evaluatons, we collected a dataset consstng of VMs that all run the Ubuntu OS, but each VM has dfferent types of software and utltes nstalled and contans dfferent types of applcaton data, whch majorty comes from Wkmeda Archves [25] and OpenfMRI [26]. The total sze of our dataset s around GB. We evaluate our soluton, denoted as Elastc n the fgures, aganst two alternatves approaches. The frst alternatve, denoted as FullIndex, represents an deal stuaton where there s unlmted RAM avalable. Ths wll serve as an upper bound on the total amount of space savngs. The other alternatve s denoted as DownSample, whch s a recent approach [6] that dynamcally adjusts the samplng rate to deal wth nsuffcent RAM. 4.1 Deduplcaton Rato We hereby compare our algorthm wth a generc deduplcaton mechansm wthout samplng and a state-ofart hgh performance deduplcaton strategy wth downsamplng mechansm [6]. Before deployng the deduplcaton process, we allocate a specfc amount of RAM for ndex n dfferent strateges. Table 3 shows the amount of RAM allocated for dfferent deduplcaton strateges. We set the sze of each entry slot n the ndex as 64 bytes, whch conssts of three parts: FP, chunk metadata (storage address, chunk length,etc) and counter, whch s 20 bytes (SHA-1 hash sgnature [27]), 40 bytes and 4 bytes, respectvely. These szes may vary under dfferent Strategy # of ndex entres sze of ndex(mb) Full ndex Wth down-sample EAD TABLE 3: RAM deployment for ndex under dfferent deduplcaton strateges. We set the down-samplng trgger as 0.85, whch means whle the storage s approachng 85% of ts current lmt, the ndex wll be down-sampled (Half of ts entres wll be removed. e.g., delete ndex FPs wth F P mod 2 = 0 ). hash functons or addressng polces, however t wll not dffer too much. We assume that the capacty s 75 GB. Nearly 10 mllon ndex entres are needed to ndex all the unque data f we do not use any samplng strateges. Whle under the down-sample strategy wth the mnmum samplng rate of 0.05, we need 500K ndex entres for 75 GB of unque data. EAD always pcks a much more conservatve sze of ndex, specfcally only 100K entry slots n ths case. We here use Normalzed Deduplcaton Rato as the metrc for deduplcaton rato comparson. It s defned as the rato of measured Deduplcaton Rato to Deduplcaton Rato of FullIndex deduplcaton. Note that FullIndex detects all the duplcate data chunks and can clam hghest deduplcaton rato. Thus, such a metrc s meanngful because t ndcates how close the measured deduplcaton rato s to the deal deduplcaton rato achevable n the system. Fgure 5a shows the Normalzed Deduplcaton Rato of the above deduplcaton strateges. Downsamplng and computaton of EDR happen when the usage of ndex approaches 85% of ts capacty. For the down-sample strategy, t has the rato hgher than 99.5 %, showng the benefts of takng advantage of localty. The EAD does not clam equally hgh rato, however the gap s less than 2 %. Also consder that the performance requrement for EAD s defned by Γ, whch s 0.95 n ths case, the performance of Elastc s always hgher than 98%, performng better than what s requred. Notce that the rato shows us fluctuatons, whch ndcates the nconsstency on data content among dfferent VMs. When about 5% of data has been processed, there s a performance thrvng, whch cannot appear n DownSample. Ths can be explaned that because of trval sze of ntal ndex sze, Elastc cannot detect enough duplcate chunks, leadng to a poor performance, whch trggers the mgraton and thrves ts performance. Such elastc behavor s the unque feature whch cannot be observed n the other two approaches. However, purely comparng the deduplcaton rato s not far for evaluatng ther performance. Snce that these three strateges spend dfferent amount of RAM for ndex from the start. Fgure 5b shows how samplng rate and number of ndex slots used vary above cases. Obvously, t brngs too much memory cost wthout samplng. We notce that both DownSample and Elastc have comparatvely very low memory cost (small number of ndex entry slots). Also we can observe that when about 5% of data has been processed, the samplng rate n Elastc ncreases, reflectng ts feature 9

10 10 of elastcty. The above results show that EAD s able to use less RAM space to acheve a satsfyng deduplcaton rato, whch s only slghtly lower than the other two. Next we derve a more meanngful metrc Deduplcaton Effcency, as a sngle utlty measure that encompasses both deduplcaton rato and RAM cost, to make a more far comparson among these three strateges. (a) Deduplcaton rato comparson. Normalzed Deduplcaton Rato (%) Samplng Rate 100 # of Index Slots FullIndex DownSample Elastc Amount of data processed (%) (b) Index usage comparson. 4 x Amount of data processed (%) FullIndex DownSample Elastc FullIndex DownSample Elastc Amount of data processed (%) Fg. 5: The samplng rate s 1 for all of them at the start of back up. The ndex mgraton n EAD s trggered when the normalzed deduplcaton rato drops below 95 % (Γ=0.95), after that, samplng rate doubles( =2). 4.2 Deduplcaton Effcency As dscussed n Secton 4.1, nether deduplcaton rato nor memory cost alone can fully represent the system performance. Therefore we defne: Dedup Effcency = Duplcate Data Detected Index Entry Slots as a more advanced performance evaluaton crteron. By usng ths crteron, we make more farly comparsons among EAD and the other two solutons, as shown n Fgure 6. It shows that Elastc outperforms both Downsample and FullIndex on effcency. Notce that Elastc always yelds a hgher effcency, almost 4 tmes of that from Downsample and 30 tmes of that from FullIndex. Ths s because that ts elastc feature enables t to utlze as lttle memory space as possble to detect enough duplcate data as requred, avodng memory waste as the other two do. (6) Deduplcaton Effcency (MB/Slot) FullIndex DownSample Elastc Amount of data processed (%) Fg. 6: Deduplcaton effcency performance. 4.3 SSD-based Fuson Dsk Evaluaton In ths secton, we frst llustrate what s stored n RAM, SSD and MD, and then ntroduce a set of performance metrcs used for evaluatng a cachng system. Followng that, we nvestgate several cachng algorthms, and analyze ther mpacts on our new EAD nfrastructure through extensve smulatons. Senstvty analyss on dfferent RAM and SSD szes s also conducted Storage Parttons (1) What s stored n RAM? The memory of each server stores two parts of cached data: IndexTable (along wth other meta tables) and ChunkCache. Note that n practce, we only adjust the sze of the IndexTable partton (refer as the RAM sze), and gnore the SegmentChunkHash part snce t s a tny temporary loadng area smply for comparson of cached and new-comng fngerprnters. Moreover, we do not count wrte operatons of real chunk content (to the storage pool), but only focus on ndex table access (on the server s dsk). To smplfy the problem, we fx the sze of one chunk s metadata to be 4 KB whch s enough for encodng and assemblng other necessary metadata n a real deduplcaton system. In other word, the basc unt of the I/O accesses s 4KB n ths paper. (2) What s stored n SSD and MD? The server s dsk contans an entre metadata lbrary whch maps chunks fngerprnts (ncludng other metadata) wth ther physcal addresses n the storage pool. In our FD desgn, the SSD lays on top of MD (havng an entre metadata lbrary) and plays the role of cache under the wrte back cachng polcy. For example, pages from the RAM are evcted to the SSD, and when the SSD s full, a vctm page s then evcted to the MD Performance Metrcs Recall that n our new EAD nfrastructure, we adopt SSDs as a mddle storage ter between RAM and dsks. Dfferent cachng algorthms such as LRU(Least Recently Used Updatng), CLOCK [28], ARC [29], and CAR [30] can be used to manage the data set cached n the new mddle ter. To explore the performance of EAD under

11 11 these dfferent cachng algorthms, we ntroduce two mportant performance metrcs: I/O ht rato and I/O operaton cost. We consder a combnaton of these two metrcs as a crteron to nvestgate the mpacts of these cachng algorthms. (1) I/O Ht Rato: I/O ht rato s defned as the fracton of I/O requests that are served by Flash. Although SSD s access unt s also 4KB, an I/O request mght stll cross more than one page. Therefore, we regard an IO request as a ht only when all of ts assocated pages are cached n SSD. Hgher I/O ht rato means that more I/Os can be accessed from Flash drectly whch accelerates the overall I/O performance. Thus, one of our prmary goals s to ncrease I/O ht rato for mprovng SSD utlzaton. (2) I/O Operaton Cost: I/O operaton cost can be represented as I/O response tme or I/O throughput (e.g., IOPS). In ths paper, we use I/O response tme to evaluate the cost, for usng MD and FD, as shown n Equaton 7, where C IOResp and C F lashupdate represent the IO access cost and the Flash contents updatng cost, respectvely. All N terms ndcate the access numbers of SSD Read (N SSDr ), SSD Wrte (N SSDw ), MD Read (N MDr ), and MD Wrte (N MDw ), whle all T terms (e.g., T SSDr and T MDr ) show the correspondng average I/O latency for each operaton. C MD = C IOAccess = N MDr T MDr + N MDw T MDw C F D = C IOAccess + C F lashupdate = N SSDr T SSDr + N SSDw T SSDw +N MDr T MDr + N MDw T MDw (7) The man dfference between the I/O cost calculaton of MD and FD desgn s that the I/O cost of FD conssts of two parts: I/O access cost and Flash contents updatng cost. Table 4 (a) further presents the related I/O operatons costs for MD and FD n four dfferent scenaros,.e., read ht, read mss, wrte ht, and wrte mss. Besdes read mss, our FD desgn always redrects I/Os from slow MD to fast SSD, whch sgnfcantly reduce the total I/O response tmes. Table 4 (b) shows Flash updatng cost, whch s only for the FD desgn. For example, when newly accessed pages are admnstrated but the SSD s full, extra tme s needed to flush (or evct) the drty page(s) to MD. We hereby consder such data movements between SSD and MD as Flash contents updatng cost and nclude t n the overall I/O cost. Table 4(c) further shows the actual average I/O response tmes (n mcroseconds) of varous types of I/O operatons at both SSD and MD devces. These results were measured from an Intel DC S3500 Seres SSD wth the capacty of 80GB and a Western Dgtal WD20EURS-63S48Y0 MD wth 2T B and 5400 RPM. Note that the tested basc I/O sze s specfed as 4KB accordng to the spacal granularty. TABLE 4: Costs Calculaton Insde Fuson Dsk (SSD and MD) (a) Operatons for MD and FD I/O access costs Case read ht Read mss Wrte ht Wrte mss MD MD read MD read MD wrte MD wrte MD read + FD SSD read SSD wrte SSD wrte SSD wrte (b) Operatons for nner FD Flash update cost Case Evct drty page Cost SSD read + MD wrte (c) Measured average I/O response tmes (µs) of SSD and MD Latency T SSDr T SSDw T MDr T MDw 4K Random TABLE 5: I/O ht ratos (%) of dfferent cachng algorthms under 10k-ndex-entres RAM sze case. Ht rato 500MB 1GB 2GB 3GB 4GB LRU CLOCK ARC CAR CART Evaluaton On Dfferent SSD Szes In ths secton, we evaluate the effectveness of the FD desgn by conductng trace-drven smulatons. The actual RAM sze s fxed to 10k ndex entres, wth deduplcaton chuck sze as 8KB and segment sze as 16MB, but the SSD sze s varyng from 500MB to 4GB. Dfferent cachng algorthms (e.g., LRU, CLOCK) are used to manage pages n SSDs. Table 5 shows I/O ht ratos under dfferent SSD szes when RAM s set to store at most 10k ndex entres. We frst observe that all these algorthms acheve smlar I/O ht ratos. Sophstcated algorthms lke ARC, CAR and CART have slghtly hgher ht rato than nave algorthms lke LRU and CLOCK, because of ther enhanced methods to avod beng flushed by I/O spkes. In general, the larger the SSD s, the hgher ht rato t wll obtan. However as long as we has suffcent capacty to hold actve workng sets of all traces, the mprovement n I/O ht rato becomes nvsble. Smlarly, Table 6 shows the normalzed I/O operaton costs under dfferent SSD szes (RAM s upper bound s 10k ndex entres), where the cost under the MD desgn s used as the base lne. We can see that our FD desgn s able to save almost 75% of I/O operaton costs compared to the MD desgn. We nterpret ths beneft by observng that the FD desgn drects a large amount of I/Os to the SSDs whch store hot data and thus reduces the I/Os to the MD, and consequently TABLE 6: Total I/O response tme costs normalzed to no-ssd structure desgn (%) under 10k-ndex-entres RAM sze case. I/O Cost 500 MB 1GB 2GB 3GB 4GB LRU CLOCK ARC CAR CART

Using Elasticity to Improve Inline Data Deduplication Storage Systems

Using Elasticity to Improve Inline Data Deduplication Storage Systems Usng Elastcty to Improve Inlne Data Deduplcaton Storage Systems Yufeng Wang Temple Unversty Phladelpha, PA, USA Y.F.Wang@temple.edu Chu C Tan Temple Unversty Phladelpha, PA, USA cctan@temple.edu Nngfang

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Introduction CONTENT. - Whitepaper -

Introduction CONTENT. - Whitepaper - OneCl oud ForAl l YourCr t c al Bus nes sappl c at ons Bl uew r esol ut ons www. bl uew r e. c o. uk Introducton Bluewre Cloud s a fully customsable IaaS cloud platform desgned for organsatons who want

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Vembu StoreGrid Windows Client Installation Guide

Vembu StoreGrid Windows Client Installation Guide Ser v cepr ov dered t on Cl enti nst al l at ongu de W ndows Vembu StoreGrd Wndows Clent Installaton Gude Download the Wndows nstaller, VembuStoreGrd_4_2_0_SP_Clent_Only.exe To nstall StoreGrd clent on

More information

Cloud Auto-Scaling with Deadline and Budget Constraints

Cloud Auto-Scaling with Deadline and Budget Constraints Prelmnary verson. Fnal verson appears In Proceedngs of 11th ACM/IEEE Internatonal Conference on Grd Computng (Grd 21). Oct 25-28, 21. Brussels, Belgum. Cloud Auto-Scalng wth Deadlne and Budget Constrants

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve

More information

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Cloud-based Social Application Deployment using Local Processing and Global Distribution Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture A Desgn Method of Hgh-avalablty and Low-optcal-loss Optcal Aggregaton Network Archtecture Takehro Sato, Kuntaka Ashzawa, Kazumasa Tokuhash, Dasuke Ish, Satoru Okamoto and Naoak Yamanaka Dept. of Informaton

More information

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers J. Parallel Dstrb. Comput. 71 (2011) 732 749 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. ournal homepage: www.elsever.com/locate/pdc Envronment-conscous schedulng of HPC applcatons

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA ) February 17, 2011 Andrew J. Hatnay ahatnay@kmlaw.ca Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ Effcent Strpng Technques for Varable Bt Rate Contnuous Meda Fle Servers æ Prashant J. Shenoy Harrck M. Vn Department of Computer Scence, Department of Computer Scences, Unversty of Massachusetts at Amherst

More information

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node Fnal Report of EE359 Class Proect Throughput and Delay n Wreless Ad Hoc Networs Changhua He changhua@stanford.edu Abstract: Networ throughput and pacet delay are the two most mportant parameters to evaluate

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

HP Mission-Critical Services

HP Mission-Critical Services HP Msson-Crtcal Servces Delverng busness value to IT Jelena Bratc Zarko Subotc TS Support tm Mart 2012, Podgorca 2010 Hewlett-Packard Development Company, L.P. The nformaton contaned heren s subject to

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

Network Aware Load-Balancing via Parallel VM Migration for Data Centers Network Aware Load-Balancng va Parallel VM Mgraton for Data Centers Kun-Tng Chen 2, Chen Chen 12, Po-Hsang Wang 2 1 Informaton Technology Servce Center, 2 Department of Computer Scence Natonal Chao Tung

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing Effcent Bandwdth Management n Broadband Wreless Access Systems Usng CAC-based Dynamc Prcng Bader Al-Manthar, Ndal Nasser 2, Najah Abu Al 3, Hossam Hassanen Telecommuncatons Research Laboratory School of

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika.

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika. VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

The Load Balancing of Database Allocation in the Cloud

The Load Balancing of Database Allocation in the Cloud , March 3-5, 23, Hong Kong The Load Balancng of Database Allocaton n the Cloud Yu-lung Lo and Mn-Shan La Abstract Each database host n the cloud platform often has to servce more than one database applcaton

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

CONTENTS Introduction... 3

CONTENTS Introduction... 3 Cuty ourm c r os of t Ex c hangeser v er s t or agec os t sbyupt o85% Howema lar c h v ngs ol ut onsex pand y ournat v eema lappl c at on' sc apab l t es CONTENTS Introducton... 3 Emal Storage Management...

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA).

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA). Managng Server Energy and Operatonal Costs n Hostng Centers Yyu Chen Dept. of IE Penn State Unversty Unversty Park, PA 16802 yzc107@psu.edu Anand Svasubramanam Dept. of CSE Penn State Unversty Unversty

More information

Updating the E5810B firmware

Updating the E5810B firmware Updatng the E5810B frmware NOTE Do not update your E5810B frmware unless you have a specfc need to do so, such as defect repar or nstrument enhancements. If the frmware update fals, the E5810B wll revert

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP Kun-chan Lan and Tsung-hsun Wu Natonal Cheng Kung Unversty klan@cse.ncku.edu.tw, ryan@cse.ncku.edu.tw ABSTRACT Voce over IP (VoIP) s one of

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines DBA-VM: Dynamc Bandwdth Allocator for Vrtual Machnes Ahmed Amamou, Manel Bourguba, Kamel Haddadou and Guy Pujolle LIP6, Perre & Mare Cure Unversty, 4 Place Jusseu 755 Pars, France Gand SAS, 65 Boulevard

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers Far Vrtual Bandwdth Allocaton Model n Vrtual Data Centers Yng Yuan, Cu-rong Wang, Cong Wang School of Informaton Scence and Engneerng ortheastern Unversty Shenyang, Chna School of Computer and Communcaton

More information

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems A Cost-Effectve Strategy for Intermedate Data Storage n Scentfc Cloud Workflow Systems Dong Yuan, Yun Yang, Xao Lu, Jnjun Chen Faculty of Informaton and Communcaton Technologes, Swnburne Unversty of Technology

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

How To Plan A Network Wide Load Balancing Route For A Network Wde Network (Network)

How To Plan A Network Wide Load Balancing Route For A Network Wde Network (Network) Network-Wde Load Balancng Routng Wth Performance Guarantees Kartk Gopalan Tz-cker Chueh Yow-Jan Ln Florda State Unversty Stony Brook Unversty Telcorda Research kartk@cs.fsu.edu chueh@cs.sunysb.edu yjln@research.telcorda.com

More information

How To Solve A Problem In A Powerline (Powerline) With A Powerbook (Powerbook)

How To Solve A Problem In A Powerline (Powerline) With A Powerbook (Powerbook) MIT 8.996: Topc n TCS: Internet Research Problems Sprng 2002 Lecture 7 March 20, 2002 Lecturer: Bran Dean Global Load Balancng Scrbe: John Kogel, Ben Leong In today s lecture, we dscuss global load balancng

More information

How To Detect An 802.11 Traffc From A Network With A Network Onlne Onlnet

How To Detect An 802.11 Traffc From A Network With A Network Onlne Onlnet IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. X, NO. X, XXX 2008 1 Passve Onlne Detecton of 802.11 Traffc Usng Sequental Hypothess Testng wth TCP ACK-Pars We We, Member, IEEE, Kyoungwon Suh, Member, IEEE,

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays VoIP Playout Buffer Adjustment usng Adaptve Estmaton of Network Delays Mroslaw Narbutt and Lam Murphy* Department of Computer Scence Unversty College Dubln, Belfeld, Dubln, IRELAND Abstract The poor qualty

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com

More information

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints Effectve Network Defense Strateges aganst Malcous Attacks wth Varous Defense Mechansms under Qualty of Servce Constrants Frank Yeong-Sung Ln Department of Informaton Natonal Tawan Unversty Tape, Tawan,

More information

taposh_kuet20@yahoo.comcsedchan@cityu.edu.hk rajib_csedept@yahoo.co.uk, alam_shihabul@yahoo.com

taposh_kuet20@yahoo.comcsedchan@cityu.edu.hk rajib_csedept@yahoo.co.uk, alam_shihabul@yahoo.com G. G. Md. Nawaz Al 1,2, Rajb Chakraborty 2, Md. Shhabul Alam 2 and Edward Chan 1 1 Cty Unversty of Hong Kong, Hong Kong, Chna taposh_kuet20@yahoo.comcsedchan@ctyu.edu.hk 2 Khulna Unversty of Engneerng

More information

Profit-Aware DVFS Enabled Resource Management of IaaS Cloud

Profit-Aware DVFS Enabled Resource Management of IaaS Cloud IJCSI Internatonal Journal of Computer Scence Issues, Vol. 0, Issue, No, March 03 ISSN (Prnt): 694-084 ISSN (Onlne): 694-0784 www.ijcsi.org 37 Proft-Aware DVFS Enabled Resource Management of IaaS Cloud

More information

A New Quality of Service Metric for Hard/Soft Real-Time Applications

A New Quality of Service Metric for Hard/Soft Real-Time Applications A New Qualty of Servce Metrc for Hard/Soft Real-Tme Applcatons Shaoxong Hua and Gang Qu Electrcal and Computer Engneerng Department and Insttute of Advanced Computer Study Unversty of Maryland, College

More information

Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems

Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems 1 Mult-Resource Far Allocaton n Heterogeneous Cloud Computng Systems We Wang, Student Member, IEEE, Ben Lang, Senor Member, IEEE, Baochun L, Senor Member, IEEE Abstract We study the mult-resource allocaton

More information

Small pots lump sum payment instruction

Small pots lump sum payment instruction For customers Small pots lump sum payment nstructon Please read these notes before completng ths nstructon About ths nstructon Use ths nstructon f you re an ndvdual wth Aegon Retrement Choces Self Invested

More information

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event. Audtng Wndows & Actve Drectory Changes va Wndows Event Logs Ths document takes a lghtweght look at the steps and consderatons nvolved n settng up Wndows and/or Actve Drectory event log audtng. Settng up

More information

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,

More information

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

CLoud computing technologies have enabled rapid

CLoud computing technologies have enabled rapid 1 Cost-Mnmzng Dynamc Mgraton of Content Dstrbuton Servces nto Hybrd Clouds Xuana Qu, Hongxng L, Chuan Wu, Zongpeng L and Francs C.M. Lau Department of Computer Scence, The Unversty of Hong Kong, Hong Kong,

More information

QBox: Guaranteeing I/O Performance on Black Box Storage Systems

QBox: Guaranteeing I/O Performance on Black Box Storage Systems QBox: Guaranteeng I/O Performance on Black Box Storage Systems Dmtrs Skourts skourts@cs.ucsc.edu Shnpe Kato shnpe@cs.ucsc.edu Department of Computer Scence Unversty of Calforna, Santa Cruz Scott Brandt

More information