A Fast Dual-level Fingerprinting Scheme for Data Deduplication

Size: px
Start display at page:

Download "A Fast Dual-level Fingerprinting Scheme for Data Deduplication"

Transcription

1 A Fast Dual-level Fingerprinting Scheme for Data Deduplication 1 Jiansheng Wei, *1 Ke Zhou, 1,2 Lei Tian, 1 Hua Wang, 1 Dan Feng *1,Corresponding Author Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan , China, k.zhou@hust.edu.cn 2 Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE , USA Abstract Data deduplication has attracted recent interest in the research community. Several approaches are proposed to eliminate duplicate data first at the file level and then at the chunk level to reduce the duplicate-lookup complexity. To meet the high-throughput requirements, this paper proposes a fast dual-level fingerprinting (FDF) scheme that can fingerprint a dataset both at the file level and at the chunk level in a single scan of the contents. FDF breaks the fingerprinting process into task segments and further leverage the computing resources of modern multi-core CPUs to pipeline the time-consuming operations. The proposed FDF scheme has been evaluated in an experimental data backup network with real-world datasets and compared with an alternative two-stage approach. Experimental results show that FDF can maintain over 100MB/s fingerprinting throughput that matches the bandwidth of a gigabit network adapter while being fully pipelined. Keywords: Fingerprinting, Content-defined chunking, Deduplication, Pipelining, Data backup 1. Introduction Content-based fingerprinting is an important technology that has been widely used in data backup [1] and storage systems [2] [3] to identify and eliminate duplicate data objects for the purpose of saving network bandwidth and/or storage resources. In order to achieve high duplicate elimination ratio, many approaches employ content defined chunking (CDC) algorithms [4] to divide large files into small variable-sized chunks at KB granularity. However, identifying duplicates among massive data chunks can consume lots of query overhead and thus challenge the deduplication performance [5]. To this end, several most recent approaches, such as SAM [6], MAD2 [7], and Extreme Binning [8], eliminate duplicates first at the file level and then at the chunk level. If a duplicate file is detected, all the chunks belonging to that file can be directly skipped. After checking both file fingerprints and (necessary) chunk fingerprints with the storage server, only non-duplicate contents need to be actually transferred and stored. These approaches can effectively avoid the duplicate lookup bottleneck and significantly improve the deduplication performance. On the other hand, however, a new challenge has emerged while fingerprinting file sets both at the file level and at the chunk level. Considering the reliable hash algorithms (such as MD5 and SHA-1) for generating fingerprints are compute-intensive, reading and fingerprinting a data set twice at different levels will consume significant time overhead for both I/O and computation, which can conversely bottleneck the deduplication throughput. To overcome the above challenge, we propose in this paper FDF, a Fast Dual-level Fingerprinting scheme to fingerprint datasets both at the file level and at the chunk level in a single scan of the contents. FDF breaks the dual-level fingerprinting process into task segments and further employs three techniques to optimize the performance. First, FDF utilizes Rabin s fingerprinting algorithm [9] to divide files into variable-sized chunks while capturing and eliminating hot zero-chunks simultaneously by judiciously selecting chunk boundaries. Second, FDF employs SHA-1 algorithm to generate fingerprints for files and chunks and further defines hash context to preserve the intermediate result of the hash algorithm, so that the file-level hashing and the chunk-level hashing can be performed in parallel by sharing the same data cache. Third, FDF resolves cache conflicts between different task segments and further pipeline the fingerprinting process by leveraging the computing resources of modern multi-core CPUs, thus the time overhead can be greatly reduced. The proposed FDF scheme has International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number1,January 2012 doi: /jdcta.vol6.issue

2 been prototyped in a data backup network and evaluated with real-world datasets for measuring its efficiency. Experimental results reveal that the FDF scheme outperforms the two-stage approach in both fingerprinting performance and memory consumption. 2. Motivation and background This section presents the necessary background information about data deduplication approaches and content-defined chunking (CDC) methods to further motivate our research Research advances in data deduplication The purpose of data deduplication is to save network bandwidth and/or improve storage efficiency by identifying and eliminating duplicate data objects in data streams and/or data stores. Deduplication approaches usually work at KB or larger granularity, which is different from traditional sequential data compression algorithms and delta encoding methods that eliminate redundancy at the byte level. Benefit from the emergence of highly reliable hash algorithms and the improving performance of modern computers, data objects can be identified and compared with their fingerprints (i.e., hashes that are usually generated by MD5 or SHA-1). Existing data deduplication approaches can be carried out at three granularities of data objects, namely, whole files, fixed-size blocks, or variable-sized chunks generated by a CDC algorithm. Previous research [4] reveals that variable-sized chunk-level deduplication is more sensitive than the whole file hashing approach in detecting duplicates among files that are similar but not identical, and it is also smarter than the fixed-size blocking approach in resolving the boundary-shifting problem [10]. Another important observation is that the space efficiency of variable-sized chunk-level deduplication is highly dependent on the average chunk size, which indicates that more duplicate information can be detected among smaller chunks for a dataset [4] [11]. On the other hand, using smaller average chunk size means that more chunks will be generated for a given dataset. For a large storage system, the chunk index can exceed the available RAM capacity and force the query process to access an on-disk index. To avoid the duplicate-lookup disk bottleneck, DDFS [5] uses Bloom filter as an in-memory fast index and exploits data locality to accelerate the duplicate detection process. Extreme Binning [8], a distributed fuzzy deduplication approach, groups similar files into bins and eliminates duplicates first at the file level and then at the chunk level. MAD2 [7] distributes file recipes and chunk contents among clustered storage nodes while maintaining data locality, and it further eliminates all the duplicates both at the file level and at the chunk level in each node. SAM [6] proposes to do global file-level deduplication and local chunk-level deduplication for data sets belonging to different users in the cloud backup environment. Considering the file-level deduplication can filter out lots of duplicate contents and reduce the chunk-level duplicate lookups, the above dual-level approaches can achieve higher performance and better scalability than DDFS. However, reading and fingerprinting a file set twice at different levels can consume lots of I/O bandwidth and computing time, which can conversely bottleneck the deduplication throughput. Therefore, some kind of high-efficient dual-level fingerprinting scheme is in desperate need Content-defined chunking Consider two files α and β, where β is derived from α by inserting a data segment X into its contents, which is a common modification in real-world data sets. Obviously, β is very similar to α, and the two files have to be divided into small pieces to detect the duplicate contents. A straightforward method is to break both of the files into fixed-size blocks, as shown in Figure 1-(a). However, since segment X overwrites part of block C and further pushes all of the successive blocks forward, the corresponding block boundaries are shifted. As a result, there is little chance to detect duplicate contents after block B. In contrast, a CDC method selects dividing points according to the binary contents observed by a sliding window. As Figure 1-(b) shows, a w-byte window slides over the contents of both α and β, and a simple fingerprint will be generated from the covered bytes each time the window moves forward. If the fingerprint matches a predefined pattern, the position of the window will be chosen as a chunk boundary. In Figure 1-(b), the boundaries of chunk e are located again, thus the successive duplicate chunks can still be detected. 272

3 (a) Fixed-size blocking (b) Content-defined chunking (CDC) Figure 1. Comparison between fixed-size blocking and CDC Since the fingerprint has to be updated each time the window slides a byte, the chunking performance can be seriously bottlenecked by the computational overhead. Existing CDC methods usually employ Rabin s fingerprinting algorithm [9] [12] to improve the computational efficiency. Let S= {b 1, b 2,, b m } be a byte string and w=48, the Rabin fingerprint for the first 48-byte substring can be defined over a Galois filed: Z 2 = {0, 1} as 1 f Rabin = ( b p + b p + L + b48 ) mod M, 1 2 where p = 2 8, and M is a predefined constant that is generated from an irreducible polynomial of degree k over Z 2. If the window slides a byte forward, the fingerprint can be easily updated by removing the first monomial and concatenating the next byte. So that f Rabin + = ( b2 p + b3 p + L+ b49 ) mod M = (( f Rabin - b1 p ) p b49 ) mod M. If the possible results of b i p 47 for all 256 byte values are pre-calculated and stored in a cached table, the computational efficiency can be greatly improved. Note that the basic operations with polynomials can be simplified in Z 2 : addition is equivalent with bitwise XOR, and multiplication by p can be implemented by shifting left 8 bits. Defining a divisor D << M and a pattern code C < D, a simple boundary selecting method is to choose the positions of the window when the corresponding fingerprints satisfy f Rabin ( ) mod D = C. Considering the fingerprint can change randomly as the window slides forward, approximately one fingerprint out of D candidates can match the condition. As a result, the expected distance between two selected boundaries will be D bytes, which can be reflected by the average chunk size. However, there can be extreme cases where a candidate chunk is too small or too large comparing to the average chunk size. To avoid such abnormal chunks, it is necessary to impose two thresholds L and U on the chunk size. A matching boundary will not be chosen until the corresponding chunk size reaches the lower threshold L, and a hard break-point will be created if the chunk size reaches the upper threshold U without detecting a valid boundary. In practice, D is usually defined between 2 12 and 2 16, L and U satisfy L < D < U, so that the average chunk size will range from 4KB to 64KB accordingly. 3. The FDF scheme The fast dual-level fingerprinting (FDF) scheme is designed to fingerprint data sets both at the file level and at the chunk level in a single scan of the contents by efficiently utilizing system resources and scheduling task segments. Figure 2 presents a general framework of dual-level deduplication. In a practical deduplication system, raw files can be fingerprinted either (1) on the client end or (2) on the server end. In the former case, the whole file fingerprints will be firstly computed and sent to the remote server to identify unique files from duplicates. Then only non-duplicate files will be further chunked and fingerprinted, and a file recipe 273

4 containing the necessary reconstruction information will be built for each file. During the chunk-level deduplication, only fresh chunk contents need to be actually transferred and stored, which can save both network bandwidth and storage capacity. In some approaches such as the Extreme Binning [8], chunk fingerprints have to be generated before the elimination of duplicate files, so that all the files have to be previously fingerprinted at the chunk level. Server Server / Client Client raw files (1) local / (2) remote communication file-level fingerprinting chunk-level fingerprinting (1) remote / (2) local communication file-level deduplication chunk-level deduplication file fingerprint file chunk fingerprint chunk index recipes index contents deduplication storage Figure 2. A dual-level deduplication framework In the latter case, raw files on a client machine will be directly sent to the storage server without considering bandwidth saving. To achieve inline deduplication, the storage server is required to do dual-level fingerprinting with high performance to match the (100MB/s) throughput of a gigabit network adapter. The straightforward solution of buffering files in RAM or on local disks for two-stage fingerprinting can increase the local I/O overhead and bottleneck the deduplication throughput. There are mainly four kinds of data inside the deduplication storage, i.e. file fingerprint index along with file recipes and chunk fingerprint index along with chunk contents, where file recipes record the mapping relationship between file fingerprints and chunk fingerprints to facilitate file reconstruction and chunk retrieve Segmenting the fingerprinting process In this paper, we mainly focus on optimizing the fingerprinting performance in scenarios that both file fingerprints and chunk fingerprints have to be generated before the deduplication procedure. To find potential optimization methods for the fingerprinting process, we first look into the straightforward two-stage fingerprinting approach. As shown in Figure 3, the fingerprinting process can be divided into sequentially executed segments. To fingerprint a file on a client machine, the file will be first read and hashed in stage 1 to generate the whole file hash. In stage 2, the file contents will be scanned again and further divided into small chunks, which are hashed later to generate a list of chunk fingerprints. Suppose the throughput of reading (on-disk files), chunking, and hashing (in-memory data) are TP r, TP c, and TP h respectively, the total time overhead of fingerprinting a large file with a size of S should be T client l = T stage1 +T stage2 = (S/TP r +S/TP h ) + (S/TP r +S/TP c +S/TP h ) = 2S/TP r +S/TP c +2S/TP h. For a small file that can be fully buffered in RAM, the overhead of fingerprinting time can be reduced as T client s = T stage1 +T stage2 = (S/TP r +S/TP h ) + (S/TP c +S/TP h ) = S/TP r +S/TP c +2S/TP h. To fingerprint files on the server end, data has to be received from the network and buffered locally. In particular, a file has to be written to disk if it is too large for RAM to hold in its entirety. Suppose the 274

5 throughput of transferring and writing (file data) are TP t and TP w respectively, the total time overhead of fingerprinting a large file from the network is T server l =T stage1 +T stage2 =(S/TP t +S/TP h +S/TP w )+(S/TP r +S/TP c +S/TP h )=S/TP t +S/TP w +S/TP r +S/TP c +2S/TP h. For a small file that can be fully buffered in RAM, it is possible for the fingerprinting process to directly copy data from the buffer, which consumes negligible time, and the corresponding total time overhead is T server s = T stage1 +T stage2 = (S/TP t +S/TP h ) + (S/TP c +S/TP h ) = S/TP t +S/TP c +2S/TP h. Obviously, T server-s is smaller than T server-l because fingerprinting a small file can avoid expensive disk accesses. loop as necessary local buffered data File-level Fingerprinting local / remote files read / receive data cache hash f file fingerprints <fingerprint 1> <fingerprint 2> 3.2. Capturing extremely hot zero-chunks Chunk-level Fingerprinting read / copy data cache chunk boundary cache hash c chunk fingerprints <fingerprint 1> <fingerprint 2> Figure 3. Segmentation of fingerprinting process In the fingerprinting task segments, the chunking module is critically important for detecting duplicates among similar files. In particular, previous studies [13] [14] reveal that zero-byte strings may widely exists in different types of files, such as.vmdk-files (virtual machine disk images),.iso-files (CD/DVD images), and so on. Quickly identifying such zero-byte strings is helpful for eliminating duplicates among dissimilar files and improving the deduplication throughput. We present here a straightforward method to capture extremely hot zero-chunks in data contents. Recall the content-defined chunking algorithm in Section 2.2, the Rabin fingerprint of a 48-byte substring that starts from offset i is Obviously, f i Rabin 46 i+ 1 + = ( b p + b p + L b 47 ) mod M. i i f Rabin = 0 if all the 48 bytes are zeros. Conversely, if 47 i+ i f Rabin = 0, we can expect that all the 48 bytes are zeros with a certain probability. Further, if f Rabin remains 0 as the windows slides L bytes, which is the lower threshold of the chunk size, we can expect that the L bytes data is a zero-chunk with a high probability, and a chunk boundary can be determined. Clearly, it is a reasonable choice to define the pattern code C=0 to facilitate the chunk boundary selection. In practice, we define L=2 10 and pre-calculate the SHA-1 sum of the 1024-byte zeros to capture and confirm hot zero-chunks. Note that identifying zero-chunks while selecting chunk boundaries is more efficient than the straightforward method that detect and count zero-bytes in a separate process Enabling parallel hashing As described in Section 3.1, a file has to be read from disk one more time in the chunk-level fingerprinting stage if it is too large to be cached in RAM in its entirety. To avoid the redundant disk 275

6 accesses, we propose to use a shared data cache for both the file-level fingerprinting and the chunk-level fingerprinting and further enable parallel hashing. In the SHA-1 algorithm, which is employed in our fingerprinting scheme, a given data object will be appended with a bit 1, k bits 0 (0 k<512), and a 64-bit big-endian integer recording the length of the data object, so that the resulting data object can be divided into 512-bit blocks with no irregular fragments. The 160-bit hash of the data object will be first initialized as five 32-bit words, i.e., h 0 = 0x , h 1 = 0xEFCDAB89, h 2 = 0x98BADCFE, h 3 = 0x , and h 4 = 0xC3D2E1F0. Then the hash will be updated with a group of complex functions in 80 rounds every time a 512-bit block is input. In the fingerprinting process, it is possible that a data object (file or chunk) has not been fully loaded into the data cache, e.g., the boundary of a chunk is not detected until the end of the cached data is reached. To resolve the problem, we record the intermediate state of the hashing process using hash context, which is defined as { unsigned long long counter; unsigned long hash_sum[5]; unsigned char buffer[64]; }, where counter records the number of bytes processed, hash_sum records the intermediate SHA-1 sum, and buffer holds the incomplete data block that needs to be make up to 512 bits after loading the fresh data. By introducing hash context, the hashing process can be performed in an incremental manner. Most importantly, it becomes possible to perform the file-level hashing and the chunk-level hashing in parallel by using a shared data cache, and the I/O overhead can be reduced. Furthermore, if there are two available CPU cores to execute the dual-level hashing tasks, the computational efficiency can be greatly improved Pipelining fingerprinting task segments As a further optimization apart from improving the chunking algorithm and paralleling the hashing processes, this subsection focuses on leveraging the computing resources of modern multi-core CPUs to pipeline the fingerprint task segments and improve the overall fingerprinting performance. As analyzed in Section 3.1, the task segments of reading/receiving, chunking, and hashing are responsible for most of the time overhead and can be more likely to constitute a bottleneck. On the other hand, the rapid development of modern multi-core processors and the well-designed OpenMP (Open Multi-Processing) libraries provide us an opportunity of executing the task segments in parallel on different CPU cores. However, there are data dependencies that make it a natural choice to run the task segments in a sequential manner. For example, the cached data can not be fingerprinted at the chunk level until the corresponding chunk boundaries are determined. To avoid such data dependencies, we propose to use a cache group instead of a single cache for accommodating both data contents and chunk boundaries. As Figure 4 shows, the fingerprinting process has been reorganized into three stages, i.e., the data preparation stage, the chunking stage and the dual-level hashing stage, where each stage can be assigned to an independent data cache as well as a boundary cache. As a result, it becomes possible to pipeline the time-consuming task segments and improve the overall fingerprinting performance. For example, the chunking stage can parse data in cache B while the data preparation stage can simultaneously read fresh data into cache C. When data cache C is fully filled, data cache A can be reused for accommodating fresh data if its content has already been chunked and hashed. A data cache (as well as its corresponding boundary cache) will be switched and handed over to the next stage once the contents have been processed. In particular, the file-level hashing and the chunk-level hashing can be performed in parallel because they can share the same data cache in a read-only manner, and they have about the same computational complexity. Clearly, it requires at least four CPU cores to pipeline and parallelize all the time-consuming task segments. On a client machine, the time overhead of fingerprinting one file that belongs to a large dataset can be expected as 276

7 T client = max(t preparation, T chunking, T hashing ) = max(s/tp r, S/TP c, S/TP h ), where the definition of each denotation is the same as in Section 3.1. Similarly, the expected time overhead of fingerprinting a file on a remote server will be T server = max(t preparation, T chunking, T hashing ) = max(s/tp t, S/TP c, S/TP h ). Obviously, the time overhead can be significantly reduced comparing to that of the two-stage approach while dealing with large datasets. We will illustrate how to schedule the fingerprinting task segments on modern quad-core or even dual-core CPUs in Section 4. Data Preparation local / remote files read / receive A B C data cache group switch cache Chunking switch cache A B C data cache group chunk A B C boundary cache group switch cache Dual-level Hashing A B C data cache group hash f file fingerprints <fingerprint 1> <fingerprint 2> A B C boundary cache group hash c chunk fingerprints <fingerprint 1> <fingerprint 2> Figure 4. Parallelism of fingerprinting task segments 4. Evaluation and analysis We evaluate FDF through a prototype running on B-Cloud, which is a research-oriented distributed system that provides network backup services for user files and other binary data. The B-Cloud system [7] consists of backup clients, front-end backup servers, metadata servers and back-end storage servers. Specifically, a backup client scans and transfers user-specified datasets to remote backup servers according to predefined backup policies. On the other hand, a group of backup servers cooperatively provide backup services for the purpose of load balancing, and a new backup job will always be distributed to the backup server with the lowest workload level. A backup server splits metadata from file contents while receiving user data. Backup job information as well as the related file metadata will be sent to metadata servers, and file contents will be delivered to storage servers. In particular, file contents will be fingerprinted both at the file level and at the chunk level before being actually transferred to storage servers, and then duplicate contents will be eliminated through the MAD2 [7] deduplication approach. We implement the FDF scheme in backup clients and backup servers respectively to evaluate its performance, and the duplicate elimination ratios based on real-world datasets are also measured and reported. Further, we discuss some implementation issues at the end of this section Experimental setup The hardware configuration of our experimental backup clients includes a dual-core CPU running at 2.5 GHz, 4GB RAM, a 500GB hard disk, and 1 gigabit network adapter. The experimental backup servers are configured as follows: a quad-core CPU running at 2.0 GHz, 4 2GB RAM, 1 RAID 277

8 controller card with 128MB cache, 8 1TB hard disks organized as a RAID-5 partition, and 2 gigabit network interface cards. We have collected two real-world data sets from different groups of users. The first dataset was contributed by 15 students in an engineering group, which we referred to as the Workgroup set. Each student backs up data using their desktop PCs or workstations in a span of 31 days. There are 12.1 million files for a total size of 6.0TB in the Workgroup set. The second dataset was collected from 26 users on a campus network, including file transfer site managers, small website maintainers, and other individuals. Every user runs full or incremental backup jobs in a span of 31 days independently. This dataset is called Campus set and contains 15.4 million files that amount to a total of 4.7TB data Fingerprinting performance This subsection evaluates the performance of our FDF scheme on a dual-core backup client and a quad-core backup server respectively. We have also implemented the two-stage fingerprinting approach in both environments for comparison. Figure 5-(a) shows the scheduling process of deploying FDF on a dual-core client machine. The reading tasks and the chunking tasks are pipelined on different CPU cores. When a cached data segment has been chunked, the data contents will be hashed in parallel to update the corresponding file fingerprint and generate chunk fingerprints. Note that the chunking tasks and the hashing tasks are not pipelined due to limited number of CPU cores. Clearly, using two data caches associated with two boundary caches is enough for deploying the FDF scheme. During the evaluation, the experimental backup client fingerprints a local dataset of 302,550 files that amount to a total of GB data. For the stability of results, we measure the average fingerprinting performance by tracing the total time overhead of processing the whole dataset. It has also been measured that the local disk can achieve an average linear read throughput of 92.1MB/s and an average random read throughput of 84.5MB/s while transferring 1MB data blocks. Average Fingerprinting Performance (MB/s) (a) (b) Figure 5. Scheduling FDF on (a) a dual-core client machine and (b) a quad-core backup server FDF Single-Thread Two-Stage Two-Thread Two-Stage Average Throughput of Linear Read Average Throughput of Random Read The Capacity of Each Cache (MB, Logarithmic Scale) Average Fingerprinting Performance (MB/s) FDF Single-Thread Two-Stage Four-Thread Two-Stage Average Throughput of the Network Adapter The Capacity of Each Cache (MB, Logarithmic Scale) (a) (b) Figure 6. The fingerprinting performance of using FDF on (a) a dual-core client machine and (b) a quad-core backup server Figure 6-(a) presents the results under different configurations of cache capacities. As the cache size increases from 1MB to 256MB, the average fingerprinting performance of the FDF scheme fluctuates between 64.7MB/s and 67.0MB/s. In comparison, the average performance of the single-thread 278

9 two-stage fingerprinting process increases from 31.6MB/s to 42.1MB/s. It has been observed that over 90% of file data can be fully buffered in RAM and thus directly reused during the chunk-level fingerprinting stage as the cache capacity reaches 256MB (see Section 3.1). For fair comparison, we have also implemented the two-stage fingerprinting approach using two concurrent threads to fully utilize the available CPU cores. As shown in Figure 6-(a), the two-thread two-stage fingerprinting approach presents a throughput of only 23.7MB/s while using 1MB data caches. It is because the disk throughput can be suppressed to around 48.8MB/s while randomly reading small pieces of data with two concurrent threads. As the cache capacity increases, the benefit of using two concurrent threads shows up, and the average fingerprinting performance finally grows to 53.5MB/s while using 256MB data caches. The results indicate that the fingerprinting performance of the FDF scheme is not as sensitive as that of the two-stage approach, and it is possible for the FDF scheme to outperform the two-stage approach by using only a few mega-bytes of RAM resources. Figure 5-(b) shows the scheduling process of deploying FDF on a quad-core backup server. The fingerprinting tasks of receiving (data from the network), chunking and hashing are pipelined and distributed to all the CPU cores. In particular, the file-level hashing and the chunk-level hashing are executed in parallel while sharing the same data cache. Three data caches along with three boundary caches are used for deploying the FDF scheme in this environment. We use another backup server as a client to supply the source data and avoid the disk-access bottleneck. It has been measured that the RAID-5-based storage subsystem can achieve an average linear read throughput of 477.6MB/s and an average linear write throughput of 465.0MB/s while transferring 16MB data blocks. On the other hand, the gigabit network adapter shows an average throughput of 107.4MB/s. As shown in Figure 6-(b), the average throughput of the FDF scheme fluctuates between 102.1MB/s and 105.2MB/s as the cache capacity increase from 1MB to 256MB. The results suggest that deploying the FDF scheme on a quad-core server can further accelerate the fingerprinting performance and achieve a high throughput of over 95% of the available network bandwidth. In comparison, the single-thread two-stage approach that buffers large files on local disk for the chunk-level fingerprinting only represents a throughput that ranges from 34.5MB/s to 41.6MB/s, which is far below the performance of the FDF scheme. For fair comparison, a four-thread two-stage approach has also been implemented to fully utilize available CPU cores. It has been observed that the four-thread two-stage approach achieves a throughput that ranges from 73.5MB/s to 89.2MB/s as the cache capacity increases. Clearly, the FDF scheme still presents better fingerprinting performance than the four-thread two-stage approach, note that the latter can consume more RAM resources and even additional disk space than the former while dealing with large files Duplicate elimination efficiency As previously discussed, the chunk-level deduplication can detect duplicate information between similar files and thus achieve high space efficiency. On the other hand, the file-level deduplication can detect and eliminate duplicate files and thus reduce the duplicate lookup complexity at the chunk level. Figure 7-(a) reports the number of fingerprints of our experimental datasets at different levels to show the duplicate lookup complexities. For the Workgroup set, fingerprints are generated and have to be deduplicated at the chunk level. By introducing file-level deduplication, original files are deduplicated into unique files that contain nonzero-chunks and zero-chunks, where zero-chunks can be directly filtered out in our FDF scheme. Finally, the Workgroup set is deduplicated into unique chunks. It should be noted that a zero-chunk has a fixed size of 1KB in our implementation, while the nonzero-chunks have a much larger average size around 4KB. Obviously, the dual-level deduplication only needs to process file fingerprints and fingerprints of nonzero-chunks, which has a complexity of only 6.7% of the pure chunk-level deduplication that examines all the chunk fingerprints. Moreover, capturing zero-chunks in the fingerprinting process can significantly reduce the computational overhead of the deduplication approach. The Campus set initially contains files that can be further divided into chunks. The file-level deduplication detects unique files containing nonzero-chunks and zero-chunks, and unique chunks are finally captured after the chunk-level deduplication. Similar to the Workgroup set, the duplicate lookup complexity is greatly reduced by employing dual-level deduplication and capturing zero-chunks in the fingerprinting process. 279

10 Number of Fingerprints (Logarithmic Scale) Files Chunks Unique Files Nonezero-Chunks in Unique Files Zero-Chunks in Unique Files Unique Chunks 2. 03x x x x x x10 7 Workgroup Dataset 1. 09x x x x10 8 Campus 3. 95x x10 7 Data Size in GB (Logarithmic Scale) Original Data Unique Files Zero-Chunks in Unique Files Unique Chunks 6, , Workgroup Campus Dataset Duplicate Elimination Ratio File-Level Chunk-Level Workgroup Campus Dataset (a) The number of fingerprints at different levels (b) Data sizes and duplicate elimination ratios Figure 7. Duplicate elimination efficiency Figure 7-(b) presents the data sizes and duplicate elimination ratios in different deduplication levels for both experimental datasets. The duplicate elimination ratio (DER) is calculated as the original data size divided by the data size after deduplication. At the beginning, there are 6,151,88GB data in the Workgroup set and 4,778.15GB data in the Campus set respectively. After the file-level deduplication, the Workgroup set generates GB file data that achieves a DER of 10.31, and the Campus set generates GB file data that results in a DER of 8.0. By further eliminating duplicates at the chunk level, GB unique chunks are finally produced for the Workgroup set, which corresponds to a chunk-level DER of On the other hand, GB unique chunks are produced for the Campus set, corresponding to a chunk-level DER of that is about 2.3 times higher than the file-level DER. Moreover, Figure 7-(b) also reports the sizes of zero-chunks contained in unique files for both datasets. Our experimental results based on real-world datasets reveal that the file-level deduplication can eliminate most duplicate data and significantly reduce duplicate lookup complexities at the chunk level. Moreover, the chunk-level deduplication can detect more duplicate information between similar files and further benefit the duplicate elimination ratio. As a result, dual-level fingerprinting as well dual-level deduplication is recommendable while designing a practical data backup/archiving system Discussion The FDF scheme is designed to fingerprint a dataset both at the file level and at the chunk level in a single scan of the contents. However, as described at the beginning of Section 3, some dual-level deduplication approach may only want to fingerprint non-duplicate files at the chunk level. In such a case, the FDF scheme can consume unnecessary computational overhead for chunking and hashing contents belonging to duplicate files. If the computational overhead outweighs the benefit of fingerprinting files in a single scan, it becomes a better choice to do the chunk-level fingerprinting as a second stage after the file-level deduplication. We argue that the principle of our FDF scheme, i.e., pipelining time-consuming task segments by resolving the cache conflicts between them, is still applicable to optimize and accelerate the overall fingerprinting process. 5. Related work Chunking methods have been well studied in many previous works. The two-threshold two-divisor (TTTD) chunking approach [15] avoids abnormal large fixed-size blocks by introducing a backup divisor to restrict the actual chunk sizes. Specifically, if the size of the current chunk reaches the predefined upper threshold without finding a boundary match, the chunking process will switch to use a smaller divisor (see Section 2.2) and try again to find a content-defined boundary as an alternative to creating a hard-breakpoint. ADMAD [16] exploits certain metadata information (e.g., file type and file format) to divide files into variable-sized logic units, and further eliminate duplicate units to achieve a good space efficiency in the archival storage. However, this approach requires the fingerprinting module to recognize the file format and maintain lots of application specific chunking libraries. Recently, a novel 280

11 bimodal content-defined chunking approach [17] has been proposed to increase the average chunk size while maintaining comparable duplicate elimination ratio. Specifically, the bimodal chunking algorithm generates small chunks in limited regions of transition from duplicate to non-duplicate data and generates large chunks elsewhere according to the existence of the candidate chunks. Similarly, VS-SWC [18] uses small chunks only at the junction region between duplicate data and unique data, so that the duplicate elimination ratio can be improved with the number of chunks being suppressed. Apart from CDC algorithms, a frequency-based chunking (FBC) algorithm [19] has also been proposed. FBC first samples and identifies frequent fixed-size blocks over a data stream, and then coarsely divides the data stream into large content-defined chunks. If a coarse-grained chunk contains any high-frequent block, it will be further divided into fine-grained chunks. As a result, FBC is able to generate fewer chunks than the baseline CDC algorithm while maintaining the duplicate elimination ratio. Comparing to the above approaches that focus on optimizing the chunking algorithm and improving the deduplication ratio, our FDF scheme aims to efficiently fingerprint file sets both at the file level and at the chunk level to benefit the deduplication throughput. Apart from the most relevant deduplication approaches introduced in Section 2.1, many other excellent works that eliminate duplicate data in different environments can be found in the MAD2 paper [7] and the DCBA [20] paper. 6. Conclusion This paper presents FDF, a fast dual-level fingerprinting scheme, which can fingerprint a dataset both at the file-level and at the chunk-level with high-performance in a single scan of the data contents. Experimental results reveal that our FDF scheme can significantly outperform the two-stage fingerprinting approach by using only a small fraction of memory resources of the latter. Most importantly, the FDF scheme can generally match the throughput of a gigabit network adapter while being fully pipelined. Deduplication results based on real-world data sets show that eliminating duplicate files can greatly reduce the duplicate-lookup complexity at the chunk level. Further, millions or even tens of millions of hot zero-chunks have been captured and pre-eliminated while processing data contents belonging to non-duplicate files. 7. Acknowledgement This work is supported in part by the National Basic Research Program (973 Program) of China under Grant No. 2011CB and the National High Technology Research and Development Program (863 Program) of China under Grant No. 2009AA01A References [1] Tianming Yang, Dan Feng, Zhongying Niu, Yaping Wan, "Scalable high performance de-duplication backup via hash join", Journal of Zhejiang University-SCIENCE C, vol. 11, no. 5, pp , [2] Lawrence L. You, Kristal T. Pollack, Darrell D. E. Long, "Deep Store: An Archival Storage System Architecture", In Proceedings of the 21st International Conference on Data Engineering (ICDE), pp , [3] Jingli Zhou, Ke Liu, Leihua Qin, Xuejun Nie, "Block-Ranking: Content Similarity Retrieval Based on Data Partition in Network Storage Environment", JDCTA: International Journal of Digital Content Technology and its Applications, vol. 4, no. 3, pp.85-94, [4] Calicrates Policroniades, Ian Pratt, "Alternatives for Detecting Redundancy in Storage Systems Data", In Proceedings of the General Track: 2004 USENIX Annual Technical Conference, pp.73-86, [5] Benjamin Zhu, Kai Li, Hugo Patterson, "Avoiding the Disk Bottleneck in the Data Domain Deduplication File System", In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), pp , [6] Yujuan Tan, Hong Jiang, Dan Feng, Lei Tian, Zhichao Yan, Guohui Zhou, "SAM: A Semantic-Aware Multi-Tiered Source De-duplication Framework for Cloud Backup", In Proceedings of the 39th International Conference on Parallel Processing (ICPP), pp ,

12 [7] Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng, "MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services", In Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), [8] Deepavali Bhagwat, Kave Eshghi, Darrell D. E. Long, Mark Lillibridge, "Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup", In Proceedings of the 17th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp , [9] Michael O. Rabin, "Fingerprinting by random polynomials", Technical Report, No. TR-15-81, Center for Research in Computing Technology, Harvard University, Cambridge, MA, USA, [10] Udi Manber, "Finding Similar Files in a Large File System", In Proceedings of the Winter 1994 USENIX Technical Conference, pp.1-10, [11] Purushottam Kulkarni, Fred Douglis, Jason LaVoie, John M. Tracey, "Redundancy Elimination within Large Collections of Files", In Proceedings of the General Track: 2004 USENIX Annual Technical Conference, pp.59-72, [12] Andrei Z. Broder, "Some applications of Rabin's fingerprinting method", Sequences II: Methods in Communications, Security, and Computer Science, Springer-Verlag, New York, USA, pp , [13] Keren Jin, Ethan L. Miller, "The Effectiveness of Deduplication on Virtual Machine Disk Images", In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, [14] Dirk Meister, André Brinkmann, "Multi-Level Comparison of Data Deduplication in a Backup Scenario", In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, [15] Kave Eshghi, Hsiu Khuern Tang, "A Framework for Analyzing and Improving Content-Based Chunking Algorithms", Technical Report, No. HPL (R.1), Hewlett-Packard Laboratories, Palo Alto, CA, USA, [16] Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, David H. C. Du, Dongsheng Wang, "ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System", In Proceedings of the 5th IEEE Int. Workshop on Storage Network Architecture and Parallel I/Os (SNAPI), pp.29-35, [17] Erik Kruus, Cristian Ungureanu, Cezary Dubnicki, "Bimodal Content Defined Chunking for Backup Streams", In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), pp , [18] Can Wang, Zhiguang Qin, Lei Yang, Peng Nie, "Improved Deduplication Method based on Variable-Size Sliding Window", JDCTA: International Journal of Digital Content Technology and its Applications, vol. 5, no. 9, pp.80-87, [19] Guanlin Lu, Yu Jin, David H. C. Du, "Frequency Based Chunking for Data De-Duplication", In Proceedings of the 18th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp , [20] Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng, Hua Wang, "Detecting Duplicates over Sliding Windows with RAM-Efficient Detached Counting Bloom Filter Arrays", in Proceedings of the 6th IEEE International Conference on Networking, Architecture, and Storage (NAS), pp ,

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

A Novel Deduplication Avoiding Chunk Index in RAM

A Novel Deduplication Avoiding Chunk Index in RAM A Novel Deduplication Avoiding Chunk Index in RAM 1 Zhike Zhang, 2 Zejun Jiang, 3 Xiaobin Cai, 4 Chengzhang Peng 1, First Author Northwestern Polytehnical University, 127 Youyixilu, Xi an, Shaanxi, P.R.

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

Security Ensured Redundant Data Management under Cloud Environment

Security Ensured Redundant Data Management under Cloud Environment Security Ensured Redundant Data Management under Cloud Environment K. Malathi 1 M. Saratha 2 1 PG Scholar, Dept. of CSE, Vivekanandha College of Technology for Women, Namakkal. 2 Assistant Professor, Dept.

More information

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong,

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

Inline Deduplication

Inline Deduplication Inline Deduplication binarywarriors5@gmail.com 1.1 Inline Vs Post-process Deduplication In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e.

More information

ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering

ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering ABSTRACT Deduplication due to combination of resource intensive

More information

Tradeoffs in Scalable Data Routing for Deduplication Clusters

Tradeoffs in Scalable Data Routing for Deduplication Clusters Tradeoffs in Scalable Data Routing for Deduplication Clusters Wei Dong Princeton University Fred Douglis EMC Kai Li Princeton University and EMC Hugo Patterson EMC Sazzala Reddy EMC Philip Shilane EMC

More information

Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity

Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Youngjin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 712-714

More information

Frequency Based Chunking for Data De-Duplication

Frequency Based Chunking for Data De-Duplication Frequency Based Chunking for Data De-Duplication Guanlin Lu, Yu Jin, and David H.C. Du Department of Computer Science and Engineering University of Minnesota, Twin-Cities Minneapolis, Minnesota, USA (lv,

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based

More information

Edelta: A Word-Enlarging Based Fast Delta Compression Approach

Edelta: A Word-Enlarging Based Fast Delta Compression Approach : A Word-Enlarging Based Fast Delta Compression Approach Wen Xia, Chunguang Li, Hong Jiang, Dan Feng, Yu Hua, Leihua Qin, Yucheng Zhang School of Computer, Huazhong University of Science and Technology,

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

Metadata Feedback and Utilization for Data Deduplication Across WAN

Metadata Feedback and Utilization for Data Deduplication Across WAN Zhou B, Wen JT. Metadata feedback and utilization for data deduplication across WAN. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(3): 604 623 May 2016. DOI 10.1007/s11390-016-1650-6 Metadata Feedback

More information

A SURVEY ON DEDUPLICATION METHODS

A SURVEY ON DEDUPLICATION METHODS A SURVEY ON DEDUPLICATION METHODS A.FARITHA BANU *1, C. CHANDRASEKAR #2 *1 Research Scholar in Computer Science #2 Assistant professor, Dept. of Computer Applications Sree Narayana Guru College Coimbatore

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Data Deduplication. Hao Wen

Data Deduplication. Hao Wen Data Deduplication Hao Wen What Data Deduplication is What Data Deduplication is Dedup vs Compression Compression: identifying redundancy within a file. High processor overhead. Low memory resource requirement.

More information

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup International Journal of Machine Learning and Computing, Vol. 4, No. 4, August 2014 Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup M. Shyamala

More information

sulbhaghadling@gmail.com

sulbhaghadling@gmail.com www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

A Survey on Data Deduplication in Cloud Storage Environment

A Survey on Data Deduplication in Cloud Storage Environment 385 A Survey on Data Deduplication in Cloud Storage Environment Manikantan U.V. 1, Prof.Mahesh G. 2 1 (Department of Information Science and Engineering, Acharya Institute of Technology, Bangalore) 2 (Department

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract

More information

0 %1 10 & ( )! 2/(( 3 & (! 4! 5.000! 6& /6&7.189 2 ( 2 &7

0 %1 10 & ( )! 2/(( 3 & (! 4! 5.000! 6& /6&7.189 2 ( 2 &7 !!! #! %& ( ) +,. /.000. 1+,, 1 1+, 0 %1 10 & ( ).000. 1+,, 1 1+, 0 %1 10 & ( )! 2/(( 3 & (! 4! 5.000! 6& /6&7.189 2 ( 2 &7 4 ( (( 1 10 & ( : Dynamic Data Deduplication in Cloud Storage Waraporn Leesakul,

More information

Cloud De-duplication Cost Model THESIS

Cloud De-duplication Cost Model THESIS Cloud De-duplication Cost Model THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Christopher Scott Hocker

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

A Data De-duplication Access Framework for Solid State Drives

A Data De-duplication Access Framework for Solid State Drives JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science

More information

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup M. Shyamala Devi and Steven S. Fernandez Abstract Cloud Storage provide users with abundant storage

More information

A Policy-based De-duplication Mechanism for Securing Cloud Storage

A Policy-based De-duplication Mechanism for Securing Cloud Storage International Journal of Electronics and Information Engineering, Vol.2, No.2, PP.70-79, June 2015 70 A Policy-based De-duplication Mechanism for Securing Cloud Storage Zhen-Yu Wang 1, Yang Lu 1, Guo-Zi

More information

A Policy-based De-duplication Mechanism for Securing Cloud Storage

A Policy-based De-duplication Mechanism for Securing Cloud Storage International Journal of Electronics and Information Engineering, Vol.2, No.2, PP.95-102, June 2015 95 A Policy-based De-duplication Mechanism for Securing Cloud Storage Zhen-Yu Wang 1, Yang Lu 1, Guo-Zi

More information

A Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage

A Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage A Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage YAN-KIT Li, MIN XU, CHUN-HO NG, and PATRICK P. C. LEE The Chinese University of Hong Kong Backup storage systems often remove

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2 Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...

More information

CURRENTLY, the enterprise data centers manage PB or

CURRENTLY, the enterprise data centers manage PB or IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 61, NO. 11, JANUARY 21 1 : Distributed Deduplication for Big Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member, IEEE,

More information

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets Young Jin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 7-7 Email:

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

A Survey on Deduplication Strategies and Storage Systems

A Survey on Deduplication Strategies and Storage Systems A Survey on Deduplication Strategies and Storage Systems Guljar Shaikh ((Information Technology,B.V.C.O.E.P/ B.V.C.O.E.P, INDIA) Abstract : Now a day there is raising demands for systems which provide

More information

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble HP Labs UC Santa Cruz HP

More information

Online Remote Data Backup for iscsi-based Storage Systems

Online Remote Data Backup for iscsi-based Storage Systems Online Remote Data Backup for iscsi-based Storage Systems Dan Zhou, Li Ou, Xubin (Ben) He Department of Electrical and Computer Engineering Tennessee Technological University Cookeville, TN 38505, USA

More information

An Efficient Deduplication File System for Virtual Machine in Cloud

An Efficient Deduplication File System for Virtual Machine in Cloud An Efficient Deduplication File System for Virtual Machine in Cloud Bhuvaneshwari D M.E. computer science and engineering IndraGanesan college of Engineering,Trichy. Abstract Virtualization is widely deployed

More information

AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment

AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment !000111111 IIIEEEEEEEEE IIInnnttteeerrrnnnaaatttiiiooonnnaaalll CCCooonnnfffeeerrreeennnccceee ooonnn CCCllluuusssttteeerrr CCCooommmpppuuutttiiinnnggg AA-Dedupe: An Application-Aware Source Deduplication

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

ABSTRACT 1 INTRODUCTION

ABSTRACT 1 INTRODUCTION DEDUPLICATION IN YAFFS Karthik Narayan {knarayan@cs.wisc.edu}, Pavithra Seshadri Vijayakrishnan{pavithra@cs.wisc.edu} Department of Computer Sciences, University of Wisconsin Madison ABSTRACT NAND flash

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed

More information

DEDUPLICATION has become a key component in modern

DEDUPLICATION has become a key component in modern IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 3, MARCH 2016 855 Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge Min

More information

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm GUIPING WANG 1, SHUYU CHEN 2*, AND JUN LIU 1 1 College of Computer Science Chongqing University No.

More information

Online De-duplication in a Log-Structured File System for Primary Storage

Online De-duplication in a Log-Structured File System for Primary Storage Online De-duplication in a Log-Structured File System for Primary Storage Technical Report UCSC-SSRC-11-03 May 2011 Stephanie N. Jones snjones@cs.ucsc.edu Storage Systems Research Center Baskin School

More information

Efficiently Storing Virtual Machine Backups

Efficiently Storing Virtual Machine Backups Efficiently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract Physical level backups offer increased performance

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, and Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

BALANCING FOR DISTRIBUTED BACKUP

BALANCING FOR DISTRIBUTED BACKUP CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based

More information

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

Trends in Enterprise Backup Deduplication

Trends in Enterprise Backup Deduplication Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data

More information

Hardware/Software Guidelines

Hardware/Software Guidelines There are many things to consider when preparing for a TRAVERSE v11 installation. The number of users, application modules and transactional volume are only a few. Reliable performance of the system is

More information

ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory

ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory Biplob Debnath Sudipta Sengupta Jin Li Microsoft Research, Redmond, WA, USA University of Minnesota, Twin Cities, USA Abstract Storage

More information

VM-Centric Snapshot Deduplication for Cloud Data Backup

VM-Centric Snapshot Deduplication for Cloud Data Backup -Centric Snapshot Deduplication for Cloud Data Backup Wei Zhang, Daniel Agun, Tao Yang, Rich Wolski, Hong Tang University of California at Santa Barbara Pure Storage Inc. Alibaba Inc. Email: wei@purestorage.com,

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Read Performance Enhancement In Data Deduplication For Secondary Storage

Read Performance Enhancement In Data Deduplication For Secondary Storage Read Performance Enhancement In Data Deduplication For Secondary Storage A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Pradeep Ganesan IN PARTIAL FULFILLMENT

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 #1 You can

More information

A DESIGN OF METADATA SERVER CLUSTER IN LARGE DISTRIBUTED OBJECT-BASED STORAGE

A DESIGN OF METADATA SERVER CLUSTER IN LARGE DISTRIBUTED OBJECT-BASED STORAGE A DESIGN OF METADATA SERVER CLUSTER IN LARGE DISTRIBUTED OBJECT-BASED STORAGE Jie Yan, Yao-Long Zhu, Hui Xiong, Renuga Kanagavelu, Feng Zhou, So LihWeon Data Storage Institute, DSI building, 5 Engineering

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

CDStore: Toward Reliable, Secure, and Cost- Efficient Cloud Storage via Convergent Dispersal

CDStore: Toward Reliable, Secure, and Cost- Efficient Cloud Storage via Convergent Dispersal CDStore: Toward Reliable, Secure, and Cost- Efficient Cloud Storage via Convergent Dispersal Mingqiang Li, Chuan Qin, and Patrick P. C. Lee, The Chinese University of Hong Kong https://www.usenix.org/conference/atc15/technical-session/presentation/li-mingqiang

More information

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Primary Data Deduplication Large Scale Study and System Design

Primary Data Deduplication Large Scale Study and System Design Primary Data Deduplication Large Scale Study and System Design Ahmed El-Shimi Ran Kalach Ankit Kumar Adi Oltean Jin Li Sudipta Sengupta Microsoft Corporation, Redmond, WA, USA Abstract We present a large

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Comprehensive study of data de-duplication

Comprehensive study of data de-duplication International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV Comprehensive study of data de-duplication Deepak Mishra School of Information Technology, RGPV hopal, India Dr. Sanjeev Sharma

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Fragmentation in in-line. deduplication backup systems

Fragmentation in in-line. deduplication backup systems Fragmentation in in-line 5/6/2013 deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin Barczynski, Wojciech Kilian, Cezary Dubnicki.

More information

Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays

Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays Lei Tian ltian@hust.edu.cn Hong Jiang jiang@cse.unl.edu Dan Feng dfeng@hust.edu.cn

More information

Understanding Data Locality in VMware Virtual SAN

Understanding Data Locality in VMware Virtual SAN Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Symantec Research Labs Symantec FY 2013 (4/1/2012 to 3/31/2013) Revenue: $ 6.9 billion Segment Revenue Example Business

More information

A SCALABLE DEDUPLICATION AND GARBAGE COLLECTION ENGINE FOR INCREMENTAL BACKUP

A SCALABLE DEDUPLICATION AND GARBAGE COLLECTION ENGINE FOR INCREMENTAL BACKUP A SCALABLE DEDUPLICATION AND GARBAGE COLLECTION ENGINE FOR INCREMENTAL BACKUP Dilip N Simha (Stony Brook University, NY & ITRI, Taiwan) Maohua Lu (IBM Almaden Research Labs, CA) Tzi-cker Chiueh (Stony

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure Q1 2012 Maximizing Revenue per Server with Parallels Containers for Linux www.parallels.com Table of Contents Overview... 3

More information