System & Network Engineering Large Installation Administration Comparing open source deduplication performance for virtual machines Authors: bhoekstra@os3.nl dwagenaar@os3.nl Supervisor: Jaap van Ginkel University of Amsterdam
This page is left blank intentionally.
Abstract The research presented in this paper shows in which specific situations the most efficient use of storage resources can be achieved using open source deduplication solutions, LessFS and ZFS, while at the same time maintaining an acceptable level of disk performance. This study specifically aims at the disk performance for virtual guests that have their disks stored at the deduplicated file systems. The results of this study show that optimal performance can be achieved not using any data deduplication solution, especially when virtual guests need to perform heavy write actions. However, if you want to save huge amounts of storage, only a small expense in performance is paid when using ZFS as the deduplication solution. Furthermore, the research shows that ZFS is best used as the open source deduplication solution as it performs best in most of the test cases while still maintaining a high level of storage consolidation. Finally, the research shows the effectiveness of both deduplication solutions in storage consolidation and it can be seen that LessFS is the most efficient in this task.
CONTENTS 1 Contents 1 Introduction 2 1.1 Data deduplication.......................... 2 2 Research 5 2.1 Research questions.......................... 5 2.2 Literature review........................... 5 2.3 Open source solutions........................ 6 2.4 Performance measurement tools................... 7 2.5 Disk performance determining factors............... 8 3 Experiments 9 3.1 Lab setup............................... 9 3.2 Storage measurements........................ 10 4 Results 15 4.1 Sequential write............................ 15 4.2 Sequential read............................ 15 4.3 Random rewrite............................ 16 4.4 Random seeks............................. 16 4.5 Storage consolidation......................... 17 5 Conclusion 20 6 Future work 21 A Server hardware 22 A.1 Storage server............................. 22 A.2 Host server.............................. 22 A.3 Virtual guest server......................... 23
1 INTRODUCTION 2 1 Introduction In an ever growing and competitive hosting market, keeping costs down is one of the more important factors. It is therefore important to make the most efficient use of resources while keeping the same level of performance. With the increase in bandwidth speeds, larger data files and increasing storage requirements a solution for efficient and fast data storage is needed. To this end data deduplication [1] has gained enormous popularity as an efficient way of storing data. 1.1 Data deduplication Storage systems are often dedicated to storing large amounts of data. Storage systems store this data on its file system in the form of blocks of a certain size. For instance, take a system administrator who has configured nightly backup jobs of all the desktop machines in his organization. These backups are to be stored on the central backup storage system. Most likely, each backup will contain many similar data files, which will require the same amount of storage space each time it has to be stored on the storage system containing all the backups of the organization. An example of efficient resource utilization is virtualization [2]. Many bare-metal systems that are providing only a single service, e.g. a DNS or Mail server, often do not fully utilize the available resources. By using virtualization, a bare-metal server can be used to run multiple virtual operating systems using virtualization technology. From a user point of view, the OS appears isolated and standalone. Effectively, one system can be used to create multiple, virtual, systems increasing the utilization of the resources of the bare-metal server. However, each virtualized system still requires the storage of its operating system files, which is often stored on a central storage repository accessible over the network. In scenarios where the same OS is virtualized more than once, the same files are likely to be stored more than once as well. According to Keren Jin and Ethan L. Miller [3], making use of data deduplication for storing virtual disks helps considerably in consolidating storage. However, practical data deduplication implementations and virtual guest performance have not been researched. Data deduplication is a method to eliminate duplicate copies of the same data blocks, which can be used to reduce the required amount of space in storage systems like described in the examples above. There are different types of data deduplication, which we will discuss further in this paper. By using data deduplication techniques on a storage system, data that is stored on it is checked for duplicate blocks. When identical data blocks are found, the system will create a pointer to the data block that was already present on the system. This approach only requires the storage space for one file together with some overhead to store the pointers.
1 INTRODUCTION 3 1.1.1 Types of deduplication Data deduplication differs from regular compression algorithms in the sense that it actually works on the block level of a storage device, instead of on the files themselves. As there are many more data blocks than there are data files, it is therefore most likely that many blocks are the same in contrast to duplicate data files. This potentially allows for more amounts of duplicated data that can be saved. A downside is that the process itself is a CPU intensive task, as new data needs to be matched against previously stored data. There are different forms of deduplication that can be used in various scenarios. The usage of each type depends on the storage and network setup and requirements. We can distinguish between the location, the time and the method of deduplication. Location The location of where the deduplication is done is an important factor to take into account. For instance, when there is a limited amount of bandwidth available, the maximum amount of time a backup can take will increase. The choice where deduplication has to take place is dependent on the amount of bandwidth available. Depending on how the storage infrastructure is designed, choices have to be made on how to utilize it. Deduplication methods can be applied at the source the data is stored, or at the target. In the example above, applying source deduplication can shorten the backup time, as there is less data to transfer when transferring the backup to the backup system. With target deduplication, more bandwidth is required, plus the target storage system would initially have to require more disk space in contrast to source deduplication. Time A storage system with a high system load is undesired at times when the system is actively used. The time to perform deduplication on your data can influence the performance of a storage system and can be noticed by the direct or indirect users of such a system. Depending on the function of the storage system, it may be wise to let the deduplication algorithms run when the system is not actively used, for instance, at night. Method The method of deduplication defines what type of deduplication technique is performed on the data to reduce the amount of space it consumes. 1.1.2 Deduplication methods There are different deduplication methods of how the actual data is stored and how they are reduced in file size. The most common methods are described below. In-line deduplication is a form of source deduplication where the deduplication mechanism examines the data that is received (e.g. via the network).
1 INTRODUCTION 4 If a duplicate block of data is detected, a new pointer to the existing block is created. Besides the pointer, no data is written to the storage system in general. A downside to this is that it slows down the system if a large amount of data is being received [4]. Out-of-line deduplication is, as the name suggests, the opposite of in-line deduplication. Deduplication is performed at the target data storage. This method is also known as post-process deduplication. The data is first received on disk, where it is deduplicated at a later time, for example, to avoid a loss in performance during business hours. With this form of deduplication, more storage space is required to store the initial data first, before it is duplicated [4]. File-based deduplication, which is commonly known as Content Addressable Storage (CAS) [5] is a deduplication method on file level instead of block level. The methods described above each have their own pro s and con s. Which one to implement is depending on the requirements and utilization of a storage system. Deduplication impacts the write performance on a system, so this is an important aspect to take into account when choosing the most suitable method.
2 RESEARCH 5 2 Research Our research focuses on the use of different open source data deduplication solutions as storage platforms for virtual disks and the amount of storage that can be consolidated. Furthermore, we want to see if it is possible to gain insight into the impact on the performance of the virtual guests with virtual disks stored on a deduplicated storage platform. 2.1 Research questions Based on the description above, we defined the following research question. What is the impact on virtual guest disk performance when using open source data deduplication solutions? The following subquestions will help to answer the main research question. What is the amount of storage saved when a deduplication mechanism is applied? How does increasing amounts of storage consolidation influence the virtual disk performance? 2.2 Literature review Prior to the start of this study, we have looked into existing related research on the subject. Keren Jin and Ethan L. Miller have done research on the effectiveness of data deduplication of virtual disks in their paper The Effectiveness of Deduplication on Virtual Machine Disk Images [3]. They found that using data deduplication on virtual disks can save about 80% of storage. However, the methods used are purely conceptual and proof of concept as they treat each virtual disk as a byte stream and separate the stream into chunks of a specific size. For each chunk they calculate a SHA1 hash and if multiple chunks have the same SHA1 hash, they are considered to be duplicates of each other. Whether the same percentage of storage saving can be achieved in practice is not clear. A more practical approach into the effectiveness of data deduplication is taken by Dutch T. Meyer and William J. Bolosky in A Study of Practical Deduplication [6]. Their results show that data deduplication is very effective within a large commercial company. However, they do not look at the use of data deduplication in virtualized environments and only look at Windows based desktop machines. This research shows the effectiveness of data deduplication on a large set of different data between different systems which is relevant for our research.
2 RESEARCH 6 2.3 Open source solutions Companies such as NetApp [7] and EMC [8] provide storage solutions which are often very costly to implement into a business. Such solutions are not affordable by every company as these solutions are proprietary and also require trained engineers or a support contract. Although the latter two can also desired when using an open source solution, an open source solution can be a more viable option. In this study, we specifically look into open source implementations of deduplication mechanisms. Below a few of the open source deduplication solutions available are discussed. SDFS is an open source deduplication solution by OpenDedup. It is capable of performing in-line deduplication [9] on the files stored in its file system. It is a cross platform solution with support for multiple, distributed storage nodes. OpenDedup provides support for in-line and out-of-line (batch) deduplication. They claim to be an open source enterprise deduplication platform that is able to perform deduplication at line speeds of 1 Gigabyte per second or faster. The official homepage contains well documented guides on how to setup an OpenDedup file system on a Linux system. OpenDedup could be an interesting choice for this research, as it provides an easy setup and promises great performance. LessFS provides in-line data deduplication using FUSE [10]. LessFS is a userspace deduplication solution that has support for in-line deduplication, compression and encryption. The official LessFS website focuses on usability by providing easy to follow tutorials on how to setup a simple LessFS-based file system on a Linux system using common tools. The suggested usability of LessFS makes this solution a very viable candidate to be used in our research. s3ql is a source-based deduplication technique that deduplicates data before sending it to the Amazon S3 storage buckets in the cloud [11]. A similar approach is taken by s3fs [12], but it does not provide local deduplication. Both solutions sound very interesting, especially when looked at with a cost-saving point of view. However, cost-saving is not the prime focus of this research and thus not interesting for us to investigate further. ZFS is a well-known file system that has build-in data deduplication. ZFS is a robust and reliable file system originally introduced with the OpenSolaris operating system in 2005. It has support for large storage volumes of up to 16 Exabytes, support for file and folder snapshots and is able to continuously perform file integrity checking [13] to prevent data corruption. More recently, native support for ZFS file systems was included in the FreeBSD operating system [14]. We think ZFS is also interesting to include in our research, as it is a popular file system and has build-in support for deduplication.
2 RESEARCH 7 2.4 Performance measurement tools To perform the actual measurements, we have done research on existing tools that are able to aid us in the measurements. iozone is a file system benchmark tool [15]. It can simulate various workloads (or IO operations) on file systems to measure it s real-life performance [16]. iostat is a similar tool as iozone. It is mostly used on local storage or NFS shares to monitor and measure the IO performance [17]. hdparm is a utility to view and change hard drive parameters to gain optimal drive performance. It can also be used to perform disk throughput tests to measure disk performance [18]. bonnie++ is basically a rewrite in the C++ programming language of a similar tool called bonnie. It has many features to perform the disk performance measurements we require. fio is a popular tool to measure storage performance [19]. It features a comprehensive set of features, such as 13 different IO engines, multiple IO priorities (for newer Linux kernels) and multi-threading. Iperf can measure the maximum performance and bandwidth of TCP and UDP based connections [20]. A client-server model is used to perform measurements on the network performance. For a specified amount of time, a traffic stream is generated between an iperf client and server and network throughput is reported upon completion. Although this tool can t be used to measure IOPS, it is still useful to measure the raw network throughput. A similar tool is Netperf [21].
2 RESEARCH 8 2.5 Disk performance determining factors How a disk performs in a storage system depends on certain factors. The hardware that is used in the system is obviously the most important factor. While hardware has its limitations, software can improve the performance of the system. The most important factors that can influence disk performance in the are the following. CPU (clockspeed, L3 cache size) Storage device (SSD, HDD, PCI) Storage configuration (RAID level) Caching mechanisms (RAM) File system (block size, journaling) IO scheduler (type of algorithm and configuration) Kernel (version, optimizations)
3 EXPERIMENTS 9 3 Experiments We have created a test plan to be able to get consistent test results. The defined test procedures were applied on the different lab setups using different configuration parameters. This method of consistent testing results in comparable measurements. 3.1 Lab setup For us to perform measurements, we set up a lab environment. This test environment consists of a total of six Dell servers which will be used during the experiments. The specific models and hardware specifications are described in Appendix A. Out of the six available machines, five will be used as the hosts which will run multiple virtual guest machines. These virtual guests are installed with a default x64 installation of Debian Squeeze. On top of this the virtualization software Xen will be used [22], which will enable us to run multiple operating systems simultaneously. We chose to use Xen because it is easy to setup, widely used, has a large community and is an enterprise ready virtualization solution. Xen is installed from the Ubuntu repository using the xen-hypervisor-4.1-amd64 package. The sixth machine will be used as a shared storage server on which we will install the deduplication software. This server will share its storage over the network to the five host machines. 3.1.1 Initial network test First, we want to test the performance of the network to confirm the reliability and stability of the network and its speed. This is important so we can be sure that the network is not a limiting factor in achieving our results. Accessing a shared storage resource over a network, results in a slight performance overhead. Because of this, our initial network test will be done using a benchmark tool called iperf, which we have described in Section 2.4. The iperf test was performed without any tweaks to the TCP/IP stack. The test performed is shown below. hosta~# iperf -s hostb~# iperf -c hosta iperf command. An average network speed of 939 Mbits/sec is measured using the shown method to set up a server and client connection. The test generates traffic from memory,
3 EXPERIMENTS 10 so disk performance is not measured or taken into account during this test. The results shown a near line-speed network bandwidth, which proves a stable network. 3.2 Storage measurements 3.2.1 Storage setup As we want to be able to perform measurements using a shared storage system, we have to setup a server that will provide these services. We choose to use the iscsi protocol to share the file system on the shared storage server to the different virtual host machines. For every virtual guest in the lab setup, we create a separate sparse file containing the operating system. Normally, when storing files on disk, the size of the file can be requested by issuing a specific system call to the operating system. A special way of storing files is by doing this in a sparse way. Sparse files are different than normal files in the sense that empty parts of the file are not actually stored on disk, but are represented using meta data, thus saving valuable storage space on the file system [23]. Each sparse file belonging to a separate virtual guest is represented as an iscsi volume (LUN), which was mounted on the host where the virtual guest would run. Each of the five hosts would mount the shared iscsi LUNs to provide the hosted virtual guests with disk space. Each virtual guest would have a separate physical hard disk of 5 Gigabyte configured in its Xen configuration file which represented the actual iscsi LUN mounted on the host. This setup enables the virtual guest to directly boot from the iscsi LUN located on the shared storage server. The setup is visualized and shown in Figure 1. The first step is to perform measurements on the lab setup without any deduplication solution running on the shared storage server. The storage server is installed using the EXT4 file system. We choose to use EXT4 as the base file system for the storage server as it is a modern file system with great performance, as benchmarked in [24]. By performing measurements on the EXT4 file system, we are able to gain insight in the basic performance of the lab setup. A baseline of the performance is created, which in turn enables us to compare it against the deduplication solutions. To be able to compare the baseline performance of the lab setup with the deduplicated setup we have to install multiple open source deduplication solutions. We choose two solutions of solutions discussed in Section 2.3 to perform measurements on.
3 EXPERIMENTS 11............ DomU Dom0 Storage backend Filesystem Disk iscsi connection............ Figure 1: Lab setup. Initially we have selected OpenDedup s SDFS and the ZFS file system. However, we were unsuccessful in maintaining stability when performing measurements on the SDFS solution under high loads. As time is a limiting factor for this project, we choose to use ZFS in a separate installation of FreeBSD 9.and LessFS instead of SDFS. LessFS and ZFS seem to be two of the most popular open source deduplication solutions that perform in-line deduplication. LessFS runs in user space, while ZFS runs in kernel space (a native implementation) in the FreeBSD 9 installation. Currently ZFS for FreeBSD is the only native file system with deduplication capabilities. While both solutions fundamentally differ from each other in the way they are run, we believe it is still very interesting to see a comparison. The drawback of running an application in user space is that more context switching is required to switch from user to kernel space. Some might argue that comparing a user space solution to a kernel space solution is unfair. While this may be true in essence, we believe the comparison is still valid as real world practice shows that, mostly due to management constrains, using FreeBSD in a strict Linux environment is often not a possibility. This results in the need for a different solution which, currently, automatically means a user space implementation is the only option.
3 EXPERIMENTS 12 3.2.2 Lab measurements The measurements will be conducted on the lab setup simultaneously running 10, 20 and 30 virtual guests respectively. By using increasing amounts of virtual guests, we were able to compare the differences in load on every measurement. In turn, the results can be used to draw conclusions on what the best performing solution is and to determine what to take into account when deploying deduplication in a similar way. To be able to calculate average values, the measurements are done a total of three times using the bonnie++ tool which we briefly discussed in Section 2.4. The reason for using bonnie++ for benchmarking is because it is a widely used tool for measuring disk performance which is able to run all of the benchmarks we require for this research. The version used is version 1.96 of bonnie++. To be able to efficiently perform measurements on the lab setup, we had to automate the deployment and bootstrapping of the virtual guests on the host machines over iscsi. To automate this, scripts are created. The automation part consists of two parts. The first script acts as a central control script on the storage server, which in turn contacts a second script on the hosts using SSH public key methods for password-less login. This second script is able to dynamically mount the iscsi LUNS required for the given amount of virtual guests that we require to boot on the lab setup. To start or stop a virtual environment consisting of 30 virtual guests, running on our lab setup, we would issue the following command on the control server: Control command to boot the virtual lab environment. control_server:~# /lab-control.sh start 30 control_server:~# /lab-control.sh stop 30 The 30 virtual guests are equally distributed over the five host servers. The control script contacts all five hosts to dynamically mount the iscsi mounts located on the storage server and boots the Xen virtual guests on each respective host. The scripts are available on request. 3.2.3 Measuring the performance There are multiple tools available for Linux distributions that are suitable for performance measurements [25]. In Section 2.4, we have looked into several of the available tools that are able to test hard drive and system performance [26].
3 EXPERIMENTS 13 Out of these tools we found that the most suitable tool for our experiments was bonnie++. Bonnie++ performs four types of tests, which can be divided into three categories. sequential output, sequential input and random seeks. Some of these tests require data being present on the disk before they can be performed. For each of these tests, bonnie++ first writes a file with random data to disk. Sequential output The sequential output category consist out of two tests. The first test is the writing of blocks of data sequentially to disk. The second test in this category is the rewrite test. This test seeks 8000 times in parallel a part of the generated file and reads it to memory. In addition, in 10% of the cases, the read data is changed and written back to disk. Both of these tests are aimed at measuring how fast files can be written to disk. Sequential input In the sequential input category, the test performed is the sequential reading of data from disk into memory. Effectively this test measures how fast files can be read from disk into memory so they can be used by the user or arbitrary applications. Random seeks Finally, the random seeks category contains a test for randomly reading data from the disk in parallel. This test focuses on measuring how fast data can be read from different parts of the disk simultaneously [26].
3 EXPERIMENTS 14 We used the following bonnie++ command in combination with standard GNU tools to get the desired output. Bonnie++ command. $ /usr/sbin/bonnie++ -n 0 -u 0 -r free -m grep Mem: \ awk {print $2} -s $(echo "scale=0; free -m grep Mem: \ awk {print $2} *2" bc -l) -f -b -d /tmp/ > /tmp/bonnie.output The command does the following: Command / Argument Description /usr/sbin/bonnie++ Run the bonnie++ tool. -n 0 Disable file creation. -u 0 Set the UID to 0 (root) -r free -m... Specify the amount of memory in Megabytes -s $(echo... File size for IO performance tests in MB times 2 -f Skips per-character IO tests. -b Don t use write buffering. fsync() after every write. -d The directory to use for the tests. Table 1: Description of bonnie++ used arguments. When running bonnie++ in parallel with the parameters mentioned in tables 1, it will write small files, similar to a real-world scenario. We are skipping per-character IO writes as we want to perform sequential writes to the storage system. The combination using files twice as large as the maximum amount of memory, in combination with disabling write buffering, we completely bypass the operating systems caching mechanisms.
4 RESULTS 15 4 Results 4.1 Sequential write Figure 2 shows the average performance of a virtual guest when 10, 20 or 30 virtual guests are writing to disk simultaneously. These results are as expected since writing to an EXT4 file system requires less actions to be taken before the data can be stored on disk. In the case where a deduplication solution is used, overhead in the form of deduplication and compression is introduced which makes the writing slower compared to EXT4. 14000 12000 10000 KByte/s 8000 6000 4000 2000 EXT4 ZFS LessFS 0 10VM 20VM 30VM Amount of VMs Figure 2: Average write performance. 4.2 Sequential read For the results of the read tests, we can take a look at figure 3. The results gathered are not as expected and show inconsistent behavior when these tests are performed with different amounts of virtual guests. However, interestingly both deduplication solutions gradually perform less in contrast to EXT4 when more virtual guests are being used. The data that needs to be read is stored in a compressed state on disk and therefore is read faster into memory since less data needs to be read. The decompression process that follows is needed to get the complete set of data. This process is solely a CPU intensive task. However, when 20 or 30 virtual guests are used, we can see a rapid decline in the performance of the deduplication solutions. This is because the CPU of the storage backend now becomes the bottleneck for these same decompression actions that need to be taken. We therefore believe that the reason for the better performance of the deduplication
4 RESULTS 16 solutions when tests are run for 10 virtual guests is because of the compression employed by both deduplication solutions. KByte/s 40000 35000 30000 25000 20000 15000 10000 5000 0 10VM 20VM 30VM Amount of VMs EXT4 ZFS LessFS Figure 3: Average read performance. 4.3 Random rewrite The results for the rewrite tests are shown in figure 4. In this graph, you can immediately see that LessFS is outperformed by ZFS and EXT4. Furthermore, ZFS either performs almost as good as EXT4 or even better. The reason ZFS actually performs better when using 20 or 30 virtual guests can be contributed to the use of dynamic block sizes of up to 128 Kilobytes [13]. When this test is conducted with 10 virtual guests, the size of the blocks used in both EXT4 and ZFS have no real impact as enough disk resources are available to accommodate all the disk access requests by the virtual guests. However, for 20 and 30 virtual guests, this is no longer the case. But because ZFS uses a dynamic block size, smaller blocks can actually be read, changed and written faster compared to the fixed size blocks of EXT4 and the even larger blocks of LessFS. 4.4 Random seeks As mentioned in section 3.2.3, the seek test actually tries to seek a part of the test file in parallel. It does this 8000 times. And in 10% of the cases, the read data is changed and written back. The results for this test can be seen in figure 5. The results are as expected because reading random data from disk, and sometimes writing data back to disk is directly affected by the number of actions needed to complete these tasks. We believe that mainly the
4 RESULTS 17 KByte/s 8000 7000 6000 5000 4000 3000 2000 1000 0 10VM 20VM 30VM Amount of VMs EXT4 ZFS LessFS Figure 4: Average rewrite performance. decompression action is the bottleneck, as trying to decompress something 8000 times in parallel is a very CPU intensive task. Also interesting to see is that LessFS outperforms ZFS structurally for the first time, albeit not by much.this is most likely due to the fact that the compression algorithm used by LessFS is actually a little faster in decompressing data than the compression algorithm used by ZFS. LessFS uses the QuickLZ algorithm to apply additional compression on top of the deduplication. QuickLZ developers claim to be the world s fastest compression library and that they can reach over 358 Megabytes per second in decompression throughput [27]. The ZFS pool we configured uses the LZJB compression algorithm, which we believe is a slower decompression algorithm when compared to QLZ. We base this belief on [28] in which LZJB is compared against LZO, the latter being significantly slower than QuickLZ. It is shown that decompression rates of LZJB are relatively poor, especially for smaller blocks of data. 4.5 Storage consolidation Figures 6, 7 and 8 show the reported usage and the actual file system usage. As discussed in Section 3.2.1, sparse files have been used and get interpreted correctly by the operating system in case of the EXT4 file system. Therefore, we can see the same size for both the reported and actual size in the case of the EXT4 file system. The most interesting thing about these results is the amount of reported usage by LessFS. This is due to the fact that LessFS does not take into account the sparse properties of the files and therefore reports the full size for each file. ZFS
4 RESULTS 18 800 700 600 Seeks/s 500 400 300 200 100 0 10VM 20VM 30VM Amount of VMs EXT4 ZFS LessFS Figure 5: Average seek performance. on the other hand does take the sparse properties in account and in addition also accounts for the used compression. This results in the reported size being just below the size reported by the EXT4 file system. 50000 45000 40000 35000 30000 MBytes 25000 20000 Reported Real 15000 10000 5000 0 EXT4 LessFS ZFS Deduplica0on method Figure 6: Used space for 10 virtual guests For a better look at the amounts of space reported, used and saved, we can take a look at Table 2. This table shows that LessFS actually consolidates more storage when compared to ZFS and EXT4.
4 RESULTS 19 100000 90000 80000 70000 60000 MBytes 50000 40000 Reported Real 30000 20000 10000 0 EXT4 LessFS ZFS Deduplica0on method Figure 7: Used space for 20 virtual guests 160000 140000 120000 100000 MBytes 80000 60000 Reported Real 40000 20000 0 EXT4 LessFS ZFS Deduplica0on method Figure 8: Used space for 30 virtual guests 10 VM 20 VM 30 VM Real Reported Real Reported Real Reported EXT4 4594,604 4594,604 21188,584 21188,584 36024,68 36024,68 LessFS 251,052 48828,127 252,26 97656,252 252,812 146484,377 ZFS 323 3295,62 434 6827,363328 525 9989,535 Table 2: Detail storage consumption (in MB)
5 CONCLUSION 20 5 Conclusion In general you can get better performance when not using any data deduplication solution, especially when virtual guests need to perform heavy write actions. However, in some specific cases it might actually prove more fruitful to use the deduplication features of ZFS. In cases where large amounts of virtual guests need to read data, change it and write it back, ZFS can get you better performance than using, for instance, EXT4. In most cases, ZFS outperforms LessFS as can be seen in our results. We believe this is mostly because of the fact that ZFS is a native file system, which is implemented in kernel space, as to where LessFS is implemented in user space and requires the FUSE libraries. This extra layer adds additional overhead, which in turn results in slower performance. The results clearly show that using a deduplication solution is very effective in saving storage. It hardly has no effect on the actual amount of storage used when you save 10 times the same data or 30 times the same data. However, if you correlate the results of the tests and the amount of storage space saved, the performance of all the file systems drop considerably when more virtual guests are run concurrently, which is expected. However, our results show no tangible evidence that the amount of storage consolidation actually influences the rate of decrease in performance for deduplication solutions. We can argue that in an environment on which you have virtual guests with mixed functions and you want to save as much space as possible without losing to much performance, the best way to go is to use ZFS. ZFS saves more space compared to EXT4 and performs better when compared to LessFS. Finally, in situations when disk performance is not as important as storage consolidation, LessFS is the better choice as it is the most efficient deduplication solution for saving space.
6 FUTURE WORK 21 6 Future work Due to time limitations, not every aspect was feasible to research extensively. This section discusses some ideas for further and future research in this area. Different hypervisor In this study, we have chosen Xen as the main hypervisor to perform testing with. An interesting extension to this research would be to compare it with different hypervisors, such as VMware or KVM. Replicated performance In cases, like failover clusters, where multiple virtual machines need to stay in sync and thus write the same data to disk, different results could be achieved. It might be interesting to see the measurements when identical blocks of data are written and read from disk of the different virtual machines. Different deduplication methods It might be interesting to see the performance of the other deduplication methods such as Nexenta [29]. We were not able to continue the measurements using OpenDedup because of occurring crashes. Perhaps in the future this particular solution is more stable. Multiple datasets Using a multiple of different operating systems for the virtual machines will most likely result in different space usage between the different file systems. The disk performance might be affected because of the different datasets that need to be kept in the deduplication databases. High-end Hardware The hardware used in our setup is far from modern. It might be interesting to pursue the usage of modern and high-end hardware and the impact it has on the results. ZFS user mode A comparison between LessFS, which runs in user mode, and ZFS running in user mode. Although unpractical, it would be better comparable.
A SERVER HARDWARE 22 A Server hardware In the lab setups described in Section 3.1, five machines were used. The hardware specifications of the machines in the setup are below: A.1 Storage server One server acting as a storage backend using data deduplication on the shared storage volume. Brand Dell Model PowerEdge R210 CPU Intel(R) Xeon(R) CPU L3426 @ 1.87GHz Memory 8GB Hard disk Western Digital WD5002ABYS-18B1B0 500GB (x2) NIC Embedded Broadcom 5716 NetXtreme II BCM5716 Gigabit Ethernet (x2) Operating system Ubuntu 11.10 x64 & FreeBSD 9.0-RELEASE #0 Linux kernel 3.0.0-15-generic x86 64 & 9.0-RELEASE #0 A.2 Host server A total of five servers acting as a Xen host for the DomU virtual machines. Brand Dell Model PowerEdge 850 CPU Intel(R) Pentium(R) D CPU @ 3.00GHz Memory 2GB Hard disk Seagate ST3808110AS 80GB (x2) NIC Embedded Broadcom 5716 NetXtreme BCM5721 Gigabit Ethernet(x2) Operating system Debian 6 (Squeeze) x64 Linux kernel 2.6.32-5-xen-amd64 x86 64
A SERVER HARDWARE 23 A.3 Virtual guest server A maximum of 30 virtual guests were booted to perform measurements on. Brand Xen Model Version 4.1 vcpu Intel(R) Pentium(R) D CPU @ 3.00GHz Memory 128MB Hard disk 5GB iscsi mount NIC Xen routed from Dom0 Operating system Debian 6 (Squeeze) x64 Linux kernel 2.6.32-5-xen-amd64 x86 64
REFERENCES 24 References [1] Nagapramod Mandagere, Pin Zhou, Mark A Smith, and Sandeep Uttamchandani. Demystifying data deduplication. In Proceedings of the ACM/IFIP/USENIX Middleware 08 Conference Companion, Companion 08, pages 12 17, New York, NY, USA, 2008. ACM. [2] G. Goth. Virtualization: Old technology offers huge new potential. Distributed Systems Online, IEEE, 8(2):3, feb. 2007. [3] Keren Jin and Ethan L. Miller. The effectiveness of deduplication on virtual machine disk images. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR 09, pages 7:1 7:12, New York, NY, USA, 2009. ACM. [4] Datadomain - in-line deduplication. Website, http://www.datadomain. com/pdf/datadomain-techbrief-inline-deduplication.pdf. [Online; consulted on March 9 2012]. [5] Searchtarget - how data deduplication works. Website, http://searchstorage.techtarget.com/feature/ How-data-deduplication-works. [Online; consulted on March 9 2012]. [6] Dutch T. Meyer and William J. Bolosky. A study of practical deduplication. usenix, 2009. [7] Netapp - official homepage. Website, http://www.netapp.com. [Online; consulted on March 13 2012]. [8] Emc - leading cloud computing, big data, and trusted it solutions. Website, http://www.emc.com. [Online; consulted on March 13 2012]. [9] opendedup.org. Opendedup official website. http://www.opendedup.org, February 2012. [Online; Consulted on February 21, 2012]. [10] Mark Ruijter. lessfs - open source data de-duplication. http://www. lessfs.com, February 2012. [Online; Consulted on February 21, 2012]. [11] s3ql - a full-featured file system for online data storage. Website, http: //code.google.com/p/s3ql/. [Online; consulted on March 6, 2012]. [12] s3fs. s3fs - fuse-based file system backed by amazon s3. http://code. google.com/p/s3fs/, February 2012. [Online; Consulted on February 21, 2012]. [13] Wikipedia.org - zfs. Website, http://en.wikipedia.org/wiki/zfs. [Online; consulted on March 7 2012]. [14] Freebsd wiki - zfs. Website, http://wiki.freebsd.org/zfs. [Online; consulted on March 20 2012].
REFERENCES 25 [15] IOzone. Iozone.org - iozone pdf documentation. http://www.iozone.org/ docs/iozone_msword_98.pdf, March 2012. [Online; Consulted on March 6, 2012]. [16] nixcraft - how to measure linux filesystem i/o performance with iozone. Website, http://www.cyberciti.biz/tips/ linux-filesystem-benchmarking-with-iozone.html. [Online; consulted on March 6, 2012]. [17] iostat linux man page. Website, http://linux.die.net/man/1/iostat. [Online; consulted on March 6, 2012]. [18] hdparm - get/set ata/sata drive parameters under linux. Website, http: //sourceforge.net/projects/hdparm/. [Online; consulted on March 6, 2012]. [19] fio website. Website, http://freshmeat.net/projects/fio/. [Online; consulted on March 6, 2012]. [20] Iperf website. Website, http://sourceforge.net/projects/iperf/. [Online; consulted on March 6, 2012]. [21] Netperf official homepage. Website, http://www.netperf.org/netperf/. [Online; consulted on March 21 2012]. [22] Xen.org - xen official homepage. Website, http://www.xen.org. [Online; consulted on March 20 2012]. [23] Wikipedia.org - sparse files. Website, http://en.wikipedia.org/wiki/ Sparse_files. [Online; consulted on March 30 2012]. [24] Kernel.org - ext4 benchmark. Website, http://kernel.org/doc/ols/ 2007/ols2007v2-pages-21-34.pdf. [Online; consulted on March 22 2012]. [25] Linux benchmark suite homepage. Website, http://lbs.sourceforge. net/. [Online; consulted on March 13 2012]. [26] Bonnie++ - official homepage. Website, http://www.coker.com.au/ bonnie++/. [Online; consulted on March 13 2012]. [27] Quicklz - official website. Website, http://www.quicklz.com. [Online; consulted on March 22 2012]. [28] Lzo vs. lzjb in zfs. Website, http://denisy.dyndns.org/lzo_vs_lzjb/. [Online; consulted on March 23 2012]. [29] Nexenta - enterprise class storage for everyone. Website, http://www. nexenta.com/corp. [Online; consulted on March 9 2012].