Decentralized Deduplication in SAN Cluster File Systems
|
|
- Brent Snow
- 8 years ago
- Views:
Transcription
1 Decentralized Deduplication in SAN Cluster File Systems Austin T. Clements Irfan Ahmad Murali Vilayannur Jinyuan Li VMware, Inc. MIT CSAIL Transcript of talk given on Wednesday, June 17 at USENIX ATEC 09.
2 Storage Area Networks Storage area networks form the backbone of most virtualized data centers today.
3 Storage Area Networks These provide high-performance, direct, block-level, SCSI access to shared disks to multiple hosts simultaneously and this enables us to run high-performance shared file systems such as VMware s VMFS.
4 Storage Area Networks This, in turn, lets us run lots and lots of virtual machines off our shared file system.
5 Storage Area Networks This, unfortunately, causes a problems because now that we ve got lots and lots of virtual machines on our shared file system, we have lots and lots of duplicate data. This is particularly bad if you re running something like, say, a virtual desktop infrastructure where you re serving maybe hundreds of very similar desktop virtual machines out of the same file system. In fact, we measured just how bad this was by taking 113 virtual machines from a live, corporate virtual desktop infrastructure cluster and deduplicating them.
6 Storage Area Networks 1.3 TB We started with 1.3 terabytes of disk data...
7 Storage Area Networks 237 GB 1.3 TB We wound up with 237 gigabytes.
8 Storage Area Networks 237 GB 5X reduction 1.3 TB That s a 5X reduction in space. Clearly there s a problem here.
9 Deduplication So, what are we going to do about it? Well, we re going to deduplicate the file system. We re going to apply the classic trick of using content hashes to identify duplicate data.
10 Deduplication As you can imagine, we can break up our virtual machines into 4 kilobyte blocks and...
11 Deduplication whenever a virtual machine does a write to one of its blocks then the host that s running that virtual machine...
12 Deduplication is going to compute the hash of that block s new content...
13 Deduplication 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. and it s going to go out to some shared index that s stored on the shared disk. It s going to look up that hash in the index and...
14 Deduplication 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. if it finds that it s already in the index, then it s simply going to reuse the block that s already on the disk.
15 Deduplication 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Otherwise, it s going to have to go and allocate space on the disk, write the block to it...
16 Deduplication 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. and update the index to reflect that block s presence. Now, this works great if you ve got a centralized host with exclusive access to the file system, but it turns out it doesn t work at all in our decentralized setting and there are a couple of reasons for this.
17 The Problem Slow Lots of IO in the write path. Can t cache the index. One is that it s just plain slow. We ve now introduced all sorts of new IO into the write path. It used to be that when a virtual machine did a write, it went pretty directly to the underlying disk; there wasn t much translation involved. Now, we ve introduced all of this extra random access IO to this complex on-disk index structure that needs to happen on every single write. Most systems mitigate this problem by aggressively caching the index on the host, but we can t even do that because in our decentralized setting you run into all sorts of terrible cache coherence problems when you try to do that. And, actually...
18 The Problem Slow Lots of IO in the write path. Can t cache the index. Very Slow Writes require allocation and thus coordination. No hope of disk locality. It s worse than that. It s very slow. Every single write that actually makes it to disk requires allocation and with our shared disk we have to coordinate allocation to prevent hosts from allocating the same space at the same time and coordination is expensive. Furthermore, when you re writing all over your disk, you loose all hope of maintaining disk locality.
19 The Problem Slow Lots of IO in the write path. Can t cache the index. Very Slow Writes require allocation and thus coordination. No hope of disk locality. Hopelessly Slow Multi-host lock contention on shared index. And, actually, it s just hopelessly slow. We ve got this shared index structure, which means we need to coordinate all of our access to this index structure, which means that we re going to need locks, which means we re going to have lots and lots of horrible lock contention between all these hosts that are trying to update the index. In our particular environment, that s really, really bad because locking is really, really expensive, which means that all of your writes become really, really expensive. Clearly this is just not going to work.
20 Three-Stage Deduplication Decentralized Deduplication To address these problems, we developed a decentralized deduplication system for SAN cluster file systems
21 Three-Stage Deduplication DeDe Or just DeDe for short.
22 Three-Stage Deduplication DeDe DeDe breaks up the deduplication process into three separate stages: write monitoring, local deduplication, and cross-host deduplication. Breaking the problem up like this gives us all sorts of really nice properties.
23 Three-Stage Deduplication DeDe Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index Resilient to stale index information Unique blocks remain mutable and sequential = No overhead for blocks that don t benefit from deduplication For one, we can do this on live, primary storage. Remember, this is not an archival system; this is a live file system. We can actually do the deduplication process out-of-band, which means that we can basically do our writes now and worry about deduplication later. Because we re doing all of our deduplication out of band anyways, we can just batch it up and do it in really large, periodic batches, which means that these last two stages can happen just every once in a while, and they can take care of all of the possible duplicates that have accumulated in the mean time and process them very efficiently and in a very large batch. Because we re doing this only periodically, that means that we minimize contention on the index, which minimizes the communication between the hosts. Furthermore, it lets us be resilient to stale index information. Our index can actually be wrong. It can be out of date because we re only updating it every once in a while, which means that we can deal with things like hosts failing when they haven t reported writes to the file system and all sorts of other good things.
24 Three-Stage Deduplication DeDe Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index Resilient to stale index information Unique blocks remain mutable and sequential = No overhead for blocks that don t benefit from deduplication We also get the extra neat property that our unique blocks that is, blocks that just have a single reference to them can actually remain mutable we can update them in place when we want to which means that they can also remain sequential on disk. This has the nice implication that precisely the blocks that are not benefiting from deduplication anyways the unique blocks actually have no additional IO overhead because they remain mutable and sequential.
25 Three-Stage Deduplication DeDe These three stages form a pipeline.
26 Three-Stage Deduplication DeDe Every single host in the cluster runs all three of these stages independently.
27 Three-Stage Deduplication DeDe The stages communicate with each other both within the hosts and between the hosts via regular files stored in the shared file system itself.
28 Three-Stage Deduplication DeDe In the first stage, write monitoring, every host independently monitors all of its writes to the file system. It computes the hashes of the blocks as they go out to disk and buffers them up in memory and...
29 Three-Stage Deduplication DeDe f4bea9.. f2a4d2.. 0e7a ba2b.. d5e341.. bc ab ee b1f28.. 9b575b.. 94c9c bc7.. periodically flushes them out to on-disk, per-file write logs. Note that there s no coordination going on here; there s no contention between the hosts either because they re all monitoring their own writes and their all modifying separate files that they all individually have locked. Now, when a given host s write logs grow to be large enough; that is, it s done enough modifications to the file system since it last deduplicated its changes, then...
30 Three-Stage Deduplication DeDe f4bea9.. f2a4d2.. 0e7a ba2b.. d5e341.. bc ab ee b1f28.. 9b575b.. 94c9c bc7.. it fires up the local deduplication stage.
31 Three-Stage Deduplication DeDe f4bea9.. f2a4d2.. 0e7a ba2b.. d5e341.. bc ab ee b1f28.. 9b575b.. 94c9c bc7.. 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. This first goes out and locks the shared, on-disk index of all of the blocks in the file system.
32 Three-Stage Deduplication DeDe f4bea9.. f2a4d2.. 0e7a ba2b.. d5e341.. bc ab ee b1f28.. 9b575b.. 94c9c bc7.. 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Then it picks up all the information from the write logs that is, all the hashes of the recently modified blocks and...
33 Three-Stage Deduplication DeDe ab ee b1f28.. 9b575b.. 94c9c bc7.. 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. it updates the index to reflect all these recent modifications to the file system. In the process, if it finds that some recently modified block now has a hash that s already in the index, that means that it s found a duplicate...
34 Three-Stage Deduplication DeDe ab ee b1f28.. 9b575b.. 94c9c bc7.. 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. and it can eliminate that. Likewise, when this host is done performing local deduplication, it gives up its lock on the index and...
35 Three-Stage Deduplication DeDe ab ee b1f28.. 9b575b.. 94c9c bc7.. 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. some other host can come along and lock the index and consume its own write logs and remove any duplicates in files that it has exclusive access to.
36 Three-Stage Deduplication DeDe 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Of course, there s a catch to this. Local deduplication can only modify the metadata of files that the host running the stage has access to; that it has an exclusive lock on.
37 Three-Stage Deduplication DeDe 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. This means that, when we first find two duplicate blocks this is the first time that we identify two blocks with the same content they might be in files that are locked by two different hosts. This is a decentralized setting; we can t just go mucking around in the metadata of files that we don t have locks on, which means that we need some other approach.
38 Three-Stage Deduplication DeDe? 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9..??? When this does happen, the local deduplication stage produces a merge request. These are stored, like the write logs, along with each individual file in the file system. It basically says, I ve got a block, you ve got a block; I think they re the same. You deal with it.
39 Three-Stage Deduplication DeDe? 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9..??? And then, later on, each host will periodically run a cross-host deduplication stage, which picks up all the merge requests for the files that it has exclusive access to, verifies that the blocks that claim to be duplicates are actually duplicates and deduplicates them.
40 Three-Stage Deduplication DeDe? 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9..??? When we put all three of these stages together, we can deduplicate the entire shared file system with very little coordination between the hosts. In fact, the only time that they really communicate at all is with these merge requests at the very end, and every that is completely asynchronous.
41 Implementation We actually built this system. It runs on top of VMware ESX Server.
42 Implementation We use VMFS as our cluster file system. We had to make a few minor modifications to it, which I ll describe in a moment.
43 Implementation On top of VMFS, we run a write monitoring kernel module.
44 Implementation 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. We also have our shared on-disk index structure.
45 Implementation 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. And, finally, we have a deduplication daemon that runs on top of the whole thing and coordinates all of these stages.
46 VMFS Compare-and-share VMFS already supports copy-on-write blocks, but it doesn t present the interface that we need for deduplication, so we added a new compare-and-share operation to it. The basic idea here is that we provide it with two files that we have exclusively locked; that we have open. We point to the two blocks in those files and we tell the file system, We think these are duplicates; go deal with it.
47 VMFS Compare-and-share? The file system is responsible then for atomically comparing the content of these two blocks and, if they are in fact identical...
48 VMFS Compare-and-share updating the block pointers in these two files to point to one block, sharing it copy-on-write, and eliminating the block that we no longer need.
49 VMFS Compare-and-share DeDe finds duplicates. VMFS eliminates them. This operation means that it s the responsibility of DeDe to find potential duplicates across the entire shared file system and then it hands them off to VMFS. It s VMFS responsibility to actually eliminate the blocks. This divides up the process very nicely. Now we ve got a way to eliminate blocks, but we need their hashes in order to find them.
50 Write Monitor That s the goal of the write monitor.
51 Write Monitor A lightweight kernel module monitors writes, computes hashes The write monitor s just a very lightweight kernel module that sits inbetween the virtual machines and VMFS. It monitors every write to the file system and it computes the hashes of these blocks as they re going out to disk.
52 Write Monitor A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk It buffers up these hashes in memory...
53 Write Monitor A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk and periodically flushes them out to disk, separating them out into separate write logs for each file.
54 Write Monitor A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient It s safe to buffer these up precisely because our index is resilient to stale information. It s actually alright if this host just crashes and looses all of its buffer or if the host is just so overloaded that it has to shed work and so it can t compute these hashes. That s okay. We ll miss deduplication opportunities, but the system will still be correct.
55 Write Monitor A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient 150 MB of regular writes 1 MB sequential log write Because we re buffering, we also have the nice property that this involve very little I/O. We can actually absorb about 150 megabytes worth of writes to regular files in the file system before we have to do even one megabyte sequential write to the logs, so this actually introduces very little I/O overhead. Now we ve got a way to gather the hashes of the blocks as we modify them. We need a way to find blocks with identical hashes.
56 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Map from hashes to block locators, list sorted by hash That s what the index is for. This is just a map from content hashes to block locations, just like you d expect. It s actually stored on the shared file system as just a simple sequential list, sorted by hash. This sounds overly simple, but because we re always updating it in these really large batches and these updates are scattered all over the index, it turns out this is actually substantially more efficient that using a much more complicated random access index structure.
57 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash As I mentioned earlier, we actually distinguish between unique blocks and shared blocks so that we can keep unique blocks mutable and this fact is reflected in the index. Each index entry indicates whether it s a unique block or a shared block and we point to these two different types of blocks differently.
58 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable For unique blocks, we simply store the i-number of some existing file and the offset of the block within the file.
59 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable This is, of course, backed by some actual block stored somewhere on the disk, but its entirely up to the file system to resolve this block offset into a block address on the disk, just like it would for any regular file system access.
60 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable Shared blocks are somewhat more complicated. We can t just use the block address of the block on disk because that actually causes all sorts of really hairy problems. And we don t really want to point to all of the files that contain the block because those could change at any time without us knowing, which means that we d have to go hunting for the shared block whenever we needed it.
61 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks So, instead, we introduce a new file. We call it a virtual arena. This virtual arena s just a regular file, just like any other file in the file system, except that...
62 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks it consists exclusively of copy-on-write references to blocks that already exist in other files. Thus, in some sense, it contains all of the shared data in the file system, without actually consuming any space other than just the file system metadata that it needs.
63 The Index 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks Now, whenever we need to refer to a shared block, it s actually pretty simple. We just refer to it by its offset in the virtual arena. Again, the file system is responsible for resolving that offset into an actual block address. We never use the block addresses themselves. So, now we ve got a way to gather the hashes of the blocks that we ve modified. We ve got a way to locate identical blocks.
64 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Now we need a way to actually combine this all together and perform deduplication.
65 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared Here we have some host that s running some virtual machine and that means...
66 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared that it has an exclusive lock on that virtual machine s files. At some point it s going to decide that it s made enough modifications to the file system and it s time to go hence forth and deduplicate.
67 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. 6cd c.. c277d6.. Unique Shared At that point it picks up the write logs for the files that it has open for this particular virtual machine which contain the hashes of all recently modified blocks.
68 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea c.. 6cd412.. c277d6.. Unique Shared It sorts all those entries by hash. Now it s going to simply walk down the index structure, walk down this sorted list of updates, and merge the two structures together.
69 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea c.. 6cd412.. c277d6.. Unique Shared So we start walking down and we see that our first hash occurs right at the beginning. And, actually, it s not in the index yet, which means that it s a new, unique block. There s only a single reference to it.
70 Indexing and Duplicate Elimination 0e7a ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea c.. 6cd412.. c277d6.. Unique Shared So we make some space in the index for it...
71 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. 6cd412.. c277d6.. Unique Shared slip it in...
72 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. 6cd412.. c277d6.. Unique Shared and add a pointer to the block in our file. That s all that we have to do. Just that single record append to the index. We don t have to modify any metadata or anything. Furthermore, this block remains mutable; we can update this block in place. However, that also means that our unique index entries are just hints. This block can be modified at any time. The host doesn t have to update the index to reflect the fact that it has a new hash, which means that every time that we use one of these entries, we have to verify that it s actually correct. This is how our index is resilient to stale information.
73 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. 6cd412.. c277d6.. Unique Shared Now we can continue moving down the index. We see that our next hash fits in here and it s already in the index. In fact, it s in the index as a unique block, which means that this is our first duplicate of this block.
74 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. c277d6.. Unique Shared Because it s a unique block in the index, that means that it just points to some existing file. The problem is that that file might be locked by some other host, which means that, again, we can t go mucking around with its metadata.
75 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. c277d6.. Unique Shared However, we know that these two blocks are duplicates, which means that we have to do something about it. Now, we do know that we ve got an exclusive lock on the file we re currently deduplicating, which means that we re free to muck around with its metadata.
76 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. c277d6.. Unique Shared We can merge its block into the virtual arena, sharing it copy-on-write.
77 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. c277d6..? Unique Shared Then we can update the index to point to the virtual arena. Now we can post a merge request, telling the other host, When you get around to it, I think I ve got a duplicate block for you. Check it out and deal with it yourself. At some point in the near future, presumably, that second host is going to check for any merge requests for file that it has exclusive locks on...
78 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. c277d6.. Unique Shared pick them up, verify that the block is, in fact, still a duplicate; remember that it s unique, so it could change. If it is still a duplicate, it will rewrite the pointer to point to the shared block. We re still scanning this index, so let s go back to that.
79 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. c277d6.. Unique Shared We keep moving down the index and we see that our last hash fits in here. Again, it s in the index already, but it s a shared block. This makes things a lot easier.
80 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared We ve already got it in the virtual arena. We already know that it s copy-on-write.
81 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared We just have to take the current block that we ve got and...
82 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared update the metadata of that file to point to the existing shared block.
83 Indexing and Duplicate Elimination 0e7a c.. 15ba2b.. 6cd412.. ab bc c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared And we re done. And look! I ve just saved a bunch of space on my storage array.
84 Phew. Okay. So what does this all get us?
85 Evaluation There are a couple of important questions we have to ask.
86 Evaluation How much space does DeDe save? One is: How much space does this actually save us?
87 Evaluation How much space does DeDe save? How much overhead does DeDe introduce? Also, how much overhead does all this complexity that we just introduced add?
88 Evaluation How much space does DeDe save? How much overhead does DeDe introduce? How fast can DeDe deduplicate? And, finally, how fast can we actually deduplicate? We ve got to be able to keep up with the file system or we re going to start missing deduplication opportunities.
89 Space Savings: VDI Cluster Corporate Virtual Desktop Infrastructure cluster Desktop XP VM s 6 12 months of active use Originally cloned from small number of base images... In order to measure the space savings of this we took, as I said, a corporate virtual desktop infrastructure. This consisted of lots of XP desktop virtual machines that have been in active use for six to twelve months, which means that they actually have quite a bit of data in them. However, they were all originally cloned from a small set of original base images, which means that there s still a lot of similarity between them. We believe that this is actually pretty representative of how people usually run clusters like this.
90 Space Savings: VDI Cluster Corporate Virtual Desktop Infrastructure cluster Desktop XP VM s 6 12 months of active use Originally cloned from small number of base images 1.9 TB 113 VM s... In total, we gathered up 113 virtual machines, which was 1.9 terabytes of disk data. We treat zero blocks specially because there are just so darn many of them...
91 Space Savings: VDI Cluster Corporate Virtual Desktop Infrastructure cluster Desktop XP VM s 6 12 months of active use Originally cloned from small number of base images 1.9 TB 1.3 TB 113 VM s... which means that we re actually interested in deduplicating just 1.3 terabytes of data.
92 Space Savings: VDI Cluster 1.3 TB I ve already kind of spoiled the punch line, but for anybody who wasn t paying attention, we started with this 1.3 terabytes of data...
93 Space Savings: VDI Cluster 237 GB 1.3 TB and reduced it to 237 gigabytes of data. But now we can look at what s actually left.
94 Space Savings: VDI Cluster 237 GB 1.3 TB 173 GB unique 61 GB shared If we zoom in on this, we see that only 61 gigabytes of this is actually in shared data. That means that all of that space we just eliminated was just repetitions of those 61 gigabytes of data. In fact, some of these blocks are repeated as many as 200,000 times. That s 800 megabytes wasted on a single 4 kilobyte block. It s ridiculous. And then we have 137 (sic, 173) gigabytes of unique data left over. Now, you re probably wondering... we ve added this index and we ve added this virtual arena and all of this extra stuff. What s the overhead? How much space did we add to this? Did we actually save any space in the end? Actually, the overhead is already on the slide, it was just a little too small for me to label.
95 Space Savings: VDI Cluster 237 GB 1.3 TB 2.7 GB 173 GB unique 61 GB shared It s this 2.7 gigabyte blip over here.
96 Space Savings: VDI Cluster 237 GB 1.3 TB 2.7 GB 173 GB unique 61 GB shared 1.3 GB index file 194 MB v. arena 1.1 GB FS metadata If we zoom in on that, we see that 1.3 gigs of that is the index file. That s all that it takes to index all of the blocks in the file system. 194 megs of that is the virtual arena because it needs the file system metadata to point to all of these blocks. And 1.1 gigs of that is additional, new file system metadata because we had to break all of our blocks into 4 kilobyte blocks. You can see the paper for more details on that. So, in terms of space savings, this is actually really great and the space overhead is very small. What about the runtime overhead?
97 Runtime Effects Write monitoring Disk array caching We very intentionally designed DeDe to do as little as possible in-line precisely to reduce the runtime overhead of it. But we still have to worry about the write monitor, which happens in-line. Also, we have other runtime effects that happen from things like disk array caching. We did a number of benchmarks to measure the runtime overheads.
98 Runtime Effects Write monitoring Disk array caching We did all of these from within a virtual machine so this is all VM I/O just like everything else running on top of a host that was...
99 Runtime Effects Write monitoring Disk array caching EMC CLARiiON CX3-40 attached to a gigantic EMC CLARiiON CX3-40 storage area network.
100 Runtime Overhead: Write Monitoring Worst-case benchmark: 100% sequential write IO, No computation To measure the overhead of the write monitor, we did an absolute worst-case benchmark. This is running inside a virtual machine, performing 100% purely sequential write I/O to the virtual disk with as little computation as we could possibly muster. Absolute worst-case for the write monitor.
101 Runtime Overhead: Write Monitoring Worst-case benchmark: 100% sequential write IO, No computation Baseline Write Monitor CPU 33% 220% And, actually, it s not so great. We do consume a lot of CPU computing hashes. The good news is that this is a totally unrealistic workload. Any realistic workload will actually be performing writes (sic, reads), which don t have any overhead. It ll be performing computation, which means that the gap will actually be much smaller. So this is really worst case.
102 Runtime Overhead: Write Monitoring Worst-case benchmark: 100% sequential write IO, No computation Baseline Write Monitor CPU 33% 220% Bandwidth (MB/s) Latency (ms) The other good news is that we actually didn t have any impact on I/O. Whether or not we re running the write monitor, we re still getting 233 megs per second bandwidth to the disk and 8.6 milliseconds of latency.
103 Runtime Overhead: Write Monitoring Worst-case benchmark: 100% sequential write IO, No computation Baseline Write Monitor CPU 33% 220% Bandwidth (MB/s) Latency (ms) (See paper for database application benchmark) And, again, the overheads that we do have here are much less for a realistic application. You can see the paper for our analysis of a database application benchmark that we ran with the write monitor.
104 Runtime Gains: Disk Array Caching Reduced storage footprint Better caching Less IO On the flip side, we can actually improve runtime performance by better disk array caching. The basic idea here is that we ve just reduced, substantially, our storage footprint, which means that we get much better cache locality, which means that we have to go out to the disk platters much less often.
105 Runtime Gains: Disk Array Caching Reduced storage footprint Better caching Less IO Average boot time (secs) VDI "boot storm" # VMs booting concurrently To measure this, we ran a VDI boot storm benchmark, which is where you run between one and twenty identical virtual machines, boot them all up at the same time, and you see how long they take to boot on average.
106 Runtime Gains: Disk Array Caching Reduced storage footprint Better caching Less IO Average boot time (secs) VDI "boot storm" Full copies # VMs booting concurrently When we do this with fully copied virtual machines, you see, more or less, what you d expect. This is a very I/O-bound workload, which means that when you re running 20 virtual machines, you re doing 20 times as much I/O, which means it takes substantially longer to boot all of this up than one virtual machine.
107 Runtime Gains: Disk Array Caching Reduced storage footprint Better caching Less IO Average boot time (secs) VDI "boot storm" Full copies Deduplicated # VMs booting concurrently If we run deduplicated virtual machines, we see that the line is virtually flat. That s because all the way across the board, whether you re booting one VM or twenty VM s, you re doing basically one VM s worth of I/O. So, for particular workloads, such as VDI boot storms, this is actually really great. We can really improve the performance.
108 Out-of-band Deduplication Rate Index scan COW sharing The other thing that we have to worry about is the out-of-band deduplication rate. We have to be able to keep up with the file system if we expect to be able to deduplicate the file system. Remember that, because we re resilient to stale index entries, we can actually shed load if we really need to, but, it s best if we don t. There are two steps to the deduplication process that we really have to worry about the overhead of. One of them is scanning the index and the other is, when we actually find a duplicate block, sharing it copy-on-write.
109 Out-of-band Deduplication Rate Index scan COW sharing 6.6 GB/sec Virtually no cost for unique blocks For the index scan process, we can process 6.6 gigabytes worth of blocks per second because this is just a purely sequential scan through the index. It s very fast. The nice thing about this is that this is the only thing we have to do for unique blocks. This means that, if you have new or modified unique blocks, we can deal with 6.6 gigabytes worth of those blocks per second. Virtually no overhead for unique blocks.
110 Out-of-band Deduplication Rate Index scan 6.6 GB/sec COW sharing 2.6 MB/sec Virtually no cost for unique blocks COW sharing, on the other hand, is not quite so fast. That we can only do at 2.6 megabytes per second, so this is clearly our bottleneck.
111 Out-of-band Deduplication Rate Index scan 6.6 GB/sec COW sharing 2.6 MB/sec (It s a prototype!) Virtually no cost for unique blocks But, I ll remind you that this is just a prototype. We can substantially improve this number and we already know a number of ways we could do that. Think of this as a lower bound.
112 Out-of-band Deduplication Rate Index scan 6.6 GB/sec COW sharing 2.6 MB/sec (It s a prototype!) Virtually no cost for unique blocks 9 GB of new shared blocks per hour (And provisioning can be special-cased) However, even if we can only share 2.6 megs per second, that still means that we can deal with 9 gigabytes of newly shared blocks per hour. In something like a virtual desktop infrastructure, which is really what we re targeting, that s probably sufficient, actually, because you don t get new shared blocks all that often. The most common source of lots and lots of duplicate blocks in very short time is provisioning operations and those we can special case and do very quickly. So, to wrap all of this up, deduplication is very effective at reducing space. It has some overheads, although for some workloads it actually improves performance. The deduplication rate is acceptable, we believe even though it s just a prototype for realistic workloads.
113 Related Work Centralized archival Venti Data Domain Foundation Centralized primary storage NetApp ASIS Microsoft Single Instance Store Distributed Farsite SAN with Coordinator DDE There s quite a bit of related work out there. Deduplication s been around for a long time. Although most of it is actually archival systems: secondary, backup systems. Other than the idea of using content hashes to identify duplicate blocks and indexes, live file systems are really a different beast. More recently, a lot of work has appeared dealing with live file systems; however, it all requires either a central host and/or exclusive access to the file system, whereas, on the other hand...
114 Conclusion Decentralized, out-of-band, live file system deduplication. DeDe is a completely decentralized system. It performs all deduplication out of band for performance, and it operates on a live, decentralized file system.
115 Conclusion Decentralized, out-of-band, live file system deduplication. Deduplication is effective. In conclusion, deduplication is really effective, especially in environments like this.
116 Conclusion Decentralized, out-of-band, live file system deduplication. Deduplication is effective. Deduplication is hard. Unfortunately, in environments like this, deduplication is also really hard.
117 Conclusion Decentralized, out-of-band, live file system deduplication. Deduplication is effective. Deduplication is hard. Three-stage deduplication has only modest performance overhead. However, using this three-stage approach where we split out write monitoring from local deduplication and from cross-host deduplication, we can actually perform deduplication in this difficult environment, with only modest performance overhead.
118 Conclusion Decentralized, out-of-band, live file system deduplication. Deduplication is effective. Deduplication is hard. Three-stage deduplication has only modest performance overhead. Thank you. Thank you, and I ll take any questions now.
Decentralized Deduplication in SAN Cluster File Systems
Decentralized Deduplication in SAN Cluster File Systems Austin T. Clements Irfan Ahmad Murali Vilayannur Jinyuan Li VMware, Inc. MIT CSAIL Storage Area Networks Storage Area Networks Storage Area Networks
More informationDecentralized Deduplication in SAN Cluster File Systems
Decentralized Deduplication in SAN Cluster File Systems Austin T. Clements Irfan Ahmad Murali Vilayannur Jinyuan Li VMware, Inc. MIT CSAIL Abstract File systems hosting virtual machines typically contain
More informationNimble Storage for VMware View VDI
BEST PRACTICES GUIDE Nimble Storage for VMware View VDI N I M B L E B E S T P R A C T I C E S G U I D E : N I M B L E S T O R A G E F O R V M W A R E V I E W V D I 1 Overview Virtualization is an important
More informationUsing VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Applied Technology Abstract By migrating VMware virtual machines from one physical environment to another, VMware VMotion can
More informationDeploying and Optimizing SQL Server for Virtual Machines
Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Much has been written over the years regarding best practices for deploying Microsoft SQL
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationVirtual server management: Top tips on managing storage in virtual server environments
Tutorial Virtual server management: Top tips on managing storage in virtual server environments Sponsored By: Top five tips for managing storage in a virtual server environment By Eric Siebert, Contributor
More informationGoogle File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationWhitepaper: performance of SqlBulkCopy
We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis
More informationWHITE PAPER 1 WWW.FUSIONIO.COM
1 WWW.FUSIONIO.COM WHITE PAPER WHITE PAPER Executive Summary Fusion iovdi is the first desktop- aware solution to virtual desktop infrastructure. Its software- defined approach uniquely combines the economics
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationOptimize VDI with Server-Side Storage Acceleration
WHITE PAPER Optimize VDI with Server-Side Storage Acceleration Eliminate Storage Bottlenecks for Fast, Reliable Virtual Desktop Performance 1 Virtual Desktop Infrastructures (VDI) give users easy access
More informationNimble Storage VDI Solution for VMware Horizon (with View)
BEST PRACTICES GUIDE Nimble Storage VDI Solution for VMware Horizon (with View) B E S T P R A C T I C E S G U I D E : N I M B L E S T O R A G E V D I F O R V M W A R E H O R I Z O N ( w i t h V I E W )
More informationPerformance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
More informationDeduplication in VM Environments Frank Bellosa <bellosa@kit.edu> Konrad Miller <miller@kit.edu> Marc Rittinghaus <rittinghaus@kit.
Deduplication in VM Environments Frank Bellosa Konrad Miller Marc Rittinghaus KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) - SYSTEM ARCHITECTURE GROUP
More informationDIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
More informationMoving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage
Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes
More informationWHITE PAPER Optimizing Virtual Platform Disk Performance
WHITE PAPER Optimizing Virtual Platform Disk Performance Think Faster. Visit us at Condusiv.com Optimizing Virtual Platform Disk Performance 1 The intensified demand for IT network efficiency and lower
More informationSimplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction
Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction There are tectonic changes to storage technology that the IT industry hasn t seen for many years. Storage has been
More informationStarWind iscsi SAN Software Hands- On Review
StarWind iscsi SAN Software Hands- On Review Luca Dell'Oca April 2011 I ve always been fascinated by those software that let you transform a computer into a SAN appliance. The available uses of this kind
More informationCOS 318: Operating Systems
COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache
More informationFlash Storage Roles & Opportunities. L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P.
Flash Storage Roles & Opportunities L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P. Overview & Background Flash Storage history First smartphones, then laptops, then onboard
More informationBackup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v.
Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Existing backup methods Most companies see an explosive growth in the amount of data that they have
More informationStorage Sizing Issue of VDI System
, pp.89-94 http://dx.doi.org/10.14257/astl.2014.49.19 Storage Sizing Issue of VDI System Jeongsook Park 1, Cheiyol Kim 1, Youngchang Kim 1, Youngcheol Kim 1, Sangmin Lee 1 and Youngkyun Kim 1 1 Electronics
More informationMoving Virtual Storage to the Cloud
Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage
More informationTop Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
More informationIncreasing Storage Performance, Reducing Cost and Simplifying Management for VDI Deployments
Increasing Storage Performance, Reducing Cost and Simplifying Management for VDI Deployments Table of Contents Introduction.......................................3 Benefits of VDI.....................................4
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationThe Power of Deduplication-Enabled Per-VM Data Protection SimpliVity s OmniCube Aligns VM and Data Management
The Power of Deduplication-Enabled Per-VM Data Protection SimpliVity s OmniCube Aligns VM and Data Management Prepared for SimpliVity Contents The Bottom Line 1 Introduction 2 Per LUN Problems 2 Can t
More informationEverything you need to know about flash storage performance
Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices
More informationWHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation
WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:
More informationMass Storage Structure
Mass Storage Structure 12 CHAPTER Practice Exercises 12.1 The accelerating seek described in Exercise 12.3 is typical of hard-disk drives. By contrast, floppy disks (and many hard disks manufactured before
More informationDemystifying Deduplication for Backup with the Dell DR4000
Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett
More informationFile System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System
CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID
More informationGRIDCENTRIC VMS TECHNOLOGY VDI PERFORMANCE STUDY
GRIDCENTRIC VMS TECHNOLOGY VDI PERFORMANCE STUDY TECHNICAL WHITE PAPER MAY 1 ST, 2012 GRIDCENTRIC S VIRTUAL MEMORY STREAMING (VMS) TECHNOLOGY SIGNIFICANTLY IMPROVES THE COST OF THE CLASSIC VIRTUAL MACHINE
More informationCreating and Managing Shared Folders
Creating and Managing Shared Folders Microsoft threw all sorts of new services, features, and functions into Windows 2000 Server, but at the heart of it all was still the requirement to be a good file
More informationMaximizing VMware ESX Performance Through Defragmentation of Guest Systems. Presented by
Maximizing VMware ESX Performance Through Defragmentation of Guest Systems Presented by July, 2010 Table of Contents EXECUTIVE OVERVIEW 3 TEST EQUIPMENT AND METHODS 4 TESTING OVERVIEW 5 Fragmentation in
More informationBest Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure
Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure Q1 2012 Maximizing Revenue per Server with Parallels Containers for Linux www.parallels.com Table of Contents Overview... 3
More informationLow-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized
More informationFile Management. Chapter 12
Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution
More informationNetApp Data Compression and Deduplication Deployment and Implementation Guide
Technical Report NetApp Data Compression and Deduplication Deployment and Implementation Guide Clustered Data ONTAP Sandra Moulton, NetApp April 2013 TR-3966 Abstract This technical report focuses on clustered
More informationEnterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011
Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011 Executive Summary Large enterprise Hyper-V deployments with a large number
More informationThe What, Why and How of the Pure Storage Enterprise Flash Array
The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) What is an enterprise storage array? Enterprise storage array: store data blocks
More informationWe make desktop virtualization simple for everyone.
CUSTOMER CASE STUDY We make desktop virtualization simple for everyone. Unlike a lot of the other vendors that require us to fork out a huge amount up front, SolidFire allows us to grow node by node. We
More informationVirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
More information09'Linux Plumbers Conference
09'Linux Plumbers Conference Data de duplication Mingming Cao IBM Linux Technology Center cmm@us.ibm.com 2009 09 25 Current storage challenges Our world is facing data explosion. Data is growing in a amazing
More informationData Backup and Archiving with Enterprise Storage Systems
Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,
More informationNEW White Paper Expert Guide on Backing up Windows Server in Hyper-V
NEW White Paper Expert Guide on Backing up Windows Server in Hyper-V by John Savill, Microsoft MVP John Savill Microsoft MVP John Savill is a Windows technical specialist, an 11-time MVP, an MCITP: Enterprise
More informationStorage Considerations for VMware View BEST PRACTICES
Storage Considerations for VMware View BEST PRACTICES Table of Contents Introduction... 3 Windows Disk I/O Workloads.... 3 Base Image Considerations.... 4 Storage Protocol Choices.... 6 Storage Technology
More informationCloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
More information1 Storage Devices Summary
Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious
More informationHIGHLY AVAILABLE MULTI-DATA CENTER WINDOWS SERVER SOLUTIONS USING EMC VPLEX METRO AND SANBOLIC MELIO 2010
White Paper HIGHLY AVAILABLE MULTI-DATA CENTER WINDOWS SERVER SOLUTIONS USING EMC VPLEX METRO AND SANBOLIC MELIO 2010 Abstract This white paper demonstrates key functionality demonstrated in a lab environment
More informationMike: Alright welcome to episode three of Server Talk, I m here with Alexey. I m Mike. Alexey, how are things been going, man?
Mike: Alright welcome to episode three of Server Talk, I m here with Alexey. I m Mike. Alexey, how are things been going, man? Alexey: They re doing pretty good. Yeah, I don t know, we ve launched two
More informationCOS 318: Operating Systems. Virtual Memory and Address Translation
COS 318: Operating Systems Virtual Memory and Address Translation Today s Topics Midterm Results Virtual Memory Virtualization Protection Address Translation Base and bound Segmentation Paging Translation
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
More informationCobian9 Backup Program - Amanita
The problem with backup software Cobian9 Backup Program - Amanita Due to the quixotic nature of Windows computers, viruses and possibility of hardware failure many programs are available for backing up
More informationIntroduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.
Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationDeduplication has been around for several
Demystifying Deduplication By Joe Colucci Kay Benaroch Deduplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding
More informationIOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org
IOmark- VDI Nimbus Data Gemini Test Report: VDI- 130906- a Test Copyright 2010-2013 Evaluator Group, Inc. All rights reserved. IOmark- VDI, IOmark- VDI, VDI- IOmark, and IOmark are trademarks of Evaluator
More informationHow To Fix A Fault Fault Fault Management In A Vsphere 5 Vsphe5 Vsphee5 V2.5.5 (Vmfs) Vspheron 5 (Vsphere5) (Vmf5) V
VMware Storage Best Practices Patrick Carmichael Escalation Engineer, Global Support Services. 2011 VMware Inc. All rights reserved Theme Just because you COULD, doesn t mean you SHOULD. Lessons learned
More informationRevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong,
More informationALL-FLASH STORAGE ARRAY. A Hyper-Converged Infrastructure for High I/O Applications and Virtual Desktops
ALL-FLASH STORAGE ARRAY A Hyper-Converged Infrastructure for High I/O Applications and Virtual Desktops VDI through Lumenate Unpredictable workloads and changing demands on storage infrastructures are
More informationMerge Healthcare Virtualization
Merge Healthcare Virtualization CUSTOMER Merge Healthcare 900 Walnut Ridge Drive Hartland, WI 53029 USA 2014 Merge Healthcare. The information contained herein is confidential and is the sole property
More informationMS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
More information(Formerly Double-Take Backup)
(Formerly Double-Take Backup) An up-to-the-minute copy of branch office data and applications can keep a bad day from getting worse. Double-Take RecoverNow for Windows (formerly known as Double-Take Backup)
More informationSQL Server Virtualization
The Essential Guide to SQL Server Virtualization S p o n s o r e d b y Virtualization in the Enterprise Today most organizations understand the importance of implementing virtualization. Virtualization
More informationFlash for Databases. September 22, 2015 Peter Zaitsev Percona
Flash for Databases September 22, 2015 Peter Zaitsev Percona In this Presentation Flash technology overview Review some of the available technology What does this mean for databases? Specific opportunities
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationHow to Use New Relic Custom Dashboards & Why You d Want To
TUTORIAL How to Use New Relic Custom Dashboards & Why You d Want To by Alan Skorkin Contents Introduction 3 Why Use Custom Dashboards at All? 4 Creating an Overview Dashboard from Existing Charts 4 Creating
More informationWITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
More informationNEDARC POSITION PAPER
Which Database Will Serve Your Needs? National EMSC Data Analysis Resource Center Central to any EMS, public health, or large healthcare organization is the collection, storage, retrieval, and analysis
More informationChapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space Management RAID Structure
More informationOverview of RD Virtualization Host
RD Virtualization Host Page 1 Overview of RD Virtualization Host Remote Desktop Virtualization Host (RD Virtualization Host) is a Remote Desktop Services role service included with Windows Server 2008
More informationAccelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer
Accelerating Applications and File Systems with Solid State Storage Jacob Farmer, Cambridge Computer SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise
More informationEMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
More informationConfiguring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture
More informationWindows Defragmenter: Not Good Enough
Technical Advisory Letter Windows Defragmenter: Not Good Enough By Gary Quan, SVP Product Strategy Think Faster. Visit us at Condusiv.com This technical advisory letter addresses some users belief that
More informationFlashSoft Software from SanDisk : Accelerating Virtual Infrastructures
Technology Insight Paper FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures By Leah Schoeb January 16, 2013 FlashSoft Software from SanDisk: Accelerating Virtual Infrastructures 1 FlashSoft
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationJenesis Software - Podcast Episode 3
Jenesis Software - Podcast Episode 3 Welcome to Episode 3. This is Benny speaking, and I'm with- Eddie. Chuck. Today we'll be addressing system requirements. We will also be talking about some monitor
More informationVirtualisa)on* and SAN Basics for DBAs. *See, I used the S instead of the zed. I m pretty smart for a foreigner.
Virtualisa)on* and SAN Basics for DBAs *See, I used the S instead of the zed. I m pretty smart for a foreigner. Brent Ozar - @BrentO BrentOzar.com/go/san BrentOzar.com/go/virtual Today s Agenda! How Virtualisa7on
More informationGetting the Most Out of VMware Mirage with Hitachi Unified Storage and Hitachi NAS Platform WHITE PAPER
Getting the Most Out of VMware Mirage with Hitachi Unified Storage and Hitachi NAS Platform WHITE PAPER Getting the Most Out of VMware Mirage with Hitachi Unified Storage and Hitachi NAS Platform The benefits
More informationComparing Virtualization Technologies
CHAPTER 2 Comparing Virtualization Technologies With this chapter, we begin our exploration of several popular virtualization strategies and explain how each works. The aim is to bring you the operational
More informationEMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY
White Paper EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY A Detailed Review Abstract No single mechanism is sufficient to ensure data integrity in a storage
More informationUsing VMWare VAAI for storage integration with Infortrend EonStor DS G7i
Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i Application Note Abstract: This document describes how VMware s vsphere Storage APIs (VAAI) can be integrated and used for accelerating
More informationImprove Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationData recovery Data management Electronic Evidence
Data recovery Data management Electronic Evidence 2 RAID, SAN and Virtual systems RAID, SAN and Virtual systems Data Loss Scenarios: RAID Recovery drive failure Deleted VM recovery Reinstall of ESX on
More informationEnsuring Data Availability for the Enterprise with Storage Replica
S T O R A G E Ensuring Data Availability for the Enterprise with Storage Replica A Q&A with Microsoft s Senior Program Manager, Ned Pyle Data is the lifeblood of any company, which is a key reason why
More informationFAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
More informationEMC XtremSF: Delivering Next Generation Performance for Oracle Database
White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing
More informationThe New Rules for Choosing Physical Appliances
content provided by sponsored by The New Rules for Choosing Physical Appliances The increased presence of virtualization in organizations hasn t lessened the need for physical backup appliances. Follow
More informationMicrosoft Hyper-V chose a Primary Server Virtualization Platform
Roger Shupert, Integration Specialist } Lake Michigan College has been using Microsoft Hyper-V as it s primary server virtualization platform since 2008, in this presentation we will discuss the following;
More informationFull and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
More informationOptimized Storage Solution for Enterprise Scale Hyper-V Deployments
Optimized Storage Solution for Enterprise Scale Hyper-V Deployments End-to-End Storage Solution Enabled by Sanbolic Melio FS and LaScala Software and EMC SAN Solutions Proof of Concept Published: March
More informationVIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS
VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS Successfully configure all solution components Use VMS at the required bandwidth for NAS storage Meet the bandwidth demands of a 2,200
More information