An In-Depth Look at Deduplication Technologies

Size: px
Start display at page:

Download "An In-Depth Look at Deduplication Technologies"

Transcription

1 An In-Depth Look at Deduplication Technologies White Paper Juan Orlandini, Datalink Mike Spindler, Datalink August 2008 Abstract: Deduplication is all the rage today, with a myriad of vendors offering technologies that provide deduplication capabilities. However, it is extremely confusing and time consuming for organizations to determine if the benefits of deduplication are compelling enough to consider implementing the technology. There are a number of factors to weigh. This white paper identifies what those factors are, provides an independent assessment of the types of deduplication, and identifies additional characteristics to look at when considering the deduplication solutions in the market.

2 Table of Contents Dedupe overview Deduplication is on everyone s mind What is deduplication? How the process works Savings are substantial Types of deduplication File-based Block-based Options for deduplication Application Source Target Network Inline versus post process Inline Post process Hybrid solutions Additional factors to weigh CPU-bound versus disk-bound File aware (format aware) Replication Realistic compression ratios Future considerations Integration with backup software ISVs Enterprise fit Deciding what is right for you

3 Dedupe overview Deduplication is on everyone s mind Among today s data storage technologies, deduplication is generating a significant buzz and is definitely at the top of the what s hot list. The major storage vendors have all introduced deduplication products within the last one to two years. And while still a relatively new technology, it has garnered interest from enterprise organizations spanning all industries. This hype comes with good reason. Deduplication provides compelling benefits and offers potential rewards rarely seen in today s IT environments in particular as it relates to disk-based backup technologies. Disk-based backup continues to improve the speed, reliability, and availability of backups. Deduplication technology lowers the cost of overall storage and makes diskbased backup more economically feasible. Disk-based backup technologies, coupled with deduplication, are transforming how organizations back up and recover their data and the role that tape plays in modern backup and archive architectures. The major storage vendors have all introduced deduplication products within the last one to two years. And while still a relatively new technology, it has garnered interest from enterprise organizations.. As with most emerging technologies, there are a plethora of alternatives and no clear answers about if, when, and where a solution should be implemented. Before an organization integrates any type of deduplication technology, it must first assess the different types of solutions and how each fits into its backup, recovery, and archive goals. Furthermore, the organization needs to ascertain the most effective path to integrate these technologies into its current operations. Unfortunately, the information necessary to make an educated decision is difficult to find. Each vendor markets its technology as the best approach. Even worse, some vendors have competing products in their own lineups, with each being touted as the best. This white paper demystifies the options and provides a clear, unbiased view of the deduplication market. What is deduplication? Simply put, deduplication identifies redundant information and stores it in a highly efficient format while maintaining the integrity of the original content. The data is stored only once, no matter how many copies are made. Vendors have chosen to call this technology many different things: dedupe, data reduction, single instance storage, global data single instance storage, capacity optimized storage, and even molecular sequence reduction. Although there are differences between each of these terms and associated technologies, at the core the concept is the same: find duplicated sets of data and store it only once. The primary benefit of deduplication is that it greatly reduces storage capacity requirements. This drives several other advantages as well, including lower power consumption, lower cooling requirements, longer disk-based retention of data (faster recoveries), and with some vendors simplified disaster recovery (DR) via optimized replication Datalink. All Rights Reserved. 1

4 How the process works Deduplication technology looks at data at either a block- (subfile) or filelevel. When a chunk of data comes across a dedupe engine, the data is broken into smaller blocks or segments. Each of these smaller blocks is given a unique identifier. This identifier is created by several hashing algorithms or even a bit by bit comparison of the block. Common algorithms used for this are MD5 or SHA-1. Some vendors also have content aware logic, which considers the source of the data (i.e., a NetBackup backup data stream) to determine block sizes and the boundaries of the resulting blocks. As the dedupe engine processes data, it compares the data to the blocks already identified and stored in its database. If a block already exists in the database, the new redundant data is discarded and a reference to the existing data is inserted into the repository. If the block contains new, unique data, then the block is inserted into the data store (filesystem), and a reference is added to that block in the dedupe database. During the backup operation, the backup application sees the file or the backup stream as it normally would. The size of the data is greatly reduced though since the blocks of data are replaced by reference pointers. Depending on where the data is being deduplicated, the backup application may or may not be aware that deduplication is occurring. With source-based deduplication, the application is intimately aware of the process. A targetbased deduplication process, meanwhile, is generally transparent to the backup application. With this approach, the application still reads and transfers the same amount of data from clients to the deduplication device; however, the device appears as a vanilla VTL or disk share and the reduction in capacity is essentially hidden from the backup application. During recovery operations, the same process exists. The backup application reads the data from the device without any concern or knowledge of the deduplication device s operations. As the dedupe engine processes data, it compares the data to the blocks already identified and stored in its database. If a block already exists in the database, the new redundant data is discarded and replaced with a reference to the existing data in the repository Datalink. All Rights Reserved. 2

5 Savings are substantial Deduplication technology can provide significant savings. If an enterprise performs a full backup weekly (as well as incremental daily backups), the potential for redundant data in the backup is enormous. For instance, if only five percent of the data being backed up by an organization changes on a daily basis, then most of the data in the full backup is redundant from backup to backup. Compounding this, it s being saved over and over, week after week. In particular, database or mail backups, which generally are performed as full backups every day, typically only change by a small percentage. Deduplication can also achieve significant savings for incremental backups. A good way to think about this is through a common daily workflow. Most people do not generate net new data each day. They tend to open a document or spreadsheet that they were working on the day before, change the document, and then save it. The net changes are relatively small compared to the whole file. However, when the backup application identifies these files for an incremental backup, the entire file is moved. When the dedupe engine sees this file, it will recognize that most of the data has been seen before and only store the net new data. It s not uncommon to see very high (more than 10x) deduplication rates on incremental backups. Regardless of the specific vendor implementation, file-based deduplication is essentially designed to identify files that are exactly the same and only store them once. Types of deduplication Additionally, deduplication eliminates the data redundancy between servers since the dedupe process examines blocks as opposed to a file or backup stream. For instance, the same system files likely exist on many Windows servers, desktops and laptops. Without any type of deduplication process, those files would be backed up and stored multiple times for each backup client that contains a copy. With deduplication, only a single copy would be stored for all locations. Deduplication technologies largely fall into either a file-based or block-based category. File-based deduplication compares the content of entire files. Block-level approaches take a more granular, sub-file level approach. Some vendors implement their sub-file level algorithms in what they term stream approaches. File-based Regardless of the specific vendor implementation, file-based deduplication is essentially designed to identify files that are exactly the same and only store them once. This approach can be very beneficial in environments where users or applications re-create or save the same file in multiple locations. For example, a common scenario is for a user to send out a spreadsheet or a presentation to multiple co-workers via , with each co-worker saving the file to his or her local network share or a workstation or laptop. Sometimes, the users save the files with a different name without changing the content of the file. Backup applications typically have no way to know that the content of those files is identical and, therefore, each file is backed up individually Datalink. All Rights Reserved. 3

6 The costs for the backup infrastructure increase from the duplication of effort associated with these files. To alleviate this problem, many organizations have implemented archive solutions to deduplication devices, which can identify files that are the same and store only the initial copy. The system only stores the relevant metadata (filename, age, size, etc.) of the redundant files and creates a reference to the first copy. From the perspective of the user or application accessing the file, each file is its own independent copy. The storage subsystem handles the details transparently. These technologies often achieve deduplication when the applications are accessing different storage platforms. In the above example, the initial copy of the data resided in the first user s hard disk. It was then attached to an message and sent to a mail server, which then forwarded a copy to each user, who in turn stored copies on their systems. This process created multiple copies on different file systems as well as a copy on the system. A file-based archive solution coupled with the right backup and archive software identifies that all of the copies are identical on both the file systems and system and stores only a single copy. Block-based solutions take the concept of storing unique data to the next level. The leading vendors in the file-based deduplication space are EMC with its Centera products, Hitachi Data Systems with its Content Archiving Platform (HCAP) solutions, and CommVault with its dedupe solution for backups and archive. Interestingly, until the advent of block or sub-file level deduplication, the file-based vendors did not hype this capability much. However, given the recent popularity of the technology and the buzz around deduplication, they have jumped on the bandwagon and are now much more actively promoting it. Block-based Block-based solutions take the concept of storing unique data to the next level. Rather than focusing on individual files, this method identifies common patterns of data regardless of where the data exists. An example of this would be if a user attached a PowerPoint presentation to an sent to multiple users, with the users each making a few minor changes to the file and saving the presentation to their laptops. At this point, file-based solutions would consider each of these files unique, even though most of the data is identical in each file and only a few bytes of information have changed. Block-based approaches, on the other hand, identify the common data regardless of the slight changes. At a very high level, this process selects a chunk of data (block, segment, variable size segment, etc.) and computes a uniquely identifying value (aka a hash). The system then compares this value against a database to identify if it has seen the data before. If it has not, the data is written to the storage. The computed value for the data along with a pointer (reference) to the data is inserted in to the dedupe database. The next time that unique value is seen, the system knows that it has a copy of the data and only needs to store another reference not the data itself Datalink. All Rights Reserved. 4

7 Options for deduplication Generally, this approach provides the ability to identify the same data, regardless of its location in the file. However, when users change a file, they essentially shift all of the bits that represent the file. This means that when the system identifies the chunks of data, the chunks will fall under different boundaries because they have shifted by a few bits. As a result, the system will see each chunk as being different because it did not create the chunks in the same locations for each file. Fortunately, most vendors that implement sub-file approaches have technologies that address this point specifically. By and large, their approach to solving this problem is what differentiates them from a strict block-based approach as well as from each other. Additionally, some vendors in this space differentiate themselves further because of how they approach the computation and verification of the uniqueness of their chunks. Within the various block- and filebased deduplication technologies, there are several methods of implementation. A significant difference between solutions has to do with where the process occurs. Within the various block- and file-based deduplication technologies, there are several methods of implementation. A significant difference between solutions has to do with where the process occurs. Application One of the first places where deduplication occurs is within applications. A dedupe application is typically complementary to another application that has tendencies to store large amounts of redundant data. The most common type of this application is archival. server applications like Microsoft Exchange or Lotus Notes manage distribution to users, with the majority of the data being stored in servers or at an user s client application. The size of the mail store is largely driven by the attachments to messages. archival applications help manage the storage of both the server and the associated client applications by finding identical attachments on the server and moving single instances of the attachments to a common repository. The files are replaced at the server by a link to the file in the repository. Several software applications provide this archival service. Two of the largest and most common applications are EMC Xtender and Symantec Enterprise Vault. Additional examples of this deduplication approach are certain file-level archive solutions and file virtualization products. Source Source-based also referred to as client-based deduplication processes the data at its origination. This method still utilizes storage or storage with an appliance; however, the CPU cycles for the deduplication process are performed at the client. The greatest benefit of this approach is that only net new data is sent from the client to the backup devices. But because the computational load is carried by the client, it imposes a very high CPU load during backups and in many cases the backup performance is slower than with traditional approaches Datalink. All Rights Reserved. 5

8 With source-based deduplication, the traditional backup software is replaced by the dedupe client code. The backup process at the client looks at the data to be backed up at a block level. Through various hashing and bit comparison techniques, the process determines changed or new blocks of data that need to be sent to the repository. In some cases, the dedupe storage appliance provides recent block lists in its database to the client to help offload some of the cycles required for the process. This speeds up the deduplication at the client because it does not need to talk to the repository to compare each block it processes. Another approach to source deduplication takes a snapshot of the data to be protected. The original snapshot of the data is transferred across the network to common storage device. Subsequent point-in-time snapshots are then taken and compared to previous snapshots, with only the changes sent to a common storage location. This type of deduplication ensures that only small and incremental amounts of data are sent to the storage device. In direct contrast to source deduplication, the process for target deduplication occurs within an appliance at the storage level. These types of source deduplication are well-suited for backing up user desktops, remote office locations, or mobile users. Leaders in source deduplication include EMC Avamar, NetApp Open Systems SnapVault (OSSV), and Symantec PureDisk. Lastly, an emerging variation of the source deduplication process is integrated into some of the latest releases of major enterprise backup software from EMC and Symantec. This design allows the media server to take data from its normal backup clients and perform the deduplication before sending it to the dedupe storage appliance. The deduplication function is performed by the media server (the NetBackup approach) or storage node (the EMC approach) before the data is sent to the backup software. This type of implementation allows an organization to integrate source-based deduplication with enterprise backup using the deduplicated storage for the enterprise backup infrastructure. With this method, performance is based on the abilities of the media server. Companies looking at this technology should consider upgrading or adding additional media servers to their infrastructure to accommodate the additional processor cycles needed. Target In direct contrast to source deduplication, the process for target deduplication occurs within an appliance at the storage level. The appliance either has integrated storage or functions as a gateway to an existing disk array. This method applies CPU and I/O resources at the destination of the deduplicated data and is currently primarily designed to address deduplication for backup and recovery processes and long-term archive of reference data. Applications of the technology include virtual tape (VTL) and disk to disk to tape (D2D2T), primary storage, and content addressable storage (CAS) Datalink. All Rights Reserved. 6

9 D2D2T In a D2D2T environment, the dedupe process runs on a network-attached storage (NAS) or storage area network (SAN) appliance that either has integrated storage or external storage attached. The appliance breaks the backup stream down and performs the deduplication process. The backup software writes to an NFS or CIFS share on the network or LUN on the SAN. With this method, the deduplication occurs as the data is received (inline). Alternatively, some vendors give the user a choice of doing the deduplication after the backup has completed, which is commonly referred to as a post process. There has been an explosion of products in this area from a host of storage vendors, including Data Domain, EMC, NetApp, and Quantum. VTL The VTL approach to deduplication is similar to the D2D method. The difference is that the target is a virtual tape as opposed to a CIFS/NFS file. The deduplication occurs within an existing VTL appliance or an additional appliance that deduplicates to the VTL. Applications of target deduplication include VTL disk-to-disk-totape, primary storage and content addressable storage (CAS). When a VTL appliance creates a virtual tape, the data is written to a LUN on the storage array integrated with the VTL. The general concern of many VTL technology users is that the deduplication process creates too much overhead on the VTL, thus slowing down the backups. Many of these solutions offer both post and inline deduplication to help counter this concern. Still, others offer the best of both worlds with the ability to switch from inline to post process if the overhead of the process is affecting the VTL backup performance. Vendors with solutions in this area include Data Domain, EMC, FalconStor, IBM, NetApp, Quantum, SEPATON, and Sun. Additionally, some vendors offer smart deduplication, where the technology is backup software-aware. This means that the dedupe engine can gain intelligence on how and where to create the blocks from the backup stream and thereby further optimize the deduplication process. The technology provides the ability to ignore components of the backup stream that don t need to be processed (backup file marks, stream information, etc.). Consequently, the ratio of deduplication increases if metadata from the backup applications is removed from the blocks being deduplicated. However, with this intelligence also come additional management factors. If software vendors change their backup software or modify the format they use when writing to tape, then the deduplication vendors have to update their code to remain compatible. Primary Storage Where the aforementioned methods of deduplication focus on the backup and recovery space, primary storage deduplication optimizes the data on the storage being accessed by users. This creates the ability to provide larger storage capacities with less physical disk and reduce the storage cost per gigabyte. In current implementations, the dedupe process runs as a scheduled background task. Note that primary storage should not be confused with tier 1 storage. Although deduplica Datalink. All Rights Reserved. 7

10 tion can be performed on any type of disk (SATA, Fibre Channel, etc.), generally applications needing this space management ability require relatively slower, but high capacity solutions. Vendors who offer these solutions are NetApp and Compellent. Content Addressable Storage Content addressable storage (CAS) is another form of storage that offers deduplication abilities. With these solution types, an appliance is integrated with the storage or front-ends the disk array, performing file-level deduplication on the data it manages. Generally, a product-specific application programming interface (API) interacts with this type of storage. Advantages include redundancy of files in the repository across different nodes within the subsystem or between subsystems for remote replication solutions. Hitachi Data Systems and EMC offer CAS solutions. Network Deduplication technology has also moved to the network itself. Enterprises continue to struggle with the bandwidth or performance of the network between headquarters and remote locations. Traditional methods address this issue through file caching or quality of service (QOS) applications. Another key difference with deduplication technologies has to do with whether the process occurs inline (real-time in the data stream) or post process. Inline versus post process A new approach is appliance-based solutions that are usually at the headquarters and remote offices or between two data centers. Through block deduplication, the technology manages duplication repositories at each of the appliances. The repository consists of blocks from the data stream that represent file sharing, application communication, web based traffic, or . Instead of transferring all the data across the WAN, the appliance replaces the blocks with the referenced blocks in the repository. This can dramatically lower the bandwidth requirements and improve performance. Leaders in this sector include Cisco and Riverbed. Another key difference with deduplication technologies has to do with whether the process occurs inline (real-time in the data stream) or post process (as a background process after the backup has finished and data has been written to the storage system). These approaches were eluded to previously in the VTL and D2D2T sections of this white paper. Inline An inline deduplication process allows the data to be deduplicated in real time as the backup data is received at the front end of the VTL or D2D device. The stream is then passed to the dedupe engine, which breaks the data into blocks, calculates a hash value for the block, determines if the block is new or existing, and replaces the block in the stream with a pointer to the block in the repository. If the block is new to the repository, it is compressed and written to the disk. The original data stream now consists of pointers to the blocks in the repository and is written to disk Datalink. All Rights Reserved. 8

11 The deduplication process is highly CPU and, depending on the implementation, I/O intensive. Because of the computing resources needed, the current maximum performance for the engine is about 400 MB per second. Over the last few years, processors used in these engines have gone from single core processors to dual core, and most recently to quad core processors. Target mode deduplication vendors that rely on CPU resources as the primary driver of their deduplication engine currently utilize dual, quad core processors to maximize performance for their fastest models. Other vendors rely on fast disk when I/O resources are needed for deduplication. As disk speeds (rotational speed, access time, and latency) have improved on the high capacity drives typically used, performance has also improved. Post process With the post process method, deduplication is performed after the backup and post-backup processing have completed. High-end VTL or D2D solutions can typically ingest backup data at up to 1200 MB per second. Since the backups occur before deduplication, there is less at risk time for an organization during which a backup has not yet completed. However, these solutions also require additional disk space to hold the backup before it is deduplicated. When implementing these solutions, it is necessary to size the landing space to accommodate not just the space required for one backup set, but potentially the next one as well. This is because if the ingest or backup speed is very high, there is a good change that the deduplication process won t complete before the next backup starts. Some target deduplication solutions are hybrid and can do both inline and post process dedupe. A key difference with deduplication technologies has to do with whether the process occurs inline (real-time in the data stream) or post process (as a background process after the backup has finished and data has been written to the storage system) Datalink. All Rights Reserved. 9

12 The post process deduplication method is similar to that of inline. Data is read from the disk where it is temporarily stored and the dedupe process occurs. After deduplication is complete, the original pre-dedupe space is reclaimed. Hybrid solutions Additional factors to weigh Some target deduplication solutions can do both inline and post process dedupe. Inline deduplication is performed by default. If the backup stream exceeds the performance abilities of the dedupe engine, the inline approach is suspended, and the backup stream is written to disk. After the backup is finished, a post process dedupe is performed on the backup data stream that was written during the backup. In addition to the pros and cons identified with the various methods of deduplication, organizations should examine several other factors when considering a deduplication solution. A major difference between inline deduplication vendors is their approach to verifying the reliability of data. CPU-bound versus disk-bound A major difference between inline deduplication vendors is their approach to verifying the reliability of data. One major camp has focused on the reliability of the cryptographically secure hash functions used to calculate the unique identifiers. The assumption is that the chance of two pieces of data computing to the same identifier (in computer science terms a collision ) is so small that, for all intent and purposes, it is virtually impossible. The other camp maintains that any chance for collision no matter how small is not acceptable and therefore performs a two step process. Technologies in this arena first compute a relatively easy-to-calculate hashbased identifier and then check for collision by comparing each block against what s on disk. The rationale is that by verifying every block against what s on disk, there is 100 percent certainty that no data will be accidentally deleted. Each of these camps offers some very compelling arguments for why their approach is the fastest and most reliable. The net result is that both methods solve the problem for real-world situations. In reality, the chance of either method being the cause of data loss is extremely low much lower in fact than the likelihood of other subsystems failing and causing data loss. For example, the hash-based approaches use algorithms that have a collision rate in the neighborhood of 1 x 2 80 or 1 x Hard drives have an undetected, unrecoverable error rate in the neighborhood of 1 x The chance of a collision is still much lower than the unrecoverable error rate even with RAID protection. For the disk-based approaches, the same math comes into play. There is a much higher likelihood that other elements of the subsystem will fail. As a result, all the efforts around verifying disk blocks will have been for nothing Datalink. All Rights Reserved. 10

13 These methods have varying side effects from a performance perspective. Vendors that rely on the purely hash-based algorithms are by-and-large CPU bound. Their performance is primarily dictated by the pure number crunching ability of the CPUs they utilize. To get better performance, these sub-systems have merely to wait for faster CPUs. Disk-bound systems are primarily bound by the number of disk drives that can perform the read verification requests in parallel. Their primary method of scaling is by adding more and faster disk drives. They do have CPU limitations as well, but the primary gating factor is the raw number of IOPS that the storage can deliver. Interestingly, today both types of systems achieve roughly the same maximum performance throughput MB per second. However, vendors for both methods are rapidly evolving their technologies and promising even higher throughput numbers. Hash-bound vendors are claiming they will soon provide clustering capabilities that will let them scale their solutions in a linear fashion and disk-bound vendors are predicting they will soon double their throughput rate. Deduplication is not only changing the way that backup is done at primary sites, but also the way backups are done for remote sites and even for disaster recovery. File aware (format aware) An interesting twist to the deduplication space is that the technologies need to understand the format that the backup applications use to write to tape or disk. By being tape format aware, they strip away the information that the backup application puts into the data stream and perform deduplication only on the real data. This is appealing because the tape formats used by the backup vendors insert markers into the data streams. If the deduplication technology is not aware of these markers, then the markers appear to be part of the data that the backup client generated. This is a problem because each marker shifts the data bits and causes the data chunking processes to align in different places. As far as the dedupe engines are concerned, each of these blocks is then unique. It s important when evaluating or implementing deduplication solutions to be aware of this file/format level. It s crucial to either test the solutions against your backup product or have the vendor provide you detailed information about their support for specific formats. Replication Deduplication is not only changing the way that backup is done at primary sites, but also the way backups are done for remote sites and even for disaster recovery. For the first time ever, it is now possible to replicate backup data with very high efficiencies. Prior to deduplication, a 1TB backup to disk (or tape) would consume at least 1TB of bandwidth to replicate. If the backup window was relatively small (less than eight hours) and the bandwidth relatively low (less than a few T1s), the backups would never be replicated in time. However, with deduplication, the 1TB backup could now consume five percent (or even less) of the space. The bandwidth requirements to move this data are reduced by the same amount Datalink. All Rights Reserved. 11

14 Surprisingly, not all deduplication vendors have implemented replication. However, because of the game-changing ability this technology offers, the vendors that currently don t have replication solutions have announced that they are imminent. Of course, the reality in some cases is that it may take several years for some vendors to deliver this capability. For those vendors that have implemented replication with deduplication capabilities, there are a few key differentiators. First is the ability to have many-to-one relationships. Most vendors are able to set up a single primary site to which many remote sites can be replicated. The maximum number of remote sites that can be replicated vary among these technologies. For some, the maximum count is as little as 10, but others are as high as 60. The second differentiator is whether the systems can do multi-hop replications. In these scenarios, Site A is replicated to Site B, and Site B is replicated to Site C (and possibly back to Site A). Very few vendors have this capability. There is no hard and fast compression number that exists for each deduplication technology. A third more subtle but important differentiator is whether the replication is done to a global namespace. For some vendors the replication relationships are generated by partitioning the receiving unit into discreet, non-interacting storage pools. The deduplication engine is only able to handle common data between the two sites in a single pairing. For example, if Site A and Site B replicate to Site C, Site C has to divide its storage into three pools. One pool is for Site A, another is for Site B, and a third is for the backups occurring at Site C. The deduplication engine treats each of those storage pools as separate entities and deduplicates only within that storage pool. This can be potentially troublesome from a resources standpoint. In the above example, we could assume that Site C is a central corporate facility, and sites A and B are remote offices. Chances are good that there is significant common data between sites A and B and the corporate office. But, because the technology treats each storage pool as unique data, the commonality would not be identified. This increases the requirements for both bandwidth and for storage at each of the appliances. Vendors that have a global namespace do not have to partition their storage and thus the deduplication is done globally. In the above example, data that has been seen by Site C s backups would not have to be replicated when Site A and Site B do their backups. Only references (aka identifiers) would need to be sent. With this capability, bandwidth requirements can be reduced even more than the raw deduplication rate at each remote office. Realistic compression ratios There is no hard and fast compression number that exists for each deduplication technology. The various deduplication vendors tout redundant storage reduction ranges from conservative (5:1) to lofty (500:1). Just like with tape drive compression, the numbers can vary significantly. Similar to compression ratios, deduplication ratios are sometimes computed differently. Primarily, the level of optimization that an organization will achieve is driven 2008 Datalink. All Rights Reserved. 12

15 by the type of data being backed up as well as the backup processes. Additionally, the size of the chunks of data processed, and the intelligence of how the deduplication engine decides where a block starts and stops, can affect the level of deduplication achieved. For instance, if an organization runs full backups every day, the ratios of redundant data will be high. Conversely, if incremental forever backups are conducted, the ratio of reduction will be significantly less. Also, many enterprise backup environments already use different approaches to shorten the backup window. With digital images, for example, weekly full backups may be skipped because the nature of this data is static. For Exchange environments, an organization may already use archival software to deduplicate attachments. Factors such as this will lessen the overall deduplication rates. The limits of deduplication in a backup environment are also affected by the SLAs around data. If an organization keeps data for only 30 days, its compression ratios will be significantly lower than if it keeps backups for six months. This is because there will be less redundant data being backed up and stored. Because deduplication technology is still somewhat in its infancy, it will continue to evolve and provide additional capabilities as time goes by. Future considerations Lastly, the overall deduplication rate will increase over time. If the repository of deduplicated data spans months opposed to weeks, the longer retention of this data will result in a greater deduplication rate. Because deduplication technology is still somewhat in its infancy, it will continue to evolve and provide additional capabilities as time goes by. Other areas that organizations will need to keep an eye on include how deduplication will integrate with independent software vendors (ISVs) and how it will fit into large environments. Integration with backup software ISVs When most of the deduplication and virtual tape products were introduced, the best practice was to perform duplication and tape offloading via the backup software. The reasoning was that unless the backup software performed this operation, the backup application would be unaware of the second copy. The downside to this is if a backup is 100 GB, making the copy means the backup server has to read and write the 100 GB a second and third time (read 100GB and write another 100GB). Even worse, the CPU and I/O cycles required by this process are delivered by the backup servers, which typically are already saturated. Symantec has recently created an API that enables utilization of advanced storage functions like deduplication. The API allows storage vendors to integrate into the backup software, enabling the storage solution to perform the replication process and, at the same time, update the backup software with the knowledge of the second copy of the backup. The process happens more quickly because the data is deduped and only the delta changes (new unique 2008 Datalink. All Rights Reserved. 13

16 blocks) need to be transmitted between the storage appliances. Also, the CPU cycles once needed at the backup server can be used for other backup processes like restores. Currently, Symantec NetBackup is the only product to offer this ability. However, the other major backup ISVs will follow as well. Enterprise fit By and large, the majority of the vendors that have implemented deduplication have done so in relatively small appliances (less than 100TB raw, prededupe space). Furthermore, they are only able to achieve relatively moderate deduplication throughput rates (less than 400 MB per second). Large enterprises with high data volumes (hundreds of terabytes to multiple petabytes) and small backup windows need to be particularly aware of these limitations. VTL and other disk-to-disk solutions that don t deduplicate can achieve much higher capacities and throughputs (>1PB and > 1.2GB per second). Even though the capacity problem is mitigated by the dedupe reductions, the relatively low raw data capacity of the current solutions could cause a problem for larger enterprises. Most vendors today approach the larger enterprises by utilizing multiple appliances. However, today these appliances are managed independently and the backup applications see them as unique storage pools. Even after conducting in-depth analysis, it can still be confusing to know which technology to implement. Datalink helps organizations sort through this process. Potentially more worrisome for the enterprise is that none of these solutions offer true controller redundancy. If a single, large dedupe appliance controller goes down, the entire capacity of that array is unavailable until the controller is fixed or replaced. All of the current dedupe vendors are working on addressing this issue. Some are approaching it through the use of traditional clustering technologies (active/passive or active/active pairs) and others are pursuing the distributed computing model (multiple active controllers presenting themselves as a unified solution). In Datalink s view, the best short-term bet is the traditional approach, but for long-term growth and scalability, the distributed model is probably going to be the most tenable solution. Deciding what is right for you Deduplication can be a confusing topic. With so many vendors offering different types of solutions some not even referred to as deduplication it can be difficult to discern which solution types add value and whether the overall benefits outweigh the risks associated with implementing a new technology. Several considerations come into play when organizations assess whether deduplication provides a good fit for their environment. These range from looking at the types of applications and data that are best suited for deduplication to weighing the many varying characteristics of the deduplication technologies that are available in today s market (i.e, post process or inline, VTL or D2D2T, etc.) Datalink. All Rights Reserved. 14

17 Even after conducting in-depth analysis, it can still be confusing to know which technology to implement. Datalink helps organizations sort through this process. As a leading information storage architect, Datalink helps organizations store, manage, and protect their information. We work with companies to maximize the value that IT delivers to their business. Datalink has worked with a number of enterprises to help define and implement solutions that utilize deduplication capabilities. Our independence allows us to recommend hardware and software technologies that provide the most optimal fit for organizations environments and enable them to effectively and efficiently meet their business initiatives. Datalink has extensive field experience with a wide range of technologies. This, combined with the knowledge we glean from in-depth testing conducted in our interoperability labs, provides us with invaluable insight that we can pass on to our clients as we design and implement storage solutions. For more information, contact Datalink at (800) or visit For more information, contact Datalink at (800) or visit Datalink. All Rights Reserved. 15

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

Reducing Backups with Data Deduplication

Reducing Backups with Data Deduplication The Essentials Series: New Techniques for Creating Better Backups Reducing Backups with Data Deduplication sponsored by by Eric Beehler Reducing Backups with Data Deduplication... 1 Explaining Data Deduplication...

More information

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved. Cost Effective Backup with Deduplication Agenda Today s Backup Challenges Benefits of Deduplication Source and Target Deduplication Introduction to EMC Backup Solutions Avamar, Disk Library, and NetWorker

More information

Deduplication has been around for several

Deduplication has been around for several Demystifying Deduplication By Joe Colucci Kay Benaroch Deduplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding

More information

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert Backup Software Data Deduplication: What you need to know Presented by W. Curtis Preston Executive Editor & Independent Backup Expert When I was in the IT Department When I started as backup guy at $35B

More information

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

E-Guide. Sponsored By:

E-Guide. Sponsored By: E-Guide An in-depth look at data deduplication methods This E-Guide will discuss the various approaches to data deduplication. You ll learn the pros and cons of each, and will benefit from independent

More information

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Table of Contents Introduction... 3 Shortest Possible Backup Window... 3 Instant

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

How to Get Started With Data

How to Get Started With Data E-Guide How to Get Started With Data Deduplication Data deduplication has certainly generated quite a buzz among storage professionals in the UK, and while there s a lot of curiosity and interest, many

More information

June 2009. Blade.org 2009 ALL RIGHTS RESERVED

June 2009. Blade.org 2009 ALL RIGHTS RESERVED Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS

More information

Backup and Recovery Redesign with Deduplication

Backup and Recovery Redesign with Deduplication Backup and Recovery Redesign with Deduplication Why the move is on September 9, 2010 1 Major trends driving the transformation of backup environments UNABATED DATA GROWTH Backup = 4 to 30 times production

More information

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures 1 Refreshing Your Data Protection Environment with Next-Generation Architectures Dale Rhine, Principal Sales Consultant Kelly Boeckman, Product Marketing Analyst Program Agenda Storage

More information

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007 Tiered Data Protection Strategy Data Deduplication Thomas Störr Sales Director Central Europe November 8, 2007 Overland Storage Tiered Data Protection = Good = Better = Best! NEO / ARCvault REO w/ expansion

More information

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Efficient Backup with Data Deduplication Which Strategy is Right for You? Efficient Backup with Data Deduplication Which Strategy is Right for You? Rob Emsley Senior Director, Product Marketing CPU Utilization CPU Utilization Exabytes Why So Much Interest in Data Deduplication?

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

Data Deduplication Background: A Technical White Paper

Data Deduplication Background: A Technical White Paper Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice

More information

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved. DPAD Introduction EMC Data Protection and Availability Division 1 EMC 的 備 份 與 回 復 的 解 決 方 案 Data Domain Avamar NetWorker Data Protection Advisor 2 EMC 雙 活 資 料 中 心 的 解 決 方 案 移 動 性 ( Mobility ) 可 用 性 ( Availability

More information

Get Success in Passing Your Certification Exam at first attempt!

Get Success in Passing Your Certification Exam at first attempt! Get Success in Passing Your Certification Exam at first attempt! Exam : E22-290 Title : EMC Data Domain Deduplication, Backup and Recovery Exam Version : DEMO 1.A customer has a Data Domain system with

More information

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions ADVANCED DEDUPLICATION CONCEPTS Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and

More information

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała Data Deduplication in Tivoli Storage Manager Andrzej Bugowski 19-05-2011 Spała Agenda Tivoli Storage, IBM Software Group Deduplication concepts Data deduplication in TSM 6.1 Planning for data deduplication

More information

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved. EMC DATA DOMAIN OVERVIEW 1 2 With Data Domain Deduplication Storage Systems, You Can WAN Retain longer Keep backups onsite longer with less disk for fast, reliable restores, and eliminate the use of tape

More information

Introduction to NetApp Infinite Volume

Introduction to NetApp Infinite Volume Technical Report Introduction to NetApp Infinite Volume Sandra Moulton, Reena Gupta, NetApp April 2013 TR-4037 Summary This document provides an overview of NetApp Infinite Volume, a new innovation in

More information

WHITE PAPER: customize. Best Practice for NDMP Backup Veritas NetBackup. Paul Cummings. January 2009. Confidence in a connected world.

WHITE PAPER: customize. Best Practice for NDMP Backup Veritas NetBackup. Paul Cummings. January 2009. Confidence in a connected world. WHITE PAPER: customize DATA PROTECTION Confidence in a connected world. Best Practice for NDMP Backup Veritas NetBackup Paul Cummings January 2009 Best Practice for NDMP Backup Veritas NetBackup Contents

More information

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group DEDUPLICATION NOW AND WHERE IT S HEADING Lauren Whitehouse Senior Analyst, Enterprise Strategy Group Need Dedupe? Before/After Dedupe Deduplication Production Data Deduplication In Backup Process Backup

More information

White. Paper. Improving Backup Effectiveness and Cost-Efficiency with Deduplication. October, 2010

White. Paper. Improving Backup Effectiveness and Cost-Efficiency with Deduplication. October, 2010 White Paper Improving Backup Effectiveness and Cost-Efficiency with Deduplication By Lauren Whitehouse October, 2010 This ESG White Paper was commissioned by Fujitsu and is distributed under license from

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Enterprise Backup and Restore technology and solutions

Enterprise Backup and Restore technology and solutions Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013

More information

EMC Data de-duplication not ONLY for IBM i

EMC Data de-duplication not ONLY for IBM i EMC Data de-duplication not ONLY for IBM i Maciej Mianowski EMC BRS Advisory TC May 2011 1 EMC is a TECHNOLOGY company EMC s focus is IT Infrastructure 2 EMC Portfolio Information Security Authentica Network

More information

Recoup with data dedupe Eight products that cut storage costs through data deduplication

Recoup with data dedupe Eight products that cut storage costs through data deduplication Reprint T H E C O N N E C T E D E N T E R P R I S E S E P T E M B E R 1 2, 2 0 1 1 C L E A R C H O I C E T E S T: D AT A D E D U P L I C AT I O N Recoup with data dedupe Eight products that cut storage

More information

Choosing an Enterprise-Class Deduplication Technology

Choosing an Enterprise-Class Deduplication Technology WHITE PAPER Choosing an Enterprise-Class Deduplication Technology 10 Key Questions to Ask Your Deduplication Vendor 400 Nickerson Road, Marlborough, MA 01752 P: 866.Sepaton or 508.490.7900 F: 508.490.7908

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

EMC BACKUP MEETS BIG DATA

EMC BACKUP MEETS BIG DATA EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for

More information

DEDUPLICATION BASICS

DEDUPLICATION BASICS DEDUPLICATION BASICS 4 DEDUPE BASICS 12 HOW DO DISASTER RECOVERY & ARCHIVING FIT IN? 6 WHAT IS DEDUPLICATION 14 DEDUPLICATION FOR EVERY BUDGET QUANTUM DXi4000 and vmpro 4000 8 METHODS OF DEDUPLICATION

More information

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup Technical white paper Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup Table of contents Executive summary... 2 Introduction... 2 What is NDMP?... 2 Technology overview... 3 HP

More information

Data Deduplication: An Essential Component of your Data Protection Strategy

Data Deduplication: An Essential Component of your Data Protection Strategy WHITE PAPER: THE EVOLUTION OF DATA DEDUPLICATION Data Deduplication: An Essential Component of your Data Protection Strategy JULY 2010 Andy Brewerton CA TECHNOLOGIES RECOVERY MANAGEMENT AND DATA MODELLING

More information

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Deduplication and Beyond: Optimizing Performance for Backup and Recovery Beyond: Optimizing Gartner clients using deduplication for backups typically report seven times to 25 times the reductions (7:1 to 25:1) in the size of their data, and sometimes higher than 100:1 for file

More information

Technical White Paper for the Oceanspace VTL6000

Technical White Paper for the Oceanspace VTL6000 Document No. Technical White Paper for the Oceanspace VTL6000 Issue V2.1 Date 2010-05-18 Huawei Symantec Technologies Co., Ltd. Copyright Huawei Symantec Technologies Co., Ltd. 2010. All rights reserved.

More information

Eight Considerations for Evaluating Disk-Based Backup Solutions

Eight Considerations for Evaluating Disk-Based Backup Solutions Eight Considerations for Evaluating Disk-Based Backup Solutions 1 Introduction The movement from tape-based to disk-based backup is well underway. Disk eliminates all the problems of tape backup. Backing

More information

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International Keys to Successfully Architecting your DSI9000 Virtual Tape Library By Chris Johnson Dynamic Solutions International July 2009 Section 1 Executive Summary Over the last twenty years the problem of data

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication ExaGrid Product Description Cost-Effective Disk-Based Backup with Data Deduplication 1 Contents Introduction... 3 Considerations When Examining Disk-Based Backup Approaches... 3 ExaGrid A Disk-Based Backup

More information

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper HP StoreOnce D2D Understanding the challenges associated with NetApp s deduplication Business white paper Table of contents Challenge #1: Primary deduplication: Understanding the tradeoffs...4 Not all

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3

More information

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION? WHAT IS FALCONSTOR? FalconStor Optimized Backup and Deduplication is the industry s market-leading virtual tape and LAN-based deduplication solution, unmatched in performance and scalability. With virtual

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read

More information

Future-Proofed Backup For A Virtualized World!

Future-Proofed Backup For A Virtualized World! ! Future-Proofed Backup For A Virtualized World! Prepared by: Colm Keegan, Senior Analyst! Prepared: January 2014 Future-Proofed Backup For A Virtualized World Like death and taxes, growing backup windows

More information

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved. Data Domain Overview Jason Schaaf Senior Account Executive Troy Schuler Systems Engineer 1 Data Domain: Leadership and Innovation Deduplication storage systems > 10,000 systems installed > 3,700 customers

More information

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS SOLUTION PROFILE EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS MAY 2012 Backups are essential for short-term data recovery

More information

LDA, the new family of Lortu Data Appliances

LDA, the new family of Lortu Data Appliances LDA, the new family of Lortu Data Appliances Based on Lortu Byte-Level Deduplication Technology February, 2011 Copyright Lortu Software, S.L. 2011 1 Index Executive Summary 3 Lortu deduplication technology

More information

Backup and Recovery 1

Backup and Recovery 1 Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup

More information

Redefining Microsoft SQL Server Data Management. PAS Specification

Redefining Microsoft SQL Server Data Management. PAS Specification Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation : Backup to Tape, Disk and Beyond Michael Fishman, EMC Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use

More information

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento EMC DATA PROTECTION Backup ed Archivio su cui fare affidamento 1 Challenges with Traditional Tape Tightening backup windows Lengthy restores Reliability, security and management issues Inability to meet

More information

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved. Redefining Backup for VMware Environment 1 Agenda VMware infrastructure backup and recovery challenges Introduction to EMC Avamar Avamar solutions for VMware infrastructure Key takeaways Copyright 2009

More information

ExaGrid - A Backup and Data Deduplication appliance

ExaGrid - A Backup and Data Deduplication appliance Detailed Product Description ExaGrid Backup Storage Appliances with Deduplication 2014 ExaGrid Systems, Inc. All rights reserved. Table of Contents Executive Summary...2 ExaGrid Basic Concept...2 ExaGrid

More information

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup EMC PERSPECTIVE An EMC Perspective on Data De-Duplication for Backup Abstract This paper explores the factors that are driving the need for de-duplication and the benefits of data de-duplication as a feature

More information

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery EMC AVAMAR a reason for Cloud Deduplication backup software Replication for Disaster Recovery Bogdan Stefanescu (Bogs) EMC Data Protection Solutions bogdan.stefanescu@emc.com 1 BUSINESS DRIVERS Increase

More information

Seriously: Tape Only Backup Systems are Dead, Dead, Dead!

Seriously: Tape Only Backup Systems are Dead, Dead, Dead! Seriously: Tape Only Backup Systems are Dead, Dead, Dead! Agenda Overview Tape backup rule #1 So what s the problem? Intelligent disk targets Disk-based backup software Overview We re still talking disk

More information

Protect Data... in the Cloud

Protect Data... in the Cloud QUASICOM Private Cloud Backups with ExaGrid Deduplication Disk Arrays Martin Lui Senior Solution Consultant Quasicom Systems Limited Protect Data...... in the Cloud 1 Mobile Computing Users work with their

More information

WHITE PAPER BRENT WELCH NOVEMBER

WHITE PAPER BRENT WELCH NOVEMBER BACKUP WHITE PAPER BRENT WELCH NOVEMBER 2006 WHITE PAPER: BACKUP TABLE OF CONTENTS Backup Overview 3 Background on Backup Applications 3 Backup Illustration 4 Media Agents & Keeping Tape Drives Busy 5

More information

EMC Disk Library with EMC Data Domain Deployment Scenario

EMC Disk Library with EMC Data Domain Deployment Scenario EMC Disk Library with EMC Data Domain Deployment Scenario Best Practices Planning Abstract This white paper is an overview of the EMC Disk Library with EMC Data Domain deduplication storage system deployment

More information

EMC NetWorker Rounds Out Deduplication Support with EMC Data Domain Boost. Analyst: Michael Fisch

EMC NetWorker Rounds Out Deduplication Support with EMC Data Domain Boost. Analyst: Michael Fisch EMC NetWorker Rounds Out Deduplication Support with EMC Data Domain Boost THE CLIPPER GROUP NavigatorTM Published Since 1993 Report #TCG2010046 October 4, 2010 EMC NetWorker Rounds Out Deduplication Support

More information

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009 ESG REPORT : Evaluating Software- vs. Hardware-Based Approaches By Lauren Whitehouse April, 2009 Table of Contents ESG REPORT Table of Contents... i Introduction... 1 External Forces Contribute to IT Challenges...

More information

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments Overcoming Backup & Recovery Challenges in Enterprise VMware Environments Daniel Budiansky Enterprise Applications Technologist Data Domain Dan Lewis Manager, Network Services USC Marshall School of Business

More information

Symantec NetBackup 5220

Symantec NetBackup 5220 A single-vendor enterprise backup appliance that installs in minutes Data Sheet: Data Protection Overview is a single-vendor enterprise backup appliance that installs in minutes, with expandable storage

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

Symantec NetBackup deduplication general deployment guidelines

Symantec NetBackup deduplication general deployment guidelines TECHNICAL BRIEF: SYMANTEC NETBACKUP DEDUPLICATION GENERAL......... DEPLOYMENT............. GUIDELINES.................. Symantec NetBackup deduplication general deployment guidelines Who should read this

More information

HP StoreOnce: reinventing data deduplication

HP StoreOnce: reinventing data deduplication HP : reinventing data deduplication Reduce the impact of explosive data growth with HP StorageWorks D2D Backup Systems Technical white paper Table of contents Executive summary... 2 Introduction to data

More information

Protect Microsoft Exchange databases, achieve long-term data retention

Protect Microsoft Exchange databases, achieve long-term data retention Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...

More information

Understanding the HP Data Deduplication Strategy

Understanding the HP Data Deduplication Strategy Understanding the HP Data Deduplication Strategy Why one size doesn t fit everyone Table of contents Executive Summary... 2 Introduction... 4 A word of caution... 5 Customer Benefits of Data Deduplication...

More information

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard UNDERSTANDING DATA DEDUPLICATION Tom Sas Hewlett-Packard SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

Disk Backup Appliances: The Next Generation

Disk Backup Appliances: The Next Generation Disk Backup Appliances: The Next Generation Prepared by TechRepublic exclusively for CONTENTS: Introduction... 3 Tape: 60 Years of History... 3 Why Tape Fails Us... 5 Disk: Tape s Replacement... 6 Disaster

More information

CIGRE 2014: Udaljena zaštita podataka

CIGRE 2014: Udaljena zaštita podataka CIGRE 2014: Udaljena zaštita podataka Žarko Stupar Product Manager zstupar@mds.rs "" 1 Agenda Udaljena zaštita podataka - pristup Replikacija podataka između data centara Napredna backup rešenja Replikacija

More information

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR PERFORMANCE BRIEF CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR INTRODUCTION Enterprise organizations face numerous challenges when delivering applications and protecting critical

More information

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007 Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication February 2007 Though data reduction technologies have been around for years, there is a renewed

More information

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication PRODUCT BRIEF Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication NOTICE This Product Brief contains proprietary information protected by copyright. Information in this

More information

Technology Fueling the Next Phase of Storage Optimization

Technology Fueling the Next Phase of Storage Optimization White Paper HP StoreOnce Deduplication Software Technology Fueling the Next Phase of Storage Optimization By Lauren Whitehouse June, 2010 This ESG White Paper was commissioned by Hewlett-Packard and is

More information

SPECIAL REPORT. Data Deduplication. Deep Dive. Put your backups on a diet. Copyright InfoWorld Media Group. All rights reserved.

SPECIAL REPORT. Data Deduplication. Deep Dive. Put your backups on a diet. Copyright InfoWorld Media Group. All rights reserved. SPECIAL REPORT Data Deduplication Deep Dive Put your backups on a diet Copyright InfoWorld Media Group. All rights reserved. Sponsored by 2 Data deduplication explained How to reduce backup overhead and

More information

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing 1 Maximize Your Virtual Environment Investment with EMC Avamar Rob Emsley Senior Director, Product Marketing 2 Private Cloud is the Vision Virtualized Data Center Internal Cloud Trusted Flexible Control

More information

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Backup and Recovery: The Benefits of Multiple Deduplication Policies Backup and Recovery: The Benefits of Multiple Deduplication Policies NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change

More information

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation Introduction to Data Protection: Backup to Tape, Disk and Beyond Michael Fishman, EMC Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT UNPRECEDENTED OBSERVABILITY, COST-SAVING PERFORMANCE ACCELERATION, AND SUPERIOR DATA PROTECTION KEY FEATURES Unprecedented observability

More information

Protecting enterprise servers with StoreOnce and CommVault Simpana

Protecting enterprise servers with StoreOnce and CommVault Simpana Technical white paper Protecting enterprise servers with StoreOnce and CommVault Simpana HP StoreOnce Backup systems Table of contents Introduction 2 Technology overview 2 HP StoreOnce Backup systems key

More information

We look beyond IT. Cloud Offerings

We look beyond IT. Cloud Offerings Cloud Offerings cstor Cloud Offerings As today s fast-moving businesses deal with increasing demands for IT services and decreasing IT budgets, the onset of cloud-ready solutions has provided a forward-thinking

More information

Data Deduplication and Corporate PC Backup

Data Deduplication and Corporate PC Backup A Druva White Paper Data Deduplication and Corporate PC Backup This Whitepaper explains source based deduplication technology and how it is used by Druva s insync product to save storage bandwidth and

More information

EMC BACKUP AND RECOVERY SOLUTIONS

EMC BACKUP AND RECOVERY SOLUTIONS EMC BACKUP AND RECOVERY SOLUTIONS Backup to the future BRS PARTNER UPDATE Sofia, March 14 th, 2011 horia.constantinescu@emc.com dumitru.taraianu@emc.com 1 Agenda EMC backup and recovery solutions Backup

More information

Barracuda Backup Deduplication. White Paper

Barracuda Backup Deduplication. White Paper Barracuda Backup Deduplication White Paper Abstract Data protection technologies play a critical role in organizations of all sizes, but they present a number of challenges in optimizing their operation.

More information

Redefining Microsoft Exchange Data Management

Redefining Microsoft Exchange Data Management Redefining Microsoft Exchange Data Management FEBBRUARY, 2013 Actifio PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft Exchange Data Management.... 3 Virtualizing

More information

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data Actifio Big Data Director Virtual Data Pipeline for Unstructured Data Contact Actifio Support As an Actifio customer, you can get support for all Actifio products through the Support Portal at http://support.actifio.com/.

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information