Tiered Data Protection Strategy Data Deduplication Thomas Störr Sales Director Central Europe November 8, 2007
Overland Storage Tiered Data Protection = Good = Better = Best! NEO / ARCvault REO w/ expansion ULTAMUS RAID 1200 ULTAMUS RAID 4800 REO w/ compression REO 9500D REO DM (Coming) Backups take too long Backups fail too often Too much human error In backup Ensuring remote site backups Reliable backups to RAID using backup software Managing data growth Improving speed of data recovery Looking for longer data retention/archive Looking for longer data retention/archive on disk Offsite storage / better Disaster Recovery
Why Deduplicate Data? To keep more backup data nearline by reducing the amount of physical disk space required to store multiple full and incremental backups To enable remote data strategies by reducing the amount of data that has to be transmitted across a WAN
The Problems Deduplication Solves (and Doesn t Solve) Use deduplication: To increase the number of full and incremental backups available nearline To reduce the amount of physical storage required to hold backups For greener data centers As part of a Tiered Data Protection strategy As part of a WAN acceleration and DR strategy Deduplication will not: Always speed up backups Always speed up full volume restores Eliminate backup rotation and data expiration strategies Replace tape as the ultimate archive media
How Data Deduplication Performs Data Reduction step 1 Provide a fine-resolution means to map a large disk repository into as small an index footprint or data dictionary as possible (this is a technologically challenging but important dimension of data deduplication) step 2 Analyze new incoming data (that is sent to the disk repository) against the index, to determine whether elements of the new data already exist in the repository step 3 Store only data that does not already exist in the repository by referencing the already-existing identical elements of the data that are in the repository step 4 Update the index with all new elements of data that are stored in the repository so the new elements can be the source of referencing for subsequent data streams. Overland s REO 9500D employs a unique deduplication process to accomplish these steps, thus achieving a combination of superior data deduplication ratio, performance, capacity and 100% data-integrity.
REO 9500D Deduplication Ratio Typically 12:1 to 25:1, but highly dependant on environmental factors; 1. Rate of data change More change to data means less deduplication can occur 2. Backup policy Backing up more similar systems means more deduplication can occur 3. Retention Keeping more backups means greater opportunity for deduplication Promising a specific deduplication ratio is risky business!
How Deduplication Reduces the Amount of Data Stored or Transferred First full backup Description Incremental backups Subsequent full backups Aggregate over time weekly full, daily incremental Aggregate over time daily full backups Typical compression ratios 3 10 to 1 6 7 to 1 50 60 to 1 20 to 1 60 to 1 20,000 18,000 16,000 14,000 GB 12,000 10,000 8,000 Virtual capacity Physical capacity 6,000 4,000 2,000 0 1 2 3 4 5 6 7 8 9 10 11 12 Source: ESG Lab Report on Deduplication Time Periods
Deduplication Architecture 1: In-line In-line Data is deduplicated as data is written to disk. Requires the most processing power, but makes the most efficient use of disk space. Backup speed is dependent on processing power. Ideal for applications in which disk space reduction is the primary consideration.
Deduplication Architecture 2: Post-Processing Post-Processing Data is written to disk without deduplication. Deduplication is performed as a post process with deduplicated data written to disk. Requires less processing power because it can be done off-peak, but is less efficient with disk space. Backup done at full speed. Ideal for less wall-clock-intensive applications.
File- or Block-Based Deduplication File-level deduplication is similar to incremental backup in which a small difference in data causes the entire file to be considered unique Block level deduplication adds sub-file level granularity to data deduplication
REO 9500D Data Deduplicating VTL Long-Term Data Retention Made Simple
REO 9500D Deduplicating VTL Appliance Long-term data retention on disk Ease of implementation Flexible configurations Up to 12 VTLs, 64 virtual tape drives, and 3,000 cartridges Dual 4 Gb FC to host Quad hot-swap power supply Redundant cooling Available in 3.75 TB and 7.5 TB configurations
REO 9500D Deduplication The 9500D performs in-line deduplication using a proprietary, non hash-based technique In-line deduplication ensures the most effective use of available storage The entire index (data dictionary) is stored in memory, avoiding the delays associated with disk access to the index
Deduplication Implementations Deduplication as a VTL or NAS: what s the difference? VTL: Dedicated deduplicating VTL offers the ease of implementation with backup software and performance advantages similar to traditional VTL over NAS NAS: Either requires a VTL front-end (like Data Domain) or backup to NAS folders. Performance and scalability limitations typical of non-deduplicating NAS. The catalogs of some backup software offerings (like Symantec NetBackup) do not integrate seamlessly with NAS Disk Storage Units.
REO 9500D is a VTL The REO 9500D is a deduplicating VTL, not NAS folders or NAS with a VTL front end. The REO 9500D arrives ready to be configured as a VTL. No additional integration steps are required. VTL configuration wizard guides the user through setup. REO 9500D Configuration is User-Definable:
Deduplication Approaches Deduplication at the application server or VTL: which is better? Host-based: An agent in installed on all application servers. The agent looks for duplicate data and only sends unique data to the backup server. Requires fundamental changes to the backup infrastructure. Backup server and software must be replaced with a deduplicating backup application. Example: Avamar VTL-based: Drops into existing backup infrastructures to start seeing savings within days. The most practical approach for enterprises that can t tear down their entire backup network to implement.
REO 9500D The REO 9500D performs deduplication at the VTL The REO 9500D requires no changes to the backup software, no agent installation, no reconfiguration of application servers Can reduce required storage by ~25:1 Data reduction ratio is dependent on user data patterns and backup policies
REO 9500D Models and Pricing The 9500D requires no add-on accessories or licensing. Model Base Capacity Estimated Capacity @12:1 Estimated Capacity @25:1 MSRP OV-REO101094 3.75 TB 45 TB 93.75 TB $65,400 OV-REO101095 7.5 TB 90 TB 187.5 TB $108,300 MSRP per TB of effective capacity: $1,453/TB (12:1) or $696/TB (25:1) $1,200/TB (12:1) or $575/TB (25:1) *Effective capacity is data and usage dependent
REO 9500D Pricing Detail Part number Raw Usable at 12:1 Usable at 25:1 MSRP US Disti SSV SSV-Par Estimated Street $ OV-REO101094 3.75 45 94 $ 65,432 $ 44,711 $ 44,711 $ 31,300 $ 40,690 OV-REO101095 7.5 90 188 $ 108,286 $ 73,995 $ 73,995 $ 51,800 $ 67,340 OV-REO101096 11.25 135 281 $ 140,062 $ 95,707 $ 95,707 $ 67,000 $ 87,100 SSV-Par per TB of effective capacity Usable Usable SSV-Par $/TB SSV-Par $/TB Part number at 12:1 at 25:1 SSV-Par at 12:1 at 25:1 OV-REO101094 45 94 $ 31,300 $696 $334 OV-REO101095 90 188 $ 51,800 $576 $276 OV-REO101096 135 281 $ 67,000 $496 $238
Tiered Data Protection Data retention time Deduplication enhances REO by greatly improving data retention time on disk TAPE REO 9500D REO w/ Compression REO VTL Backup and restore performance
Choosing the Right REO VTL Factors that determine the right product to fit the need: Amount of data to protect Backup service levels / policy Retention objective
REO VTL Appliances with Hardware Data Compression REO 4500c REO 9100c Overland REO patent-pending Dynamic Virtual Tape - thin provisioning for virtual tape cartridges - soundly defeats Quantum in disk storage space efficiency in hardware-compressed VTLs.
Tiered Data Protection = Good = Better = Best! NEO / ARCvault REO w/ expansion ULTAMUS RAID 1200 ULTAMUS RAID 4800 REO w/ compression REO 9500D REO DM (Coming) Backups take too long Backups fail too often Too much human error In backup Ensuring remote site backups Reliable backups to RAID using backup software Managing data growth Improving speed of data recovery Looking for longer data retention/archive Looking for longer data retention/archive on disk Offsite storage / better Disaster Recovery
Making Tiered Data Protection Easy - Today High speed backup and immediate recovery Archive REO 9100 (Fast backup/recovery) REO 9500D (Deduplicating VTL archive) ARCvault (Physical tape archive) Datacenter
Right Sizing Determine capacity requirement How much data in fulls, incrementals and what retention and rotation strategy? Determine the performance requirement How much data in fulls in what backup window? Match capacity and performance to appropriate model In some cases, user can meet objectives with non-deduplicating REO VTL appliance. In some cases, user can best meet objectives by combining both traditional and deduplication REO VTLs Use the Overland Pre-Sales resources to help fit the right solution
Match Capacity, Performance and Price per TB to Your Customers Needs 110TB 100TB Capacity Range 30TB 20TB 10TB ARCvault/NEO $250/TB ARCvault / NEO REO 9500D Best for: Managing data growth Backups take too Too long much human Backups error fail in too often Best for: Looking for longer data backup Too much backup human to disk Backups fail too often error in retention/archive Managing data growth Managing Data backup Growth products Too much human error in Offsite storage / Better Looking Disaster for longer data Better for: Managing on the Data planet backup Growth Recovery retention/archive on disk Improving speed Looking of data for longer Better data for: Better for: Good for: recovery retention/archive Looking for longer data Improving speed of data Backups take too long recovery Good for: Improving speed retention/archive of data recovery Capacity range: Capacity range: Improving speed of data Looking for longer data 5TB 1600TB 52TB 250TB recovery retention/archive Good for: Cost per TB: Deploying high-performance Good for: Cost per TB: $500 / TB Offsite storage / Better Disaster Recovery applications Deploying high-performance $250 / TB Capacity range: Capacity range: applications 12TB 114TB $500/TB REO 9500D ULTAMUS RAID $750/TB RAID 1200 & 4800 Best for: REO 4500c REO 9100c $1,100/TB $1,300/TB REO Compression REO VTL Without Best for: Compression Backups take Best too for: long The most Backups fail Backups too often comprehensive take too long family of Managing Data Growth Deploying high-performance applications Better for: 3TB 72TB Cost per TB: $750 / TB REO 1500/4500/9100 Capacity range: Cost per TB: 3TB 57TB $1100 / TB Cost per TB: $1300 / TB
REO 9500D Futures Expansion / scalability Use of expansion arrays are in development Replication A policy based replication mechanism for remote site consolidation and DR is nearing completion of development A device-to to-device replication solution for DR is currently in development.
Tiered Data Protection Roadmap 2008 REO with Data Mobility For remote backup consolidation & DR 2007 REO VTL appliance with Deduplication REO VTL appliance with Compression
Making Tiered Data Protection Easy - Tomorrow REO DM 500 (Data mover) Remote office REO 1500 WAN REO DM 100 REO DM 500 (Data mover) WAN REO DM 100 REO 9100 (Fast backup/recovery) REO 9500D (De-dupe disk archive) REO 1500 Remote office Datacenter ARCvault (tape archive) Disaster Recovery site
Validation Lab validation by launch Customer success by launch Customer success by launch Analyst quote by launch IDC white paper by launch
Comments Questions