With Verified Erasure Coding Superior Availability Integrity Performance Economy 1
Critical Big Data requirements are: High Data Availability (survive multiple drive failures even during rebuild) Perfect Data Integrity (eliminate silent data corruption even during rebuild) High Performance (minimize RAID rebuild time impacts) High Economy (both money and time savings) Big Parity meets these requirements Extends traditional RAID5(N+1) and RAID6(N+2) into N+(3:127) Eliminates Silent Data Corruption and Silent Data Corruption Amplification 7-30x the performance of Open Source Erasure Coding libraries Extends traditional Erasure Coding with Verified Erasure Coding 2
Big Parity is a C language Erasure Coding library Compatible with Linux (GCC), Mac OS X (GCC) and Windows (Intel) Implements Verified Erasure Coding Fully tested, verified and supported with multiple pending patents Early adopters of Erasure Coding systems include: Oracle/Sun Dell/Compellent ZFS: 3 parity drives NEC HydraStor: 3 parity drives EMC/Isilon: 4 parity drives Amplidata: 4 parity drives CleverSafe: 6 parity drives Big Parity offers 7-30x performance of any competing Erasure Coding System Big Parity is the only library to offer Verified Erasure Coding (patent pending) Big Parity is the only library to eliminate Silent Data Corruption Amplification 3
Compared with leading Open Source solutions Published results are from highly respected academic sources Comparisons include Jerasure, Luby, Zooko and Cleversafe Multiple runs to verify stability of results Results Big Parity offers 7-30x performance advantage 4
Erasure Coding Relative Performance Decoding MB/Sec 4000 3500 3000 2500 2000 1500 1000 500 0 14,2 12,4 10,6 Parity Configuration (Data Drives, Parity Drives) Big Parity Jerasure CRS Jerasure RS Luby Cleversafe Zooko 5
Each additional Parity drive increases data availability by an order of magnitude System can survive additional failures seamlessly Much more reliable than hot spares Unlike hot spares, pre-computed Parity drives are not contingent on successful reconstruction Parity drive requirements increase as a log function of data drives, not a linear function Bigger RAID groups have fewer components and are more reliable than multiple small groups 6
Silent Data Corruption is a well documented occurrence in Big Data Systems Network Appliance found ~1% of disk drives had Silent Data Corruption http://www.usenix.org/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf Additional Parity drives means Silent Data Corruption can be eliminated Mathematical sums are used to validate data and correct errors 100% Reliable detection and correction Even during reconstruction 7
Assume a RAID6 system has two failed drives that are reconstructing Any SDC Error from any drive will be amplified and recorded on BOTH reconstructing drives permanently without the possibility of detection or recovery Big Parity eliminates both Silent Data Corruption and Silent Data Corruption Amplification Patent pending technique extends Erasure Coding into Verified Erasure Coding All other Erasure Coding Systems suffer from Silent Data Corruption Amplification 8
Using additional parity drives means reconstruction can be deferred No need to load system with reconstruction while critical applications are running N extra parity drives means 1/(N+1) as many reconstructions required For example, 1 extra parity drive means ½ as many reconstructions Using additional Parity drives means delays can be eliminated Drives often have long delays during recovery operations, which can accumulate over time and delay applications Additional Parity drives can eliminate those delays by reconstructing delayed data 9
Larger RAID groups mean fewer total disk components Saves power, packaging, cooling and interconnect for each saved disk Additional parity disks means fewer service events N extra parity drives means 1/(N+1) as many disk service events For example, 1 extra parity drive means ½ the number of disk service events 10
Authored by a veteran RAID designer specifically as a Verified Erasure Coding solution for RAID systems Not an academic work, a meticulously engineered production level solution Simultaneously supports older (Vandermonde) based Erasure Codes as well as newer (Lagrange) based Erasure Codes Easy to extend existing RAID6(N+2) systems into N+3 No need to rewrite any existing data or parity information Seamless upgrade in place for older customers needing additional protection Full support to migrate in place RAID6(N+2) into N+(3:127) Single library can read and verify old codes and then write new codes without disrupting user data 11
Higher data availability Orders of magnitude more protection than existing RAID5/6 strategies Increased data integrity with Verified Erasure Coding Elimination of Silent Data Corruption in all forms Improved performance Fewer reconstructions required Errant drive latencies eliminated Increased Economy Fewer total components Fewer service events Backwards compatible 12
2 Parity Drives for Data Integrity Used to eliminate Silent Data Corruption, especially during reconstruction 1 Parity Drive for Unrecoverable Read Errors Likely to occur in large systems with large drives 1 Parity Drive for Performance Large systems are likely to have drives with high latencies 3 Parity Drives to reduce Reconstruction Events Fewer service events required 7 Parity Drives in Total Guaranteed data integrity, predictable performance and economical service More parity drives decrease service requirements even further Which costs more, an additional disk drive or N additional service events? 13
Software or Hardware solution Both are highly accelerated and mathematically proven correct C language solution for all major Operating Systems Linux, Windows, OS X and Solaris Kernel or User level C and Verilog solution for FPGA/ASIC Backwards compatible with older RAID6 codes Allows update in place of older codes to newer codes Very simple interface only 4 total functions required Solve, Generate, Regenerate, Update 14
Developed by world leading mathematicians and engineers over a period of 4 years Formal mathematical proofs of correctness Extensive test validation matrix Up to 127 Data drives and 127 Parity drives Larger versions in development Multi-Gigabyte/Second performance with near linear scaling using standard X86 cores Patent Pending technology tested at >25x performance of ZFS RAIDZ3 and >30x performance of Jerasure Integration and support resources available Both object and source code licensing available Per Unit, Per Site, Per Year or One Time Fee 15
Reduce your Time to Market Leverage 4 calendar years and >50 man years of development Offer important reliability upgrade to your existing customers Be first to deliver best of breed data protection to your customers Eliminate risk to your development engineering schedule License this fully tested solution with source code and documentation Mathematical proofs of correctness vetted by world class professors Experienced engineering talent ready to support integration Ongoing support, improvements and product updates Arm your salespeople with the latest technology RAID5 and RAID6 are already being displaced Contact us Today Sales dmcdonell@streamscale.com Engineering manderson@streamscale.com 16
Superior Availability Integrity Performance Economy 17