OceanStor 9000 InfoProtector Technical White Paper Issue 01 Date 2014-02-13 HUAWEI TECHNOLOGIES CO., LTD.
Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Huawei Technologies Co., Ltd Address: Website: Email: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of China http://support.huawei.com/enterprise ChinaEnterprise_TAC@huawei.com 2014-6-24 Huawei Confidential Page i
Contents 1 Overview of NAS System Data Protection... 1 1.1 Existing NAS System Data Protection Technologies... 1 1.2 Data Protection Overview... 2 2 InfoProtector Overview... 3 2.1 Working Principles... 3 2.2 InfoProtector Implementation... 5 3 InfoProtector Highlights... 7 4 Acronyms and Abbreviations... 8 2014-6-24 Huawei Confidential Page ii
1 Overview of NAS System Data Protection 1.1 Existing NAS System Data Protection Technologies RAID Replication RAID + Replication Nowadays, data volume is growing rapidly around the globe. According to an IDC survey, the global data volume generated by 2020 will be eight times larger than that generated by 2012. Data is important to enterprises. Loss of critical data may bring a devastating blow, or even bankruptcy to the enterprises. In this case, enterprises desire storage systems with solid reliability, especially for critical applications. NAS systems are widely adopted in most enterprises for data storage. At present, NAS systems support three data protection modes: RAID, replication, and RAID + replication. Many NAS systems do not have their own data protection mechanisms and can only depend on RAID protection of the bottom-layer SAN devices. However, the traditional RAID technology has its inherent defects, such as low disk utilization and inferior error tolerance (for example, RAID 5 only accommodates the failure of one disk, and RAID 6 and RAID-DP accommodate the failures of two disks). As disk capacity grows, the time required for RAID reconstruction increases and the probability of disk failures rises during the reconstruction, greatly driving the risk of data loss. In some NAS systems, the copy-on-write method is used to implement data protection. Copy-on-write means that multiple copies of data are generated while data is being written. Specifically, when data is written into a file system, one or more copies of data are replicated to other disks. If one copy is damaged, other copies continue providing services. One disadvantage of this method lies in low disk utilization. Even only one copy is generated, the system utilization is lower than 50%. Some NAS systems perform replication at the file system level, and create RAID groups using the bottom-layer SAN, providing multi-level data protection. This mode ensures solid data reliability and can accommodate the failures of multiple disks at the same time without data loss, but it leads to low disk utilization. 2014-6-24 Huawei Confidential Page 1
1.2 Data Protection Overview HUAWEI OceanStor 9000 is a fully symmetrical and distributed storage system specific to big data. With the advanced Share-Nothing architecture, the OceanStor 9000 delivers the ultra-large single file system and the extensive scale-out capability, providing storage and sharing for unstructured data resources. The OceanStor 9000 is applicable to a variety of big data application scenarios such as broadcasting & television, satellite mapping, DNA sequencing, energy exploration, and research & education. The OceanStor 9000 uses the innovative erasure code (EC) policy for data protection and the Reed-Solomon algorithm for data check. Up to four copies of data are allowed to fail at the same time without affecting system operating. Users can configure N+M (the value of M ranges from 1 to 4) data redundancy modes based on site requirements. Different configurations lead to different disk utilization. The OceanStor 9000 achieves up to 95% disk utilization. 2014-6-24 Huawei Confidential Page 2
2 InfoProtector Overview 2.1 Working Principles By using the virtualization technology, HUAWEI InfoProtector divides file data into stripes of the same size, and checks the stripped data before writing them into the system. File data and parity data are written to disks on different nodes. When a disk or a node fails, the system can recover the data using the parity data, avoiding data loss. Administrators can configure different redundancy levels based on the file importance. The highest redundancy level is +4. That is, when four disks on which the same data block resides fail at the same time, data is still accessible. InfoProtector supports N+1 to N+4 data protection policies. N indicates the number of data copies, as shown in the following figure: Figure 2-1 OceanStor 9000 N+M data protection 2014-6-24 Huawei Confidential Page 3
If the number of nodes is smaller than N+M, the N+M:B protection mode can be configured, indicating that the failures of M disks or B nodes do not affect the service operating. This mode is helpful when the number of nodes is small. The N+M:B data protection technology is depicted in the following figure: Figure 2-2 OceanStor 9000 N+M:B data protection Users only need to specify +M or +M:B for directories. The OceanStor 9000 automatically selects the most appropriate N (the maximum value for N is 16) based on the number of the current nodes. Users can change the redundancy configuration based on site requirements. The new configuration is valid only for newly created directories. The following table lists the N+M or N+M:B protection modes that correspond to different configurations and number of nodes. The percentage in brackets indicates the overhead of each configuration. Table 2-1 OceanStor 9000 N+M Number of Nodes +1 +2 +3 +4 +2:1 +3:1 3 2+1 (66%) 4+2:1 (66%) 4+3:1 (57%) 6+4:1 (60%) 4+2:1 (66%) 4+3:1 (57%) 4 3+1 (75%) 4+2:1 (66%) 4+3:1 (57%) 6+4:1 (60%) 6+2:1 (66%) 8+3:1 (72%) 5 4+1 (80%) 4+2:1 (66%) 4+3:1 (57%) 6+4:1 (60%) 8+2:1 (80%) 12+3:1 (80%) 6 4+1 (80%) 4+2 (66%) 4+3:1 (57%) 6+4:1 (60%) 12+2:1 (85%) 12+3:1 (80%) 2014-6-24 Huawei Confidential Page 4
Number of Nodes +1 +2 +3 +4 +2:1 +3:1 7 4+1 (80%) 4+2 (66%) 4+3:1 (57%) 6+4:1 (60%) 12+2:1 (85%) 8 4+1 (80%) 4+2 (66%) 4+3:1 (57%) 6+4:1 (60%) 12+2:1 (85%) 9 4+1 (80%) 4+2 (66%) 4+3 (57%) 6+4:1 (60%) 16+2:1 10 6+1 (85%) 6+2 (75%) 6+3 (66%) 6+4 (60%) 16+2:1 11 6+1 (85%) 6+2 (75%) 6+3 (66%) 6+4 (60%) 16+2:1 12 8+1 8+2 (80%) 8+3 (72%) 8+4 (66%) 16+2:1 13 8+1 8+2 (80%) 8+3 (72%) 8+4 (66%) 16+2:1 14 8+1 8+2 (80%) 8+3 (72%) 8+4 (66%) 16+2:1 15 8+1 8+2 (80%) 8+3 (72%) 8+4 (66%) 16+2:1 16 12+1 (92%) 12+2 (85%) 12+3 (80%) 12+4 (75%) 16+2:1 17 12+1 (92%) 12+2 (85%) 12+3 (80%) 12+4 (75%) 16+2:1 18 12+1 (92%) 12+2 (85%) 12+3 (80%) 12+4 (75%) 16+2:1 19 12+1 (92%) 12+2 (85%) 12+3 (80%) 12+4 (75%) 16+2:1 20 16+1 (95%) 16+2 16+3 16+4 (80%) 16+2:1 2.2 InfoProtector Implementation The OceanStor 9000 supports three types of disks: SSD, SAS, and SATA. Disks of the same type form storage pools. Administrators can write data in files to different storage pools according to requirements on file access speeds and capacity. The file data is divided into fixed-size strips. N copies of data strips form a stripe and are checked using the Reed-Solomon algorithm to obtain M copies of parity disks. 2014-6-24 Huawei Confidential Page 5
1. Data blocks and check blocks are respectively written to different disks. One write operation is completed only when N or more than N copies of data blocks are successfully written. The data blocks that have not been written can be recovered during the data recovery process. 2. If a disk or node fails, the system triggers the data recovery process, computes the data blocks that need to be recovered using EC, and writes the data blocks to other normal disks. The highest redundancy level allows a maximum of four copies made for the same data. That means, data will not be lost even though four disks are faulty at the same time. 2014-6-24 Huawei Confidential Page 6
3 InfoProtector Highlights High Efficiency Robust Reliability High Utilization InfoProtector divides data into fixed-size strips. N copies of data strips form a stripe and are checked using the EC algorithm to obtain M copies of parity data. The N+M data blocks can be written into different DSs concurrently, increasing write speeds. If a disk or node fails, the damaged data must be recovered. During the recovery, the OceanStor 9000 divides data on one disk into data objects and recovers these data objects to different disks, thereby implementing concurrent recovery. The reconstruction time is greatly shortened, and the data is recovered at a speed of 1 TB/hour. By using the patented EC technology, InfoProtector provides the unparalleled reliability, ensuring that no data is lost even though four disks are damaged at the same time. Based on data importance and performance requirements, a variety of solutions are available to meet users' specific requirements. Based on the number of nodes and the number of redundant copies (M), InfoProtector determines the stripe width N, ensuring high reliability while increasing system space utilization. InfoProtector provides a maximum of 90% space utilization, far higher than systems working based on other data protection policies such as RAID or replication. 2014-6-24 Huawei Confidential Page 7
4 Acronyms and Abbreviations NAS SAN RAID SSD SAS SATA EC Network Attached Storage Storage Area Network Redundant Array of Independent Disk Solid-State Disk Serial Attached SCSI Serial Advanced Technology Attachment Erasure Code 2014-6-24 Huawei Confidential Page 8