Doc. Code OceanStor VTL6900 Technical White Paper Issue 1.1 Date 2012-07-30 Huawei Technologies Co., Ltd.
2012. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Huawei Technologies Co., Ltd. Address: Website: Email: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of China http://www.huawei.com support@huawei.com Tel: 0755-28560000 4008302118 Fax: 0755-28560111 iii
OceanStor Contents Contents 1 Executive Summary... vii 2 Introduction... 2-1 3 Solution... 3-1 3.1 Deduplication... 3-1 3.1.1 Introduction to Deduplication... 3-1 3.1.2 Deduplication Principle of the VTL6900... 3-2 3.2 High Availability Cluster... 3-5 3.2.1 Introduction to the High Availability Cluster... 3-5 3.2.2 High Availability of the VTL6900... 3-6 3.3 IP Replication... 3-8 3.4 Tape Caching... 3-12 3.4.1 Data Migration Policies... 3-13 3.4.2 Space Reclamation Policy... 3-14 3.5 Tape Encryption... 3-15 3.6 Energy Saving and Consumption Reduction... 3-16 3.6.1 Energy Saving by Deduplication... 3-16 3.6.2 Energy Saving by Disk Spin-Down... 3-16 4 Experience... 4-1 4.1 Typical Application and Benefit to Customers... 4-1 4.1.1 VTL Backup System... 4-1 5 Conclusion... 5-1 6 Acronyms and Abbreviations... 6-1 v
OceanStor 1 Executive Summary 1 Executive Summary This document describes the problems facing mid-range and high-end customers when they select virtual tape library (VTL) products. By analyzing the key technical features, application scenarios, solutions, and customer values of the VTL6900 virtual tape library (the VTL6900 for short), this document comes into the conclusion that the VTL6900 can properly solve the problems facing high-end, mid-range and low-range customers. vii
OceanStor 2 Introduction 2 Introduction With the explosive data increase, the backup and recovery speeds of traditional tape backup systems cannot meet the backup requirements of customers. As disk technologies develop rapidly, the capacity of disk media greatly increases, and the price of disk storage devices significantly decreases in the unit capacity. Backup systems that use Serial Advanced Technology Attachment (SATA) disks are widely deployed in IT environments of customers and gain favor from customers with the high backup and recovery performance. VLT products are representatives of the backup systems. VTL products inherit advantages such as high performance, easy maintenance, and cutting-edge mature media management of disk devices. Therefore, they keep a good development trend since the launching and have bright market prospects. For mid-range and high-end customers, the data amount involved in one time of full backup may reach tens of terabytes. Provided that the backup window is eight hours, the backup system must provide a backup rate of at least 1000 Mbit/s. Provided that incremental backup is performed once a day, full backup is performed once a week, and the storage period is three months, the storage capacity of backup devices must reach hundreds of terabytes. The construction, operation, and management of such a large-scale disk backup system require large investments in storage, power consumption, and management. In addition, mid-range and high-end customers have high requirements for the availability of both production systems and backup systems. Common single-engine VTL products cannot meet the high availability requirements. To sum up, problems facing mid-range and high-end customers when they select VTL products include low performance, insufficient capacity, high power consumption, and low availability of backup systems. 2-1
OceanStor 3 Solution 3 Solution The VTL6900 is oriented at mid-range and high-end customers. It uses deduplication, high availability clusters, and disk spin-down to solve the problems facing customers. This chapter describes three features of the VTL6900, including deduplication, high availability clusters, IP replication,tape caching and energy saving and consumption reduction. 3.1 Deduplication 3.1.1 Introduction to Deduplication The deduplication technology eliminates repetitive data through hardware or software to reduce the occupied storage space. In a backup system, data is transferred from a backup client (source end) to a backup device (destination end) under the control of the backup server. Based on the location where deduplication is performed, deduplication is divided into the deduplication at the source end and deduplication at the destination end. In the deduplication at the source end, a backup client processes data before transferring the data to a backup device. It aims to eliminate repetitive data and send unique data to the backup device. In the deduplication at the destination end, a backup client transfers data to a backup device without processing the data. After receiving the data, the backup device performs deduplication. Based on the time when deduplication is performed, deduplication at the destination end is divided into in-line deduplication and post processing deduplication). In the in-line deduplication, a backup device performs deduplication while receiving backup data. The deduplication is complete when the backup data reception is complete. In the post processing deduplication, a backup device performs deduplication after the backup is complete, that is, the backup device receives all backup data and then performs deduplication against the received backup data at a specified time point. With a common purpose of eliminating repetitive data, all types of deduplication have to compare new data with existing data to determine whether the new data is repetitive. Repetitive data can be identified using two methods: index-based comparison and content-based comparison. The former one identifies repetitive data by comparing data indexes, while the latter one identifies repetitive data by directly comparing data. The system divides data into fixed-length or variable-length data blocks. A unique value is calculated for each data block based on the specified algorithm, and the unique value is the index of the corresponding data block. The index storage space is much smaller than the data block storage 3-1
3 Solution OceanStor space. Therefore, index-based comparison can be directly performed within the memory space, providing a higher efficiency than content-based comparison. At present, index-based comparison has been widely applied to various deduplication technologies. 3.1.2 Deduplication Principle of the VTL6900 The VTL 6900 supports both post processing deduplication and in-line deduplication. SIR stands for single instance repository. Post Processing Deduplication Index-based comparison is applied to the post processing deduplication supported by the VTL6900. Figure 3-1 Principle of post processing deduplication In post processing deduplication, the VTL6900 software consists of two modules: VTL module and SIR module. The storage space of the VTL6900 is logically divided into two parts, which are respectively occupied by the VTL module and SIR module. The VTL storage space is also called cache, and the SIR storage space is also called repository. After receiving the backup data, the VTL6900 stores it in the VTL storage space. The SIR module obtains the backup data at a specified time point, when the backup is complete, or when the storage level reaches the specified value, and compares the backup data with existing data blocks in the SIR storage space. The SIR module divides original backup data into kilobyte-size data blocks and calculates a hash value (also called an index) for each data block based on the secure hash algorithm 1 (SHA-1). By comparing the hash values of new data blocks and those of existing data blocks, the SIR module identifies repetitive data blocks. Repetitive data blocks will be discarded, with pointers to the data blocks retained. Unique new data blocks will be stored in the SIR storage space. See Figure 3-1. On the VTL6900, the physical machine that runs VTL software module applications and the VTL software module applications constitute the VTL engine. The physical machine that runs SIR software module applications and the SIR software module applications constitute the SIR engine. The VTL engine must be configured to provide external VTL services. The SIR engine is optional and is used to provide the deduplication function. The VTL engine and SIR engine can be integrated into one engine or deployed independently. When an independent SIR engine is configured, the VTL6900 can support a maximum of three SIR clusters that work in 2+1 redundancy mode. 3-2
OceanStor 3 Solution Figure 3-2 Original virtual tape data on the VTL6900 Before the SIR engine performs deduplication, backup data is stored on virtual tapes in the VTL storage space, as shown in Figure 3-2. After the deduplication is complete, data stored on the virtual tapes is replaced with pointers, and the virtual tapes are called virtual index tapes (VITs). Each pointer points to a single-instance data block in the SIR storage space. The released VTL storage space is used to store new backup data. See Figure 3-3. Figure 3-3 Data distribution on the VTL6900 after the deduplication Figure 3-4 Usage of VTL6900 storage space in post processing deduplication As described in the preceding part, in post processing deduplication, the storage space of the VTL6900 is logically divided into the VTL storage space and SIR storage space, that is, cache and repository. The SIR storage space is further divided into the SIR data disk and SIR index disk. See Figure 3-4. The SIR data disk is used to store unique data blocks after the 3-3
3 Solution OceanStor In-line Deduplication deduplication. New backup data stored in the cache is compared with data blocks on the SIR data disk, to identify repetitive data. The SIR index disk is used to store the indexes (namely SHA-1 hash values) of all data blocks on the SIR data disk. The capacity of the SIR index disk increases with the capacity of the SIR data disk. The SIR module reads all indexes (index table) from the SIR index disk and writes them to the memory of the SIR engine, providing quick search in the index table. The required memory capacity of the SIR engine increases with the capacity of the SIR index disk. Index-based comparison is applied to the in-line deduplication supported by the VTL6900. Figure 3-5 shows the in-line deduplication process. Figure 3-5 Principle of in-line deduplication In in-line deduplication, the VTL6900 software consists of two modules: VTL module and SIR module. When the VTL6900 receives backup data, the In-line Parser divides the original backup data into kilobyte-size data blocks and calculates a hash value (also called index) for each data block by using the SHA-1 algorithm. Meanwhile, the SIR module compares the hash values of new data blocks and those of existing data blocks to identify repetitive data blocks. Repetitive data blocks will be discarded, with pointers to the data blocks retained. Unique new data blocks will be stored in the SIR storage space. 3-4
OceanStor 3 Solution Figure 3-6 Usage of VTL6900 storage space in inline deduplication If errors occur during the backup in in-line deduplication, most backup software will write the backup data to a new tape. If errors occur at the beginning of the backup in in-line deduplication, post processing deduplication will be triggered instead. 3.2 High Availability Cluster 3.2.1 Introduction to the High Availability Cluster If a node in a high availability cluster becomes faulty and cannot work properly, another node in the high availability cluster will take over the work of the faulty node. A high availability cluster consists of an active node and standby nodes. The active node is a node that is executing tasks. A standby node is a backup of the active node. When the active node becomes faulty, a standby node will take over the work of the active node. A high availability cluster is implemented based on resource switchover. The resource refers to the collection of information concerning the work taken over by a standby node when the active node in a high availability cluster becomes faulty. The standby node operates properly after taking over the resource of the faulty node, minimizing the impacts on the client. The monitoring and takeover of the resource is implemented based on high availability software. All operating systems provide high availability software, and most vendors develop their own high availability software. High availability software enables standby nodes to monitor the status of the active node. Once a fault is detected, a standby node forcibly takes over the work of the active node and continues to provide services. 3-5
3 Solution OceanStor This section is prepared by referring to the Storage Overview. 3.2.2 High Availability of the VTL6900 Both VTL and SIR software modules of the VTL6900 support the high availability feature. That is, the high availability design is applied to both the VTL engine and SIR engine. High Availability of the VTL Engine The VTL6900 supports two VTL engines, which can work in bidirectional active-standby mode (bidirectional failover). Figure 3-7 High availability configuration of the VTL engines As shown in Figure 3-7, when VTL A and VTL B operate properly, both of them provide external VTL services. In unidirectional failover mode, provided that VTL A is the active node and VTL B is the standby node, VTL B monitors the status of VTL A and takes over the work of VTL A if VTL A becomes faulty. However, if VTL B becomes faulty, VTL A does not take over the work of VTL B. In bidirectional failover mode, VTL A and VTL B monitor each other. If either of them becomes faulty, the other VTL takes over the work of the faulty VTL and provides external VTL services. As shown in Figure 3-7, VTL A consists of the VTL A software application and VTL A physical machine, and VTL B consists of the VTL B software application and VTL B physical machine. If the work of VTL A is taken over by VTL B, the VTL A software application that runs on the VTL A physical machine before the takeover will run on the VTL B physical 3-6
OceanStor 3 Solution machine after the takeover. After the takeover, the VTL A software application accesses the original storage memory (VTL storage memory) through the shared storage and continues to provide VTL services for the original backup server through the standby host port. The shared storage functions as follows: Both the VTL A physical machine and VTL B physical machine are connected to the storage unit of the VTL6900 through physical channels. When running on the VTL B physical machine, the VTL A software application can still access its original storage memory by using the physical channel of the VTL B physical machine. The standby host port functions as follows (take the fiber channel [FC] as an example): VTL A provides external VTL services by using FC port Target wwpn1. Correspondingly, VTL B provides an FC port Standby wwpn3 as a standby port for Target wwpn1. In actual configurations, FC ports Target wwpn1 and Standby wwpn3 are connected to the backup server of the VTL A through FC channels. When running on the VTL B physical machine, the VTL A software application can still provide VTL services for its original backup server by using port Standby wwpn3. In bidirectional failover mode, VTL A and VTL B software applications monitor the status of each other through the heartbeat network. If either of them cannot provide external VTL services properly due to software, hardware, or channel (connected to the storage unit) faults, failover will be automatically triggered. The normal node will take over the work of the faulty node. The takeover process takes about four minutes. After the faulty node recovers, failback will be triggered automatically (or manually), and the takeover terminates. High Availability of the SIR Engine The VTL6900 supports three SIR engines. Two of the SIR engines function as active nodes, and the rest one functions as the standby node. Figure 3-8 High availability configuration of the SIR engines As shown in Figure 3-8, high availability configurations of SIR engines are different from those of VTL engines. A standby SIR engine is configured. The SIR software application on the standby SIR engine monitors the status of the active SIR engines through the heartbeat network. When SIR 1 (or SIR 2) becomes faulty, for example, SIR 1 (or SIR 2) cannot provide deduplication services due to software, hardware, or physical channel (connected to the SIR storage memory), failover is automatically triggered. In this case, the standby SIR engine powers off SIR 1 through an Intelligent Platform Management Interface (IPMI) instruction and takes over the work of SIR 1. The SIR 1 software application will run on the 3-7
3 Solution OceanStor standby SIR physical machine. After SIR 1 recovers, it automatically functions as the standby SIR engine after being powered on. In the high availability configurations of SIR engines, all SIR engines can access the SIR storage memory through physical channels. Each SIR engine can access its own storage unit and the storage units of other SIR engines. This ensures that all SIR software applications can access their own storage units when they run on any SIR physical machines. In addition, all SIR engines are interconnected with VTL engines through physical channels. Therefore, SIR engines can read original backup data from the VTL storage memory for deduplication, and VTL engines can read data from the SIR storage memory for recovery. Generally, the physical connections of all SIR engines are the same, ensuring free switchover between the active and standby nodes. Moreover, SIR software modules also provide the high performance feature. In high availability configurations, three SIR engines of the VTL6900 work in 2+1 active-standby mode. During actual operation, the two active SIR engines perform deduplication. They constitute a high performance cluster, improving the deduplication performance and efficiency. The VTL6900 also supports the configuration of one SIR engine or two SIR engines. In this case, the SIR engines do not provide the high availability feature. When two SIR engines are configured, they constitute a high performance cluster. 3.3 IP Replication Replication is a common technology used for disaster recovery. Data replication refers to copying data from one medium onto another medium and generating a data copy by using the data replication software. The traditional disaster recovery generally uses the transportation method. The backup software copies data onto a physical tape library, and the physical tape library is transported to a remote place for preservation. During the transportation, tapes may get lost or damaged; thus, the effect of disaster recovery cannot be ensured. Over an IP network, the local VTL6900 copies data on virtual tapes to the remote VTL6900. Through this method, the VTL6900 utilizes the convenience and high speed of the network to save the transportation cost. The local VTL6900 encrypts the tape data by using the encryption algorithm before data transfer. Then the remote VTL6900 decrypts the data after receiving it. As a result, the data security during transfer is ensured. The VTL6900 provides four options for the IP replication: Remote Copy Automatic Replication IP Replication Replication upon De-duplication. Among the four options, three support automatic replication and one supports manual replication. Table 3-1 lists the four options of IP replication. 3-8
OceanStor 3 Solution Table 3-1 Four options of IP replication O p t i o n A u t o R e p l i c a t i o n R e m o t e C o p y I P R e p l i c a t i o n T y p e A u t o m a t i c M a n u a l A u t o m a t i c Description When a virtual tape is exported from the VTL, the system automatically copies the data on the virtual tape to another VTL6900. The data on a virtual tape is copied to another VTL as required. Within the specified interval and according to the user-defined policy, the changed data on the primary virtual tape is copied to the same or another VTL. 3-9
3 Solution OceanStor O p t i o n R e p l i c a t i o n u p o n D e - d u p l i c a t i o n T y p e A u t o m a t i c Description When the de-duplication function is enabled, the deletion policy is integrated with the replication policy. The changed data is copied to another VTL6900 according to the replication policy. These four options differ mainly in the replication triggering mechanism. Auto Replication is triggered by the backup software. If the VTL is set Auto Replication, the replication of the virtual tape is triggered when the VTL receives the eject command from the backup software (For a physical tape library, the eject command for the backup software means to eject the tape out of the physical tape library; for a virtual tape library, this command means to put the virtual tape into the virtual vault). Remote Copy is triggered manually. The user can copy the data on the selected disk to the VTL6900 in the disaster recovery center. Then, the VTL6900 in the disaster recovery center allocates the space equal to that of the source tape to the target disk, and sets the same barcode. When the copy is complete, the system automatically promotes the disk to the virtual vault of the remote VTL6900 for future use. Through the Remote Copy function, the whole virtual tape can be copied to the remote VTL6900, without the need of creating a new virtual tape in the remote VTL6900. Before the copy, any virtual tape 3-10
OceanStor 3 Solution in the remote VTL6900 must not have the same name as any virtual tape in the local VTL6900. IP Replication is triggered based on the policy. The policy can be: Data increment-based replication policy. The VTL6900 can identify the amount of the data backed up to the tape each time. If the data increment exceeds the pre-set threshold, the replication is automatically triggered after the copy. Time point-based replication. The user can specify the time point for the first replication and the replication interval for each virtual tape. Then, the data on the virtual tape will be copied according to the specified time point. The remote virtual tape that adopts IP Replication must be promoted manually before use. Replication upon De-duplication is manually triggered based on the policy. The triggering condition can be the specific date or time point, or upon the completion of the backup operation. The local VTL6900 transfers the data after de-duplication to the remote VTL6900 over an IP network. After de-duplication, data blocks instead of data are transferred during the IP replication. The bandwidth occupation decreases and the transfer efficiency increases. As a result, the remote data-level disaster recovery can be implemented with low costs, easy deployment, and high efficiencies. The remote IP replication has the following scenarios: One VTL6900 copies data to the remote VTL6900. Figure 3-9 Networking of one-to-one remote disaster recovery Multiple VTL6900s copy data to the remote VTL6900. 3-11
3 Solution OceanStor Figure 3-10 Networking of many-to-one remote disaster recovery 3.4 Tape Caching Tape Caching is an advanced function of the VTL6900. This function uses the high-speed VTL6900 as the high-speed cache of the physical tape library. The backup data is written to the VTL6900 first. After the backup operation is complete, the VTL6900 migrates the backup data to the physical tape library according to the preset policy. In this way, the hierarchical storage architecture forms. The VTL6900 can shorten the backup window and quickly recover data. Physical tape libraries are suitable for large-capacity offline data. Therefore, the VTL6900 can be combined with physical tape libraries to implement the hierarchical storage. The principles of the hierarchical storage include: The data that needs to be archived for a long time is stored on the physical tape libraries. The frequently-used data is stored in the VTL. The VTL takes over the physical tape libraries. Physical tape libraries have the slow backup speed and disks are unsuited for seldom-accessed data for a long time. The hierarchical storage eliminates the shortcomings of physical tape libraries and disks. Figure 3-11 shows the networking of the hierarchical storage. 3-12
OceanStor 3 Solution Figure 3-11 Networking of the hierarchical storage Data can be recovered directly from the VTL or physical tape library. To fully utilize the high-speed cache, the VTL6900 provides various migration triggering policies and space reclaiming policies. 3.4.1 Data Migration Policies Tape Caching provides two policies for triggering data migration between the VTL6900 and the physical tape library: 1) time-based migration; 2) intelligent migration. Table 3-2 and Table 3-3 list the two policies. Table 3-2 Time-based migration policy Policy Name Certain time point each day Certain time point each week Description Migration is performed in a one-day cycle. The VTL6900 starts data migration at the specified time point each day. Migration is performed in a one-week cycle. The VTL6900 starts data migration at the specified time point each day from Monday to Saturday. Table 3-3 Intelligent migration policy Policy Name And/Or Description Conjunction/disjunction of the intelligent policy. The option And means migration is triggered only when all conditions are met; or means that migration is triggered when any condition is met. Data period storage Migration is triggered when the backup data is stored on the VTL6900 for a specified period. 3-13
3 Solution OceanStor Policy Name Watermark After backup (tape space used out) Postponed to a certain time point Description Migration is triggered when the usage of the disk space of the VTL6900 reaches 90%. Migration is triggered after each backup. "Tape space used out" is the additional policy for "after backup". If the two options are chosen, the VTL6900 checks the usage of the virtual tape when a virtual tape is ejected out of the tape drive. If the space of this tape is used out, migration is triggered. Migration is postponed to a specific time point after the condition is met this time. This policy must be used together with the preceding three policies. When the condition of any preceding policy is met, migration can be postponed to a specific time point. The time-based migration policy and intelligent migration policy cannot be used simultaneously. For the time-based migration policy, "Certain time point each day" and "Certain time point each week" cannot be used at the same time. The user can only select either for the condition of triggering migration. Multiple options of the intelligent policy can be chosen simultaneously. The options can be combined to meet different requirements of migration. 3.4.2 Space Reclamation Policy To fully utilize the cache, the VTL6900 provides two space reclamation policies to ensure the space utilization: 1) intelligent reclamation; 2) reclamation upon de-duplication. Table 3-4 lists the reclamation methods. Table 3-4 Reclamation methods Policy Name Intelligent reclamation Reclamation upon de-duplication Description The space occupied by the virtual tapes of the VTL6900 used as the cache is reclaimed. That is, the data on these virtual tapes is deleted and only the indexes to the physical tapes are reserved. Through the de-duplication algorithm, the duplicate data is deleted to release the storage space of the VTL6900. Table 3-5 lists the methods of triggering space reclamation. Table 3-5 Methods of triggering space reclamation Policy Name Description 3-14
OceanStor 3 Solution Policy Name Immediate reclamation Watermark Description After the migration is complete, the space originally occupied by the migrated data is reclaimed. When the remaining disk space accounts for less than 10% of the total space, the space originally occupied by the migrated data is reclaimed. This trigger method is available only under intelligent reclamation. Users do not need to worry about data loss. The VTL6900 only reclaims the space originally occupied by the migrated data. The space occupied by the other data will not be reclaimed. Thus, the data security and consistency are ensured. 3.5 Tape Encryption To ensure the security of the data stored on tapes, the VTL6900 encrypts tapes when data is transferred to physical tape libraries. Figure 3-12 Tape encryption The tape encryption function of the VTL6900 uses the 256-bit Advanced Encryption Standard (AES) encryption algorithm. The user can create one or more tape keys to encrypt the data exported to physical tapes and decrypt the data imported to virtual tapes. The data on the tape library is inaccessible unless the correct key has been used to decrypt the data. Moreover, the user can set passwords for each key. Only when the correct password is provided can the key name, password, and password hint be changed and can the key be deleted and exported. When data is being exported to a physical tape library or during the IP replication, the user can employ a created key to encrypt the data, thus ensuring the security of the tape data. Even if tapes are lost or stolen or data packets are intercepted during the transportation, the user does not need to worry the data security. If the correct key is not used, the data on tapes are totally inaccessible. 3-15
3 Solution OceanStor 3.6 Energy Saving and Consumption Reduction According to the statistics made by the International Energy Agency (IEA) in 2008, global energy consumption had increased by 73% from 1973 to 2006. Since 1970s, the price of energy has been increasing, causing a cost pressure in industry and manufacturing. Therefore, the whole society attaches more importance to energy saving. In the context of energy saving promotion, users pay more attention to the energy saving effect of storage products. At present, many organizations and enterprises consider the energy saving performance as an indispensable factor for selecting storage products. To a certain extent, the energy saving performance determines the success or failure of a product. 3.6.1 Energy Saving by Deduplication Deduplication significantly reduces the storage investments, management costs, and power consumption of storage systems for users. Provided that the amount of data to be stored is the same, users can save more data in the same storage capacity by using deduplication. With the same storage density, decrease in the storage capacity indicates decrease in the number of the storage devices required, which therefore reduces the power consumption. 3.6.2 Energy Saving by Disk Spin-Down In addition to deduplication, disk spin-down can also reduce the power consumption of storage systems. The operating principle of disk spin-down is as follows: Disks that are not accessed in a long time enters the spin-down or even power-off state, for the purpose of saving energy and extending the service life of disks. On storage devices that use the disk spin-down technology, disks without read/write operations are in the spin-down state, and disks with read/write operations are in the running state. If read/write operations are performed on the disks in the spin-down state, the disks will be spun up and enter the running state. After the read/write operations are complete, disks in the running state will enter different levels of spin-down states as required. Therefore, the disk spin-down technology applies to large-capacity near-line storage devices and tiered storage devices that provide a low access frequency and require low instant availability for data. It especially applies to data backup and archiving devices oriented at data recovery, for example, VTL devices. For a VTL device that supports disk spin-down, within the hours of backup window each day, all disks on the VTL device are in the running state under extreme conditions. In the rest time of the day, all disks on the VTL device are in the spin-down state. The VTL6900 supports disk spin-down which is described as follows: Disks in storage units of the VTL6900 are classified into several redundant array of independent disks (RAID) groups. Users set different disk spin-down policies, for example, scheduled spin-down and spin-down in a specified time period, for each RAID group as required. If no read/write operation is performed on a disk in a RAID group configured with disk spin-down, the disk will be spun down or powered off based on the policy. After the backup software (for example, NetBackup) sends a read/write request to the logical unit number (LUN) in a spun-down RAID group, the powered-off disk in the RAID group will be automatically powered on, and the RAID group allows normal read/write access. By using the disk spin-down technology, the VTL6900 reduces power consumption of the storage units by 40%. The power consumption of the storage units accounts for 40% of the overall power consumption of the VTL6900. Therefore, the disk spin-down technology helps VTL6900 reduce the overall power consumption by 16%. 3-16
OceanStor 4 Experience 4 Experience The VTL6900 supports clustered VTL and SIR engines, providing the backup performance of 8600 MB/s and a raw storage capacity of 2304TB. In addition, the VLT6900 provides deduplication and disk spin-down, meeting the requirements of mid-range and high-end users for high performance, large capacity, energy saving and consumption reduction, and high availability. This chapter describes the VTL6900 solution and benefit to customers in two typical application scenarios: VTL backup system and remote backup system. For details about the solution and benefit to customers in other application scenarios such as tiered backup and remote disaster recovery, see the Technical White Paper for the VTL6900. 4.1 Typical Application and Benefit to Customers 4.1.1 VTL Backup System Application Scenario Solution The VTL backup system applies to the following scenarios: A backup system needs to be built: No backup system is available or the existing backup system needs to be improved, and therefore the customer needs to select new backup devices. The VTL6900 is used to replace the existing physical tape library in the backup system of a customer: The customer has constructed a backup system using the physical tape library. However, the physical tape library needs to be replaced with a new backup device for superior performance, reliability, and management. The VTL6900 is used as a backup device. 4-1
4 Experience OceanStor Benefit to Customers The VTL6900 is connected to the backup server through FC storage area network (SAN) or IP SAN. The VTL backup system provides the following benefits for customers: High performance, meeting customers' requirements for the backup window: When only one VTL engine is configured, the VTL6900 provides the backup performance of 9TB/hr and can back up 63 TB data within eight hours. When two engines are configured, the VTL6900 provides the backup performance of 31TB/hr and can back up 239 TB data within eight hours. The VTL6900 meets the backup window requirements of customers whose data amount in one backup does not exceed 239 TB. Large capacity, meeting customers' requirements for the storage capacity: When two VTL engines are configured, the VTL6900 provides a maximum raw capacity of 2304 TB, with an available capacity of 1690 TB. When the deduplication function is configured, the VTL6900 provides a maximum capacity of 220 TB for storing the deduplication data. Provided that the deduplication ratio is 20:1, the VTL6900 can store 4 PB backup data. This meets the backup capacity requirements of mid-range and high-end customers. Deduplication, reducing power consumption and investment in storage: The VTL6900 supports deduplication and disk spin-down. This significantly reduces the disk storage capacity required by the backup system and therefore reduces the power consumption and investment in storage. High availability clustering configuration, meeting high availability requirements of customers: When two VTL engines are configured for the VTL6900, the two VTL engines usually work independently. When one VTL engine becomes faulty and cannot provide backup services, the other VTL engine takes over the work of the faulty VTL engine. The takeover process requires four minutes. The VTL6900 operates properly after the takeover. This meets customers' high availability requirements for the backup system. When three SIR engines are configured for the VTL6900, the SIR engines work in 2+1 active-standby mode. When one of the active SIR engines becomes faulty and cannot provide deduplication services, the standby SIR engine takes over the work of the faulty SIR engine. The takeover process requires four minutes. The VTL6900 operates properly after the takeover. This meets customers' high availability requirements for the backup system. 4-2
OceanStor 4 Experience Application Scenario Solution Remote Backup System The remote backup system applies to the following scenarios: Network bandwidth resources of a customer are insufficient. Besides the data stored at the data center, data on multiple remote branch nodes of a customer needs to be backed up. No backup system is available for a customer. Alternatively, the customer has a backup system, but the available bandwidth resources used between branch nodes and the data center are insufficient. As a result, the branch nodes provide low data backup performance and cannot meet customers' requirements for the backup window. Figure 4-1 shows the networking for the remote backup system. Figure 4-1 Remote backup system Benefit to Customers An integrated or All In One VTL6900 device that supports deduplication is deployed on each branch node. The VTL6900 that supports deduplication is configured in high availability mode at the data center. For details, see the VTL6900 Product Description. Data on the branch nodes is backed up to the local integrated or All In One VTL6900 device and then replicated to the VTL6900 at the data center by using the deduplication-based remote replication function. Data at the data center is backed up to the local VTL6900. The remote backup system provides the following benefits for customers: 4-3
4 Experience OceanStor Local VTL backup, meeting customers' backup window requirements for the branch nodes: Data on the branch nodes is backed up to the local integrated or All In One VTL6900 device. This is an application scenario of the VTL backup system. For details about the benefits to customers, see section 4.1.1 "VTL Backup System." VTL products provide the high performance feature, for example, the integrated VTL6900 device provides a maximum backup rate of 2.34TB/hr, and the All In One VTL6900 device provides a maximum backup rate of 9TB/hr. Therefore, even if the amount of backup data on the branch nodes reaches 63 TB, the backup time does not exceed eight hours. This meets the customers' backup window requirements for the branch nodes. Deduplication-based remote replication, significantly reducing customers' requirements for the network bandwidth and investment in the network bandwidth: Backup data on the branch nodes is deduplicated on the local VTL and then replicated to the VTL at the data center through the wide area network (WAN). During the replication, only deduplicated data blocks not stored at the data center are transferred. This significantly reduces the bandwidth required for the replication compared with the method of transferring backup data without deduplication, reducing customers' requirements for the network bandwidth and investment in the network bandwidth. Global deduplication, further reducing customers' investment in storage: The VTL6900 supports global deduplication. During the replication, only deduplicated data blocks not stored at the data center are transferred. In this way, repetitive data in the VTLs of branch nodes and that in the VTLs of branch nodes and the VTL at the data center is eliminated. Compared with the deduplication applied in the VTL backup system (for details, see section 4.1.1 "VTL Backup System"), global deduplication provides a higher efficiency. This further reduces customers' investment in storage. 4-4
OceanStor 5 Conclusion 5 Conclusion The VTL6900 supports deduplication, disk spin-down, and high availability clustering of VTL engines and SIR engines. Therefore, it can properly solve the problems facing mid-range and high-end customers in terms of low performance, insufficient capacity, high power consumption, and low availability of backup systems. The VTL6900 meets the backup window requirements of customers whose data amount in one backup does not exceed 239 TB. Provided that the deduplication ratio is 20:1, the VTL6900 can store 4 PB backup data. This meets customers' requirements for the backup capacity. By using deduplication and disk spin-down, the VTL6900 reduces the power consumption of customers by over 50%. The VTL6900 supports high availability clustering of VTL engines and SIR engines. The failover requires four minutes. This meets customers' high availability requirements for the backup system. 5-1
OceanStor 6 Acronyms and Abbreviations 6 Acronyms and Abbreviations Table 6-1 Acronyms and abbreviations related to the VTL6900 Acronym or Abbreviation VTL SATA SIR FC IPMI LUN RAID Full Name Virtual tape library Serial Advanced Technology Attachment Single instance repository Fiber channel Intelligent Platform Management Interface Logical unit number Redundant array of independent disks 6-1