Journal of Information & Computational Science 7: 12 (2010) 2403 2409 Available at http://www.joics.com The Planning Method for Disaster Recovery Program of Digital Business Based on HSM-III Model Shenghui Zhao School of Information Management, Wuhan University, Wuhan 430072, China Abstract In this paper, based on analysis of objectives, hierarchy and cost of disaster recovery system of digital business, the author proposes HSM-III model which referenced from Hierarchical Storage Management model HSM-I and HSIM-II. The main idea of HSM-III is hierarchical disaster recovery management, that is, different levels of business disaster recovery requirements and disaster recovery programs should matched properly in order to gain highest performance at the lowest cost of disaster recovery programs. A relevant example is also provided to demonstrate the application of this model. The model proposed in this paper can be used in various organizations to help strategic thinking for their digital business disaster recovery system. Keywords: Disaster Recovery Program; Hierarchical Storage Management; HSM-III 1 Introduction Digital business refers to a series of management activities or business processes within specific organization which must done by means of computers, networks and other modern information technologies. In the field of business continuity, the word disaster has special meaning. As defined by Disaster Recovery Journal (DRJ), disaster is the incidents which can lead to total or parts of the critical business functions of the organization can not provided at scheduled time [1]. Disaster recovery refers to the combination of technical means and management measures which can ensure specific data, information systems or business processes is restore after the occurrence of disaster as soon as possible.disaster recovery runs at the last layer of defense system which can maintain continuous operation of the digital business that supported by information systems through rapid business data or business system recovery. How to obtain the best outcome at the lowest cost of disaster recovery program, it is the issue that every organization should considered, which requires great deal of effort to plan business disaster recovery program at strategic level. Corresponding author. Email address: foolbirdzsh@gmail.com (Shenghui Zhao). 1548 7741/ Copyright 2010 Binary Information Press December 2010
2404 S. Zhao /Journal of Information & Computational Science 7: 12 (2010) 2403 2409 2 Basic Principles of Digital Business Disaster Recovery Planning The objectives of disaster recovery system are used to be measured by two indicators [2]: RTO (Recovery Time Object), and RPO (Recovery Point Object). RTO refers to recovery time objective, it is the tolerated time from the stop point of business function to the recovery point (the speed requirement of recovery activity). RPO refers to recovery point objectives, it is requirements of time point that system must to be recoveried, it measures the tolerate volume of data loss after disaster (the requirement of recovery integrity). Based on the difference of business, RTO/RPO of them can be set at the levels such as seconds, minutes, hours, days, weeks, different levels are different in the way of realization [3]. According to international standard SHARE 78, the recovery ability of disaster recovery program can be divided into seven levels, they are [4] (Shown in Fig. 1): (1) Tier0 (no off-site data). (2) Tier1 (Pickup Truck Access Method). (3) Tier2 (PTAM+Hot Site). (4) Tier3 (Electronic Vaulting). (5) Tier4 (Active Secondary Site). (6) Tier5 (Two-Site, Two-Phase Commit). (7) Tier6 (Zero Data Loss). Cost Tier6 Tier5 Tier4 Tier3 Tier2 Tier1 Tier0 Tiers O T1T2T3 T4 T5 T6 T7 RTO/RPO Fig. 1: The relationship between disaster recovery programs and cost Digital business disaster recovery is a complex issue, the program with highest level of cost can not ensure best performance at most of time. Reasonable or scientific program must try to gain the balance point among disaster recovery objectives, technology choices, and overall cost (TCO). For specific organization, the best disaster recovery program is the intersection of cost curve and requirement curve (loss caused by business interruption), shown in Fig. 2.
S. Zhao /Journal of Information & Computational Science 7: 12 (2010) 2403 2409 2405 Value Recovery cost Recovery requirement O T m T k T n RTO/RPO Fig. 2: The balance of disaster recovery costs and requirements So, the basic principles of digital business disaster recovery planning is to analyze the characteristics of various types of digital businesses, for their disaster recovery requirements, select reasonable portfolio programs of disaster recovery level (Tiers), to achieve the objectives of business disaster recovery (RTO/RPO), at the lowest total cost of disaster recovery system [5] [6]. 3 Proposition and Principle of HSM-III Model 3.1 Proposition of HSM-III Model HSM, namely, hierarchical storage management, its basic idea is unified management of organizational storage resource, different levels of storage requirements can be satisfied by appropriate levels of storage resources, to achieve the best performance of overall system at the lowest cost. Related research to hierarchical storage management focused on two levels: the computer storage system level, mainly at the lowest cost of hardware to maximize computing performance [7]. Another lies on organizational business data storage, mainly according with the frequency of business data using, to planning storage programs among life cycle [8]. In order to distinguish them, the former is called HSM-I theory while the latter is named HSM-II theory. By the enlightenment of HSM-I and HSM-II, the ideas of hierarchical storage management is introduced into digital business disaster recovery study, to build a disaster recovery management model based on hierarchical management philosophy. That is, to achieve the highest effect of disaster recovery at the lowest cost, the core objective is to improve overall performance. For their consistency, hierarchical disaster recovery management theory is called HSM-III model in this paper. 3.2 Construction of HSM-III Model According to the idea of HSM-III, digital business is combined by a large number of sub-businesses, the characteristics of various types of business are different. In this paper, it can be classified from two dimensions: functional value and IT dependence (or manual substitutability).
2406 S. Zhao /Journal of Information & Computational Science 7: 12 (2010) 2403 2409 Firstly, according to the core functions of business, it can be classified into follow categories: (1) Critical Business, refers to core business functions of organization.if they are interrupted, core values of the institution will not be able to play, resulting in great loss; (2) Major Business, refers to important functions of organization.their interruption will affect the realization of core values of institution business, and damage to the organization at certain extent; (3) General Business, refers to normal functioning of organization. Their interruption will leads to a modest effect on core functions of business running, can cause minor damage to organization [9]. Secondly, according to IT dependence, they may also be classified to three categories (shown in Fig. 3): (1) Can not be substituted by manual operations. Business under any circumstances can not be substituted by manual operation, unless business processes with same function to substitute the damaged original process can be found, otherwise business process functions can not be performed manually. Therefore, the type of business has very low tolerance for interruption. (2) Can be substituted by manual operation in a short term. Business in most cases can not be substituted by manual operation, but after the interruption, business operations can done manually within a short period of time, if system can not be recovery within the given period, disaster loss will continue to increase. (3) Can be substituted by manual operations. Operations in all cases can be substituted manually. During the occurrence of a digital disaster, business execution can be handled manually.howeverm, the efficiency and accuracy of manual processing is less than digital businesses, the organization would still have some kinds of loss. Loss Can not be substituted by manual operations Can be substituted by manual Operation in a short term Can be substituted by manual operations Time of interruption Fig. 3: Business classification based on IT dependence [4] Thirdly, according to the above two dimensions, it can be combined to form a more detailed business classification matrix, shown in Fig. 4. According to the classification matrix, business disaster recovery requirements can be divided into three different levels [10]: (1) Businesses must running continually. It includes the types of Critical Business with Can t Manual Substituted (I) and Major Business with Can t Manual Substituted (IV). This kinds of business is very weak in disasters tolerance, business interruption will cause great loss mostly.
S. Zhao /Journal of Information & Computational Science 7: 12 (2010) 2403 2409 2407 Critical Business I II III Major Business IV V VI General Business VII VIII IX Can t Manual Substituted Short-term Manual Substituted Can Manual Substituted Fig. 4: Classification matrix of digital business (2) Businesses require rapid recovery. It includes the types of Critical Business with Short-term Manual Substituted (II) and Major Business with Short-term Manual Substituted (V). Such businesses can be substituted by manual operation at certain period of time, but the loss would increase if time is over such period, thus, they require rapid recovery. (3) Businesses can tolerate interruption. It includes the types of Critical Business with Can Manual Substituted (III), Major Business with Can Manual Substituted (VI), General Business with Can t Manual Substituted (VII), General Business with Short-term Manual Substituted (VII), General Business with Can Manual Substituted (IX). The impact of disaster towards these businesses is relatively small, they have sufficient time for system recovery. However, the prerequisite is that they have the ability to recovery, such as the backup of necessary business data. 3.3 Principle of HSM-III Model As shown in Fig. 5, RTO and RPO are showing linear positive correlation in disaster recovery system, the higher of RTO is set, corresponding RPO should also set at a higher point. So, the recovery ability can be measured by one objective of them, usually is the RTO [10]. Reference to disaster recovery white paper issued by China Information Center of IBM, usual value of three levels in disaster recovery system as follows (Fig. 5): (1) The first level, businesses must to run continually. 0<RTO <1hour, tiers can achieve above requirements are Tier6, Tier5, Tier4; (2) The second level, businesses require rapid recovery. 0<RTO <24hour, tiers can achieve Cost Tier6 Tier5 Must running continually Tier4 Tier3 Require rapid recovery Tier2 O Interruption tolerance Tier1 Tier0 1hour 24hour 48hour RTO Fig. 5: The principle of HSM-III model [4]
2408 S. Zhao /Journal of Information & Computational Science 7: 12 (2010) 2403 2409 above requirements are Tier3, Tier2; (3) The third level, businesses can tolerate interruption. 0<RTO<48hour, tiers can achieve above requirements are Tier1, Tier0. 4 Example of HSM-III Model Application Telecommunications services include traditional voice services, internet access services, corporate connectivity and remote access, and other data services. For effective management of varies telecommunications services, companies of telecom sector have developed appropriate information system to support these businesses relied on computers and networks [1]. Thus, once digital disaster occurs, the critical data of their business will facing the risk of losing, customer s communications services would not be guaranteed, this is bound to cause enormous economic loss and decline credibility.in accordance with principles of disaster recovery, digital business disaster recovery planning process based on HSM-III model as follows: Code Step 1. Digital businesses identification. Step 2. Business feature profiling. Main task of this phase is to evaluate businesses according their functional importance and IT dependence. Step 3. Disaster recovery requirement analysis. Reference from digital business classification matrix in Fig.4 to obtain various types of business disaster recovery requirements, such as shown in Table 1. Table 1: Telecom Digital Business Disaster Recovery Requirements Analysis Name of Business Business Functional Business IT Disaster Recovery importance independence Requirement T1 Billing Critical Business Can t Manual Substituted Must Running Continually T2 Accounting Treatment Critical Business Can t Manual Substituted Must Running Continually T3 Customer Services Critical Business Short-term Manual Substituted Require Rapid Recovery T4 Comprehensive Settlement Major Business Can t Manual Substituted Must Running Continually T5 Statistical Analysis General Business Short-term Manual Substituted Interruption Tolerance T6 Accounting Management Critical Business Can t Manual Substituted Must Running Continually T7 Credit Control Critical Business Can t Manual Substituted Must Running Continually T8 Systems Management Major Business Can t Manual Substituted Must Running Continually T9 Acquisition Critical Business Can t Manual Substituted Must Running Continually T10 Online Instruction Critical Business Can t Manual Substituted Must Running Continually T11 Scheduling General Business Can Manual Substituted Interruption Tolerance Step 4. The classification of disaster recovery requirement for telecom businesses. Reference to the conclusions drawn from the analysis in Table 1 and Fig. 4, telecom business could be divided into three groups, shown in Table 2. 5 Conclusion Essentially speaking, disaster recovery planning is to pursue greatest extent of possible balance between disaster recovery requirements with disaster recovery ability on the whole. The idea is,
S. Zhao /Journal of Information & Computational Science 7: 12 (2010) 2403 2409 2409 Table 2: Digital business disaster recovery program Level Name Business Codes Disaster Recovery Object Program S1 Must Running Continually T1, T2, T4, T6, Tier4 or Tier5 0<RTO <1 hour T7, T8, T9, T10 or Tier6 S2 Require Rapid Recovery T3 0<RTO <24hour Tier2 or Tier3 S3 Interruption Tolerance T5, T11 0<RTO <48hour Tier0 or Tier1 for specific business disaster recovery requirements, select reasonable level of disaster recovery portfolio (Tiers), to meet the RTO/RPO under premise of the lowest total cost (TCO). In this paper, based on principles of digital business disaster recovery, hierarchical disaster recovery management model (HSM-III) referenced from HSM-I and HSM-II is constructed. It s advantage lies in the perspective of digital business, bypassing the difficulty and uncertainty in disaster risk analysis, taken the relationship of total costs and disaster recovery requirement into account at same time to achieve optimal performance of the overall disaster recovery program. Acknowledgement This work is supported by Burg Information Technology Research Institute of Xi an Burg International Data Group Co. Ltd of China. References [1] LI Man etc.theory and application of information systems disaster survivability, People s Posts and Telecom Press, Beijing, 2007 [2] [America] John William Toigo, LIAN Yi-feng, PANG Nan translated, Disaster recovery planning, Electronic Industry Press, Beijing, 2004 [3] [America] Roopendra Jeet Sandhu, ZHANG Rui-ping translated, Information disaster recovery planning, Tsinghua University Press, Beijing, 2004 [4] IBM China Information Support Center, Disaster recovery white paper, Accessed at: http://bbs. 51cto.com/thread-21639-1.html, Agust 15, 2010 [5] LIU Hongfa, TANG Hong, Network storage and disaster recovery technology, Electronic Industry Press, Beijing, 2008 [6] State Council Information Office of China, Important Information Systems Disaster Recovery Guide, April, 2005 [7] ZHANG Jiwen, principles of computer composition, Tsinghua University Press, Beijing, 2004 [8] LIU Jiazhen, Electronic record management theory and practice, Science Press, Beijing, 2003 [9] [America] Michael Miller, JIANG Jinlei translated, Cloud computing, Mechanical Industry Press, Beijing, 2009 [10] LIN Xiaoming, Balance of RTO and RPO, China Information (Newspaper), May 23, 2009