Disaster Recovery Backup System for P2P Based VoIP Application



Similar documents
An Optimization Model of Load Balancing in P2P SIP Architecture

P2P VoIP for Today s Premium Voice Service 1

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

bcp for a large scale carrier level VoIP system


Big Data Storage Architecture Design in Cloud Computing

Memory Database Application in the Processing of Huge Amounts of Data Daqiang Xiao 1, Qi Qian 2, Jianhua Yang 3, Guang Chen 4

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Fault-Tolerant Framework for Load Balancing System

Scalable Multiple NameNodes Hadoop Cloud Storage System

A Load Balancing Method in SiCo Hierarchical DHT-based P2P Network

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

A Slow-sTart Exponential and Linear Algorithm for Energy Saving in Wireless Networks

Cloud Computing Disaster Recovery (DR)

A Deduplication-based Data Archiving System

A Network Simulation Experiment of WAN Based on OPNET

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

TOPOLOGIES NETWORK SECURITY SERVICES

Research and Application of Redundant Data Deleting Algorithm Based on the Cloud Storage Platform

Multicast vs. P2P for content distribution

Distributed Consistency Method and Two-Phase Locking in Cloud Storage over Multiple Data Centers

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Design of Remote data acquisition system based on Internet of Things

SAN Conceptual and Design Basics

New Cloud Computing Network Architecture Directed At Multimedia

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

A SWOT ANALYSIS ON CISCO HIGH AVAILABILITY VIRTUALIZATION CLUSTERS DISASTER RECOVERY PLAN

Research on P2P-SIP based VoIP system enhanced by UPnP technology

Load Balancing in Structured Peer to Peer Systems

Optimization of Distributed Crawler under Hadoop

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

Smart Queue Scheduling for QoS Spring 2001 Final Report

Research on Job Scheduling Algorithm in Hadoop

Study on Redundant Strategies in Peer to Peer Cloud Storage Systems

International Journal of Advanced Research in Computer Science and Software Engineering

MPLS: Key Factors to Consider When Selecting Your MPLS Provider Whitepaper

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Safety in Numbers. Using Multiple WAN Links to Secure Your Network. Roger J. Ruby Sr. Product Manager August Intelligent WAN Access Solutions

How To Protect Data On Network Attached Storage (Nas) From Disaster

A Distributed Architecture of Video Conference Using P2P Technology

NQA Technology White Paper

Distributed file system in cloud based on load rebalancing algorithm

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Service Quality Assurance Mechanisms for P2P SIP VoIP

Autonomous Fault Detection and Recovery System in Large-scale Networks

Magnus: Peer to Peer Backup System

Cisco WAAS for Isilon IQ

High Availability and Clustering

Load Balancing in Structured Peer to Peer Systems

Backup with synchronization/ replication

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding

Hardware Configuration Guide

Architecture of distributed network processors: specifics of application in information security systems

SCADA. Supervisory Control and Data Acquisition. How to monitor and control your business operation in the most cost-effective way.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Conventionally, software testing has aimed at verifying functionality but the testing paradigm has changed for software services.

Ethernet. Ethernet. Network Devices

Definition. A Historical Example

CDMA-based network video surveillance System Solutions

Multi Protocol Label Switching (MPLS) is a core networking technology that

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

Research Article ISSN Copyright by the authors - Licensee IJACIT- Under Creative Commons license 3.0

Constructing High Quality IP Core Network

Study of Different Types of Attacks on Multicast in Mobile Ad Hoc Networks

VoIP QoS. Version 1.0. September 4, AdvancedVoIP.com. Phone:

Keywords Wimax,Voip,Mobility Patterns, Codes,opnet

International Journal of Applied Science and Technology Vol. 2 No. 3; March Green WSUS

Technical White Paper for the Oceanspace VTL6000

Distributed Software Development with Perforce Perforce Consulting Guide

Monitoring Large Flows in Network

System Infrastructure Non-Functional Requirements Related Item List

A Scheme for Implementing Load Balancing of Web Server

Distributed File Systems

VMware vsphere Data Protection

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

A NOVEL RESOURCE EFFICIENT DMMS APPROACH

IPv4 and IPv6: Connecting NAT-PT to Network Address Pool

The Google File System

Cloud Computing for Agent-based Traffic Management Systems

Deploying VSaaS and Hosted Solutions Using CompleteView

Eudemon8000 High-End Security Gateway HUAWEI TECHNOLOGIES CO., LTD.

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

WHITEPAPER MPLS: Key Factors to Consider When Selecting Your MPLS Provider

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

Influence of Load Balancing on Quality of Real Time Data Transmission*

Portable Wireless Mesh Networks: Competitive Differentiation

Dynamic Adaptive Feedback of Load Balancing Strategy

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper

Research on Operation Management under the Environment of Cloud Computing Data Center

Real-time Protection for Hyper-V

Questions to be responded to by the firm submitting the application

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Analysis of Effect of Handoff on Audio Streaming in VOIP Networks

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Transcription:

Journal of Computational Information Systems 9: 20 (2013) 8099 8109 Available at http://www.jofcis.com Disaster Recovery System for P2P Based VoIP Application Kai SHUANG, Jing XIE State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China Abstract Using P2P network as the distributed data storage network for VoIP system is an Internet hotpot research. How to realize a data backup strategy to reply to data disaster in a VoIP system based P2P network under the premise of carrier-grade demands has not been mentioned in previous researches. This paper analyzed the characteristics of VoIP system. A suitable data backup strategy reply to data disaster is designed and the design proposal for data disaster recovery backup system is described in detail. Finally, the experiments were given to assess the performance of this strategy. Keywords: Data Storage and ; VoIP; P2P; DHT Algorithm 1 Introduction For the past few years, many distributed storage systems emerged. However, there is no existing storage system or strategy mentioned user node as serving node to provide storage. Since user node always has shorter online time and higher probability of crash and abnormally exit than service provided nodes, existing system is not suitable for this scene. The disaster recovery backup strategy discussed in this paper can provide the basic function, such as reliable storage and rapid recovery, in a storage network, which has various nodes as serving nodes including user nodes. Besides that, our strategy ensured the high availability and reliability for data and can meet the five nines carrier-grade demand. This paper is organized as follows. After introduction part, the related works are given. In the third part, the analysis and design of the storage strategy is described, including background analysis, performance objectives and control scheme for data storage. Control scheme for data storage is introduced as three aspects, such as data storage and backup, data recovery and data update. Next is performance assessment for this strategy. The main concerning factors are reliability, recovery latency and bandwidth consumption. Finally, we summarize this paper and give a conclusion. Corresponding author. Email address: shuangk@bupt.edu.cn (Kai SHUANG). 1553 9105 / Copyright 2013 Binary Information Press DOI: 10.12733/jcis6851 October 15, 2013

8100 K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 2 Related Works There are many distributed storage systems at present, such as GFS [1], Bit-Value [2], Dynamo [3] and SandStone [4]. GFS deals meta data with primary server and realizes read/write operation using block data server. Bit-Value is a content-addressable retention platform for large volumes of reference data - seldom changing information that needs to be retained for a long time. It uses smart bricks as the building block to lower the hardware cost. Dynamo is a highly available key-value storage system that some of Amazon s core services use to provide an always-on experience. It is used to manage the state of services that have very high reliability requirements and need tight control over the tradeoffs between availability, consistency, cost-effectiveness and performance. SandStone is another reliable storage system with traffic localization, strong consistency, high availability and scalability. These storage systems all adopt DHT algorithm of key based routing. The first three systems only guarantee the basic performance demand for Internet application, while the solution of SandStone lists the carrier-grade demand as the important factor. However, SandStone mainly used for distributed core network of telecommunication. The storage nodes in that system are stable which are all deployed by service provider. Data storage mechanism has been widely discussed. But there is no strategy applicable for all the scenes. Therefore, we put forward a new data storage and backup strategy used in our specific P2P based VoIP system. 3 Design of Storage and Strategy 3.1 Applicable scenario The disaster recovery backup strategy studied in this paper mainly applies to P2P based VoIP application. All data are stored in distributed network. In the storage network, the serving node in the network contains not only peer, but also client which upgrades as peer. Peer is supposed to be the stable nodes deployed by service provider, such as giant server or ordinary PC. Meanwhile, client is just common user node in VoIP system which usually uses service rather than provides service. Users maybe never have a long online time like nodes deployed by service provider. Besides, since the upgraded peer is actually a user node, it still has many restraining factors on network environment. What s more, the case of crash or exit abnormally for upgraded peers is more likely to happen. Thus, we design a new system for data storage. Considering the geographic distribution pattern of call sessions and service provider s full knowledge in his IP networks, we believe that the layered DHT [5, 6] is the simple but effective way to adopt. The overlay of our system is typically deployed as a two layered DHT, including a global DHT and several regional DHTs. For each region, at least one boundary peer should be chosen to route update messages between regions. They must be deployed by service providers and pre-configured. Every peer in the same region must be aware of this BP (bootstrap peer) while BP is also aware of every peer s current situation in its region. 3.2 System features The disaster recovery backup strategy used in P2P based VoIP application differs from traditional telecom application or common P2P application. It combines the features of both, but has the

K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 8101 distinction which is embodied in the following two aspects. On the one hand, the serving nodes in the network are distributed, eliminating the centralized control. Compared with traditional telecom network, the network of the system in this paper is organized with structured P2P mode. It gets ride of HSS core storage server. In this method, the investment in high performance equipment can reduce and the bottleneck effect on core server can be solved. It makes the serving network more flexible and easier to expand. On the other hand, the logic role of the client changes dynamically. In comparison with the applications in flat P2P network, the client in our system may change to a storage node providing service to achieve lower contributed capital and better network adaption. But the upgrade peer has no guarantee on online time and network connectivity. Therefore, there should be an upper limit for the number of upgrade peers to the number of all serving peers ratio. This requirement can avoid network disruption which may cost more on network maintenance of stability. Considering these two characteristics above, the fault model of disaster recovery backup system is established as follows to ensure reliable data storage. There are 3 kinds of failure model: single node failure model, nonlocal batch nodes failure model and local batch nodes failure model. In single node failure model case, exponential distribution is adopted to set up model which is usually used for reliability analysis. Suppose the failure density function is exponentially distributed. Using P 11 represents the failure rate, which is equal to 1/MT BF. China enforces MT BF of PC is 4000 hours. We assume MT BF equals 1000 hours to leave a sufficient margin for the situations that the system upgrades and bug fix in the actual scene. Thus, the failure rate is calculated as below. P 11 = 1/MT BF = 1/1000 = 0.1% As to another case for single node failure model, upgrade peer exit abnormally, we build the model as follows. Average online time and the proportion of the network need to be considered for the upgrade peer to estimate the probability of abnormal offline. The research data show that ordinary P2P user s online time is about 2.9 hours [7]. With the variable P 12 indicates the probability of abnormal offline for upgrade peer, P 12 = 1/2.9 = 0.3448% Suppose the upper limit for the number of upgrade peers to the number of all serving peers ratio is M/N. Thus, the single node failure rate is calculated below. P 1 = 1 (1 P 11 ) [1 (UP/SP ) P 12 ] We subjectively assume that the possibility for nonlocal batch nodes failure case is exponentially distributed, that means MT BF equals 20000 hours. Using P 2 represents nonlocal batch nodes failure rate. Therefore, P 2 can be calculated as below. P 2 = 1/MT BF = 1/20000 = 0.005% It shows that this situation is extremely rare to happen. Local batch nodes failure often happens in the case of the earthquake or human misconfiguration. In these cases, the closed analysis has little significance. At present, Huawei s HSS use offsite backup method and put them in this province and another province. This method is adopted by our paper. 3.3 Performance goals This disaster recovery backup strategy should meet the basic function of data storage. In addition, there are four performance requirements as follows.

8102 K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 (1) High Availability: Although this system is used in distributed VoIP system, it still has to meet the requirements of carrier-grade five nines. That means the availability should reach 99.999%. Our system will use ordinary PC to achieve the same effect. (2) Scalability: The system should be easy to expand. When the amount of VoIP user data become more and more, the system needs to support the ability to smoothly expand. (3) Cost-effectiveness: Input costs of this system must be lower than the current telecommunications systems and other P2P-based VoIP systems while providing the same performance. This system uses ordinary PCs to provide service. Meanwhile, it allows the user node to upgrade to be a serving node. In this way, it can achieve the lower input and higher dynamic scalability. (4) Self-recovery: When an exception occurs, the self-recovery function of this system will work. The topology recovery is implemented by DHT algorithm. Data recovery relays on the data recovery strategy in this paper. 3.4 Strategy design For high data availability, network coding is used to encode the source data and distribute encoded fragments with original data pieces [8]. And we use data storage strategy ensures reliable data storage and rapidly effective recovery. The key technologies contain the following three parts: the data storage and backup strategy, data recovery mechanism and data update mechanism. The following described these three aspects. 3.4.1 Data replica analysis The primary means of data disaster recovery rely on data redundancy. To ensure reliable data storage, we explain the data storage and backup strategy from two sides: the number and the placement of data replications. First, consider the number of data duplicates. Assume that the failure rate is P. According to the node failure model discussed in section 3.2, we can get the following equation: P = P 1 + P 2 = 1 (1 P 11 ) [1 (UP/SP ) P 12 ] + P 2 (1) Where P 11 = 0.1%, P 12 = 34.48%, P 2 = 0.005%, UP and SP are variable and 0 < UP/SP < 1. The formula above shows the calculation of the total probability of the single node failure and nonlocal batch nodes failure. In these two cases, assuming the number of the data replications is x. Considering the reliability requirements of the system discussed in section 3.1, it should satisfy the following inequality: 1 P x > 99.999% (2) In the former two fault model scenes, the number of replications, x, can be calculated according to the values and the range of the parameters in formula 1, inequality 2. In addition, in the case that the local batch nodes crash, we add a backup outside the domain as the strategy. To sum up, the desired total number of data replications is x + 1. The placement of the data replications is discussed below. In our system, the data is stored in the form of key-value to facilitate the storage and retrieval of user data. The specific way of

K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 8103 placement is as follows: According to the rules of structured P2P resource storage, the primary storage node stores a complete data, x 1 redundancies are stored in the domain, and another one copy of data is placed in another domain. The redundant replications in the domain are placed randomly to deal with the case of simultaneously continuous nodes failure. The replications in another domain are to deal with the local large quantities nodes failure. 3.4.2 Data recovery mechanism To achieve fast and reliable data recovery, this system use parallel recovery mechanism. Under this mechanism, the data in the primary storage node can be recovered by multiple backup nodes in the meantime. It can accelerate the speed of data recovery in the way that multiple backup nodes simultaneously transmit part of data to primary storage node to restore the complete data. The Dynamo system is used for key-value pairs stored in the system. The system used in a distributed network with the annular structure. There is a coordinating node to place a copy of the data on the primary storage node and its N 1 consecutive successor nodes. In order to reduce the influence brought about by the limited recovery resources, Dynamo nodes can join the network many times as the virtual node. At the same time, it will bring the problem of the integration of data traffic. SandStone system proposed by Huawei uses the N:N copy of the replacement policy [4]. N refers to the parallel recovery factor. With the preconfigured number of backups B (e.g. 2), each primary peer will divide its stored data into S = N/B slices, and every slice will be backup by different B peers. The primary peer determines the B backup positions for its i th (starts from 1) slice by the following criteria: A slice + H size ((i 1) B + j)/(n + 1)(i = 1,, S, j = 1,, B) (3) Where, A slice is the begin address of the slice, and H size is the size of hash space. The backup replicas are stored in the successors after each position. Each backup node is only responsible for providing part of the data. The N backup data restore the data in the primary storage node simultaneously, unlike the Dynamo system which provides all data from only one backup node. Our system uses the same mechanism with SandStone for backup data placement to achieve good parallel recovery efficiency. This placement way increases the efficiency of the parallel recovery, and avoids the problem of backup data integration bandwidth wasted. Compare with 1:1 replica placement, every node to provide recovery data in our N:N replica placement method only need to have the capacity of 1/N of the original way. In this way, we can achieve the high cost performance described in section 3.3. 3.4.3 Data update mechanism Data update mechanism is critical to achieve data consistency maintenance. Data update trigger principle contains two ways: the backup node within the real-time data updates, the Outland backup node data is updated regularly. Efficient Remote Data Synchronization Algorithm is used to compare the remote replica and the crucial data, then update the remote replica using the bit difference [9]. Suppose that the parallel recovery factor is L and the replica number is x + 1 in total. The specific update strategy follows as shown in Fig. 1 to Fig. 3, including indexes keep-alive and data update flows.

8104 K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 PrimaryStorageNode (ResponsibleID:0-k) 1_1 No.1inDomain 1_M 1_L x-1_1 No.x-1inDomain x-1_m x-1_l inanother Domain Calculaterandombackup nodesinthisregionandin anotherregion Setupindexfor databackupnodes Startindexheartbeat tokeepalive Ping Ping Ping Fig. 1: The primary storage node keep alive with the backup nodes regularly PrimaryStorageNode (ResponsibleID:0-k) No.1inthesameregion 1_M 1_1 1_L No.x-1inthesameregion x-1_1 x-1_m x-1_l Newdatastoragerequest /Existingdataupdaterequest Store/UpdateSucces Localstorage/update Markadded/updateddata Lookupindexofbackupnodes newdata/updateddata CalculateNode-IDthatshould storethisnew/updateddata Succes newdata/updateddata The Mth segmentdata Localstorage/update Succes Succes CalculateNode-IDthatshould storethisnew/updateddata The Mth segmentdata Localstorage/update Succes Fig. 2: The storage and backup procedure for new data or updated data As shown above, the primary storage node stores the index identity for every backup data replica. And primary storage node keeps alive with all its index nodes by heartbeat. When the index node fails, the timer for keep-alive will be up. Then the primary storage node will select a new index node, and transfer data to the new index to backup data. After that, it updates index information about the backup replicas and keeps alive with the new index node by heartbeat. When the data on the primary storage node is updated, the primary storage node should transmit the updated data the backup nodes to ensure data consistency. Since it is less likely to happen that all the nodes in one region crash at the same time, the backup replica in another region is updated by timer not in real time and only the changed data can be updated. 3.5 Routing To meet the real-time response requirement, the O(logN) lookup performance of traditional DHT must be revisited. Our system use improved one-hop routing method. One-hop routing is based on DHT algorithm. It has great advantages on quick query. But the cost for maintaining rout table is expensive. It should maintain two tables: one is the ordinary DHT routing table (Leafset,

K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 8105 PrimaryStorageNode (ResponsibleID:0-k) inanotherdomain Dataupdateregularlyinotherregion Lookupdataadded/updatedflags Lookupindexofbackupnodesinotherregion New/updateddata Succes Resetdataadded/updatedflags Localstorage/update Fig. 3: The backup procedure for new data or updated data in other region regularly FingerTable), the other is WRT (Whole Routing Table). The later one is for all peers in the network. It seems obviously that the main problem is how to update WRT. Since the WRT will become larger and larger with the expansion of the network, the update for WRT is costly. Thus, we take some enhancements for the original method. For the neighboring successors, sequential monitoring and notification has been proved an efficient tactic. Apart from that, our system determines which remote peers should get notified, because they have the failed peer in their finger table. Meanwhile, regarding the geographic distribution pattern of call sessions, we use two layered DHT overlay [10]. BPs play important role in two layered DHT overlay. BP can divide the whole overlay into several parts. Every region can only exchange update messages inside. Each BP communicates with each other to get information about other regions. Every peer in the region knows the BP in its region. If any peer joins the overlay or its direct successor crashes, this peer will notify BP in its region. The BP notification mechanism tremendously reduces the maintenance traffic demand for backbone network because one update event only transfers once in backbone links. BPs don t deal with actual application requests so that they unlikely become the bottle-neck of the performance even with the increasing of application request. In this way, the cost of WRT update can be reduced. 4 Simulation Experiments and Assessment We implemented our disaster recovery backup system in C++, atop the OMNET++ and OverSim [11] simulation tool which is an extensible, modular, component-based C++ simulation library and framework, primarily for building network simulators. In the simulation experiment, each node mainly consists of three parts: data storage controller, one-hop routing and TLS transmission which is based on security assurances. The topology of experiments environment includes 4 regions and some servers to monitor the numbers and flows of messages in the system, such as globleobserver, MessageObserver and so on. Some of the servers are used for experiment statistics to estimate the performance of the system. In the process of building and deploying our system, we have experienced a variety of issues, some operational and technical. We built a prototype of 600 peers located in four regions separately. There is one toppeer in each region. The system supports client upgrading. Thus, the real number of serving nodes is more than 600. We configure the number of upgrade peers to be 400. The total number of serving peers is actually 1000. In the system, we use enhanced one-hop overlay algorithm.

8106 K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 Fig. 4: Time for data recovery 4.1 Reliability The system takes three kinds of fault models. The experiment data is used to consider data recovery success rate. Suppose that the proportion of the number of upgrade peers and all servicing peers is M / N and the system need to meet the recovery success rate of 99.999% or more. According to formula (1) and inequality (2) listed in section 3.4.1, we get the theoretical calculation of the minimum amount of data replicas for each ratio of the number of upgrade peers to the number of serving peers: 4 for 15%, 5 for 30%, 6 for 45%, 8 for 60% and 9 for 75%. The number of data replicas will be more and more with the ratio growing. But the maintenance of data consistency will be costly in this way. Considering the cost performance, we set the number of data replicas in the intra-region have to be four. In other words, where will be one replica in the main storage node, three backup data in the intra-region nodes and one backup data in the extra-region node. Totally, five data replicas are necessary. We set the replicas number as five in our experiments to see the reliability. The experiments were made under 3 kinds of failure model talked above. Besides, 3 kinds of application models were set for each different failure model as follows. Firstly, write once and read 10 times for each node per second. Then, write 10 times and read 50 times for each node per second. Next, write 50 times and read 100 times for each node per second. For each application model, the number of upgrade peers to the number of serving peers ratio were set to be 15%, 30%, 45%, 60% and 75%. From the results, we can see that the success rate of recovery is all more than 99.5% when we set the upper limit to the number of replicas. It is an adoptable method to achieve our goal in system reliability. 4.2 Recovery time The recovery time discussed in this paper begins with the time of the main storage node failed and ends with the time that all data completely restored in the new main storage node, including the time that consume on backup nodes detects the main storage node s failure by heartbeat and transmit data to new main storage data. The pre-condition of the experiment is that each node stored approximately 10,000 user data and the interval of heartbeats set to 0.5 minutes. The experimental results are as shown in Fig. 4. From Fig. 4, with the growth of the parallel recovery factor, the time for recovery becomes

K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 8107 shorter. It means that the more recovery nodes can bring less recovery time. This kind of recovery time in the Carrier Grade network is totally acceptable. The recovery time relays on the parallel factor and processing capacity of the main storage node. Although the parallel factor can become big enough, the processing capacity of the peer is limited. Thus, the recovery time will be not declining linearly with the growing of the parallel factor. We can see from the figure, after parallel factor 6, the recovery time declines slowly. Even after 10, it doesn t change any more. It may be the bound value limited by the peer s processing capacity. 4.3 Bandwidth consumption In this paper, the design of disaster recovery backup system should achieve high data reliability and recoverability taking the bandwidth as the loss. Bandwidth consumption is used for main storage node send heartbeat messages with the node that stored the first segment backup data in the same region and the node that have backup data in another region. Thus, the main storage node can detect backup nodes failure to achieve the data high availability and timely recovery. Another main consumption is cost in data recovery. The bandwidth consumption for heartbeat is very small. We just take the bandwidth consumption for data recovery as the main point. The specific experimental data is shown in Table 1. Table 1: Bandwidth consumption for single node failed in different parallel recovery factor The parallel recovery factor 4 1.64 5 1.69 6 1.76 7 1.81 8 1.85 Bandwidth consumption (KB/s) From Table 1, we can see that the bandwidth consumption becomes more and more. But it changes tiny. It varies just because the number of parallel recovery nodes is different. Since the messages for recovery contains a light weighted header, the more parallel recovery nodes, the more messages will be transmit. Thus, the bandwidth consumption will be a little more. The change is almost linear. Regarding this aspect, the parallel recovery factor should be as small as it can be. Considering Fig. 4 and Table 1, we can summarize that the recovery time has a boundary value because of the processing capacity of serving node and the bandwidth consumption will become a little more with the increase of the parallel recovery factor. We assume the benefit equals the decline of the recovery time minus the increase of the bandwidth consumption with the increase of the parallel recovery factor. There is a yield curve as Fig. 5. From Fig. 5, we can see that when the parallel recovery factor is six, the benefit reach the biggest value. Thus, the best parallel recovery factor in our simulation environment is six. And the Table 2 shows the bandwidth consumption in different failure models while the parallel recovery factor sets six. It shows that the bandwidth consumption in the three kinds of Failure Model is very tiny. Even in nonlocal batch nodes failure model, the bandwidth consumption is just 861.39KB/s. It can be almost ignored. The simulate experiments above prove that our disaster recovery backup strategy is feasible. The targets listed in section 3.3 can be achieved.

8108 K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 Fig. 5: The yield with the change of parallel recovery factor Table 2: The theoretical calculation of the minimum amount of data replicas Failure Model Failure ratio Bandwidth consumption (KB/s) 1% 1.76 single node failure model 5% 35.64 10% 76.87 50 fails 95.76 nonlocal batch nodes failure model 100 fails 218.31 200 fails 464.52 500 fails 861.39 local batch nodes failure mode 100% 14.95 5 Conclusion This paper describes the disaster recovery backup system based on VoIP business in P2P network. The system implements reliable storage and rapid recovery for VoIP users data with high availability, scalability, and carrier-class guarantee. After analyzing the characteristics of specific VoIP business, the flexibility of the role for the common client in the system and the small amount of each user information data which changes frequently, we put forward the design of the storage system, including the core data storage controller strategy, the routing mechanism and TLS transmission mode. Its innovation lies in the determination of the number of data replicas, N:N data placement strategy and the principle of multi-node parallel recovery mechanism. Finally, we simulate and run the prototype in an experimental environment (thousands of nodes). The results show that the design of the system can meet the expected target. The design and implementation in this paper have practical significance. Acknowledgements Important national science & technology specific projects:next-generation broadband wireless mobile communications network (2010ZX03004-001-01), Innovative Research Groups of the National Natural Science Foundation of China (61121061), National Key Basic Research Program of China (973 Program)(2009CB320504).

K. Shuang et al. /Journal of Computational Information Systems 9: 20 (2013) 8099 8109 8109 References [1] Ghemawat, S., Gobioff, H., and Leung. The Google file system. In Proc of ACM SOSP, 2003. [2] Zheng Zhang, Qiao Lian,Shiding Lin et al. BitVault: a Highly Reliable Distributed Data Retention Platform [J]. Operating systems review, 2007, 41(2): 27-36. DOI: 10.1145 / 1243418. 1243423. [3] Giuseppe DeCandia, Deniz Hastorun,Madan Jampani et al. Dynamo: Amazon s Highly Available Key-value Store [J]. Operating systems review, 2007, 41(6): 205-220. DOI: 10.1145 / 1323293. 1294281. [4] Guangyu Shi, Jian Chen,Hao Gong et al. SandStone: A DHT based Carrier Grade Distributed Storage System [C]. 2009 38th International Conference on Parallel Processing (ICPP 2009). 2009: 420-428. [5] Cheng Lan, Luo Jian. ActiveSuper Node Based Layered DHT P2P Model Research [J]. Aeronautical Computing Technique, 2012, 42(5): 131-134. [6] Yu Zhang, Yuanda Cao, Baodong Cheng. A Layered P2P Network Topology Based on Physical Network Topology [C]. The 4th International Conference on Wireless Communications, Networking and Mobile Computing. 2008: 1-4. [7] S. Saroiu, P.K. Gummadi, and S.D. Gribble. A measurement study of peer-to-peer file sharing systems [C]. Proceedings of the Multimedia Computing and Networking (MMCN), San Jose, January, 2002. [8] Zeng, Rongfei, Jiang, Yixin,Lin, Chuang et al. A Distributed Fault/Intrusion-Tolerant Sensor Data Storage Scheme Based on Network Coding and Homomorphic Fingerprinting [J]. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(10): 1819-1830. [9] Zhonghua Li. Network Data Disaster Recovery Scheme for Bulk Data of E-Commerce and E- Government [J]. Journal of Computational Information Systems, 2009, 5 (2): 839-84. [10] Ye Ping, Li Yizhong, Xia Qin. Strategy Research on Delay Optimizing oriented Overlay Routing [J]. Chinese Journal of Computers, 2010, 33(1): 36-44. DOI: 10.3724/ SP.J.1016. 2010. 00036. [11] Cui Jianqun, Lai Mincai, Jiang Wenbin. OverSim: A Scalable Application Layer Multicast Network Simulation Framework [J]. Computer Engineering and Science, 2012, 34(10): 1-5. DOI: 10.3969/j.issn.1007-130X.2012.10.001.