Mathematical Framework for A Novel Database Replication Algorithm

Similar documents
A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

An Interest-Oriented Network Evolution Mechanism for Online Communities

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

An RFID Distance Bounding Protocol

Traffic State Estimation in the Traffic Management Center of Berlin

An Alternative Way to Measure Private Equity Performance

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Secure Password-Authenticated Key Agreement Using Smart Cards

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

Calculating the high frequency transmission line parameters of power cables

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Calculation of Sampling Weights

Project Networks With Mixed-Time Constraints

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

AN EFFICIENT GROUP AUTHENTICATION FOR GROUP COMMUNICATIONS

Multiple-Period Attribution: Residuals and Compounding

M3S MULTIMEDIA MOBILITY MANAGEMENT AND LOAD BALANCING IN WIRELESS BROADCAST NETWORKS

A Cluster Based Replication Architecture for Load Balancing in Peer-to-Peer Content Distribution

IMPACT ANALYSIS OF A CELLULAR PHONE

Genetic Algorithm Based Optimization Model for Reliable Data Storage in Cloud Environment

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

= (2) T a,2 a,2. T a,3 a,3. T a,1 a,1

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Cloud Auto-Scaling with Deadline and Budget Constraints

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Forecasting the Direction and Strength of Stock Market Movement

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

DEFINING %COMPLETE IN MICROSOFT PROJECT

Implementation of Deutsch's Algorithm Using Mathcad

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Watermark-based Provable Data Possession for Multimedia File in Cloud Storage

The OC Curve of Attribute Acceptance Plans

IT09 - Identity Management Policy

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

A Programming Model for the Cloud Platform

Politecnico di Torino. Porto Institutional Repository

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Detect An Traffc From A Network With A Network Onlne Onlnet

What is Candidate Sampling

Checkng and Testng in Nokia RMS Process

P2P/ Grid-based Overlay Architecture to Support VoIP Services in Large Scale IP Networks

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Conferencing protocols and Petri net analysis

Vembu StoreGrid Windows Client Installation Guide

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Design and Development of a Security Evaluation Platform Based on International Standards

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

A DATA MINING APPLICATION IN A STUDENT DATABASE

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

A High-confidence Cyber-Physical Alarm System: Design and Implementation


A Parallel Architecture for Stateful Intrusion Detection in High Traffic Networks

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Recurrence. 1 Definitions and main statements

A Novel Adaptive Load Balancing Routing Algorithm in Ad hoc Networks

Credit Limit Optimization (CLO) for Credit Cards

A role based access in a hierarchical sensor network architecture to provide multilevel security

Canon NTSC Help Desk Documentation

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

Overview of monitoring and evaluation

Improved SVM in Cloud Computing Information Mining

iavenue iavenue i i i iavenue iavenue iavenue

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

J. Parallel Distrib. Comput.

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Multi-Source Video Multicast in Peer-to-Peer Networks

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Transcription:

I.J.Modern Educaton and Computer Scence, 203, 9, -0 Publshed Onlne October 203 n MECS (http://www.mecs-press.org/) DOI: 0.585/jmecs.203.09.0 Mathematcal Framework for A Novel Database Replcaton Algorthm Sanjay Kumar Yadav Dept. of Computer Scence & Informaton Technology, Sam Hggnbottom Insttute Of Agrculture, Technology & Scences- Deemed Unversty, Allahabad, Inda Emal: yadav_sk@redffmal.com Gurmt Sngh Dept. of Computer Scence & Informaton Technology, Sam Hggnbottom Insttute Of Agrculture, Technology & Scences- Deemed Unversty, Allahabad, Inda Emal: gurmtsngh3@redffmal.com Dvakar Sngh Yadav Department of Computer Scence & Engneerng, Insttute of Engneerng and Technology, Lucknow, Inda Emal: dvakar_yadav@redffmal.com Abstract In ths paper, the detaled overvew of the database replcaton s presented. Thereafter, PDDRA (Pre-fetchng based dynamc data replcaton algorthm) algorthm as recently publshed s detaled. In ths algorthm, further, modfcatons are suggested to mnmze the delay n data replcaton. Fnally a mathematcal framework s presented to evaluate mean watng tme before a data can be replcated on the requested ste. Index Terms database replcaton, throughput, average delay I. INTRODUCTION A database system s one of the computer systems whch offer effcent data storage facltes to the applcatons. A database system s used to control the collecton of data tems. Database systems play a vtal role n contemporary applcatons, such as admnstraton, socal stes, search-engnes, and bankng systems. Database systems offer abstractons; data consstency and concurrent data access, due to these database systems have got huge success n real world applcatons. A database system [] ) provdes an nterface whch can be used to solve the problems of data storage and retreval; 2) allows concurrent data access whle mantanng data ntegrty; 3) survves server crashes or power falures wthout corruptng data ; Scalablty and performance are the key problems as the database system gets bgger. When database system ncreases from a smaller system to a larger system performance s degraded and at one pont performance can become a bottleneck n the database system. Because of ths, much research has been done n these areas of database systems []. Replcaton s one of the good ways to ncrease the performance of the database system by separatng out the database by mantanng dfferent servers. Workload on a sngle server can be decreased by mantanng the dfferent database servers [2]. Replcaton s an effcent method to acheve optmzed access to data and hgh performance n dstrbuted envronments [3]. Replcaton has been used n dstrbuted computng for a long tme [4]. Replcaton creates several copes of the orgnal fle (called replcas) and dstrbutes them to multple stes. Ths provdes remarkably hgher access speeds than havng just a sngle copy of each fle. Besdes [4,5] t can effectvely enhance data avalablty, fault tolerance, relablty, system scalablty and load balancng by creatng replcas and dspersng them among multple stes. The three [6] fundamental questons any replcaton strategy has to answer are: When should the replcas be created? Whch fles should be replcated? Where the replcas should be placed? Dependng on the answers, dfferent replcaton strateges are born. Mng Tang et al. suggested two replcaton algorthms n [7]: Smple Bottom Up (SBU) and Aggregate Bottom Up (ABU) for multter data stes. The basc dea of these algorthms s to create the replcas as close as possble to the clents that request the data fles wth hgh rates exceedng the pre-defned threshold. In [8] a Popularty Based Replca Placement (PBRP) algorthm was proposed. Ths algorthm tres to decrease data access tme by dynamcally creatng replcas for popular data fles. Ruay-Shung Chang et al. proposed a dynamc data replcaton mechansm n [9], whch s called Latest Access Largest Weght (LALW). The desgn of the archtecture s based on a centralzed data replcaton management. LALW selects a popular fle for Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

2 Mathematcal Framework for A Novel Database Replcaton Algorthm replcaton and calculates a sutable number of copes and grd stes for replcaton. In [0] a dynamc data replcaton strategy called FIRE was proposed. In ths method each ste mantans a fle access table to keep track of ts local fle access hstory. In another paper [] a new replcaton algorthm named Modfed BHR was proposed. The proposed algorthm was based on the network level localty. The algorthm tres to replcate fles wthn a regon and stores the replca n a ste where the fle has been accessed frequently based on the assumpton that t may requre n the future. As detaled above, related work n the data replcaton data throughput and average fetchng delay are the two mportant parameters. In ths work an exstng PDDRA algorthm [2] s dscussed and modfcatons are suggested to mprove ts performance. Fnally a mathematcal model s presented to obtan throughput and average delay. The remander of ths paper s organzed as follows: Secton of ths paper descrbes the ntroducton to dstrbuted database and dfferent replcaton strateges; secton 2 presents the overvew of the database replcaton. The concept and context of database replcaton and desgn ssues n dstrbuted real tme replcated database system detaled n secton 3. In secton 4, the exstng PDDRA algorthm s detaled, wth ts lmtaton. Secton 5 explans our proposed scheme and the mathematcal frame work for a novel database replcaton algorthm. Concluson and future work are gven n the fnal secton. II. OVERVIEW OF DATABASE REPLICATION Replcaton s the method of sharng nformaton so as to make sure of consstency between redundant resources, such as hardware or software components, to enhance relablty, defect-tolerance, or accessblty. It could be data replcaton f the same data s stored on multple storage devces. Replcaton s the mechansm that automatcally copes drectory data from one drectory Server to another. Fg. shows the basc replcaton model. In ths model user or clent does not know the multple physcal copes of data exts. Data replcaton s a combnaton of database and dstrbuted system. Database replcaton can be defned as the process of creaton and mantenance of the duplcate copy of database objects n a dstrbuted database system [3]. Usng replcaton, copyng of any drectory tree or sub-tree (stored n ts own database) could be done between servers. The Drectory Server, holdng the master copy of the nformaton, automatcally copes every update to all replcas, whereas computaton replcaton of the same computng job s executed several tmes. Fg. : basc data replcaton model A computatonal job s typcally replcated n space,.e. executed on dfferent devces, or t could be replcated n tme, f t s executed agan and agan on a sngle devce. The access to a replcated entty s generally unform wth access to a sngle entty whch s not replcated. The replcaton tself should be transparent to an external user. The man features of database replcaton are as follows [4, 5] ) Database Localty Ths feature of database replcaton mantans the database locally so that geographcally far dstance users can access data wth hgh speed. These users can access data from local servers nstead of far dstance servers because data access speed wll be much hgher than far dstance area network. Provdng database as closer to the user as possble contrbutes to hgher performance of a system. 2) Performance Database replcaton typcally focuses on mprovng both read performance and wrte performance, whle mprovng both read and wrte performance smultaneously s a more challengng task. When applcaton s wdely used across the large network but database s stored at a sngle server n that case database server can be a bottleneck of that system and the whole system slows down,.e. slow response tme and low request throughput capacty. Multple replcas offer the system whch serves the data n parallel. 3) Avalablty and Fault Tolerance Hgh avalablty of database requres low downtme of database system. In database systems there are two downtmes ext, frst s planned and another one s unplanned. Planned downtme s ncurred durng the mantenance operaton of all the software and hardware. Unplanned downtme can strke at any tme and t s due to predctable or unpredctable falures such as hardware falures, software bugs, human error, etc. Downtme s usually the prmary optmzaton area of database replcaton to ncrease the database avalablty. If a database tem s stored at a sngle server and that Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

Mathematcal Framework for A Novel Database Replcaton Algorthm 3 server does not respond or s down or t mght have crashes. In that case database replcaton s the soluton of ths problem, to provde fault a tolerance database system. 2.Types of Database Replcaton The replca of database server can provde the data tem to the users durng server falure. Ths replca can also be used for restorng the data of faled servers. In ths way database replcaton ncreases the data avalablty and forms a defect-tolerant system. [4] There are three dfferent ways of Database replcaton: ) Snapshot Replcaton Data on one database server s planly coped to another database server, or else to another database on the same server (Fg. 2). The snapshot replcaton method functons by perodcally sendng data n bulk format. Usually t s used when the subscrbng servers can functon n read-only envronment and also when the subscrbng server can functon for some tme wthout updated data. Functonng wth un-updated data for a perod of tme s referred to as latency. Snapshot replcaton works by readng the publshed database and creatng fles n the workng folder on the dstrbutor. These fles are named as snapshot fles and contan the data from the publshed database as well as some addtonal nformaton that wll help create the ntal copy on the subscrpton server [6]. Fg. 2: schematc of snapshot replcaton 2) Merger Replcaton Data from two or more databases s combned nto a sngle database. Merge replcaton s the process of dstrbutng data from Publsher to Subscrbers, allowng the Publsher as well as Subscrbers to make updates durng connected or dsconnected state, and then mergng the updates between stes when they are connected. Merge replcaton allows dstnct stes to work autonomously and at a later tme merge updates nto a sngle and unform result. Merge replcaton ncludes default and custom choces for conflct resoluton that you can defne as you confgure a merge publcaton. When a conflct happens, a resolve s nvoked by the Merge Agent and determnes whch data wll be accepted and propagated to other stes. 3) Transacton Replcaton Users obtan complete ntal copes of the database and then obtan perodc updates as data changes. In transactonal replcaton, each commtted transacton s replcated to the subscrber as t takes place. You can control the replcaton process so that t wll ether accumulate transactons or send them at tmed ntervals; or transmt all changes as they occur. Transacton replcaton s used n envronments havng a lower degree of latency and hgher bandwdth connectons. Transactonal replcaton requres a relable and contnuous connecton, because the Transacton Log wll grow quckly and f the server s unable to connect for replcaton t mght not be manageable. Transactonal replcaton begns wth a snapshot. Ths snap shot sets up the ntal copy. Later then, the coped transactons update that copy. You can choose how often to update the snapshot, or select not to update the snapshot after the very frst copy. Once the ntal snapshot has been coped, transactonal replcaton, usng the Log Reader agent, reads the Transacton Log of the publshed database and stores new transactons n the dstrbuton Database. The transactons are then transferred from the publsher to the subscrber by the Dstrbuton agent. III. CONCEPT OF REPLICATED DATABASES To better understand the method behnd Database Replcaton we start wth the term Replcaton whch represents the process of sharng nformaton to ensure consstency between redundant resources, lke software or hardware components, to mprove relablty, accessblty or fault-tolerance. It could be data replcaton f the same data s stored on multple storage devces or computaton replcaton f the same computng task s executed many tmes [7]. The avalablty of certan replcaton databases could be mproved by usng Database mrrorng. Support for combnng transactonal replcaton wth database mrrorng depends on whch replcaton database s beng consdered. Peer-to-Peer replcaton n combnaton wth database mrrorng s not supportve. The replcaton agents that connect to the publcaton database can automatcally fal over to the mrrored publcaton database. In the occurrence of a falure the agents that connect to the publcaton database wll automatcally reconnect to the new prncpal database. The source, n a replcaton buldng block, s generally a database that contans data to be replcated. One database can be the source for varous replcaton buldng blocks. Further, the source database can also serve as the target for another replcaton buldng block. Followng example wll make t clearer. The same par of data stores swap roles n the Master-Master Replcaton pattern, (source becomes target, and target becomes source) for a common movement set that s updateable n ether data store. A computatonal task s typcally replcated n space,.e. executed on dfferent devces or t could be replcated n tme, f t s executed agan and agan on a ndvdual devce. The access to replcated entty s Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

4 Mathematcal Framework for A Novel Database Replcaton Algorthm typcally unform wth access to a non-replcated, sngle entty. The replcaton tself should be transparent to an external user. Addtonally, n a falure scenaro, a falover of replcas s concealed as much as possble Replcaton s the key characterstc n mprovng the avalablty of data dstrbuted real tme systems. Replcated data s stored at multple server stes so that t can be accessed by the user even when some of the copes are not avalable due to server/ste falures [8]. A Major restrcton to usng replcaton s that replcated copes must behave lke a sngle copy,.e. nternal consstency as well as mutual consstency must be preserved, Synchronzaton technques for replcated data n dstrbuted database systems have been studed n order to ncrease the degree of consstency and to reduce the possblty of transacton rollback [9]. In replcated database systems, copes of the data tems can be stored at multple servers and a number of places. The potental of data replcaton for hgh data avalablty and mproved read performance s crucal to DRTDBS. In contrast, data replcaton brngs up ts own complcatons. Access to a data tem s no longer controlled exclusvely by a sngle server; rather, the access control s dstrbuted across the servers each storng a copy of the data tem. It s essental to ensure that mutual consstency of the replcated data s provded; t must fulfll the ACID propertes of database. It s common to talk about actve and passve replcaton n systems that replcate data or servces. In Actve replcaton, the same request s processed at every replca, whle n passve replcaton, each sngle request s processed on a sngle replca and then ts state s transferred to the other replcas. If at any tme, one master replca s enttled to process all the requests, then we are dscussng about the prmary-backup scheme (master-slave scheme) predomnant n hgh-avalablty clusters. On the other hand, f any replca processes a request and then dstrbutes a new state, then ths s a mult-prmary scheme (n the database feld called mult-master). In the mult-prmary scheme, t s necessary to use some form of dstrbuted concurrency control, lke dstrbuted lock manager. Load balancng s dfferent from task replcaton, as t dstrbutes a load of dfferent (not the same) computatons across machnes, and t allows a sngle computaton to be dropped n case of a falure. Load balancng, however, sometmes uses data replcaton especally for mult-user nternally, to dstrbute ts data among machnes. To cope wth the complexty of replcaton, the noton of group (of servers) and group communcaton prmtves have been ntroduced [20]. The noton of a group actng as a logcal addressng mechansm, allows the clent to gnore the degree of replcaton and the dentty of the ndvdual server processes of a replcated servce. Group communcaton prmtves provde one-to-many communcaton wth varous powerful semantcs. These semantcs hde much of the complexty of mantanng the consstency of replcated servers. The two man group communcaton prmtves are Atomc Broadcast (ABCAST) and Vew Synchronous Broadcast (VSCAST). We gve here an nformal defnton of these prmtves. A more formal defnton of ABCAST and of VSCAST can be found n [2] and [22] respectvely (see also [23, 24]). Group communcaton propertes can also feature FIFO order guarantees. Even though the process of Data Replcaton s used to create nstances of the same or parts of the same data, we must not confuse the process of Data Replcaton wth the process of backup snce replcas are frequently updated and quckly lose any hstorcal state. Whle Backup on the other hand, saves a copy of data unchanged for a long perod of tme. Actve replcaton, also called the state machne approach [25], s a non-centralzed replcaton technque. Its key concept s that all replcas receve and process the same sequence of clent requests. Consstency s made certan by assumng that, when suppled wth the same nput n the same order, the same output wll be produced by the replcas. Ths assumpton mples that servers process requests n a determnstc way. Clents do not contact one specfc server, but address servers as a group. In order for servers to receve the same nput n the same order, an Atomc Broadcast can be used to propagate the clent requests to servers. Weaker communcaton prmtves can also be used f semantc nformaton about the operaton s known (e.g., two requests that commute do not have to be delvered at all servers n the same order). The man advantage of actve replcaton s ts smplcty (e.g., same code everywhere) and falure transparency. Falures are fully concealed from the clents, because f a replca fals; the requests are stll processed by the other replcas. The major drawback of ths approach s the determnsm constrant. The basc prncple of passve replcaton, also named as Prmary Backup replcaton, s that clents send ther requests to a prmary, whch after executng the requests; sends update messages to the backups. The nvocatons are not executed by the backups, but apply the alteratons produced by the nvocaton executon at the prmary that s; updates. By dong ths, no determnsm constrant s necessary on the executon of nvocatons, the man dsadvantage of actve replcaton. Communcaton between the backups and the prmary has to guarantee that updates are processed n the same sequence, whch s the case f prmary backup communcaton s based on FIFO channels. However, an only FIFO channel s not enough to ensure correct executon n case of falure of the prmary. For example, consder that the prmary fals before all backups receve the updates for a defnte request, and another replca takes over as a new prmary. Some mechansm has to ensure that updates sent by the new prmary wll be properly ordered wth regard to the updates sent by the prmary, whch s faulty. VSCAST s a mechansm that guarantees these constrants can usually be used to mplement the prmary backup replcaton technque [26]. Passve replcaton can tolerate non-determnstc Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

Mathematcal Framework for A Novel Database Replcaton Algorthm 5 servers (e.g., mult-threaded servers) and uses lttle processng power when compared to other replcaton technques. However, when the prmary fals, passve replcaton suffers from a hgh reconfguraton cost. 3. Context of Database Replcaton Replcaton Technques n Dstrbuted Systems organzes and surveys the spectrum of replcaton protocols and systems that acheve hgh avalablty by replcatng enttes n falure-prone dstrbuted computng envronments. The transacton level data can be duplcated to the replca database. The output s greater data ntegrty and avalablty. However, the ncreased avalablty s dependent on how ndependent the database replca s from the prmary database. Replca ndependence must be taken n consderaton n terms of dsk spndles, dsk controller, system, power supples, room, cty and buldng. Whle data copyng can provde users wth local and much qucker data accessng, the problem s to provde these copes to users so that the overall systems operate wth the same ntegrty and management capacty that s avalable wthn a centralzed model. It s sgnfcantly more complcated to manage a replcated data than runnng aganst a sngle locaton database. It deals wth all of the mplementaton and desgn ssues of a sngle locaton and addtonally wth complexty of dstrbuton, remote admnstraton and network latency. 3.2 Issues n Dstrbuted Real Tme replcated Database Systems: Desgn Issues ) Replcaton Set Sze Decde whether to replcate a subset of a table, an entre table, or data from more than one table. Ths s a trade-off among the amount of data that changes the complexty of the lnk and the overall table sze. 2) Transmsson Volume To transmt, the rght amount of the data should be chosen. The decson between sendng all changes for any sngle row, or just the net effect of entre changes, s a key one. 3) At the target, Replcaton Set Data Changes If these have to occur and f the source wants to vew the changes, then try to make the changes naturally non-conflctng to avod the need for conflct detecton and resoluton. 4) Replcaton Frequency Decde the approprate tmng of the replcaton for the requrements and optmze the use of computng resources. 5) Replcaton Unt As explaned earler, a replcaton set conssts of a group of replcaton unts. Recognze the unt of data that wll be transmtted to the target from the source. In the extreme requrements, ths wll be a transacton as t has been executed on the source. Easer to acheve (easly achevable) but a less precse requrement s to move a changed row. For envronments wth a bg (huge, mmense) rsk of conflcts, t can also be a dstnctve change n a cell wthn a record. 6) Intator Decde whether the target pulls the data or the source pushes t, and make sure that throughout your replcaton topology these decsons do not cause later replcaton lnks to have problems meetng ther operatonal requrements. 7) Lockng Issues Verfy that you can accept the lockng mpact of the replcaton on the source. If not, verfy that a mnor (small) decrease n consstency at a pont n tme s acceptable for the targets so you can avod lock conflct. 8) Replcaton Topology The players, ther roles and the overall ntegrty must be dentfed. 9) Securty Ensure that the replcated data s treated wth the rght level of securty at the target gven the source securty condtons. Along wth (t), verfy that your replcaton lnk s secure enough n the overall topology requrements. 0) Key Updates Verfy whether the source allows updates to the key of records belongng to the replcaton set. If so, take specal care for a consstent replcaton of such operatons. Key updates are SQL updates to the columns of the physcal key wthn a replcaton set. Such key updates must be handled partcularly by the replcaton (Specally, the replcatons must handle such key updates). ) Referental Integrty Verfy whether the target has mplemented referental ntegrty. If so, you need rules to stop (prevent) changes from the replcaton lnk beng appled twce f the change trggers a target change n another replcated table. IV. RELATED WORKS PDDRA: Pre-fetchng Based Dynamc Data Replcaton Algorthm [2] Replcaton s an effcent method to acheve optmzed access to data and hgh performance n dstrbuted envronments Replcaton has been used n dstrbuted computng for a long tme. Ths technque appears clearly applcable to data dstrbuton problems such as Hgh Energy Physcs communty where several thousand physcsts want to access the terabytes and Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

6 Mathematcal Framework for A Novel Database Replcaton Algorthm even petabytes of data that s produced every year. It s a reasonable way to make copes or replcas of the dataset and store these replcas among multple stes because of the geographc dstrbuton of the corporaton n a data grd. Replcaton creates several copes of the orgnal fle (called replcas) among the data grd and dstrbutes them to multple grd stes. Ths provdes remarkably hgher access speeds than havng just a sngle copy of each fle. Besdes t can effectvely enhance data avalablty, fault tolerance, relablty, system scalablty and load balancng by creatng replcas and dspersng them among multple stes. Data replcaton not only reduces data access costs but also ncreases data avalablty n many applcatons. If the requred fles are replcated n some stes where the job s executed, then the job s capable of processng data wthout communcaton delay. However f the requred fles are not stored locally, they wll be fetched from remote stes. Ths fetchng takes a long tme due to the large sze of fles and the lmtaton of network bandwdth between stes. Therefore t s better to prefetch and pre-replcate the fles that are probable to be requested n near future. Ths wll ncrease data avalablty. In ths secton algorthm for data replcaton wll be presented; ths algorthm s based on pre-fetchng [2]. For ncreasng system performance and reducng response tme and bandwdth consumpton t s better to pre-fetch some replcas for requester grd ste, these replcas wll be requested n the near future wth a hgh probablty and s better to replcate these fles to requester node so the next tme that the grd ste needs them, t wll access them locally, decreasng access latency and response tme. The archtecture of the algorthm s llustrated n Fg. 3. As shown n the Fg. 3, the grd stes are located n lowest level of the archtecture. These grd stes consst of Storage and/or Fg. 3: exstng pre-fetchng based dynamc data replcaton algorthm (PDDRA) Computng Element. Multple grd stes consttute a Vrtual Organzaton (VO), there s a Local Server (LS) for every Vrtual Organzaton (VO) and the Replca Catalog (RC) s located at Local Server. It s worth mentonng that as avalable bandwdth between the stes wthn a VO s hgher than bandwdth between Vrtual Organzatons. Hence accessng a fle that s located n the current VO s faster than accessng the one that s located n the other VO. In the upper layer there s a Regonal Server (RS) and each RS conssts of one or more VOs. Regonal Servers are connected through the nternet, so transferrng fles between them takes a long tme. There s also a Replca Catalog located at each RS that s a drectory of all the fles stored at that regon. Whenever a fle that s not stored n the current VO s requred, the RC of RS s asked for determnng whch VOs have the requested fle. Suppose that grd ste A requests a fle that s not stored locally. Therefore t asks the RC to determne whch stes have the requested fle. For reducng access latency, bandwdth consumpton and response tme, t s better to pre-fetch replcas that are probable to be requested by the requester grd ste n the near future. When a requred fle s not n the current VO and s stored n the other VOs, a request s sent to RS. Then RS searches on ts Replca Catalog table and determnes the locatons of the requested fle n other VOs. In such stuatons only the requred fle wll be replcated and because of low bandwdth between VOs, hgh propagaton delay tme and consequently hgh replcaton cost, pre-fetchng wll not be advantageous and wll not be done. In addton n ths paper [7] the authors have assumed that members n a VO have smlar nterests of fles so fle access patterns of dfferent VOs dffer and consequently a fle from dfferent VO should not be pre-fetched for the requester grd ste n other VO, because ther requrements and access patterns are dfferent. So only the requred fle wll be replcated and pre-fetchng wll not be performed. The algorthm s constructed on the bases of an assumpton: members n a VO have smlar nterest n fles. For predctng the future accesses, past sequence of accesses should be stored. Fles that wll be accessed n the near future can be predcted by mnng the past fle access patterns. PDDRA conssts of three phases: ) Phase : Storng fle access patterns In ths phase, fle access sequences and data access patterns are stored n a database. 2) Phase2: Requestng a fle and performng replcaton and pre fetchng In ths phase a grd ste asks for a fle and replcaton s accomplshed for t, f t s benefcal. Adjacent fles of the requested fle are also pre-fetched for the requester grd ste n ths phase. 3) Phase 3: Replacement If there was enough space n storage element for storng a new replca, t wll be stored; otherwse an exstng fle should be selected for replacement. Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

Mathematcal Framework for A Novel Database Replcaton Algorthm 7 Lmtatons of Exstng PDDRA ) The PDDRA algorthm tres to mnmze the access tme usng pre-fetchng mechansm. However, due to the lmted bandwdth of the access network sometmes t may not be possble to the fetch data as per our wll, and request wll be n queue, that leads to the further watng and n turn wll ncrease the replcaton tme. 2) Members of VO may have dfferent nterests. V. PROPOSED SCHEME. In the proposed scheme the nternet cloud wll be consdered as master node as t can be assumed that the data s avalable n the nternet for the replcaton. 2. If any VO searches for any data frst t wll search n RS and then t wll search n nternet, f data s locally avalable at any RS then t wll be replcated and there wll not be any need to connect through the master node. 3. There s a possblty that the data may not be avalable at RS, hence, a smultaneous request s send to both RS and master node, f access of master node s n queue for let s say tme t q then local search at RS wll be done for tme ts < tq. 4. The three phases of the above PDDRA wll be mplemented as explaned above. 5. Smplfed Mathematcal Framework As the replcated data s ether avalable locally or t s avalable globally. Therefore, some of the generated request wll be full-flled locally and leftover request wll be fetched from nternet (master node). In ths secton a mathematcal framework s developed to estmate the average response tme of all the transactons. Table : Lst of Parameters Parameters Meanng n Number of ctes Transacton type ζ Percentage of transacton of type λ Transacton arrval rate th µ Mean servce tme for transacton p Probablty of local transacton executon t Mean tme to send a transacton send type t Mean tme to return query result return Transacton Processng and Arrval Rates The arrval of the update and query transacton are random n nature. However, n most of the cases for a partcular node t s a rare event. Hence, the arrval of the update and query transactons from every node s assumed to be a Posson process, and then ther sum s also Posson. However, where the arrval of updates and query transactons are frequent, then, Bernoull model can be used. Update transactons are assumed to be propagated asynchronously to the secondary copes. Furthermore, transactons are also assumed to be executed at a sngle ste, ether the local or a remote ste. The performance of replcated databases can be mproved f the requrement of mutual consstency among the replcas of a logcal data tem s relaxed. Varous concepts of relaxed coherency can be denoted by coherency condtons whch allow calculatng a coherency ndex k [0,] as a measure of the degree of allowed dvergence. Small values of k express hgh relaxaton, k = 0 models suspend update propagaton, and for k = updates are propagated mmedately. Takng localty, update propagaton, and relaxed coherency nto account, the total arrval rate of transactons of type, ( ), at a sngle ste amounts to T λ = pλ + ( n ) ( p) λ. ( n ) () Fg. 4: modfed pre-fetchng based dynamc data replcaton algorthm (PDDRA) The frst term p descrbe a share of the ncomng λ transactons whch can be executed locally, whereas the remanng transactons ( p ) are forwarded to nodes where approprate data s avalable. The other n- nodes also forward ( p ) of ther λ transactons, whch are receved by each of the remanng databases λ Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

8 Mathematcal Framework for A Novel Database Replcaton Algorthm wth equal probablty smplfes to λ T = λ Tot Tot λ = λ = λ = = ( n ). The above formula (2) The mean watng tme W at a local database s found to be: Tot. = Tot = 2 λ µ W = λ. µ (3) The mean watng tme at local database ste s the tme that user or transacton spends n a queue watng to be servced. Meanwhle, the response tme s the total tme that a job spends n the queung system. In other words, the response tme s equal to the summaton of the watng tme and the servce tme n the queung system. On average, a transacton needs to wat for W seconds at a database node to receve a servce of µ seconds. Addtonally, wth probablty ( p ) a transacton needs to be forwarded to a remote node that takes WC seconds to wat for plus the tme to be sent and returned. Thus, the response tme s gven by R = W + µ + ( p ).( W + t + t ) (4) C send return And the average response tme over all transacton type results n R = ζ R (5) = VI. CONCLUSION In ths paper, PDDRA (Pre-fetchng based dynamc data replcaton algorthm) algorthm [2] as recently publshed s modfed. Fnally a mathematcal framework s presented to evaluate mean watng tme before a data can be replcated on the requested ste. In the future work, the smulaton results wll be presented to obtan the mean watng tme and throughput. REFERENCES [] R.Elmasr and S. B. Navathe. Fundamentals of Database Systems [B]. The Benjamn/Cummngs Publshng Company, Inc., 994. [2] Fredrk Nlsson, Patrk Olsson. A survey on relable communcaton and replcaton technques for dstrbuted databases [B]. [3] A. Dogan, A study on performance of dynamc fle replcaton algorthms for real-tme fle access n data grds, [J] Future Generaton Computer Systems 2009, 25 (8): 829 839. [4] R.-S. Chang, P.-H. Chen, Complete and fragmented selecton and retreval n data grds, [J] Future Generaton Computer Systems, 2007, 23: 536 546. [5] Y. ar Amr, Alec Peterson, and Davd Shaw. Seamlessly Selectng the Best Copy from Internet- Wde Replcated Web Servers [C]. Proceedngs of the Internatonal Symposum on Dstrbuted Computng (Dsc98), LNCS 499, pages 22-33 Andros, Greece, September 998. [6] I. Foster, K. Ranganathan, Desgn and evaluaton of dynamc replcaton strateges a hgh performance Data Grd, [C] n: Proceedngs of Internatonal Conference on Computng n Hgh Energy and Nuclear Physcs, Chna, September 200. [7] M. Tang, B.S. Lee, C.K. Yao, X.Y Tang, Dynamc replcaton algorthm for the mult-ter data grd, [J] Future Generaton Computer Systems 2005, 2 (5) : 775 790. [8] M. Shorfuzzaman, P. Graham, R. Eskcoglu, Popularty-drven dynamc replca placement n herarchcal data grds, [C] n: Proceedngs of Nnth Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes, 2008, 524 53. [9] R.-S. Chang, H.-P. Chang, Y.-T. Wang, A dynamc weghted data replcaton strategy n data grds, [J] The Journal of Supercomputng, 2008, 45 (3) : 277 295 [0] A.R. Abdurrab, T. Xe, FIRE: a fle reunon data replcaton strategy for data grds, [C] n: 0th IEEE/ACM Internatonal Conference on Cluster, Cloud and Grd Computng, 200, 25 223. [] K. Sash, A.S. Thanaman, Dynamc replcaton n a data grd usng a modfed BHR regon based algorthm, [J] Future Generaton Computer Systems 200, 27 :202 20. [2] N.Saadat and A.M. Rahman. PDDRA: A new prefetchng based dynamc data replcaton algorthm n data grds. [J] Sprnger: Future Generaton Computer Systems, 202, 28:666-68. [3] Salman Abdul Moz, Salaja P., Venkataswamy G., Suprya N. Pal. Database Replcaton: A Survey of Open Source and Commercal Tools. [J] Internatonal Journal of Computer Applcatons (0975 8887) 20, 3(6), -8. [4] Henz Stocknger. Data Replcaton n Dstrbuted Database Systems, 999 [B]. [5] Marus Crstan MAZILU, "Database Replcaton", [J] Database Systems Journal 200, (2), 33-38. [6] Mark A.Lnsenbardt, Shane Stgler. McGraw- Hll/Osborne Meda Book SQL Server2000Admnstraton-Chap.0,.Replcaton [B]. Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

Mathematcal Framework for A Novel Database Replcaton Algorthm 9 [7]Mcrosoft MSDN Lbrary - http://msdn.mcrosoft.com [W] [8] B. Kemme, F. Pedone, G. Alonso, and A. Schper. Processng transactons over optmstc atomc broadcast protocols. [C] In Proceedngs of the Internatonal Conference on Dstrbuted Computng Systems, Austn, Texas, June 999. [9] M. Raynal, G. Tha-Kme, and M. Ahamad. From seralzable to causal transactons for collaboratve applcatons. [R] Techncal Report 983, Insttut de Recherche en Informatque et Systèmes Aléatores, Feb. 996. [20] K. P. Brman. The process group approach to relable dstrbuted computng. [J] Communcatons of the ACM, 993, 36(2):37 53. [2] V. Hadzlacos and S. Toueg. Fault-tolerant broadcasts and related problems. [B] In S. Mullender, edtor, Dstrbuted Systems, chapter 5. adwe, second edton, 993. [22] Sanjay Kumar Twar et al. Dstrbuted Real Tme Replcated Database: [J] Concept and Desgn Internatonal Journal of Engneerng Scence and Technology (IJEST) ISSN: 0975-5462 230, 20 3(6) 4839-4849. [23] K. P. Brman and T. A. Joseph. Explotng vrtual synchrony n dstrbuted systems. [C] In Proceedngs of the th ACMSymposum on OS Prncples, pages 23 38, Austn, TX, USA, Nov. 987. ACM SIGOPS, ACM. [24] K. P. Brman, A. Schper, and P. Stephenson. Lghtweght causal and atomc group multcast.[j]acm Transactons on Computer Systems, 99, 9(3):272 34. [25] F. B. Schneder. Implementng fault-tolerant servces usng the state machne approach:[j] A tutoral. ACM Computng Surveys, 990, 22(4):299 39. [26] R. Guerraou and A. Schper. Software-based replcaton for fault tolerance.[j] IEEE Computer, 997, 30(4):68 74. Authors Profles Sanjay Kumar Yadav: s Assstant Professor of Computer Scence n Dept. of Computer Scence & Informaton Technology at Sam Hggnbottom Insttute Of Agrculture, Technology & Scences (Formerly Allahabad Agrcultural Insttute), (Deemed-to-be-Unversty) Allahabad. He obtaned batchelor degree n B.Sc.(Maths) from Unversty of Allahabad, MCA degree from Insttute of Engneerng and Technology, Lucknow. M.Tech. n Software Engneerng from Motlal Nehru Natonal Insttute of Technology Allahabad and pursung hs Ph.D. n Computer Scence & IT at Sam Hggnbottom Insttute Of Agrculture, Technology & Scences (Formerly Allahabad Agrcultural Insttute), (Deemed-to-be-Unversty) Allahabad. Hs research nterest ncludes dstrbuted system and moble ad-hoc network. Prof. Gurmt Sngh: s Emertus Professor of Computer Scence n Dept. of Computer Scence & Informaton Technology at Sam Hggnbottom Insttute Of Agrculture, Technology & Scences (Formerly Allahabad Agrcultural Insttute), (Deemed-to-be-Unversty) Allahabad. He served the department as professor and Head for several years and retred n year 202. He was also served the Unversty as Dean, Shepherd School of Engneerng & Technology and s on the program commttees of the Unversty. He s the author/coauthor of several publcatons n techncal journals and conferences. Hs research nterest ncludes dstrbuted system and moble ad-hoc network, wreless sensor network and evolutonary computng. Prof. Dvakar Sngh Yadav: s Professor of Computer Scence at Insttute of Engneerng and Technology, Lucknow. He obtaned B.Tech n Computer Scence& Engneerng, M.Tech n Computer Scence from IIT, Kharagpur and Ph.D from Unversty of Southampton, U.K. Before jonng Gautam Buddh Techncal Unversty, Lucknow as Pro- Vce Chancellor, he was at South Asan Unversty, New Delh, an nternatonal unversty establshed by South Asan Assocaton for Regonal Cooperaton (SAARC) natons, where he was Charperson of Department of Computer Scence at Faculty of Mathematcs and Computer Scence. Dr. Yadav possesses more than 20 years of experence n academcs/research n Inda and Abroad. Besdes servng as member of several expert commttees of U. P. Techncal Unversty, Lucknow, AICTE, New Delh and Govt. of Uttar Pradesh, he also served as member of Advsory Boards, Techncal Program Commttees and revewer for several nternatonal conferences /workshops /journals. He has long standng academc nterests n database systems and dstrbuted computng. Hs prmary research nterests are n formal methods, refnement of dstrbuted systems usng Event-B, verfcaton of crtcal propertes of busness crtcal systems and reasonng about dstrbuted database systems. He has also partcpated n prestgous Daghtuhl semnar at Schloss Dagstuhl-Lebnz Center for Informatcs, Germany n 2006, n addton to Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0

0 Mathematcal Framework for A Novel Database Replcaton Algorthm nvtaton at Commonwealth Scholarshp Commsson, U.K. semnar held at the Unversty of the West England, Brstol n 2007. Dr. Yadav s author of four (04) books n the area of computers and nformaton technology ncludng best seller Foundatons of Informaton Technology publshed n 200. Hs research contrbutons n the area of computer scence and nformaton technologes appeared n the nternatonal journals and refereed conference proceedngs publshed by Sprnger-Verlag, Elsever and IEEE. Copyrght 203 MECS I.J. Modern Educaton and Computer Scence, 203, 9, -0