Replication and Consistency

Size: px
Start display at page:

Download "Replication and Consistency"

Transcription

1 Introduction to Replication Replication and onsistency Replication is!replication of data: the maintenance of copies of data at multiple computers!object replication: the maintenance of copies of whole server objects at multiple computers Replication can provide!performance enhancement, scalability Remember caching: improve performance by storing data locally, but data are incomplete Additionally: several web servers can have the same DNS name. The servers are selected by DNS in turn to share the load Replication of read-only data is simple, but replication of frequently changing data causes overhead in providing current data Introduction to Replication When to do Replication?!Increased availability Sometimes needed: nearly 100% of time a service should be available In case of server failures: simply contact another server with the same data items Network partitions and disconnected operations: availability of data if the connection to a server is lost. ut: after re-establishing the connection, (conflicting) data updates have to be resolved!fault-tolerant services Guaranteeing correct behaviour in spite of certain faults (can include timeliness) If f in a group of f+1 servers crash, then 1 remains to supply the service If f in a group of 2f+1 servers have byzantine faults, the group can supply a correct service Replication at service initialisation Try to estimate number of servers needed from a customer's specification regarding performance, availability, fault-tolerance, hoose places to deposit data or objects Example: root servers in DNS Replication 'on demand' When failures or performance bottlenecks occur, make a new replica, possibly placed at a new location in the network Example: web server of an online shop Or: to improve local access operations, place a copy near a client Example: (DNS) caching, disconnected operations

2 Why Object Replication? Requirements for Replicated Data For all of these application areas, one requirement holds: the client should not be aware of using a group of computers! It is useful to consider objects instead of only considering data Objects have the benefit of encapsulating data and operations on data. Thus, object-specific operation requests can be distributed ut: now one has to consider internal object states! This topic is related to mobile agents, but it becomes more complicated in the case of consistent internal states! Replication transparency lients do not work on multiple physical copies, they only see one logical object which they request to do an action lients only expect a single result, not results of each data copy Needed: propagation of updates onsistency How to ensure all data copies to be consistent? Specific degrees depending on the application:! Temporarily inconsistencies could be tolerated! When multiple clients are connected using different copies, they should get consistent results Performance, availability, fault-tolerance consistency, up-to-date information Scalability? Management of Replicated Data Needed: replication transparancy and consistency lient requests are handled by Front Ends. A front end provides replication transparency. y the front ends, clients see a service that gives them access to logical objects, which are in fact replicated at several Replica Managers which guarantee consistency lient request operations:! read-only requests: calls without making updates! update requests: calls executing write operations (but could also execute read operations) asic architecture: Requests and replies lients front ends Service replica managers System Model Each logical object is implemented by a collection of physical copies called replicas (the replicas are not necessarily consistent all the time; some may have received updates not yet delivered to the others) Assumption: asynchronous system where processes fail only by crashing and generally no network partitions occur. Replica managers! ontain replicas on a computer and access them directly! Replica managers apply operations to replicas recoverably, i.e. they do not leave inconsistent results if they crash! Static systems are based on a fixed set of replica managers! In a dynamic system, replica managers may join or leave (e.g. when they crash)! A replica manager can be a state machine which has the following properties: a) Operations are applied atomically b) The current state is a deterministic function of the initial state and the operations applied c) All replicas start identical and carry out the same operations d) The operations must not be affected by clock readings etc.

3 Performing a Request In general: five phases for performing a request on replicated data Issue request. The front end either:! sends the request to a single replica manager that passes it on to the others, or! multicasts the request to all of the replica managers oordination. For consistent execution, the replica managers decide! whether to apply the request (e.g. because of failures)! how to order the request relative to other requests (according to FIFO, causal or total ordering) Execution. The replica managers execute the request (sometimes tentatively) Agreement. The replica managers agree on the effect of the request, e.g. perform it 'lazily' or immediately Response. One or more replica managers reply to the front end, which combines the results:! For high availability, give first response to client! To tolerate byzantine faults, take a vote Group ommunication Strong requirement: replication needs multicast communication (or group communication) to distribute requests to several servers holding the data and to manage the replicated data Required: dynamic membership in groups That means: establish a group membership service which manages dynamic memberships in groups, additionally to multicast communication Group address expansion Group send Multicast communication Process group Leave Fail Join Group membership management A client sending from outside the group does not have to know the group membership. Group Membership Service The group membership service provides: a) An interface for group membership changes reate and destroy process groups Add/remove members (a process can generally belong to several groups) b) A failure detector Monitor members for failures (process crashes or communication failure) Exclude a process from membership when unreachable c) Notification of members when changes in membership occur d) Expand of group addresses Expand a group identifier to the current group membership oordinates delivery when membership is changing Note: IP multicast allows members to join/leave and performs address expansion, but does not support the other features View-synchronous Group ommunication A full membership service maintains group views, which are lists of group members. An new view is generated each time a process is added or excluded. View delivery Ease the life of a programmer: he has not to know about performing consistent operations when group membership changes Instead: notify group members when a change occurs In the ideal case, all processes get the same change information in the same order View synchronous group communication uses reliable mulitcast and view delivery Messages are delivered reliably to all group members All processes agree on the ordering of messages and membership changes A joining process can safely get state from another member Or if one crashes, another will know which operations it had already performed

4 Replication Example How to provide fault-tolerant services, i.e. a service that is provided correct even if some processes fail? Simple replication system: e.g. two replica managers A and each managing replicas of two accounts x and y. lients use the local replica manager if possible, after responding to a client the update is transmitted to the other replica manager lient 1: setalance (x,1) setalance A (y,2) lient 2: getalance A (y) 2 getalance A (x) 0 Strict onsistency There are several correctness criteria for replication regarding consistency. Real world: strict consistency: Any read on a data item x returns a value corresponding to the result of the most recent write on x Problem with strict consistency: it relies on absolute global time Initial balance of x and y is $0 lient 1 first updates x at (local). When updating y it finds has failed, so it uses A for the next operation. lient 2 reads balances at A (local), but because had failed, no update was propagated to A: x has amount 0. e careful when designing replication algorithms! You need a consistency model Linearisability The strictest criterion for a replication system is linearisability onsider a replicated service with two clients, that perform read and update operations o 1i resp. o 2j. ommunication is synchronous, i.e. a client waits for one operation to complete before doing another Single server: serialise the operations by interleaving, e.g. o 20, o 21, o 10, o 22, o 11, o 12 In replication: virtual interleaving; a replicated shared service is linearisable if for any execution there is some interleaving of the series of operations issued by all the clients with! The interleaved sequence of operations meets the specification of a (single) correct copy of the objects! The order of operations in the interleaving is consistent with the real times at which they occurred in the actual execution. Linearisability concerns only the interleaving of individual operations, it is not intended to be transactional Sequential onsistency Problem with linearisability: real-time requirement practically is not to fulfil. Weaker correctness criterion: sequential consistency A replicated shared service is sequential consistent if for any execution there is some interleaving of the series of operations issued by all the clients with! The interleaved sequence of operations meets the specification of a (single) correct copy of the objects! The order of operations in the interleaving is consistent with the program order in which each individual client executed them Every linearisable service is sequential consistent, but not vice versa: lient 1: lient 2: Possible under a naive replication strategy: the update at has not yet setalance (x,1) been propagated to A when client 2 reads it. getalance A (y) 0 The real-time criterion for getalance A (x) 0 linearisability is not satisfied, setalance A (y,2) because the reading of x occurs after its writing. ut: both criteria for sequential consistency are satisfied with the ordering: getalance A (y) 0; getalance A (x) 0; setalance (x,1); setalance A (y,2)

5 more onsistency Summary on onsistency Models Sequential consistency is widely used, but has poor performance. Thus, there are models relaxing the consistency guarantees: ausal onsistency (weakening of sequential consistency) Distinction between events that are causally related and those that are not Example: read(x); write(y) in one process are causally related, because the value of y can depend on the value of x read before. onsistency criterion: Writes that are potentially causally related must be seen by all processes in the same order. oncurrent writes may be seen in a different order on different machines. Necessary: keeping track of which processes have seen which write (vector timestamps) FIFO onsistency (weakening of causal consistency) Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes Easy to implement Weak onsistency, Release onsistency, Entry onsistency onsistency Strict Linearisability Sequential ausal FIFO Description Absolute time ordering of all shared accesses matters All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a global timestamp All processes see all shared accesses in the same order. Accesses are not ordered in time All processes see causally-related shared accesses in the same order All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order Replication Approaches In the following lecture: replication for fault tolerance for highly available services in transactions Fault Tolerance: Passive (primarybackup) Replication Replication for fault tolerance: passive or primary-backup model There is at any time a single primary replica manager and one or more secondary replica managers (backups, slaves) Primary ackup ackup Front ends only communicate with the primary replica manager which executes the operation and sends copies of the updated data to the backups If the primary fails, one of the backups is promoted to act as the primary This system implements linearisability, since the primary sequences all the operations on the shared objects If the primary fails, the system is linearisable, if a single backup takes over exactly where the primary left off view-synchronous group communication can achieve linearisability

6 Execution in Passive Replication There are five phases in performing a client request: 1. Request:! a front end issues the request, containing a unique identifier, to the primary replica manager 2. oordination:! the primary performs each request atomically, in the order in which it receives it! it checks the unique identifier, if it has already executed the request. If yes, it resends the response 3. Execution:! The primary executes the request and stores the response 4. Agreement:! If the request is an update the primary sends the updated state, the response and the unique identifier to all the backups. The backups send an acknowledgement 5. Response:! The primary responds to the front end, which hands the response back to the client Properties of Passive Replication + Simple principle + To survive f process crashes, f +1 replica managers are required + Front end can be designed with little functionality - ut: relatively large overheads by using view-synchronous communication: several rounds of communication, latency after a primary crash because of an agreement to a new view Variation: clients can read from backups! Reduces the work load for the primary! Achieves sequential consistency but not linearisability Sun Network Information System (NIS) uses passive replication with weaker guarantees than sequential consistency:! Achieves high availability and good performance! The master receives updates and propagates them to slaves using one-toone communication.! Information retrieval can be made by using either the master or a slave server Active Replication The replica managers are state machines all playing the same role and are organised as a group: A front end multicasts each request to the group of replica managers All replica managers start in the same state and perform the same operations in the same order so that their state remains identical (notice: totally ordered reliable multicast would be needed to guarantee the identical execution order!) If a replica manager crashes it has no effect on the performance of the service because the others continue as normal yzantine failures can be tolerated because the front end can collect and compare the replies it receives Execution in Active Replication There are the following phases in performing a client request: 1. Request! The front end attaches a unique identifier to a request and uses totally ordered reliable multicast to send it to all replica managers. 2. oordination! The group communication system delivers the requests to all the replica managers in the same (total) order. 3. Execution! Every replica manager executes the request. They are state machines and receive requests in the same order, so the effects for correct replica managers are identical. (Agreement)! No agreement is required because all managers execute the same operations in the same order, due to the properties of the totally ordered multicast. 4. Response The front end collects responses from the managers (and combines them, if necessary).

7 Properties of Active Replication Highly Available Services As replica managers are state machines, we have sequential consistency but do not achieve linearisability (because wo do not have perfectly accurate synchronisation algorithms) Possible to deal with byzantine failures: use 2f +1 replica managers to overrule up to f process failures. For this, a front end has simply to wait for receiving f +1 identical responses from replica managers. Replica managers have to digitally sign their responses! Variations: Allow a sequence of read operations to be executed from different replica managers in any order Or: allow the front end to send a read-only request to only one replica manager (if this manager fails, the front end contacts the next one) Allow write operations on different data items to be executed in any order How to provided services with high availability, i.e. how to use replication to achieve an availabilty of (ideally) 100%? Reasonable response times for as much of the time as possible Even if some results do not conform to sequential consistency e.g. a disconnected user may accept temporarily inconsistent results if he can continue to work and fix inconsistencies later Difference to fault tolerant systems For fault tolerance, all replica managers have to agree on a result 'eager' evaluation, not acceptable for reaching reasonable response times Instead for high availability:! Only contact a minimum number of replica managers necessary to reach an acceptable level of service! lient should tied up for a minimum time while managers coordinate their actions! Weaker consistency generally requires less agreement and makes data more available. Updates are propagated 'lazily'. The Gossip Architecture The gossip architecture is a framework for implementing highly available services! Data is replicated close to the location of clients! Replica managers periodically exchange gossip messages containing updates they have received from clients Two basic types of operations are provided: queries - read only operations updates -modify (but do not read) the state Front ends choose any replica manager to send queries and updates to, selected by availability and response times Two guarantees are made (even if managers are temporarily unable to communicate with one another): Each client gets a consistent service over time (i.e. the data reflects at least the updates seen by the client so far, even if it uses different replica managers). Vector timestamps are used with one entry per manager. Relaxed consistency between replicas. All replica managers eventually receive all updates and use ordering guarantees to suit the needs of the application (generally causal ordering). A client may observe stale data. Query & Update Operations The service consists of a collection of replica managers that exchange gossip messages Queries and updates are sent by a client via a front end to any replica manager Query, prev Query Val, new Val Service Gossip Update, prev Update Update id The front end converts operations containing both, reading and writing, in two separate calls For ordering operations, the front end sends a timestamp prev with each request to denote its latest state A new timestamp new is passed back in a read operation to mark the data state the client has seen last

8 Timestamps Each front end keeps a vector timestamp that reflects the latest data value seen by the front end (prev) lients can communicate with each other. This can lead to causal relationships between client operations which has to be considered in the replicated system. Thus, communication is made via the front ends including an exchange of vector timestamps allowing the front ends to consider causal ordering in their timestamps Service Gossip Vector timestamps Gossip Processing of Queries and Updates Phases in performing a client request: 1. Request Front ends normally use the same replica manager for all operations Front ends may be blocked on queries 2. Update response The replica manager replies as soon as it has received the update 3. oordination The replica manager receiving a request waits to process it until the ordering constraints apply. This may involve receiving updates from other replica managers in gossip messages 4. Execution The replica manager executes the request 5. Query response If the request is a query the replica manager now replies 6. Agreement Replica managers update one another by exchanging gossip messages with the most recent received updates. This has not to be done for each update separately Gossip Replica Manager Replica Manager State Gossip messages Other replica managers Replica timestamp Updates Update log Replica timestamp Timestamp table Stable updates Replica log OperationID Update Prev Value Replica manager Value timestamp Executed operation table Main state components of a replica manager: Value: refelects the application state as maintained by the replica manager (each manager is a state machine). It begins with a defined initial value and is applied update operations to Value timestamp: the vector timestamp that reflects the updates applied to get the saved value Executed operation table: prevents an operation from being applied twice, e.g. if received from other replica managers as well as the front end Timestamp table: contains a vector timestamp for each other replica manager, extracted from gossip messages. Update log: all update operations are recorded immediately when they are received. An operation is held back until ordering allows it to be applied Replica timestamp: a vector timestamp indicating updates accepted by the manager, i.e. placed in the log (different from value s timestamp if some updates are not yet stable).

9 Query & Update Operations Gossip Messages A query includes a timestamp. The operation is marked as pending until the manager's timestamp is larger then the operation's timestamp. Update operations are processed in causal order. A front end can send an update operation containing timestamp and idenifier id to one or several replica managers! When replica manager i receives an update request, it checks whether it is new, by looking for the id in its executed ops table and its log! If it is new, the replica manager - increments by 1 the ith element of its replica timestamp, - assigns the result as new timestamp ts to the update, and - stores the update in its log.! The replica manager returns ts to the front end, which merges it with its vector timestamp (Note: by sending a request to several managers the front end gets back several timestamps which have to be merged)! Depending on the timestamp for an operation, it can be delayed till gossip messages from other replica managers arrive. When the timestamp allows, the replica manager applies the operation and makes an entry in the executed operation table The timestamp table contains a vector timestamp for each other replica, collected from gossip messages A replica manager uses entries in this timestamp table to estimate which updates another manager has not yet received. These information are sent in a gossip message A gossip message m contains the log m.log and the replica timestamp m.ts A manager receiving gossip message m has the following main tasks:! Merge the arriving log with its own! Apply in causal order updates that are new and have become stable! Remove redundant entries from the log and executed operation table when it is known that they have been applied by all replica managers! Merge its replica timestamp with m.ts, so that it corresponds to the additions in the log Update Propagation The given architecture does not specify when to exchange gossip messages. To design a robust system in which each update is propagated in reasonable time, an exchange strategy is needed. The time which is required for all replica managers to receive a given update depends upon three factors: 1. The frequency and duration of network partitions This is beyond the system s control 2. The frequency with which replica managers send gossip messages This may be tuned to the application 3. The policy for choosing a partner with which to exchange gossip messages Random policies choose a partner randomly but with weighted probabilities so as to favour some partners over others Deterministic policies give fixed communication partners Topological policies arrange the replica managers into a fixed graph (mesh, circle, tree, ) and messages are passed to the neighbours Other strategies: consider transmission latencies, fault probabilities, Properties of the Gossip Architecture The gossip architecture is designed to provide a highly available service + lients with access to a single replica manager can work even when other managers are inaccessible - but this is not suitable for data such as bank accounts - it is inappropriate for updating replicas in real time Scalability - as the number of replica managers grows, so does the number of gossip messages -for R managers collecting G updates in a gossip message, the number of messages per request is 2 + (R-1)/G Variation: increase G and improve the number of gossip messages, but make latency worse For applications where queries are more frequent than updates, use some read-only replicas placed near the client, which are updated only by gossip messages

10 Operational Transformation Approach The so-called ayou system provides data replication for high availability with weaker guarantees than sequential consistency ayou replica managers cope with variable connectivity by exchanging updates in pairs (like in gossip architecture), but it adopts a markedly different approach in that it enables domain specific conflict detection and resolution to take place All updates are applied and recorded at whatever replica manager they reach Replica manager detect and resolve conflicts when exchanging information by using domain-specific policies. The effect of undo or alter conflicting operations to resolve them is called operational transformation ayou update is a special case of a transaction. It is carried out with the AID guarantees. ayou may undo and redo updates to the database as execution proceeds The ayou guarantee is that, eventually, every replica manager receives the same set of updates and applies those updates in such a way that the replica managers' databases are identical In practice, there may be a continuous stream of updates, and the databases may never become identical; but they would become identical if the updates ceased ommitted and Tentative Updates Updates are marked as tentative when they are first applied to the database. While they are in this state, they can be undone and re-applied if necessary Tentative updates are eventually placed in a canonical order and marked as committed The committed order can be achieved by designating some replica manager as the primary replica manger deciding an order by receiving date A tentative update t i becomes the next committed update and is inserted after the last committed update c N : ommitted c 0 c 1 c 2 c N t 0 t 1 t 2 t i t i+1 All updates t 0 to t i-1 are to be reapplied after it Tentative Dependency hecks and Merge Procedures Every ayou update contains a dependency check and merge procedure in addition to the operation's specification because of the possibility that an update may conflict with other operation that has already been applied: A replica manager calls the dependency check procedure before applying the operation It would check whether a conflict would occur if the update was applied and it may examine any part of of the database to do that If the dependency check indicates a conflict, then ayou invokes the operation's merge procedure That procedure alerts the operation that will be applied so that it achieves something similar to the intended effect but avoids a conflict The merge procedure may fail to find a suitable alteration of the operation, in which case the system indicates an error The effect of a merge procedure is deterministic. However ayou replica mangers are state machines Problem with ayou ayou is different from other approaches because it makes replication non-transparent to the application Increased complexity for the application programmer: provide dependency checks and merge procedures Increased complexity for the user: getting tentative data and alteration of user-specified operations The operational transformation approach used by ayou appears in systems for computer supported cooperative work (SW) This approach is limited in practice to situations where only few conflicts arise, users can deal with tentative data and where data semantics are simple

11 Transactions with Replicated Data Till now: consideration of only single operations on a set of replicas. ut: objects in transactional systems can be replicated to enhance availability and performance. How to deal with atomic sequences of operations? The effect of transactions on replicated objects should be the same as if they had been performed one at a time on a single set of objects This property is called one-copy serialisability Each replica manager provides concurrency control and recovery of its own objects Assumption for presented methods: two-phase locking is used Replication makes recovery more complicated when a replica manager recovers, it restores its objects with information from other managers Architectures for Replicated Transactions Assumption: a front end sends requests to one of a group of replica managers In the primary copy approach, all front ends communicate with the primary replica manager which propagates updates to back-ups In other schemes, front ends may communicate with any replica manager and coordination between them has to take place. The manager receiving a request is responsible for the cooperation process (Note: rules as to how many managers are involved vary with the replication scheme) Question: propagate requests immediately or at the end of a transaction? In the primary copy scheme, we can wait until end of transaction (concurrency control is applied at the primary) If transactions access the same objects at different managers, requests need to be propagated so that concurrency control can be applied Two-phase commit protocol ecomes a two-level nested 2P. If a coordinator or worker is a replica manager it will communicate with other managers that it passed requests to during the transaction Simple Scheme: Read One/Write All (ROWA) One replica manager is required for a read request, all replica managers for a write request!every write operation must be performed at all managers, each of which applies a write lock!each read operation is performed by a single manager, which sets a read lock onsider pairs of operations by different transactions on the same object:!any pair of write operations will require conflicting locks at all of the managers!a read operation and a write operation will require conflicting locks at a single manager y this, one-copy serialisability is achieved Replica managers lient + front end getalance(a) T lient + front end deposit(,3) U Replica managers Available opies Replication The simple read one/write all scheme is not realistic: It cannot be carried out if some of the replica managers are unavailable either because the have crashed or because of a communication failure The available copies replication scheme is designed to allow some managers to be temporarily unavailable A read request can be performed by any available replica manager Write requests are performed by the receiving manager and all other available managers in the group As long as the set of available managers does not change, local concurrency control achieves one-copy serialisability in the same way as read one/write all Problems with this occur if a manager fails/recovers during the progress of conflicting transactions A A A

12 Available opies Read One/Write All Available (ROWA-A) T s getalance is performed by X whereas Ts deposit is performed by M, N and P At X, T has read A and has locked it. Therefore U s deposit is delayed until T finishes Local concurrency control achieves one-copy serialisability provided the set of replica managers does not change. but we have managers failing and recovering lient + front end getalance(a) deposit(,3); X A T Replica managers delay Y A U M lient + front end getalance() deposit(a,3); P Replica managers N Replica Manager Failure An replica manager can fail by crashing and is replaced by a new process, which restores the state from a recovery file Front ends use timeouts to detect a manager failure. In case of a timeout, another manager is tried If a replica manager is doing recovery, it is currently not up to date and rejects requests (and the front end tries another replica manager) For one-copy serialisability, failures and recoveries have to be serialised with respect to transactions! A transaction observes when a failure occurs! One-copy serializability is not achieved if different transactions make conflicting failure observations! In addition to local concurrency control, some global concurrency control is required to prevent inconsistent results between a read in one transaction and a write in another transaction lient + front end X A Replica Manager Failure: read/write onflict Replica manager X fails just after T has performed getalance Replica manager N fails just after U has performed getalance. oth managers fail before T and U have performed their deposit operations!therefore T s deposit will be performed at replica managers M and P (all available)!u s deposit will be performed at manager Y (all available). oncurrency control at X does not prevent U from updating A at Y oncurrency control at M does not prevent T from updating at M and P getalance(a) deposit(,3); T Replica managers Y A U M lient + front end getalance() deposit(a,3); P Replica managers N Additional oncurrency ontrol: Local Validation Goal: ensure that in a transaction's progress no failure or recovery occurs efore a transaction commits, it checks for failures and recoveries of the replica managers it has contacted! T would check if N is still unavailable and that X, M and P are still available.! If this is the case, T can commit. - This implies that X failed after T validated and before U validated - We have N fails T commits X fails U validates! U checks if N is still available (no) and X still unavailable Therefore U must abort In case of a failure or recovery:! When a transaction has observed a failure, its local validation communicates with the failed manager to ensure that it has not yet recovered! The check if replica managers have failed since starting a transaction, can be combined with the 2P protocol

13 Network partitions Available opies with Validation Part of the network fails creating sub-groups, which cannot communicate with one another Replication schemes assume partitions will be repaired! Operations done during a partition must not cause inconsistency! Optimistic schemes (e.g available copies with validation) allow all operations and resolve inconsistencies when a partition is repaired! Pessimistic schemes (e.g. quorum consensus) prevent inconsistency e.g. by limiting availability in all but one sub-group lient + front end withdraw(, 4) T Network partition lient + front end deposit(,3) U Replica managers Optimistic approach The algorithm is applied within each partition Maintains the normal level of availability for read operations, even during partitions When a partition is repaired, the possibly conflicting transactions that took place in the separate partitions are validated: if the validation fails then some steps must be taken to overcome the inconsistencies if there had been no partition, one of a pair of transactions with conflicting operations would have been delayed or aborted As there has been a partition, pairs of conflicting transactions have been allowed to commit in different partitions then the only choice after the event is to abort one of them, this requires making changes in the objects and in some cases, compensating effects in the real world The optimistic approach is only feasible with applications where such compensating actions can be taken. Quorum onsensus Methods To prevent transactions in different partitions from producing inconsistent results, make a rule that operations can be performed in only one of the partitions Replica managers in different partitions cannot communicate, thus each subgroup decides independently whether they can perform operations A quorum is a sub-group of replica managers whose size gives it the right to perform operations. The right could be given by having the majority of the replica managers in the partition In quorum consensus schemes, update operations may be performed by a subset of the replica managers forming a quorum! The other replica managers have out-of-date copies! Version numbers or timestamps can be used to determine which copies are up-to-date! Operations are applied only to copies with the current version number Gifford s Quorum onsensus Assign a number of votes to each data copy at a replica manager A vote is a weighting giving the desirability of using a particular copy Thus, groups of replica managers can be configured to consider different performance or reliability characteristics each read operation must obtain a read quorum of R votes before it can read from any up-to-date copy Each write operation must obtain a write quorum of W votes before it can do an update operation. R and W are set for a group of replica managers such that :! W > half the total votes! R + W > total number of votes for the group Ensure that any pair contains common copies (i.e. a read quorum and a write quorum or two write quora). y this, in case of a partition it is not possible to perform conflicting operations on the same data file in different partitions. Performance and reliability of write resp. read operations are increased with decreasing W resp. R Main disadvantage: performance of read operations is degraded

14 Quora Gifford s Quorum onsensus Three examples of the voting algorithm: a) A correct choice of read and write set b) A choice that may lead to write-write conflicts c) A correct choice, coming to ROWA (read one, write all) efore a read operation, a read quorum is collected:! Make enquiries at replica managers to find a set of copies, the sum of whose votes is not less than R (not all of these copies need be up to date)! As each read quorum overlaps with every write quorum, every read quorum is certain to include at least one current copy! The read operation may be applied to any up-to-date copy efore a write operation, a write quorum is collected! Make enquiries at replica managers to find a set with up-to-date copies, the sum of whose votes is not less than W! If there are insufficient up-to-date copies, then an out-of-date file is replaced with a current one, to enable the quorum to be established! The write operation is then applied by each replica manager in the write quorum, the version number is incremented, and completion is reported to the client! The files at the remaining available managers are then updated in the background Two-phase read/write locking is used for concurrency control Gifford s Quorum onsensus Examples Gifford s Quorum onsensus Examples Example 1 Example 2 Example 3 Latency Replica (milliseconds) Replica Replica Voting Replica configuration Replica Replica Quorum R sizes W Derived performance of file suite: Read Latency locking probability Write Latency locking probability Derived performance latency blocking probability - probability that a quorum cannot be obtained, assuming probability of 0.01 that any single replica manager is unavailable Example 1 Is configured for a file with high read to write ratio with several weak representatives and a single replica manager Replication is used for performance, not reliability The replica manager can be accessed in 75 ms and the two clients can access their weak representatives on local discs in 65 ms, resulting in lower latency and less network traffic. Example 2 Is configured for a file with a moderate read to write ratio which is accessed mainly from one local network. Local replica manager has 2 votes and remote managers 1 vote each reads can be done at the local replica manager, but writes must access one local and one remote manager. If the local manager fails only reads are allowed Example 3 Is configured for a file with a very high read to write ratio reads can be done at any replica manager and the probability of the file being unavailable is small. ut writes must access all managers

Distributed Software Systems

Distributed Software Systems Consistency and Replication Distributed Software Systems Outline Consistency Models Approaches for implementing Sequential Consistency primary-backup approaches active replication using multicast communication

More information

Database Replication Techniques: a Three Parameter Classification

Database Replication Techniques: a Three Parameter Classification Database Replication Techniques: a Three Parameter Classification Matthias Wiesmann Fernando Pedone André Schiper Bettina Kemme Gustavo Alonso Département de Systèmes de Communication Swiss Federal Institute

More information

Database Replication with Oracle 11g and MS SQL Server 2008

Database Replication with Oracle 11g and MS SQL Server 2008 Database Replication with Oracle 11g and MS SQL Server 2008 Flavio Bolfing Software and Systems University of Applied Sciences Chur, Switzerland www.hsr.ch/mse Abstract Database replication is used widely

More information

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps (Pessimistic) stamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed

More information

Distributed Systems (5DV147) What is Replication? Replication. Replication requirements. Problems that you may find. Replication.

Distributed Systems (5DV147) What is Replication? Replication. Replication requirements. Problems that you may find. Replication. Distributed Systems (DV47) Replication Fall 20 Replication What is Replication? Make multiple copies of a data object and ensure that all copies are identical Two Types of access; reads, and writes (updates)

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

Replication architecture

Replication architecture Replication INF 5040 autumn 2007 lecturer: Roman Vitenberg INF5040, Roman Vitenberg 1 Replication architecture Client Front end Replica Client Front end Server Replica INF5040, Roman Vitenberg 2 INF 5040

More information

Information sharing in mobile networks: a survey on replication strategies

Information sharing in mobile networks: a survey on replication strategies Technical Report RT/015/03 Information sharing in mobile networks: a survey on replication strategies João Pedro Barreto Instituto Superior Técnico/Distributed Systems Group, Inesc-ID Lisboa joao.barreto@inesc-id.pt

More information

Distributed Databases

Distributed Databases C H A P T E R19 Distributed Databases Practice Exercises 19.1 How might a distributed database designed for a local-area network differ from one designed for a wide-area network? Data transfer on a local-area

More information

Dr Markus Hagenbuchner markus@uow.edu.au CSCI319. Distributed Systems

Dr Markus Hagenbuchner markus@uow.edu.au CSCI319. Distributed Systems Dr Markus Hagenbuchner markus@uow.edu.au CSCI319 Distributed Systems CSCI319 Chapter 8 Page: 1 of 61 Fault Tolerance Study objectives: Understand the role of fault tolerance in Distributed Systems. Know

More information

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas 3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

INTRODUCTION TO DATABASE SYSTEMS

INTRODUCTION TO DATABASE SYSTEMS 1 INTRODUCTION TO DATABASE SYSTEMS Exercise 1.1 Why would you choose a database system instead of simply storing data in operating system files? When would it make sense not to use a database system? Answer

More information

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

CHAPTER 6: DISTRIBUTED FILE SYSTEMS CHAPTER 6: DISTRIBUTED FILE SYSTEMS Chapter outline DFS design and implementation issues: system structure, access, and sharing semantics Transaction and concurrency control: serializability and concurrency

More information

Database Tuning and Physical Design: Execution of Transactions

Database Tuning and Physical Design: Execution of Transactions Database Tuning and Physical Design: Execution of Transactions David Toman School of Computer Science University of Waterloo Introduction to Databases CS348 David Toman (University of Waterloo) Transaction

More information

Database Concurrency Control and Recovery. Simple database model

Database Concurrency Control and Recovery. Simple database model Database Concurrency Control and Recovery Pessimistic concurrency control Two-phase locking (2PL) and Strict 2PL Timestamp ordering (TSO) and Strict TSO Optimistic concurrency control (OCC) definition

More information

Synchronization in. Distributed Systems. Cooperation and Coordination in. Distributed Systems. Kinds of Synchronization.

Synchronization in. Distributed Systems. Cooperation and Coordination in. Distributed Systems. Kinds of Synchronization. Cooperation and Coordination in Distributed Systems Communication Mechanisms for the communication between processes Naming for searching communication partners Synchronization in Distributed Systems But...

More information

Geo-Replication in Large-Scale Cloud Computing Applications

Geo-Replication in Large-Scale Cloud Computing Applications Geo-Replication in Large-Scale Cloud Computing Applications Sérgio Garrau Almeida sergio.garrau@ist.utl.pt Instituto Superior Técnico (Advisor: Professor Luís Rodrigues) Abstract. Cloud computing applications

More information

Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control

Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control Database Management Systems Fall 2001 CMPUT 391: Transactions & Concurrency Control Dr. Osmar R. Zaïane University of Alberta Chapters 18 and 19 of Textbook Course Content Introduction Database Design

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

Database Replication with MySQL and PostgreSQL

Database Replication with MySQL and PostgreSQL Database Replication with MySQL and PostgreSQL Fabian Mauchle Software and Systems University of Applied Sciences Rapperswil, Switzerland www.hsr.ch/mse Abstract Databases are used very often in business

More information

Survey on Comparative Analysis of Database Replication Techniques

Survey on Comparative Analysis of Database Replication Techniques 72 Survey on Comparative Analysis of Database Replication Techniques Suchit Sapate, Student, Computer Science and Engineering, St. Vincent Pallotti College, Nagpur, India Minakshi Ramteke, Student, Computer

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999 The ConTract Model Helmut Wächter, Andreas Reuter November 9, 1999 Overview In Ahmed K. Elmagarmid: Database Transaction Models for Advanced Applications First in Andreas Reuter: ConTracts: A Means for

More information

Database Replication

Database Replication Database Systems Journal vol. I, no. 2/2010 33 Database Replication Marius Cristian MAZILU Academy of Economic Studies, Bucharest, Romania mariuscristian.mazilu@gmail.com, mazilix@yahoo.com For someone

More information

Chapter 10. Backup and Recovery

Chapter 10. Backup and Recovery Chapter 10. Backup and Recovery Table of Contents Objectives... 1 Relationship to Other Units... 2 Introduction... 2 Context... 2 A Typical Recovery Problem... 3 Transaction Loggoing... 4 System Log...

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Seth Gilbert Nancy Lynch Abstract When designing distributed web services, there are three properties that

More information

Review: The ACID properties

Review: The ACID properties Recovery Review: The ACID properties A tomicity: All actions in the Xaction happen, or none happen. C onsistency: If each Xaction is consistent, and the DB starts consistent, it ends up consistent. I solation:

More information

A Framework for Highly Available Services Based on Group Communication

A Framework for Highly Available Services Based on Group Communication A Framework for Highly Available Services Based on Group Communication Alan Fekete fekete@cs.usyd.edu.au http://www.cs.usyd.edu.au/ fekete Department of Computer Science F09 University of Sydney 2006,

More information

Global Server Load Balancing

Global Server Load Balancing White Paper Overview Many enterprises attempt to scale Web and network capacity by deploying additional servers and increased infrastructure at a single location, but centralized architectures are subject

More information

A Reputation Replica Propagation Strategy for Mobile Users in Mobile Distributed Database System

A Reputation Replica Propagation Strategy for Mobile Users in Mobile Distributed Database System A Reputation Replica Propagation Strategy for Mobile Users in Mobile Distributed Database System Sashi Tarun Assistant Professor, Arni School of Computer Science and Application ARNI University, Kathgarh,

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

Naming vs. Locating Entities

Naming vs. Locating Entities Naming vs. Locating Entities Till now: resources with fixed locations (hierarchical, caching,...) Problem: some entity may change its location frequently Simple solution: record aliases for the new address

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Basics Of Replication: SQL Server 2000

Basics Of Replication: SQL Server 2000 Basics Of Replication: SQL Server 2000 Table of Contents: Replication: SQL Server 2000 - Part 1 Replication Benefits SQL Server Platform for Replication Entities for the SQL Server Replication Model Entities

More information

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides

More information

Informix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11

Informix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11 Informix Dynamic Server May 2007 Availability Solutions with Informix Dynamic Server 11 1 Availability Solutions with IBM Informix Dynamic Server 11.10 Madison Pruet Ajay Gupta The addition of Multi-node

More information

Chapter 14: Recovery System

Chapter 14: Recovery System Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure

More information

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit Ghazi Alkhatib Senior Lecturer of MIS Qatar College of Technology Doha, Qatar Alkhatib@qu.edu.sa and Ronny

More information

Data Distribution with SQL Server Replication

Data Distribution with SQL Server Replication Data Distribution with SQL Server Replication Introduction Ensuring that data is in the right place at the right time is increasingly critical as the database has become the linchpin in corporate technology

More information

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases Distributed Architectures Distributed Databases Simplest: client-server Distributed databases: two or more database servers connected to a network that can perform transactions independently and together

More information

Distributed Database Management Systems

Distributed Database Management Systems Distributed Database Management Systems (Distributed, Multi-database, Parallel, Networked and Replicated DBMSs) Terms of reference: Distributed Database: A logically interrelated collection of shared data

More information

Eventually Consistent

Eventually Consistent Historical Perspective In an ideal world there would be only one consistency model: when an update is made all observers would see that update. The first time this surfaced as difficult to achieve was

More information

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina Database Systems Lecture 15 Natasha Alechina In This Lecture Transactions Recovery System and Media Failures Concurrency Concurrency problems For more information Connolly and Begg chapter 20 Ullmanand

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper Connectivity Alliance Access 7.0 Database Recovery Information Paper Table of Contents Preface... 3 1 Overview... 4 2 Resiliency Concepts... 6 2.1 Database Loss Business Impact... 6 2.2 Database Recovery

More information

Definition of SOA. Capgemini University Technology Services School. 2006 Capgemini - All rights reserved November 2006 SOA for Software Architects/ 2

Definition of SOA. Capgemini University Technology Services School. 2006 Capgemini - All rights reserved November 2006 SOA for Software Architects/ 2 Gastcollege BPM Definition of SOA Services architecture is a specific approach of organizing the business and its IT support to reduce cost, deliver faster & better and leverage the value of IT. November

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

Design Patterns for Distributed Non-Relational Databases

Design Patterns for Distributed Non-Relational Databases Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

The CAP theorem and the design of large scale distributed systems: Part I

The CAP theorem and the design of large scale distributed systems: Part I The CAP theorem and the design of large scale distributed systems: Part I Silvia Bonomi University of Rome La Sapienza www.dis.uniroma1.it/~bonomi Great Ideas in Computer Science & Engineering A.A. 2012/2013

More information

Transaction Management Overview

Transaction Management Overview Transaction Management Overview Chapter 16 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Transactions Concurrent execution of user programs is essential for good DBMS performance. Because

More information

Locality Based Protocol for MultiWriter Replication systems

Locality Based Protocol for MultiWriter Replication systems Locality Based Protocol for MultiWriter Replication systems Lei Gao Department of Computer Science The University of Texas at Austin lgao@cs.utexas.edu One of the challenging problems in building replication

More information

Virtual Full Replication for Scalable. Distributed Real-Time Databases

Virtual Full Replication for Scalable. Distributed Real-Time Databases Virtual Full Replication for Scalable Distributed Real-Time Databases Thesis Proposal Technical Report HS-IKI-TR-06-006 Gunnar Mathiason gunnar.mathiason@his.se University of Skövde June, 2006 1 Abstract

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. 23 Concurrency Control Part -4 In the last lecture, we have

More information

Chapter 15: Recovery System

Chapter 15: Recovery System Chapter 15: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Shadow Paging Recovery With Concurrent Transactions Buffer Management Failure with Loss of

More information

G22.3250-001. Porcupine. Robert Grimm New York University

G22.3250-001. Porcupine. Robert Grimm New York University G22.3250-001 Porcupine Robert Grimm New York University Altogether Now: The Three Questions! What is the problem?! What is new or different?! What are the contributions and limitations? Porcupine from

More information

Dependability in Web Services

Dependability in Web Services Dependability in Web Services Christian Mikalsen chrismi@ifi.uio.no INF5360, Spring 2008 1 Agenda Introduction to Web Services. Extensible Web Services Architecture for Notification in Large- Scale Systems.

More information

An Overview of Distributed Databases

An Overview of Distributed Databases International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview

More information

Recovery System C H A P T E R16. Practice Exercises

Recovery System C H A P T E R16. Practice Exercises C H A P T E R16 Recovery System Practice Exercises 16.1 Explain why log records for transactions on the undo-list must be processed in reverse order, whereas redo is performed in a forward direction. Answer:

More information

THE BCS PROFESSIONAL EXAMINATIONS BCS Level 6 Professional Graduate Diploma in IT. April 2009 EXAMINERS' REPORT. Network Information Systems

THE BCS PROFESSIONAL EXAMINATIONS BCS Level 6 Professional Graduate Diploma in IT. April 2009 EXAMINERS' REPORT. Network Information Systems THE BCS PROFESSIONAL EXAMINATIONS BCS Level 6 Professional Graduate Diploma in IT April 2009 EXAMINERS' REPORT Network Information Systems General Comments Last year examiners report a good pass rate with

More information

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper Connectivity Alliance 7.0 Recovery Information Paper Table of Contents Preface... 3 1 Overview... 4 2 Resiliency Concepts... 6 2.1 Loss Business Impact... 6 2.2 Recovery Tools... 8 3 Manual Recovery Method...

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

IT Service Management

IT Service Management IT Service Management Service Continuity Methods (Disaster Recovery Planning) White Paper Prepared by: Rick Leopoldi May 25, 2002 Copyright 2001. All rights reserved. Duplication of this document or extraction

More information

Installing and Configuring a SQL Server 2014 Multi-Subnet Cluster on Windows Server 2012 R2

Installing and Configuring a SQL Server 2014 Multi-Subnet Cluster on Windows Server 2012 R2 Installing and Configuring a SQL Server 2014 Multi-Subnet Cluster on Windows Server 2012 R2 Edwin Sarmiento, Microsoft SQL Server MVP, Microsoft Certified Master Contents Introduction... 3 Assumptions...

More information

Lecture 7: Concurrency control. Rasmus Pagh

Lecture 7: Concurrency control. Rasmus Pagh Lecture 7: Concurrency control Rasmus Pagh 1 Today s lecture Concurrency control basics Conflicts and serializability Locking Isolation levels in SQL Optimistic concurrency control Transaction tuning Transaction

More information

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES 1 ALIREZA POORDAVOODI, 2 MOHAMMADREZA KHAYYAMBASHI, 3 JAFAR HAMIN 1, 3 M.Sc. Student, Computer Department, University of Sheikhbahaee,

More information

Scalability. We can measure growth in almost any terms. But there are three particularly interesting things to look at:

Scalability. We can measure growth in almost any terms. But there are three particularly interesting things to look at: Scalability The ability of a system, network, or process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. We can measure growth in almost

More information

Microsoft SQL Server 2005 Database Mirroring

Microsoft SQL Server 2005 Database Mirroring Microsoft SQL Server 2005 Database Mirroring Applied Technology Guide Abstract This document reviews the features and usage of SQL Server 2005, Database Mirroring. May 2007 Copyright 2007 EMC Corporation.

More information

! Volatile storage: ! Nonvolatile storage:

! Volatile storage: ! Nonvolatile storage: Chapter 17: Recovery System Failure Classification! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management!

More information

A Dell Technical White Paper Dell Storage Engineering

A Dell Technical White Paper Dell Storage Engineering Networking Best Practices for Dell DX Object Storage A Dell Technical White Paper Dell Storage Engineering THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND

More information

Tunnel Broker System Using IPv4 Anycast

Tunnel Broker System Using IPv4 Anycast Tunnel Broker System Using IPv4 Anycast Xin Liu Department of Electronic Engineering Tsinghua Univ. lx@ns.6test.edu.cn Xing Li Department of Electronic Engineering Tsinghua Univ. xing@cernet.edu.cn ABSTRACT

More information

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES By: Edward Whalen Performance Tuning Corporation INTRODUCTION There are a number of clustering products available on the market today, and clustering has become

More information

Chapter 16: Recovery System

Chapter 16: Recovery System Chapter 16: Recovery System Failure Classification Failure Classification Transaction failure : Logical errors: transaction cannot complete due to some internal error condition System errors: the database

More information

ZooKeeper. Table of contents

ZooKeeper. Table of contents by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 4. Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ NoSQL Overview Main objective:

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

A new Cluster Resource Manager for heartbeat

A new Cluster Resource Manager for heartbeat A new Cluster Resource Manager for heartbeat Lars Marowsky-Brée lmb@suse.de January 2004 1 Overview This paper outlines the key features and design of a clustered resource manager to be running on top

More information

Ph.D. Thesis Proposal Database Replication in Wide Area Networks

Ph.D. Thesis Proposal Database Replication in Wide Area Networks Ph.D. Thesis Proposal Database Replication in Wide Area Networks Yi Lin Abstract In recent years it has been shown that database replication is promising in improving performance and fault tolerance of

More information

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1) Naming vs. Locating Entities Till now: resources with fixed locations (hierarchical, caching,...) Problem: some entity may change its location frequently Simple solution: record aliases for the new address

More information

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s) v. Test Node Selection Having a geographically diverse set of test nodes would be of little use if the Whiteboxes running the test did not have a suitable mechanism to determine which node was the best

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Tushar Joshi Turtle Networks Ltd

Tushar Joshi Turtle Networks Ltd MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper. The EMSX Platform A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks A White Paper November 2002 Abstract: The EMSX Platform is a set of components that together provide

More information

Software Life-Cycle Management

Software Life-Cycle Management Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema

More information

Textbook and References

Textbook and References Transactions Qin Xu 4-323A Life Science Building, Shanghai Jiao Tong University Email: xuqin523@sjtu.edu.cn Tel: 34204573(O) Webpage: http://cbb.sjtu.edu.cn/~qinxu/ Webpage for DBMS Textbook and References

More information

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師 Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Availability Digest. MySQL Clusters Go Active/Active. December 2006 the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of

More information

Outline. Failure Types

Outline. Failure Types Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Module 15: Network Structures

Module 15: Network Structures Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and

More information

Data Consistency on Private Cloud Storage System

Data Consistency on Private Cloud Storage System Volume, Issue, May-June 202 ISS 2278-6856 Data Consistency on Private Cloud Storage System Yin yein Aye University of Computer Studies,Yangon yinnyeinaye.ptn@email.com Abstract: Cloud computing paradigm

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

DCaaS: Data Consistency as a Service for Managing Data Uncertainty on the Clouds

DCaaS: Data Consistency as a Service for Managing Data Uncertainty on the Clouds DCaaS: Data Consistency as a Service for Managing Data Uncertainty on the Clouds Islam Elgedawy Computer Engineering Department, Middle East Technical University, Northern Cyprus Campus, Guzelyurt, Mersin

More information

1 Storage Devices Summary

1 Storage Devices Summary Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious

More information