HAT not CAP: Highly Available Transactions

Size: px
Start display at page:

Download "HAT not CAP: Highly Available Transactions"

Transcription

1 HAT not CAP: Highly Available Transactions Talk at Dagstuhl Seminar 13081, February Draft Paper at Peter Bailis (UCBerkeley), Alan Fekete (U of Sydney), Ali Ghodsi (UC Berkeley, KTH), Joseph M. Hellerstein (UC Berkeley), Ion Stoica (UC Berkeley) Presented by Alan Fekete (University of Sydney)

2 Internet-Scale Data Storage Many early systems* offered scalability and availability but missed functionality expected in traditional database management platforms (=> NoSQL ) Access by id/key [without content-based access, without joins] Operations may see stale data Lack all-or-nothing combining ops across items *eg BigTable, PNUTS, S3, Dynamo, 2 MongoDB, Cassandra, SimpleDB, Riak

3 Wouldn t it be nice if More recent papers and systems for internet-scale data offer extra features beyond early NoSQL approaches, including some familiar from DBMS (Choice of) more consistency in an operation Richer operations Grouping operations on multiple items Our focus is transactions: ways to group operations on multiple items 3

4 Returning stale data Allowing weak consistency (return of stale data) in accesses was justified by CAP result: For single item access, you can t offer strong consistency read and write, that will be always available, if partitions are possible in the system Conjecture of Brewer (2000), proved by Gilbert and Lynch (2002) 4

5 Not supporting transactions? A system can t provide serializable transactions that are always available if the system can partition This was known long before Brewer; see Davidson et al (ACM Computing Surveys 1985) 5

6 ACID Transactions with weaker I Serializability is the ideal for isolation of transactions but most transactions on (conventional, single site) dbms don t run serializably Read Committed is often the default level 6

7 HAT We propose a useful model for programmers is offer txns that can be arbitrary collection of accesses to arbitrary sets of read/write objects, with semantics chosen to be as strong as feasible to implement with availability even when partitioned 7

8 Available? Clearly not possible if client is partitioned away from its data However, we should tolerate partition between item replicas within the data store So, we ask for: IF client can get to one replica of each item it asks for, THEN transaction can eventually commit (or it aborts volontarily) 8

9 Semantics for HAT We show you can offer available transactions that have All-or-nothing atomicity Causal consistency (including RYW, monotonic reads, write follows reads) Isolation level like read committed and repeatable read* But where reads may not always see the most recent committed changes *in absence of predicate reads [which is not an issue for key-value store] 9

10 Defining semantics We are trying an approach inspired from Adya s MIT PhD thesis Graph showing edges between operations Types of edges: wr,ww, rw, also happens before Restrictions on the sorts of cycles that can occur 10

11 Implementation proposal We can sketch implementation where client buffers operations, propagates them asynchronously (as a group at commit) to sites, tracking causal precedence etc This is mainly to prove existence of an available implementation Lots of engineering will be needed to get decent performance 11

12 Related work with Availability Restricted form of transactions Operate on set of items that are colocated Eg Google Megastore entity group, UCSB G-Store Multiple gets or multiple puts, not get with put Eg Princeton COPS-GT, Eiger Restricted data types Only allow commutative operations eg INRIA CRDTs, Berkeley Bloom L Weak semantics Without isolation properties Eg ETH Consistency rationing (some choices) 12

13 Related work without Availability Systems that support general (read committed, SI-like, or even serializable) transactions but use 2PC, Paxos, a master replica for ordering, etc Eg Google Megastore (across entity groups), ETH Consistency Rationing (some choices), Google Spanner, MSR Walter, UCSB Paxos- CP, Yale Calvin, Berkeley Planet (formerly MDCC) 13

14 Conclusion We advocate internet-scale data system to offer clients Unrestricted sets of operations on arbitrary multiple items as transaction Semantics as strong as possible while remaining available during partitions We show this can be atomic, causal, repeatable read (with perhaps stale data) Open questions: what further properties can one offer? Can one get good performance? How to design an 14 application so it works properly with this isolation model?