The relative simplicity of common requests in Web. CAP and Cloud Data Management COVER FEATURE BACKGROUND: ACID AND CONSISTENCY

Size: px
Start display at page:

Download "The relative simplicity of common requests in Web. CAP and Cloud Data Management COVER FEATURE BACKGROUND: ACID AND CONSISTENCY"

Transcription

1 CAP and Cloud Data Management Raghu Ramakrishnan, Yahoo Novel systems that scale out on demand, relying on replicated data and massively distributed architectures with clusters of thousands of machines, particularly those designed for real-time data serving and update workloads, amply illustrate the realities of the CAP theorem. The relative simplicity of common requests in Web data management applications has led to dataserving systems that trade off some of the query and transaction functionality found in traditional database systems to efficiently support such features as scalability, elasticity, and high availability. The perspective described here is informed by my experience with Yahoo s PNUTS (Platform for Nimble Universal Table Storage) data-serving platform, which has been in use since As of 2011, PNUTS hosted more than 100 applications that support major Yahoo properties running on thousands of servers spread over 18 datacenters worldwide, with adoption and usage growing rapidly. 2 The PNUTS design was shaped by the reality of georeplication accessing a copy across a continent is much slower than accessing it locally and we had to face the tradeoff between availability and consistent data access in the presence of partitions. It is worth noting, however, that the realities of slow access lead programmers to favor local copies even when there are no partitions. Thus, while the CAP theorem limits the consistency guarantees programmers can offer during partitions, they often make do with weaker guarantees even during normal operation, especially on reads. BACKGROUND: ACID AND CONSISTENCY Database systems support the concept of a transaction, which is informally an execution of a program. While the systems execute multiple programs concurrently in interleaved fashion for high performance, they guarantee that the execution s result leaves the database in the same state as some serial execution of the same transactions. The term ACID denotes that a transaction is atomic in that the system executes it completely or not at all; consistent in that the database remains unchanged; isolated in that the effects of incomplete execution are not exposed; and durable in that results from completed transactions survive failures. The transaction abstraction is one of the great achievements of database management systems, freeing programmers from concern about other concurrently executing programs or failures: they simply must ensure that their program keeps the database consistent when run by itself to completion. The database system usually implements this abstraction by obtaining locks when a transaction reads or writes a shared object, typically according to a two-phase locking regimen that ensures the resulting executions are equivalent to some serial execution of all transactions. The system first durably records all changes to a write-ahead log, which allows undoing incomplete transactions, if need be, and restores completed transactions after failures. In a distributed database, if a transaction modifies objects stored at multiple servers, it must obtain and hold locks across those servers. While this is costly even if the servers are collocated, it is more costly if the servers are in different datacenters. When data is replicated, everything becomes even more complex because it is necessary to ensure that the surviving nodes in a failure scenario /12/$ IEEE Published by the IEEE Computer Society 43

2 can determine the actions of both completed transactions (which must be restored) and incomplete transactions (which must be undone). Typically, the system can achieve this by using a majority protocol (in which writes are applied to most of the copies, or quorum, and a quorum member serves the reads). In addition to the added costs incurred during normal execution, these measures can force a block during failures that involve network partitions, compromising availability, as the CAP theorem describes. 3,4 Both the database and distributed systems literature offer many alternative proposals for the semantics of concurrent operations. Although the database notions of consistency apply to a distributed setting (even though they can be more expensive to enforce and might introduce availability tradeoffs), they were originally designed to Systems must serve requests with low latency to users worldwide, throughput is high, and applications must be highly available, all at minimal ongoing operational costs. allow interleaving of programs against a centralized database. Thus, the goal was to provide a simple programming abstraction to cope with concurrent executions, rather than to address the challenges of a distributed setting. These differences in setting have influenced how both communities have approached the problem, but the following two differences in perspective are worth emphasizing: Unit of consistency. The database perspective, as exemplified by the notion of ACID transactions, focuses on changes to the entire database, spanning multiple objects (typically, records in a relational database). The distributed systems literature generally focuses on changes to a single object. 5 Client- versus data-centric semantics. The database community s approach to defining semantics is usually through formalizing the effect of concurrent accesses on the database; again, the definition of ACID transactions exemplifies this approach the effect of interleaved execution on the database must be equivalent to that of some serial execution of the same transactions. But the distributed systems community often takes a client-centric approach, defining consistency levels in terms of a client that issues reads and writes sees (potentially) against a distributed data store in the presence of other concurrently executing clients. The notions of consistency proposed in the distributed systems literature focus on a single object and are clientcentric definitions. Strong consistency means that once a write request returns successfully to the client, all subsequent reads of the object by any client see the effect of the write, regardless of replication, failures, partitions, and so on. Observe that strong consistency does not ensure ACID transactions. For example, client A could read object X once, and then read it again later and see the effects of another client s intervening write because this is not equivalent to a serial execution of the two clients programs. That said, implementing ACID transactions ensures strong consistency. The term weak consistency describes any alternative that does not guarantee strong consistency for changes to individual objects. A notable instance of weak consistency is eventual consistency, which is supported by Amazon s Dynamo system, 6 among others. 1,5 Intuitively, if an object has multiple copies at different servers, updates are first applied to the local copy and then propagated out; the guarantee offered is that every update is eventually applied to all copies. However, there is no assurance of the order in which the system will apply the updates in fact, it might apply the updates in different orders on different copies. Unless the nature of the updates makes the ordering immaterial for example, commutative and associative updates two copies of the same object could differ in ways that are hard for a programmer to identify. Researchers have proposed several versions of weak consistency, 5 including read-your-writes a client always sees the effect of its own writes, monotonic read a client that has read a particular value of an object will not see previous values on subsequent accesses, and monotonic write all writes a client issues are applied serially in the issued order. Each of these versions can help strengthen eventual consistency in terms of the guarantees offered to a client. CLOUD DATA MANAGEMENT Web applications, a major motivator for the development of cloud systems, have grown rapidly in popularity and must be able to scale on demand. Systems must serve requests with low latency (tens of milliseconds) to users worldwide, throughput is high (tens of thousands of reads and writes per second), and applications must be highly available, all at minimal ongoing operational costs. Fortunately, full transactional support typically is not required, and separate systems perform complex analysis tasks for example, map-reduce platforms such as Hadoop ( hadoop.apache.org). For many applications, requests are quite simple compared to traditional data management settings the data 44 COMPUTER

3 might be user session data, with all user actions on a webpage written to and read from a single record, or it might be social, with social activities written to a single user record, and a user s friends activities read from a small number of other user records. These challenges have led to the development of a new generation of analytic and serving systems based on massively distributed architectures that involve clusters of thousands of machines. All data is routinely replicated within a datacenter for fault tolerance; sometimes the data is even georeplicated across multiple datacenters for low-latency reads. Massively distributed architectures lend themselves to adding capacity incrementally and on demand, which in turn opens the door to building multitenanted, hosted systems with several applications sharing underlying resources. These cloud systems need not be massively distributed, but many current offerings are, such as those from Amazon, 6 Google,7,8 Microsoft, 9 Yahoo, 1 and the Cassandra ( and HBase ( open source systems. Although Web data management provided the original motivation for massively distributed cloud architectures, these systems are also making rapid inroads in enterprise data management. Furthermore, the rapid growth in mobile devices with considerable storage and computing power is leading to systems in which the number of nodes is on the order of hundreds of millions, and disconnectivity is no longer a rare event. This new class of massively distributed systems is likely to push the limits of how current cloud systems handle the challenges highlighted by the CAP theorem. The belief that applications do not need the greater functionality of traditional database systems is a fallacy. Even for Web applications, greater functionality simplifies the application developer s task, and better support for data consistency is valuable: depending on the application, eventual consistency is often inadequate, and sometimes nothing less than ACID will suffice. As the ideas underlying cloud data-serving systems find their way into enterprise-oriented data management systems, the fraction of applications that benefit from (indeed, require) higher levels of consistency and functionality will rise sharply. Although it is likely that some fundamental tradeoffs will remain, we are witnessing an ongoing evolution from the first generation of cloud data-serving systems to increasingly more complete systems. Bigtable and HBase are systems that write synchronously to all replicas, ensuring they are all always up to date. Dynamo and Cassandra enforce that writes must succeed on a quorum of servers before they return success to the client. They maintain record availability during network partitions, but at the cost of consistency because they do not insist on reading from the write quorum. Megastore comes closer to the consistency of a traditional DBMS, supporting ACID transactions (meant to be used on records within the same group; it uses Paxos for synchronous replication across regions. (Note that because new systems are being announced in this space at a rapid pace, this is not meant to be a comprehensive survey.) PNUTS: A CASE STUDY Yahoo has 680 million customers and numerous internal platforms with stringent latency requirements (fewer than 10 ms is common). Servers can fail, and individual datacenters can suffer network partitions or general shutdown due to disaster, but data must remain available under any failure conditions, which is achieved via replication at datacenters. We developed PNUTS to support CRUD create, retrieve, update, delete workloads in this setting. The belief that applications do not need the greater functionality of traditional database systems is a fallacy. Many applications have moved to PNUTS either from pure LAMP (Linux, Apache, MySQL, PHP) stacks or from other legacy key-value stores. Illustrative applications include Yahoo s user location, user-generated content, and social directory platforms; the Yahoo Mail address book; Yahoo Answers, Movies, Travel, Weather, and Maps applications; and user profiles for ad and content personalization. The reasons for adopting PNUTS include flexible records and schema evolution; the ability to efficiently retrieve small ranges of records in order (for example, comments by time per commented-upon article); notifications of changes to a table; hosted multidatacenter storage; and above all, reliable, global, low-latency access. At Yahoo, the experience with PNUTS led to several findings: The cloud model of hosted, on-demand storage with low-latency access and highly available multidatacenter replication has proved to be very popular. For many applications, users are willing to compromise on features such as complex queries and ACID transactions. Additional features greatly increase the range of applications that are easily developed by using the PNUTS system for data management, and, not surprisingly, the adoption for these applications. In particular, providing support for ordered tables which allows arranging tables according to a composite key and enables efficient range scans sparked a big increase in adoption, and we expect support for selective replication and secondary indexes to have a similar effect. 45

4 Users have pushed for more options in the level of consistency. In the context of this article, the most relevant features are those that involve consistency. Relaxed consistency PNUTS was one of the earliest systems to natively support geographic replication, using asynchronous replication to avoid long write latencies. Systems that make copies within the same datacenter have the option of synchronous replication, which ensures strong consistency. This is not viable for cross-datacenter replication, which requires various forms of weak consistency. Most Web applications tend to write a single record at a time, and it is usually acceptable if subsequent reads of the record do not immediately see the write. However, eventual consistency does not always suffice for supporting the semantics natural to an application. For example, suppose we want to maintain the state of a user logged in to Yahoo who wants to chat. Copies of this state might be maintained in multiple georegions and must be updated when the user decides to chat or goes offline. Consider what happens when the two regions are disconnected because of a link failure: when the link is restored, it is not sufficient for both copies to eventually converge to the same state; rather, the copies must converge to the state most recently declared by the user in the region where the user was most recently active. It is possible to update the copies via ACID transactions, but supporting ACID transactions in such a setting is a daunting challenge. Fortunately, most Web applications tend to write a single record at a time such as changing a user s chat status or home location in the profile record and it is acceptable if subsequent reads of the record (for example, by a friend or user) do not immediately see the write. This observation is at the heart of the solution we adopted in PNUTS, called timeline consistency. Timeline consistency In timeline consistency, an object and its replicas need not be synchronously maintained, but all copies must follow the same state timeline (possibly with some replicas skipping forward across some states). PNUTS does not allow objects to go backward in time or to not appear in the timeline associated with the object. The approach essentially is primary copy replication. 1 At any given time, every object has exactly one master copy (in PNUTS, each record in a table is an object in this sense), and updates are applied at this master and then propagated to other copies, thereby ensuring a unique ordering of all updates to a record. Protocols for automatically recognizing master failures and transferring mastership to a surviving copy ensure high availability and support automated load-balancing strategies that transfer this mastership to the location where a record is most often updated. To understand the motivation behind this design, consider latency. Data must be globally replicated, and synchronous replication reduces latency unacceptably. Database systems support auxiliary data structures such as secondary indexes and materialized views, but maintaining such structures synchronously in a massively distributed environment further increases latency. The requirement of low-latency access therefore leads to asynchronous replication, which inherently compromises consistency. 10 However, because object timelines provide a foundation that can support many read variants, applications that can tolerate some staleness can trade off consistency for performance, whereas those that require consistent data can rely on object timelines. Timeline consistency compromises availability, but only in those rare cases where the master copy fails and there is a partition or a failure in the messaging system that causes the automated protocol for transferring mastership to block. Additionally, timeline consistency weakens the notion of consistency because clients can choose to read older versions of objects even during normal operation. Again, this reflects a fundamental concern for minimizing latency, and is in the spirit of Daniel Abadi s observation that the CAP theorem overlooks an important aspect of large-scale distributed systems namely, latency (L). According to Abadi s proposed reformulation, 11 CAP should really be PACELC: If there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)? Selective record replication Many Yahoo applications have a truly global user base and replicate to many more regions than needed for fault tolerance. But while an application might be global, its records could actually be local. A PNUTS record that contains a user s profile is likely only ever written and read in one or a few geographic regions where that user and his or her friends live. Legal issues also arise at Yahoo that limit where records can be replicated; this pattern typically follows user locality as well. To address this concern, we added per-record selective replication to PNUTS. 12 Regions that do not have a full copy of a record still have a stub version with enough metadata 46 COMPUTER

5 to know which regions contain full copies for forwarding requests. A stub is only updated either at record creation or deletion, or when the record s replica location changes. Normal data updates are only sent to regions containing full copies of the record, saving bandwidth and disk space. THE CASE FOR A CONSISTENCY SPECTRUM Cloud data management systems designed for real-time data serving and workload updates amply illustrate the realities of the CAP theorem: such systems cannot support strong consistency with availability in the presence of partitions. Indeed, such massively distributed systems might settle for weaker consistency guarantees to improve latency, especially when data is georeplicated. In practice, a programmer using such a system must be able to explicitly make tradeoffs among consistency, latency, and availability in the face of various failures, including partitions. Fortunately, several consistency models allow for such tradeoffs, suggesting that programmers should be allowed to mix and match them to meet an application s needs. We organize the discussion to highlight two independent dimensions: the unit of data that is considered in defining consistency and the spectrum of strong to weak consistency guarantees for a given choice of unit. Unit of consistency While the database literature commonly defines consistency in terms of changes to the entire database, the distributed systems literature typically considers changes to each object, independent of changes to other objects. These are not the only alternatives; intuitively, any collection of objects to which we can ensure atomic access and that the system replicates as a unit can be made the unit of consistency. For example, any collection of objects collocated on a single server can be a reasonable choice as the unit of consistency (from the standpoint of ensuring good performance), even in the presence of failures. One widely recognized case in which multirecord transactions are useful is an entity group, which comprises an entity and all its associated records. As an example, consider a user (the entity ) together with all user-posted comments and photos, and user counters, such as number of comments. It is frequently useful to update the records in an entity group together, for example, by inserting a comment and updating the number-of-comments counter. Usually, an entity group s size is modest, and a single server can accommodate one copy of the entire set of records. Google s App Engine provides a way to define entity groups and operate on them transactionally; Microsoft s Azure has a similar feature that allows record grouping via a partition key as well as transactional updates to records in a partition. The basic approach to implementing transactions over entity groups is straightforward and relies on controlling how records are partitioned across nodes to ensure that all records in an entity group reside on a single node. Then, the system can invoke conventional database transaction managers without using cross-server locks or other expensive mechanisms. This model has two basic restrictions: first, the entity group must be small enough to fit on a single node; indeed, for effective load balancing, the size must allow many groups to fit on a single node. Second, the definition of an entity group is static and typically specifies a composite key over the record s attributes. A recent proposal considers how to relax the second restriction and allow defining entity groups more generally and dynamically. 13 The basic approach to implementing transactions over entity groups is straightforward and relies on controlling how records are partitioned across nodes to ensure that all records in an entity group reside on a single node. A consistency spectrum We begin by discussing a spectrum of consistency across copies of a single object and then discuss how to generalize these ideas to handle other units of consistency. Consistency models for individual objects. Timeline consistency offers a simple programming model: copies of a record might lag the master copy, but the system applies all updates to every copy in the same order as the master. Note that this is a data-centric guarantee. From the client s perspective, monotonic writes are guaranteed, so an object timeline a timestamp generated at the master object that identifies each state and its position on the object s timeline can support several variants of the read operation, each with different guarantees: Read-any. Any copy of the object can be returned, so if a client issues this call twice, the second call might actually see an older version of the object, even if the master copy is available and timeline consistency is enforced. Intuitively, the client reads a local copy that later becomes unavailable, and the second read is served from another (nonmaster) copy that is more stale. Critical-read. Also known as monotonic read, criticalread ensures that the copy read is fresher than any previous version the client sees. By remembering the last client-issued write, the critical-read operation can extend to support read-your-writes, although to make this more efficient, it might be necessary to additionally cache a client s writes locally. 47

6 Read-up-to-date. To get the current version of the object, read-up-to-date accesses the master copy. Test-and-set. Widely used in PNUTS, test-and-set is a conditional write applied only if the version at the master copy when the write applied is unchanged from the version previously read by the client issuing the write. It is sufficient to implement single-object ACID transactions. Timeline consistency over entity groups. A natural generalization of timeline consistency and entity groups is to consider entity group timelines rather than individual records. The timeline has a master copy of each entity group, rather than each record, and applies transactional updates to an entity group (possibly affecting multiple records) and at the master copy, just like individual updates of a single record in timeline consistency. The transaction sequence is then logged, asynchronously shipped to the sites with copies of the entity group, and reapplied at each such site. Although massively distributed systems provide multiple abstractions to cope with consistency, programmers need to be able to mix and match these abstractions. Although this generalization should be supportable with performance and availability characteristics comparable to timeline consistency and entity group consistency, I am not aware of any systems that (yet) do so. It seems an attractive option on the consistency spectrum, and covers many common applications that would otherwise require full ACID transactions. Offering consistency choices Geographic replication makes all records always available for read from anywhere. However, anytime a distributed system is partitioned due to failures, it is impossible to preserve both write consistency and availability. One alternative is to support multiple consistency models and let the application programmer decide whether and how to degrade in case of failure. Eric Brewer suggests 14 thinking in terms of a partition mode how a client enters and exits this mode, and what it does while in partition mode and upon exit. Intuitively, a client enters the partition mode (due to a failure of some kind, triggered by a mechanism such as a time out) when it cannot complete a read or write operation with the desired level of consistency. The client must then operate with the recognition that it is seeing a version of the database that is not strongly consistent, and when emerging from partition mode (when the system resolves the underlying failure and signals this state in some way), the client must reconcile inconsistencies between the objects it has accessed. In the PNUTS implementation of timeline consistency, a client enters the partition mode when it attempts to write an object but the write is blocked because the master copy is unreachable, and the mastership transfer protocol also is blocked, typically because of a partition or site failure. At this point, the client should be able to choose to degrade to eventual consistency from timeline consistency and continue by writing another copy. However, the system is now in a mode in which the given object does not have a unique master. At some future point, the client must explicitly reconcile different versions of the object perhaps using system-provided version vectors for the object if the weaker guarantees of eventual consistency do not suffice for this object. PNUTS is not this flexible it requires the programmer to choose between timeline consistency and eventual consistency at the level of a table of records. It treats each record in a table as an object with the associated consistency semantics. Inserts and updates can be made to any region at any time if eventual consistency is selected. For programmers who understand and can accept the eventual consistency model, the performance benefits are great: the system can perform all writes locally at the client, greatly improving write latencies. Given a server node failure, another (remote) node will always be available to accept writes. PNUTS can also enter partition mode if a read request requires access to the master copy; again, the client has the choice of waiting for the system to restore access or proceeding by reading an available copy. Furthermore, a client can choose to read any available copy if a slight lag from the master copy is acceptable even during normal operation. Although massively distributed systems provide multiple abstractions to cope with consistency, programmers need to be able to mix and match these abstractions. The discussion of per-object timeline consistency highlights how data- and client-centric approaches to defining consistency are complementary. The discussion of how to build on per-object timeline consistency to support different client-side consistency guarantees carries over to timeline consistency over entity groups. Indeed, this is a useful way to approach consistency in distributed systems in general: decide on the units of consistency that the system is to support; decide on the consistency guarantees the system supports from a data-centric perspective; decide on the consistency guarantees the system supports from a client-centric perspective; and 48 COMPUTER

7 expose these available choices to programmers through variations of create-read-write operations, so they can make the tradeoffs among availability, consistency, and latency as appropriate for their application. For example, we could allow the type of consistency desired timeline or eventual to be a property of a collection of objects; alternative forms of reading an object that support various client-centric consistency semantics; flexible definition of groups of objects to be treated as one object for consistency purposes; and specification of how to degrade gracefully to weaker forms of consistency upon failure. Determining the right abstractions and the right granularities for expressing these choices requires further research and evaluation. It is time to think about programming abstractions that allow specifying massively distributed transactions simply and in a manner that reflects what the system can implement efficiently, given the underlying realities of latency in wide-area networks and the CAP theorem. The tradeoff between consistency on one hand and availability/performance on the other has become a key factor in the design of large-scale data management systems. Although Web-oriented systems led the break from traditional relational database systems, the ideas have begun to enter the database mainstream, and over the next several years, cloud data management for enterprises will offer database administrators some of the same design choices. The key observation is that we have choices not just between ACID transactions and full RDBMS capabilities on one side and NoSQL systems offering no consistency guarantees and minimal query and update capabilities on the other. We will see systems that are somewhere in the middle of this spectrum, striving to provide as much functionality as possible while satisfying the availability and performance demands of diverse application settings. Designing abstractions that cleanly package these choices, developing architectures that robustly support them, and optimizing and autotuning these systems will in turn provide research challenges for the next decade. Acknowledgments In writing this article, I was strongly influenced by the experience gained in designing, implementing, and deploying the PNUTS system (also known as Sherpa within Yahoo), and thank the many people who contributed to it and to the many papers we wrote jointly. PNUTS is a collaboration between the systems research group and the cloud platform group at Yahoo. I also thank the anonymous referees and Daniel Abadi for useful feedback that improved the article, and Eric Brewer for sharing a preprint of his article that appears elsewhere in this issue. References 1. B.F. Cooper et al., PNUTS: Yahoo! s Hosted Data Serving Platform, Proc. VLDB Endowment (VLDB 08), ACM, 2008, pp A. Silberstein et al., PNUTS in Flight: Web-Scale Data Serving at Yahoo, IEEE Internet Computing, vol. 16, no. 1, 2012, pp E. Brewer, Towards Robust Distributed Systems, Proc. 19th Ann. ACM Symp. Principles of Distributed Computing. (PODC 00), ACM, 2000, pp S. Gilbert and N. Lynch, Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, ACM SIGACT News, June 2002, pp W. Vogels, Eventually Consistent, ACM Queue, vol. 6, no. 6, 2008, pp G. DeCandia et al., Dynamo: Amazon s Highly Available Key-Value Store, Proc. 21st ACM SIGOPS Symp. Operating Systems Principles (SOSP 07), ACM, 2007, pp F. Chang et al., Bigtable: A Distributed Storage System for Structured Data, ACM Trans. Computers, June 2008, article no. 4; doi: / J. Baker et al., Megastore: Providing Scalable, Highly Available Storage for Interactive Services, Proc. Conf. Innovative Database Research (CIDR 11), ACM, 2011, pp P.A. Bernstein et al., Adapting Microsoft SQL Server for Cloud Computing, Proc. IEEE 27th Int l Conf. Data Eng. (ICDE 11), IEEE, 2011, pp P. Agrawal et al., Asynchronous View Maintenance for VLSD Databases, Proc. 35th SIGMOD Int l Conf. Management of Data, ACM, 2009, pp D.J. Abadi, Consistency Tradeoffs in Modern Distributed Database System Design, Computer, Feb. 2012, pp S. Kadambi et al., Where in the World Is My Data? Proc. VLDB Endowment (VLDB 2011), ACM, 2011, pp S. Das, D. Agrawal, and A.E. Abbadi, G-Store: A Scalable Data Store for Transactional Multi Key Access in the Cloud, Proc. ACM Symp. Cloud Computing (SoCC 10), ACM, 2010, pp E. Brewer, Pushing the CAP: Strategies for Consistency and Availability, Computer, Feb. 2012, pp Raghu Ramakrishnan heads the Web Information Management Research group at Yahoo and also serves as chief scientist for cloud computing and search. Ramakrishnan is an ACM and IEEE Fellow and has received the ACM SIGKDD Innovations Award, the ACM SIGMOD Contributions Award, a Packard Foundation Fellowship, and the Distinguished Alumnus Award from IIT Madras. Contact him at scyllawi@ yahoo.com. Selected CS articles and columns are available for free at 49

Eventually Consistent

Eventually Consistent Historical Perspective In an ideal world there would be only one consistency model: when an update is made all observers would see that update. The first time this surfaced as difficult to achieve was

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

The Cloud Trade Off IBM Haifa Research Storage Systems

The Cloud Trade Off IBM Haifa Research Storage Systems The Cloud Trade Off IBM Haifa Research Storage Systems 1 Fundamental Requirements form Cloud Storage Systems The Google File System first design consideration: component failures are the norm rather than

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client

More information

Although research on distributed database systems. Consistency Tradeoffs in Modern Distributed Database System Design COVER FEATURE

Although research on distributed database systems. Consistency Tradeoffs in Modern Distributed Database System Design COVER FEATURE COVER FEATURE Consistency Tradeoffs in Modern Distributed Database System Design Daniel J. Abadi, Yale University The CAP theorem s impact on modern distributed database system design is more limited than

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

INFO5011 Advanced Topics in IT: Cloud Computing Week 10: Consistency and Cloud Computing

INFO5011 Advanced Topics in IT: Cloud Computing Week 10: Consistency and Cloud Computing INFO5011 Advanced Topics in IT: Cloud Computing Week 10: Consistency and Cloud Computing Dr. Uwe Röhm School of Information Technologies! Notions of Consistency! CAP Theorem! CALM Conjuncture! Eventual

More information

Cloud Database Emergence

Cloud Database Emergence Abstract RDBMS technology is favorable in software based organizations for more than three decades. The corporate organizations had been transformed over the years with respect to adoption of information

More information

Geo-Replication in Large-Scale Cloud Computing Applications

Geo-Replication in Large-Scale Cloud Computing Applications Geo-Replication in Large-Scale Cloud Computing Applications Sérgio Garrau Almeida sergio.garrau@ist.utl.pt Instituto Superior Técnico (Advisor: Professor Luís Rodrigues) Abstract. Cloud computing applications

More information

The CAP theorem and the design of large scale distributed systems: Part I

The CAP theorem and the design of large scale distributed systems: Part I The CAP theorem and the design of large scale distributed systems: Part I Silvia Bonomi University of Rome La Sapienza www.dis.uniroma1.it/~bonomi Great Ideas in Computer Science & Engineering A.A. 2012/2013

More information

Secure Cloud Transactions by Performance, Accuracy, and Precision

Secure Cloud Transactions by Performance, Accuracy, and Precision Secure Cloud Transactions by Performance, Accuracy, and Precision Patil Vaibhav Nivrutti M.Tech Student, ABSTRACT: In distributed transactional database systems deployed over cloud servers, entities cooperate

More information

Data Consistency on Private Cloud Storage System

Data Consistency on Private Cloud Storage System Volume, Issue, May-June 202 ISS 2278-6856 Data Consistency on Private Cloud Storage System Yin yein Aye University of Computer Studies,Yangon yinnyeinaye.ptn@email.com Abstract: Cloud computing paradigm

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

Database Replication with MySQL and PostgreSQL

Database Replication with MySQL and PostgreSQL Database Replication with MySQL and PostgreSQL Fabian Mauchle Software and Systems University of Applied Sciences Rapperswil, Switzerland www.hsr.ch/mse Abstract Databases are used very often in business

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Hosting Transaction Based Applications on Cloud

Hosting Transaction Based Applications on Cloud Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India

More information

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

MASTER PROJECT. Resource Provisioning for NoSQL Datastores Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin

More information

This paper defines as "Classical"

This paper defines as Classical Principles of Transactional Approach in the Classical Web-based Systems and the Cloud Computing Systems - Comparative Analysis Vanya Lazarova * Summary: This article presents a comparative analysis of

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

The CAP-Theorem & Yahoo s PNUTS

The CAP-Theorem & Yahoo s PNUTS The CAP-Theorem & Yahoo s PNUTS Stephan Müller June 5, 2012 Abstract This text is thought as an introduction to the CAP-theorem, as well as for PNUTS, a particular distributed databased. It subsumes a

More information

Cloud Computing with Microsoft Azure

Cloud Computing with Microsoft Azure Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating

More information

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Seth Gilbert Nancy Lynch Abstract When designing distributed web services, there are three properties that

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

WINDOWS AZURE DATA MANAGEMENT

WINDOWS AZURE DATA MANAGEMENT David Chappell October 2012 WINDOWS AZURE DATA MANAGEMENT CHOOSING THE RIGHT TECHNOLOGY Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Windows Azure Data Management: A

More information

HAT not CAP: Highly Available Transactions

HAT not CAP: Highly Available Transactions HAT not CAP: Highly Available Transactions Talk at Dagstuhl Seminar 13081, February 19 2013 Draft Paper at http://arxiv.org/pdf/1302.0309.pdf Peter Bailis (UCBerkeley), Alan Fekete (U of Sydney), Ali Ghodsi

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

Cloud Computing Is In Your Future

Cloud Computing Is In Your Future Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Microsoft Azure Data Technologies: An Overview

Microsoft Azure Data Technologies: An Overview David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

How To Build Cloud Storage On Google.Com

How To Build Cloud Storage On Google.Com Building Scalable Cloud Storage Alex Kesselman alx@google.com Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage What does a customer want from a cloud service? Reliability

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Data Distribution with SQL Server Replication

Data Distribution with SQL Server Replication Data Distribution with SQL Server Replication Introduction Ensuring that data is in the right place at the right time is increasingly critical as the database has become the linchpin in corporate technology

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Report Data Management in the Cloud: Limitations and Opportunities

Report Data Management in the Cloud: Limitations and Opportunities Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached

More information

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do. Database Management Systems Chapter 1 Mirek Riedewald Many slides based on textbook slides by Ramakrishnan and Gehrke 1 Logistics Go to http://www.ccs.neu.edu/~mirek/classes/2010-f- CS3200 for all course-related

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

TRANSACTION MANAGEMENT TECHNIQUES AND PRACTICES IN CURRENT CLOUD COMPUTING ENVIRONMENTS : A SURVEY

TRANSACTION MANAGEMENT TECHNIQUES AND PRACTICES IN CURRENT CLOUD COMPUTING ENVIRONMENTS : A SURVEY TRANSACTION MANAGEMENT TECHNIQUES AND PRACTICES IN CURRENT CLOUD COMPUTING ENVIRONMENTS : A SURVEY Ahmad Waqas 1,2, Abdul Waheed Mahessar 1, Nadeem Mahmood 1,3, Zeeshan Bhatti 1, Mostafa Karbasi 1, Asadullah

More information

Understanding NoSQL on Microsoft Azure

Understanding NoSQL on Microsoft Azure David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick

More information

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo High Availability for Database Systems in Cloud Computing Environments Ashraf Aboulnaga University of Waterloo Acknowledgments University of Waterloo Prof. Kenneth Salem Umar Farooq Minhas Rui Liu (post-doctoral

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

Database Replication with Oracle 11g and MS SQL Server 2008

Database Replication with Oracle 11g and MS SQL Server 2008 Database Replication with Oracle 11g and MS SQL Server 2008 Flavio Bolfing Software and Systems University of Applied Sciences Chur, Switzerland www.hsr.ch/mse Abstract Database replication is used widely

More information

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Sudipto Das (Microsoft summer intern) Shyam Antony (Microsoft now) Aaron Elmore (Amazon summer intern)

More information

Cassandra A Decentralized Structured Storage System

Cassandra A Decentralized Structured Storage System Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates

More information

An Open Market of Cloud Data Services

An Open Market of Cloud Data Services An Open Market of Cloud Data Services Verena Kantere Institute of Services Science, University of Geneva, Geneva, Switzerland verena.kantere@unige.ch Keywords: Abstract: Cloud Computing Services, Cloud

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011 A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

The CAP theorem explores tradeoffs between consistency, Overcoming CAP with Consistent Soft-State Replication COVER FEATURE

The CAP theorem explores tradeoffs between consistency, Overcoming CAP with Consistent Soft-State Replication COVER FEATURE COVER FEATURE Overcoming CAP with Consistent Soft-State Replication Kenneth P. Birman, Daniel A. Freedman, Qi Huang, and Patrick Dowell, Cornell University New data-consistency models make it possible

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

A Distribution Management System for Relational Databases in Cloud Environments

A Distribution Management System for Relational Databases in Cloud Environments JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 169 A Distribution Management System for Relational Databases in Cloud Environments Sze-Yao Li, Chun-Ming Chang, Yuan-Yu Tsai, Seth

More information

Referential Integrity in Cloud NoSQL Databases

Referential Integrity in Cloud NoSQL Databases Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering

More information

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES Bettina Kemme Dept. of Computer Science McGill University Montreal, Canada Gustavo Alonso Systems Group Dept. of Computer Science ETH Zurich,

More information

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing) 1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication

More information

G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud

G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud Sudipto Das Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California, Santa Barbara Santa Barbara,

More information

Consistency Management in Cloud Storage Systems

Consistency Management in Cloud Storage Systems Consistency Management in Cloud Storage Systems Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, María S. Pérez INRIA Rennes - Bretagne Atlantique Rennes, 35000, France {houssem-eddine.chihoub,

More information

Database Scalabilty, Elasticity, and Autonomic Control in the Cloud

Database Scalabilty, Elasticity, and Autonomic Control in the Cloud Database Scalabilty, Elasticity, and Autonomic Control in the Cloud Divy Agrawal Department of Computer Science University of California at Santa Barbara Collaborators: Amr El Abbadi, Sudipto Das, Aaron

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING Enzo Unified Extends SQL Server to Simplify Application Design and Reduce ETL Processing CHALLENGES SQL Server does not scale out

More information

Big Data & Scripting storage networks and distributed file systems

Big Data & Scripting storage networks and distributed file systems Big Data & Scripting storage networks and distributed file systems 1, 2, adaptivity: Cut-and-Paste 1 distribute blocks to [0, 1] using hash function start with n nodes: n equal parts of [0, 1] [0, 1] N

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Improved Aggressive Update Propagation Technique in Cloud Data Storage

Improved Aggressive Update Propagation Technique in Cloud Data Storage Improved Aggressive Update Propagation Technique in Cloud Data Storage Mohammed Radi Computer science department, Faculty of applied science, Alaqsa University Gaza Abstract: Recently, cloud computing

More information

Design Patterns for Distributed Non-Relational Databases

Design Patterns for Distributed Non-Relational Databases Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying

More information

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE

More information

Guide to Scaling OpenLDAP

Guide to Scaling OpenLDAP Guide to Scaling OpenLDAP MySQL Cluster as Data Store for OpenLDAP Directories An OpenLDAP Whitepaper by Symas Corporation Copyright 2009, Symas Corporation Table of Contents 1 INTRODUCTION...3 2 TRADITIONAL

More information

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models 1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint

More information

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Sherpa: Cloud Computing of the Third Kind

Sherpa: Cloud Computing of the Third Kind Sherpa: Cloud Computing of the Third Kind Raghu Ramakrishnan Yahoo! and Platform Engineering Team What s in a Name? Data Intensive Super Scalable Computing Grid Computing Super Computing Cloud Computing

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

Introducing DocumentDB

Introducing DocumentDB David Chappell Introducing DocumentDB A NoSQL Database for Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Why DocumentDB?... 3 The DocumentDB Data Model...

More information

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our

More information

A survey of big data architectures for handling massive data

A survey of big data architectures for handling massive data CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

More information