Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Size: px
Start display at page:

Download "Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales"

Transcription

1 Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May Introduction F1 [1] is a distributed relational database developed by Google and it is used mainly for the Google AdWords business. F1 combines the scalability of NoSQL systems with the consistency offered by SQL databases. The name F1 is an abbreviation of Filial 1 Hybrid, which in biology stands for the first generation of the offspring of two very different parent species. The name is meant to symbolize the fact that F1 is the result of the combination of NoSQL and SQL databases. One of the most important design aspects of F1 is the fact that it is built on top of a distributed key-value store database. This key-value store is called Spanner [2], it is a NoSQL database created by Google and provides synchronous cross-datacenter replication and strong consistency. This design choice results in a relatively high commit latency for transactions, which is mitigated by F1 through various design heuristics. As a result, the latency of applications using F1 is similar to the latency of the previous database solution used by the Google AdWords product. However, F1 also provide better scalability, reliability and availability. 2 Goals The goal of this report is to analyze the approach taken by F1 for designing scalable SQL databases. For the purpose of this report, a scalable SQL database will be defined as a databases that: provides strong consistency semantics. This means that the system should always present a consistent state. A strongly consistent state is the basis for ACID transactions, however this report will not go into details about this type of transactions. is scalable both geographically and from the point of view of data storage and request load. The system should be able to scale transparently and just by adding additional nodes in order to handle more data or an increasing number of requests per second. 1

2 The architecture proposed by F1 is to build such a scalable SQL database by adding a layer of SQL processing over a distributed key-value store. The distributed key-value store should fulfill the scalability and consistency requirements for simple operations that operate on key-value pairs, while relational abstractions like tables, SQL processing and ACID transactions are implemented by an additional database middleware. The main issue that appears in such a system is the additional network latency added by the middleware server layer. Considering that distributed databases can be deployed over multiple data centers, the additional network latency could add a significant penalty to the read and write latencies. And in an age of Big Data, databases with high latencies and potentially low throughput are not acceptable. In this context, this report will present the architecture of the multilayer system proposed earlier and analyze the impact of the additional layer on the read and write latencies. Additionally, the report will present some solutions to mitigate the extra latency added by the additional layer. 3 System Architecture Figure 1: The architecture of F1 This section will detail the distributed database architecture proposed by F1. The architecture of the system is presented in Figure 1. In this model, the data is stored on the distributed key-value pair servers, while the query processing is performed by the middleware servers. In the case of the F1 database, the key-value store servers are represented by Spanner servers, while the middleware servers are represented by F1 servers. 2

3 The Spanner key-value store server use a distributed file system implemented by Google, called Colossus [3], the second generation of the Google File System [4]. Conceptually, the relational data is stored as rows in tables, however at the level of the key-value store servers each table row is stored by multiple key-value pairs. This implementation detail is abstracted by the middleware servers, which convert SQL queries operating on table rows into low-level operations that operate on key-value pairs. The database is accessed using SQL queries sent through a Client library to one of the middleware servers. The middleware servers process the SQL queries and produce a list of low-level operations operating on key-value pairs. These low-level operations are the forwarded by the middleware server to one or more of the key-value pair nodes that hold the data affected by them. Because of the strong consistency requirement, a consistent view of the data must be kept at all times and the middleware server can consider any write operation completed only after it has received an acknowledgement from the keyvalue pair servers signifying that the write was finished successfully. A common way to increase the availability and fault-tolerance of a distributed database system is to replicate the data stored across multiple nodes. In such a case, to maintain the strong consistency requirement, the middleware node would also have to obtain an acknowledgement from all of the replicas of the node that holds the data to be written before considering the write completed. The main advantage of this multilayer architecture is the fact that the data processing components are physically separated from the data storage components. Because the data is only stored on the key-value pair nodes, the data storage capacity of the system can be scaled individually from the query processing capacity. Therefore, to increase the query processing power of the system, one would only need to add more middleware nodes. Because these middleware nodes do not store any relational data, this operations does not incur a data redistribution cost. To increase the amount of data that can be stored by the system, more key-value store nodes need to be added. It is important to note that adding new storage nodes brings a data redistribution cost. If the new node is a replica of an existing node, the new node needs to load the state of the node that it replicates. If the new node is a non-replica node, the existing data stored by the system would need to be redistributed among the existing nodes. The disadvantage of this architecture is that all data access operations need at least two network round-trips in addition to the disk operation. In the case of read operations, a network request is made between the client and the middleware server, followed by an additional network request made by the middleware to the key-value store server holding the data requested. In this case, having multiple replicas of key-value store nodes would allow different requests from different clients for the same data to be sent to different nodes, thus mitigating some of the extra latency. In the case of write operations, in addition to the network request between the middleware and key-value store server, network requests need to be made to all of the replicas of the node storing the data to be updated. Unlike read operations, the presence of the replica nodes influences the write latency in a negative manner. The next section will analyze the impact of replication on write requests in a strongly consistent system. 3

4 Figure 2: Replication of a write operation using the Paxos algorithm 4 Synchronous Replication There are multiple models for the replication of writes to replicas, each model ensuring a certain consistency model. The synchronous replication model ensures that all the write requests are atomically performed on all replicas. In this replication model, the node that contains the main copy of the data is called the leader node, while the nodes containing copies of the data are simply called replicas. The replication process is initialized by the leader and is finished once all of the replicas have performed the write on the data. If multiple writes need to be replicated by a middleware server, it is possible to initialize the replication of each write on a different node, to increase parallelism. In such a case, a consensus algorithm needs to be used for replication. F1 uses the Paxos consensus algorithm for the replication process. This algorithm ensures that the replication will be finished successfully even in the presence of multiple leaders. The figure 2 shows requests made during a replication round in the Paxos algorithms. First, an SQL query is send from the client to one of the middleware servers. This update query is then converted to a single key-value operation that is then sent to the key-value store servers. The best-case scenario for the algorithm is the following: The key-value store server receiving the update initializes the replication process and sends propose messages with the new value to the replicas. The replicas can accept the newly proposed value and send an acknowledge message to the leader. The leader counts the acknowledge messages from the replicas as votes. If a majority of the replicas have accepted the proposed update, the leader can send a commit message to the replicas. It is only after the commit was performed by all of the replicas that the replication process is finished. 4

5 Figure 3: A possible normalized relational schema for mobile manufacturers If multiple updates are initialized in the same time, a single leader is chosen to perform all the updates in a valid order. This case will not be covered in the report. However, one can see that even in the simplest case, when a single write is replicated successfully, the algorithm needs 2 network round-trips between the leader and each replica: one for the propose message and another for the commit message. These additional network roundtrips increase the latency of write operations. There are several ways through which a high write latency could be mitigated in a database system. The following sections will analyze the optimizations proposed by F1 to deal with the high write latency. 5 Data Model F1 proposes using a hierarchical data model to reduce the number of writes required for update operations. The data model used by F1 is very similar to the data model used by modern relational databases. F1 stores data as rows in tables, however the internal storage is slightly different from the one of traditional databases. F1 provides some extensions to the traditional data model: explicit table hierarchies and column support for Protocol Buffers. From the logical point of view, in the clustered hierarchical model the tables are organized in a hierarchy. In this hierarchy, each table can be a parent table of one or more child tables. Moreover, a table that has no parent is called a root table. From the physical point of view, all of the child tables are stored clustered with the parent tables. This means that the rows of the parent and child tables are interleaved. The remainder of this section will present the differences between the traditional, 5

6 normalized relational model and the hierarchical clustered schema model proposed by F1, using as an example a database that holds data about mobile manufacturers. An example relational schema for a mobile manufacturer database is illustrated in Figure 3. There are 4 tables, for manufacturers, phones, tablets and an additional SIM Support table that tracks the types of SIM cards that can be associated with each mobile phone. In this traditional, normalized relational model, all rows that belong to the same table are usually stored in the same file on disk. Figure 4 shows an example table hierarchy for the mobile database. In this hierarchy, the Manufacturer is the root table, while the Tablet and Phone tables are its child tables. The SIM Support table is a child table of the Phone table as the SIM information is only related to Phones. Figure 4: A possible hierarchy for a mobile manufacturer database The storage of rows on disk for a clustered hierarchical schema is different from the storage layout proposed by the relational schema. While in the relational schema, rows of each table are stored one after the other on disk, in the clustered hierarchical schema, the child rows are stored interleaved with the parent rows. The storage layout for the mobile manufacturer hierarchy is shown in Figure 5. In the example, the rows of the Phone and Tablet tables are stored right after the rows of the corresponding Manufacturer entries and the rows of the SIM Support table are stored right after the corresponding rows from the Phone table. An additional storage constraint set by the hierarchical clustered schema is that all the rows associated to a root row be stored on the same node. This includes not only the direct children, but also the children s children and so on. The main advantage of such a hierarchical schema is that all the rows belonging to a 6

7 single root row are accessible using a range scan starting from that root row. For example, updating attributes belonging to all tablets or phones manufactured by Samsung can be done in a single scan starting from the manufacturer row of Samsung. Because of the constraint stating that all child rows of a root row are stored on the same node, if a transaction needs to apply multiple updates on a root row hierarchy, all the writes are directed to a single node. This is important because multiple updates corresponding to the same transaction can be batched in a single network message. This request batching will be described in the following section. The disadvantages of this hierarchical model is that the domain data needs to manifest a certain hierarchy. If the tables cannot be grouped into such hierarchies, the schema can degenerate into a traditional normalized schema, where all the tables are root tables and no table has any child tables. Such a schema cannot benefit from the advantages of a hierarchical schema. Moreover, the fact that all of the child rows of a root row need to be stored on the same node limits the maximum number of nodes in a root row hierarchy to the storage space available on the node. This could potentially pose a problem for hierarchies in which root rows have very many child rows. Figure 5: The storage layout for the hierarchical schema for mobile manufacturers 6 Request batching This section will detail the request batching proposed by F1 to mitigate the high write latency. In traditional SQL databases, where the data storage and processing are done on the same node, the write latency is mainly caused by disk latency. Additionally, the disk latency is caused by the write capacity of the device and the contention of database processes for the IO device. In the case of the multilayer architecture, the duration of network messages make up an important component of the write latency. This means 7

8 that the write latency can be mitigated by batching multiple write commands in a single network message. For example, if a transaction contains multiple write operations that are applied on data belonging to a single root row, all of these write commands can be batched in a single network message from the query processing node to the key-value store node. Moreover, this batch of updates can be replicated in the same time. Another example are updates that are applied on different root rows that reside on the same node. This case is illustrated in Figure 6, where 2 separate updates need to be applied on root rows stored on the same node. The SQL updates are translated into 2 write commands operating on key-value pairs and these 2 commands can be batched in a single network message sent from the F1 middleware server to the key-value store server. Figure 6: Illustration of the request batching process 7 Drawbacks and Alternatives The system proposed in the F1 paper manages to provide both scalability and strong consistency. However, this comes at a certain cost: Higher single read and write latencies. In this system, read and write commands operating on single rows have a high latency. The authors have reported that the latency of the system for these operations was larger than the latency of the previous database system used for AdWords. However, in the proposed system, reads or writes to the full row hierarchy of a single root row can be done with a single network request as well. Higher resource cost. This architecture requires more physical nodes due to the fact that the SQL query processing and data storage is done on different machines. This means that at least one middleware node is needed for the query processing, without taking into account the key-value pair store nodes. 8

9 Need for hierarchical structure in data The clustered hierarchical data model is a key concept used to reduce the number of write requests associated with each transaction. If the data stored by the database cannot be grouped into an appropriate hierarchy, the reduced latency offered by this storage optimization will not be achieved. In the architecture proposed by F1, the data is remote from the nodes that perform the query processing. An alternative architecture is to keep the data on the same nodes that perform the query processing. The authors of [5] have identified the main bottlenecsk of traditional relational databases to be: write-ahead logging, two-phase locking, data structure latching and buffer management. The database VoltDB [6] was implemented in the spirit of these previous ideas and proposed a distributed in-memory architecture. In this system, nodes are single-threaded, eliminating the need for locking and latching, while the full in-memory architecture simplifies the buffer management process. 8 Conclusions This report has analyzed F1, a distributed SQL database. The authors manage to successfully combine the advantages of SQL and NoSQL systems in a system that provides transparent scalability, strong consistency and very high availability. This is done using a multilayer architecture, in which the query processing components are physically separated from the data storage components. Such an architecture provides good scalability and availability but the additional physical layer impacts the write latency negatively. This additional network latency is mitigated using a clustered hierarchical schema instead of a traditional relational schema. Request batching is also used to group multiple commands into a single network request in order to reduce the impact of network latency. The authors report that the system has been successfully used in production and the user-facing latency of their application is on par with the latency when using the previous database system. References [1] Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. F1: A distributed SQL database that scales. In VLDB, [2] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson C. Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google s globally distributed database. ACM Trans. Comput. Syst., 31(3):8,

10 [3] Andrew Fikes. Storage architecture and challenges. googleusercontent.com/media/research.google.com/en//university/ relations/facultysummit2010/storage_architecture_and_challenges.pdf, July [4] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. SIGOPS Oper. Syst. Rev., 37(5):29 43, October [5] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. OLTP Through the Looking Glass, and What We Found There. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 08, pages , New York, NY, USA, ACM. [6] Michael Stonebraker and Ariel Weisberg. The VoltDB main memory DBMS. 10

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Data-Intensive Computing Course Code: CS4480

More information

A Taxonomy of Partitioned Replicated Cloud-based Database Systems

A Taxonomy of Partitioned Replicated Cloud-based Database Systems A Taxonomy of Partitioned Replicated Cloud-based Database Divy Agrawal University of California Santa Barbara Kenneth Salem University of Waterloo Amr El Abbadi University of California Santa Barbara Abstract

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

F1: A Distributed SQL Database That Scales

F1: A Distributed SQL Database That Scales F1: A Distributed SQL Database That Scales Jeff Shute Radek Vingralek Bart Samwel Ben Handy Chad Whipkey Eric Rollins Mircea Oancea Kyle Littlefield David Menestrina Stephan Ellner John Cieslewicz Ian

More information

How To Build Cloud Storage On Google.Com

How To Build Cloud Storage On Google.Com Building Scalable Cloud Storage Alex Kesselman alx@google.com Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage What does a customer want from a cloud service? Reliability

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

TECHNICAL OVERVIEW HIGH PERFORMANCE, SCALE-OUT RDBMS FOR FAST DATA APPS RE- QUIRING REAL-TIME ANALYTICS WITH TRANSACTIONS.

TECHNICAL OVERVIEW HIGH PERFORMANCE, SCALE-OUT RDBMS FOR FAST DATA APPS RE- QUIRING REAL-TIME ANALYTICS WITH TRANSACTIONS. HIGH PERFORMANCE, SCALE-OUT RDBMS FOR FAST DATA APPS RE- QUIRING REAL-TIME ANALYTICS WITH TRANSACTIONS Overview VoltDB is a fast in-memory relational database system (RDBMS) for high-throughput, operational

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Hosting Transaction Based Applications on Cloud

Hosting Transaction Based Applications on Cloud Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India

More information

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.

More information

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Megastore: Providing Scalable, Highly Available Storage for Interactive Services Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M Léon, Y. Li, A. Lloyd, V. Yushprakh Google Inc. Originally

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin N. Silva ysilva@asu.edu Suzanne W. Dietrich dietrich@asu.edu Lisa M. Tsosie lmtsosi1@asu.edu Jason M. Reed jmreed3@asu.edu ABSTRACT An important

More information

Data Management Course Syllabus

Data Management Course Syllabus Data Management Course Syllabus Data Management: This course is designed to give students a broad understanding of modern storage systems, data management techniques, and how these systems are used to

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

The Google File System

The Google File System The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Performance of Scalable Data Stores in Cloud

Performance of Scalable Data Stores in Cloud Performance of Scalable Data Stores in Cloud Pankaj Deep Kaur, Gitanjali Sharma Abstract Cloud computing has pervasively transformed the way applications utilized underlying infrastructure like systems

More information

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

Low-Latency Multi-Datacenter Databases using Replicated Commit

Low-Latency Multi-Datacenter Databases using Replicated Commit Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi University of California Santa Barbara, CA, USA {hatem,nawab,pucher,agrawal,amr}@cs.ucsb.edu

More information

Modularity and Scalability in Calvin

Modularity and Scalability in Calvin Modularity and Scalability in Calvin Alexander Thomson Google agt@google.com Daniel J. Abadi Yale University dna@cs.yale.edu Abstract Calvin is a transaction scheduling and replication management layer

More information

Online, Asynchronous Schema Change in F1

Online, Asynchronous Schema Change in F1 Online, Asynchronous Schema Change in F1 Ian Rae University of Wisconsin Madison ian@cs.wisc.edu Eric Rollins Google, Inc. erollins@google.com Jeff Shute Google, Inc. jshute@google.com ABSTRACT Sukhdeep

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able

More information

White Paper. Optimizing the Performance Of MySQL Cluster

White Paper. Optimizing the Performance Of MySQL Cluster White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....

More information

Data Distribution with SQL Server Replication

Data Distribution with SQL Server Replication Data Distribution with SQL Server Replication Introduction Ensuring that data is in the right place at the right time is increasingly critical as the database has become the linchpin in corporate technology

More information

- Behind The Cloud -

- Behind The Cloud - - Behind The Cloud - Infrastructure and Technologies used for Cloud Computing Alexander Huemer, 0025380 Johann Taferl, 0320039 Florian Landolt, 0420673 Seminar aus Informatik, University of Salzburg Overview

More information

The Google File System

The Google File System The Google File System Motivations of NFS NFS (Network File System) Allow to access files in other systems as local files Actually a network protocol (initially only one server) Simple and fast server

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture References Anatomy of a database system. J. Hellerstein and M. Stonebraker. In Red Book (4th

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Communication System Design Projects

Communication System Design Projects Communication System Design Projects PROFESSOR DEJAN KOSTIC PRESENTER: KIRILL BOGDANOV KTH-DB Geo Distributed Key Value Store DESIGN AND DEVELOP GEO DISTRIBUTED KEY VALUE STORE. DEPLOY AND TEST IT ON A

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive

Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive E. Laxmi Lydia 1,Dr. M.Ben Swarup 2 1 Associate Professor, Department of Computer Science and Engineering, Vignan's Institute

More information

Big Table A Distributed Storage System For Data

Big Table A Distributed Storage System For Data Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,

More information

FIT: A Distributed Database Performance Tradeoff

FIT: A Distributed Database Performance Tradeoff FIT: A Distributed Database Performance Tradeoff Jose M. Faleiro Yale University jose.faleiro@yale.edu Daniel J. Abadi Yale University dna@cs.yale.edu Abstract Designing distributed database systems is

More information

CumuloNimbo: A Cloud Scalable Multi-tier SQL Database

CumuloNimbo: A Cloud Scalable Multi-tier SQL Database CumuloNimbo: A Cloud Scalable Multi-tier SQL Database Ricardo Jimenez-Peris Univ. Politécnica de Madrid Ivan Brondino Univ. Politécnica de Madrid Marta Patiño-Martinez Univ. Politécnica de Madrid José

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

On the Design and Scalability of Distributed Shared-Data Databases

On the Design and Scalability of Distributed Shared-Data Databases On the Design and Scalability of Distributed Shared-Data Databases Simon Loesing Markus Pilman Thomas Etter Donald Kossmann Department of Computer Science Microsoft Research ETH Zurich, Switzerland Redmond,

More information

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information

How To Write A Database Program

How To Write A Database Program SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008 One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code

More information

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.

More information

A Comparison of Approaches to Large-Scale Data Analysis

A Comparison of Approaches to Large-Scale Data Analysis A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce

More information

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs MS-40074: Microsoft SQL Server 2014 for Oracle DBAs Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills and experience as an Oracle

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012 In-Memory Columnar Databases HyPer Arto Kärki University of Helsinki 30.11.2012 1 Introduction Columnar Databases Design Choices Data Clustering and Compression Conclusion 2 Introduction The relational

More information

Tier Architectures. Kathleen Durant CS 3200

Tier Architectures. Kathleen Durant CS 3200 Tier Architectures Kathleen Durant CS 3200 1 Supporting Architectures for DBMS Over the years there have been many different hardware configurations to support database systems Some are outdated others

More information

SCHEDULING IN CLOUD COMPUTING

SCHEDULING IN CLOUD COMPUTING SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions Concepts of Database Management Seventh Edition Chapter 7 DBMS Functions Objectives Introduce the functions, or services, provided by a DBMS Describe how a DBMS handles updating and retrieving data Examine

More information

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led Course Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Massive Data Storage

Massive Data Storage Massive Data Storage Storage on the "Cloud" and the Google File System paper by: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung presentation by: Joshua Michalczak COP 4810 - Topics in Computer Science

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

Future Prospects of Scalable Cloud Computing

Future Prospects of Scalable Cloud Computing Future Prospects of Scalable Cloud Computing Keijo Heljanko Department of Information and Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 7.3-2012 1/17 Future Cloud Topics Beyond

More information

Parallel & Distributed Data Management

Parallel & Distributed Data Management Parallel & Distributed Data Management Kai Shen Data Management Data management Efficiency: fast reads/writes Durability and consistency: data is safe and sound despite failures Usability: convenient interfaces

More information

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do. Database Management Systems Chapter 1 Mirek Riedewald Many slides based on textbook slides by Ramakrishnan and Gehrke 1 Logistics Go to http://www.ccs.neu.edu/~mirek/classes/2010-f- CS3200 for all course-related

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

High-performance metadata indexing and search in petascale data storage systems

High-performance metadata indexing and search in petascale data storage systems High-performance metadata indexing and search in petascale data storage systems A W Leung, M Shao, T Bisson, S Pasupathy and E L Miller Storage Systems Research Center, University of California, Santa

More information

A Distribution Management System for Relational Databases in Cloud Environments

A Distribution Management System for Relational Databases in Cloud Environments JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 169 A Distribution Management System for Relational Databases in Cloud Environments Sze-Yao Li, Chun-Ming Chang, Yuan-Yu Tsai, Seth

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation SQL Server 2014 New Features/In- Memory Store Juergen Thomas Microsoft Corporation AGENDA 1. SQL Server 2014 what and when 2. SQL Server 2014 In-Memory 3. SQL Server 2014 in IaaS scenarios 2 SQL Server

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D. 1 How To Build a High-Performance Data Warehouse How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D. Over the last decade, the

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server Chapter 3 Database Architectures and the Web Transparencies Database Environment - Objectives The meaning of the client server architecture and the advantages of this type of architecture for a DBMS. The

More information

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

A B S T R A C T. Index Terms : Apache s Hadoop, Map/Reduce, HDFS, Hashing Algorithm. I. INTRODUCTION

A B S T R A C T. Index Terms : Apache s Hadoop, Map/Reduce, HDFS, Hashing Algorithm. I. INTRODUCTION Speed- Up Extension To Hadoop System- A Survey Of HDFS Data Placement Sayali Ashok Shivarkar, Prof.Deepali Gatade Computer Network, Sinhgad College of Engineering, Pune, India 1sayalishivarkar20@gmail.com

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Stay Tuned for Today s Session! NAVIGATING THE DATABASE UNIVERSE"

Stay Tuned for Today s Session! NAVIGATING THE DATABASE UNIVERSE Stay Tuned for Today s Session! NAVIGATING THE DATABASE UNIVERSE" Dr. Michael Stonebraker and Scott Jarr! Navigating the Database Universe" A Few Housekeeping Items! Remember to mute your line! Type your

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Basics Of Replication: SQL Server 2000

Basics Of Replication: SQL Server 2000 Basics Of Replication: SQL Server 2000 Table of Contents: Replication: SQL Server 2000 - Part 1 Replication Benefits SQL Server Platform for Replication Entities for the SQL Server Replication Model Entities

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information