NoSQL Databases. Database and Database Management System. Relational Databases Management Systems (RDMBSs) Notes. Notes. Notes

Similar documents
Big Table A Distributed Storage System For Data

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Distributed storage for structured data

Lecture Data Warehouse Systems

Cloud Computing at Google. Architecture

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Distributed Data Stores

A programming model in Cloud: MapReduce

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Cloud Scale Distributed Data Storage. Jürmo Mehine

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011

NoSQL Databases. Nikos Parlavantzas

Design Patterns for Distributed Non-Relational Databases

Hypertable Architecture Overview

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Distributed Systems. Tutorial 12 Cassandra

Cassandra A Decentralized, Structured Storage System

Can the Elephants Handle the NoSQL Onslaught?

Practical Cassandra. Vitalii

Bigtable: A Distributed Storage System for Structured Data

Cassandra. Jonathan Ellis

Challenges for Data Driven Systems

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Advanced Data Management Technologies

Dynamo: Amazon s Highly Available Key-value Store

Do Relational Databases Belong in the Cloud? Michael Stiefel

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Big Data Analytics. Lucas Rego Drumond

Transactions and ACID in MongoDB

NoSQL Data Base Basics

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Introduction to NOSQL

Preparing Your Data For Cloud

NoSQL. Thomas Neumann 1 / 22

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data With Hadoop

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

NoSQL Database Options

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Data Management in the Cloud

NoSQL Systems for Big Data Management

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

09 Cloud Storage. NoSQL Databases. Dynamo: Amazon s Highly Available Key-value Store. Christof Strauch

Integrating Big Data into the Computing Curricula

Applications for Big Data Analytics

A Survey of Distributed Database Management Systems

Structured Data Storage

nosql and Non Relational Databases

NoSQL, Big Data, and all that

Big Data Analytics. Rasoul Karimi

A survey of big data architectures for handling massive data

NoSQL Database Systems and their Security Challenges

In Memory Accelerator for MongoDB

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Cassandra vs MySQL. SQL vs NoSQL database comparison

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Introduction to Apache Cassandra

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Big Systems, Big Data

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

How To Scale Out Of A Nosql Database

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Big Data Management and NoSQL Databases

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Comparing SQL and NOSQL databases

Scalability. We can measure growth in almost any terms. But there are three particularly interesting things to look at:

How To Improve Performance In A Database

NOSQL DATABASES AND CASSANDRA

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

NoSQL in der Cloud Why? Andreas Hartmann

Dynamo: Amazon s Highly Available Key-value Store

The CAP theorem and the design of large scale distributed systems: Part I

Parallel & Distributed Data Management

bigdata Managing Scale in Ontological Systems

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Introduction to NoSQL

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

MongoDB Developer and Administrator Certification Course Agenda

Domain driven design, NoSQL and multi-model databases

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013


Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Cloud Computing with Microsoft Azure

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Database Scalability and Oracle 12c

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing

NoSQL: Going Beyond Structured Data and RDBMS

Transcription:

NoSQL Databases Amir H. Payberah amir@sics.se KTH Royal Institute of Technology Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 1 / 96 Database and Database Management System Database: an organized collection of data. Database Management System (DBMS): a software that interacts with users, other applications, and the database itself to capture and analyze data. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 2 / 96 Relational Databases Management Systems (RDMBSs) RDMBSs: the dominant technology for storing structured data in web and business applications. SQL is good Rich language and toolset Easy to use and integrate Many vendors They promise: ACID Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 3 / 96

ACID Properties Atomicity All included statements in a transaction are either executed or the whole transaction is aborted without affecting the database. Consistency A database is in a consistent state before and after a transaction. Isolation Transactions can not see uncommitted changes in the database. Durability Changes are written to a disk before a database commits a transaction so that committed data cannot be lost through a power failure. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 4 / 96 RDBMS Challenges Web-based applications caused spikes. Internet-scale data size High read-write rates Frequent schema changes Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 5 / 96 Let s Scale RDBMSs RDBMS were not designed to be distributed. Possible solutions: Replication Sharding Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 6 / 96

Let s Scale RDBMSs - Replication Master/Slave architecture Scales read operations Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 7 / 96 Let s Scale RDBMSs - Sharding Dividing the database across many machines. It scales read and write operations. Cannot execute transactions across shards (partitions). Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 8 / 96 Scaling RDBMSs is Expensive and Inefficient [http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/nosqlwhitepaper.pdf] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 9 / 96

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 10 / 96 NoSQL Avoidance of unneeded complexity High throughput Horizontal scalability and running on commodity hardware Compromising reliability for better performance Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 11 / 96 NoSQL Cost and Performance [http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/nosqlwhitepaper.pdf] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 12 / 96

RDBMS vs. NoSQL [http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/nosqlwhitepaper.pdf] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 13 / 96 NoSQL Data Models Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 14 / 96 NoSQL Data Models [http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 15 / 96

Key-Value Data Model Collection of key/value pairs. Ordered Key-Value: processing over key ranges. Dynamo, Scalaris, Voldemort, Riak,... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 16 / 96 Column-Oriented Data Model Similar to a key/value store, but the value can have multiple attributes (Columns). Column: a set of data values of a particular type. Store and process data by column instead of row. BigTable, Hbase, Cassandra,... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96 Document Data Model { Similar to a column-oriented store, but values can have complex documents, instead of fixed format. Flexible schema. XML, YAML, JSON, and BSON. CouchDB, MongoDB,... } FirstName: "Bob", Address: "5 Oak St.", Hobby: "sailing" { FirstName: "Jonathan", Address: "15 Wanamassa Point Road", Children: [ {Name: "Michael", Age: 10}, {Name: "Jennifer", Age: 8}, ] } Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 18 / 96

Graph Data Model Uses graph structures with nodes, edges, and properties to represent and store data. Neo4J, InfoGrid,... [http://en.wikipedia.org/wiki/graph database] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 19 / 96 Consistency Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 20 / 96 Consistency Strong consistency After an update completes, any subsequent access will return the updated value. Eventual consistency Does not guarantee that subsequent accesses will return the updated value. Inconsistency window. If no new updates are made to the object, eventually all accesses will return the last updated value. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 21 / 96

Quorum Model N: the number of nodes to which a data item is replicated. R: the number of nodes a value has to be read from to be accepted. W: the number of nodes a new value has to be written to before the write operation is finished. To enforce strong consistency: R + W > N Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 22 / 96 Consistency vs. Availability The large-scale applications have to be reliable: availability + redundancy These properties are difficult to achieve with ACID properties. The BASE approach forfeits the ACID properties of consistency and isolation in favor of availability, graceful degradation, and performance. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 23 / 96 BASE Properties Basic Availability Possibilities of faults but not a fault of the whole system. Soft-state Copies of a data item may be inconsistent Eventually consistent Copies becomes consistent at some later time if there are no more updates to that data item Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 24 / 96

CAP Theorem Consistency Consistent state of data after the execution of an operation. Availability Clients can always read and write data. Partition Tolerance Continue the operation in the presence of network partitions. You can choose only two! Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 25 / 96 Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 26 / 96 Dyanmo Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 27 / 96

Dynamo Distributed key/value storage system Scalable and Highly available CAP: it sacrifices strong consistency for availability: always writable Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 28 / 96 Data Model Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 29 / 96 Data Model [http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 30 / 96

Partitioning Key/value, where values are stored as objects. If size of data exceeds the capacity of a single machine: partitioning Consistent hashing is one form of sharding (partitioning). Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 31 / 96 Consistent Hashing Hash both data and nodes using the same hash function in a same id space. partition = hash(d) mod n, d: data, n: number of nodes hash("fatemeh") = 12 hash("ahmad") = 2 hash("seif") = 9 hash("jim") = 14 hash("sverker") = 4 Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 32 / 96 Load Imbalance (1/4) Consistent hashing may lead to imbalance. Node identifiers may not be balanced. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 33 / 96

Load Imbalance (2/4) Consistent hashing may lead to imbalance. Data identifiers may not be balanced. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 34 / 96 Load Imbalance (3/4) Consistent hashing may lead to imbalance. Hot spots. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 35 / 96 Load Imbalance (4/4) Consistent hashing may lead to imbalance. Heterogeneous nodes. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 36 / 96

Load Balancing via Virtual Nodes Each physical node picks multiple random identifiers. Each identifier represents a virtual node. Each node runs multiple virtual nodes. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 37 / 96 Replication To achieve high availability and durability, data should be replicates on multiple nodes. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 38 / 96 Data Consistency Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 39 / 96

Data Consistency Eventual consistency: updates are propagated asynchronously. Each update/modification of an item results in a new and immutable version of the data. Multiple versions of an object may exist. Replicas eventually become consistent. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 40 / 96 Data Versioning (1/2) Use vector clocks for capturing causality, in the form of (node, counter) If causal: older version can be forgotten If concurrent: conflict exists, requiring reconciliation Version branching can happen due to node/network failures. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 41 / 96 Data Versioning (2/2) Client C1 writes new object via Sx. C1 updates the object via Sx. C1 updates the object via Sy. C2 reads D2 and updates the object via Sz. C3 reads D3 and D4 via Sx. The read context is a summary of the clocks of D3 and D4: [(Sx, 2), (Sy, 1), (Sz, 1)]. Reconciliation Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Dynamo API Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 43 / 96 Dynamo API get(key) Return single object or list of objects with conflicting version and context. put(key, context, object) Store object and context under key. Context encodes system metadata, e.g., version number. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 44 / 96 put Operation Coordinator generates new vector clock and writes the new version locally. Send to N nodes. Wait for response from W nodes. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 45 / 96

get Operation Coordinator requests existing versions from N. Wait for response from R nodes. If multiple versions, return all versions that are causally unrelated. Divergent versions are then reconciled. Reconciled version written back. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 46 / 96 Sloppy Quorum Due to partitions, quorums might not exist. Sloppy quorum. Create transient replicas: N healthy nodes from the preference list. Reconcile after partition heals. Say A is unreachable. put will use D. Later, D detects A is alive. Sends the replica to A Removes the replica. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 47 / 96 Membership Management Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 48 / 96

Membership Management Administrator explicitly adds and removes nodes. Gossiping to propagate membership changes. Eventually consistent view. O(1) hop overlay. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 49 / 96 Adding and Removing Nodes A new node X added to system. X is assigned key ranges w.r.t. its virtual servers. For each key range, it transfers the data items. Removing a node: reallocation of keys is a reverse process of adding nodes. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 50 / 96 Failure Detection (1/2) Passive failure detection. Use pings only for detection from failed to alive. In the absence of client requests, node A doesn t need to know if node B is alive. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 51 / 96

Failure Detection (2/2) Anti-entropy for replica synchronization. Use Merkle trees for fast inconsistency detection and minimum transfer of data. Nodes maintain Merkle tree of each key range. Exchange root of Merkle tree to check if the key ranges are updated. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 52 / 96 BigTable Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 53 / 96 Motivation Lots of (semi-)structured data at Google. URLs, TextGreenper-user data, geographical locations,... Big data Billions of URLs, hundreds of millions of users, 100+TB of satellite image data,... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 54 / 96

BigTable Distributed multi-level map Fault-tolerant Scalable and self-managing CAP: strong consistency and partition tolerance Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 55 / 96 Data Model Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 56 / 96 Data Model [http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 57 / 96

Column-Oriented Data Model (1/2) Similar to a key/value store, but the value can have multiple attributes (Columns). Column: a set of data values of a particular type. Store and process data by column instead of row. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 58 / 96 Columns-Oriented Data Model (2/2) In many analytical databases queries, few attributes are needed. Column values are stored contiguously on disk: reduces I/O. [Lars George, Hbase: The Definitive Guide, O Reilly, 2011] Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 59 / 96 BigTable Data Model (1/5) Table Distributed multi-dimensional sparse map Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 60 / 96

BigTable Data Model (2/5) Rows Every read or write in a row is atomic. Rows sorted in lexicographical order. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 61 / 96 BigTable Data Model (3/5) Column The basic unit of data access. Column families: group of (the same type) column keys. Column key naming: family:qualifier Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 62 / 96 BigTable Data Model (4/5) Timestamp Each column value may contain multiple versions. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 63 / 96

BigTable Data Model (5/5) Tablet: contiguous ranges of rows stored together. Tables are split by the system when they become too large. Auto-Sharding Each tablet is served by exactly one tablet server. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 64 / 96 Bigtable API Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 65 / 96 The Bigtable API Metadata operations Create/delete tables, column families, change metadata Writes: single-row, atomic write/delete cells in a row, delete all cells in a row Reads: read arbitrary cells in a Bigtable table Each row read is atomic. One row, all or specific columns, certain timestamps, and... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 66 / 96

Writing Example // Open the table Table *T = OpenOrDie("/bigtable/web/webtable"); // Write a new anchor and delete an old anchor RowMutation r1(t, "com.cnn.www"); r1.set("anchor:www.c-span.org", "CNN"); r1.delete("anchor:www.abc.com"); Operation op; Apply(&op, &r1); Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 67 / 96 Reading Example Scanner scanner(t); scanner.lookup("com.cnn.www"); ScanStream *stream; stream = scanner.fetchcolumnfamily("anchor"); stream->setreturnallversions(); for (;!stream->done(); stream->next()) { printf("%s %s %lld %s\n", scanner.rowname(), stream->columnname(), stream->microtimestamp(), stream->value()); } Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 68 / 96 BigTable Architecture Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 69 / 96

BigTable Cell Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 70 / 96 Main Components Master server Tablet server Client library Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 71 / 96 Master Server One master server. Assigns tablets to tablet server. Balances tablet server load. Garbage collection of unneeded files in GFS. Handles schema changes, e.g., table and column family creations Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 72 / 96

Tablet Server Many tablet servers. Can be added or removed dynamically. Each manages a set of tablets (typically 10-1000 tablets/server). Handles read/write requests to tablets. Splits tablets when too large. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 73 / 96 Client Library Library that is linked into every client. Client data does not move though the master. Clients communicate directly with tablet servers for reads/writes. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 74 / 96 Building Blocks The building blocks for the BigTable are: Google File System (GFS): raw storage Chubby: distributed lock manager Scheduler: schedules jobs onto machines Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 75 / 96

Google File System (GFS) Large-scale distributed file system. Store log and data files. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 76 / 96 Chubby Lock Service Ensure there is only one active master. Store bootstrap location of BigTable data. Discover tablet servers. Store BigTable schema information. Store access control lists. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 77 / 96 Master Startup The master executes the following steps at startup: Grabs a unique master lock in Chubby, which prevents concurrent master instantiations. Scans the servers directory in Chubby to find the live servers. Communicates with every live tablet server to discover what tablets are already assigned to each server. Scans the METADATA table to learn the set of tablets. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 78 / 96

Tablet Assignment 1 tablet 1 tablet server. Master uses Chubby to keep tracks of set of live tablet serves and unassigned tablets. When a tablet server starts, it creates and acquires an exclusive lock in Chubby. Master detects the status of the lock of each tablet server by checking periodically. Master is responsible for finding when tablet server is no longer serving its tablets and reassigning those tablets as soon as possible. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 79 / 96 Table Serving Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 80 / 96 Finding a Tablet Three-level hierarchy. Root tablet contains location of all tablets in a special METADATA table. METADATA table contains location of each tablet under a row. The client library caches tablet locations. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 81 / 96

SSTable (1/2) SSTable file format used internally to store Bigtable data. Immutable, sorted file of key-value pairs. Each SSTable is stored in a GFS file. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 82 / 96 SSTable (2/2) Chunks of data plus a block index. A block index is used to locate blocks. The index is loaded into memory when the SSTable is opened. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 83 / 96 Tablet Serving (1/2) Updates committed to a commit log. Recently committed updates are stored in memory - memtable Older updates are stored in a sequence of SSTables. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 84 / 96

Tablet Serving (2/2) Strong consistency Only one tablet server is responsible for a given piece of data. Replication is handled on the GFS layer. Tradeoff with availability If a tablet server fails, its portion of data is temporarily unavailable until a new server is assigned. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 85 / 96 Loading Tablets To load a tablet, a tablet server does the following: Finds locaton of tablet through its METADATA. Metadata for a tablet includes list of SSTables and set of redo points. Read SSTables index blocks into memory. Read the commit log since the redo point and reconstructs the memtable. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 86 / 96 Compaction Minor compaction Convert the memtable into an SSTable. Merging compaction Reads the contents of a few SSTables and the memtable, and writes out a new SSTable. Major compaction Merging compaction that results in only one SSTable. No deleted records, only sensitive live data. Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 87 / 96

Cassandra Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 88 / 96 Cassandra Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 89 / 96 From Dynamo I Symmetric P2P architecture I Gossip based discovery and error detection I Distributed key-value store: partitioning and topology discovery I Eventual consistency Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 90 / 96

From BigTable Sparse Column oriented sparse array SSTable disk storage Append-only commit log Memtable (buffering and sorting) Immutable sstable files Compaction Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 91 / 96 Summary Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 92 / 96 Summary NoSQL data models: key-value, column-oriented, documentoriented, graph-based Sharding and consistent hashing ACID vs. BASE CAP (Consistency vs. Availability) Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 93 / 96

Summary Dynamo: key/value storage: put and get Data partitioning: consistent hashing Load balancing: virtual server Replication: several nodes, preference list Data versioning: vector clock, resolve conflict at read time by the application Membership management: join/leave by admin, gossip-based to update the nodes views, ping to detect failure Handling transient failure: sloppy quorum Handling permanent failure: Merkle tree Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 94 / 96 Summary BigTable Column-oriented Main components: master, tablet server, client library Basic components: GFS, chubby, SSTable Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 95 / 96 Questions? Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 96 / 96