Case study: CASSANDRA

Similar documents
Distributed Systems. Tutorial 12 Cassandra

NoSQL Data Base Basics

Practical Cassandra. Vitalii

Cassandra A Decentralized, Structured Storage System

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

LARGE-SCALE DATA STORAGE APPLICATIONS

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

Introduction to Apache Cassandra

The Apache Cassandra storage engine

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

NOSQL DATABASES AND CASSANDRA

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Apache Cassandra 1.2

Cassandra. Jonathan Ellis

Apache Cassandra 2.0

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Cassandra A Decentralized Structured Storage System

NoSQL Databases. Nikos Parlavantzas

Real-Time Big Data in practice with Cassandra. Michaël

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Apache Cassandra 1.2 Documentation

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Evaluation of NoSQL databases for large-scale decentralized microblogging

Design Patterns for Distributed Non-Relational Databases

Designing Performance Monitoring Tool for NoSQL Cassandra Distributed Database

Cassandra vs MySQL. SQL vs NoSQL database comparison

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

A Survey of Distributed Database Management Systems

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Enabling SOX Compliance on DataStax Enterprise

Distributed Data Stores

A survey of big data architectures for handling massive data

NoSQL Database Options

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

MS 20465C: Designing a Data Solution with Microsoft SQL Server

Big Data with Component Based Software

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Introduction to Cassandra

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

DataStax Enterprise Reference Architecture

Hypertable Architecture Overview

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Designing a Data Solution with Microsoft SQL Server 2014

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

SQL Server AlwaysOn Deep Dive for SharePoint Administrators

How To Use Big Data For Telco (For A Telco)

Designing a Data Solution with Microsoft SQL Server 2014

Designing a Data Solution with Microsoft SQL Server

Cloud Computing at Google. Architecture

Data Management in the Cloud

Course 20465: Designing a Data Solution with Microsoft SQL Server

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

The Sierra Clustered Database Engine, the technology at the heart of

Designing a Data Solution with Microsoft SQL Server

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Apache Cassandra for Big Data Applications

White Paper. Optimizing the Performance Of MySQL Cluster

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Simba Apache Cassandra ODBC Driver

Dynamo: Amazon s Highly Available Key-value Store

NoSQL in der Cloud Why? Andreas Hartmann

Benchmarking Cassandra on Violin

Appendix A Core Concepts in SQL Server High Availability and Replication

Administering a Microsoft SQL Server 2000 Database

CASSANDRA. Arash Akhlaghi, Badrinath Jayakumar, Wa el Belkasim. Instructor: Dr. Rajshekhar Sunderraman. CSC 8711 Project Report

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Course 20465C: Designing a Data Solution with Microsoft SQL Server

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

MinCopysets: Derandomizing Replication In Cloud Storage

Benchmarking the Availability and Fault Tolerance of Cassandra

Distributed Architecture of Oracle Database In-memory

COMPARATIVE STUDY OF NOSQL DOCUMENT, COLUMN STORE DATABASES AND EVALUATION OF CASSANDRA

CDH AND BUSINESS CONTINUITY:

Big Data Challenges in Bioinformatics

6231A - Maintaining a Microsoft SQL Server 2008 Database

Improving Scalability Of Storage System:Object Storage Using Open Stack Swift

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

DISTRIBUTED AND PARALLELL DATABASE

Comparison of Distribution Technologies in Different NoSQL Database Systems

Module 14: Scalability and High Availability

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Microsoft SQL Server Security and Auditing Clay Risenhoover ISACA North Texas April 14,

Administering a Microsoft SQL Server 2000 Database

Designing a Cloud Storage System

Apache Cassandra Query Language (CQL)

Distributed File Systems

Transcription:

Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu

Cassandra: main features Cassandra does not support relationships between column families ( tables ), disregarding foreign keys and join operations. Knowing this, the best practice when designing a data model is to keep related data in the same column family. In this section we will review only the main features of Cassandra as an example 2

Architecture The architecture of Cassandra is completely decentralized and peer-to-peer, meaning all nodes in a Cassandra cluster are equivalent and provide the same functionality: receive read and write requests, or forward them to other nodes. Peer-to-peer, distributed system All nodes the same Data Partitioned Custom data replication 3

Partitioning Cassandra implements automatic partitioning and replication mechanisms to decide which nodes are in charge of each replica. How? PARTITIONER Divide the data across the nodes in the cluster Each node is responsible for a range of the overall data Source: Juan Luis Pérez researcher at BSC (EEDC 2012 master course) 4

Partitioning Node A Node B Node C Node D Source: Juan Luis Pérez researcher at BSC (EEDC 2012 master course) 5

Partitioning Row Key determines node placement raiser name: john pass: **** url: icann.org trucker name: james pass: **** url: w3.org dumpe r name: maria pass: **** biker name: linda pass: **** 6

Partitioning Range of MD5 hash [000..1 400..0] [400..1 800..0] [800..1 c00..0] [c00..1 000..0] 7

Partitioning Row Key MD5 Hash raiser trucker dumpe r biker 65236c... a113f4... d4ab26... 864058... [000..1 400..0] [800..1 c00..0] [400..1 800..0] [c00..1 000..0] 8

Partitioning Row Key MD5 Hash raiser trucker dumpe r biker 65236c... a113f4... d4ab26... 864058... [000..1 400..0] [800..1 c00..0] [400..1 800..0] [c00..1 000..0] 9

Partitioning Row Key MD5 Hash raiser trucker dumpe r biker 65236c... a113f4... d4ab26... 864058... [000..1 400..0] [800..1 c00..0] [400..1 800..0] [c00..1 000..0] 10

Partitioning Row Key MD5 Hash raiser trucker dumpe r biker 65236c... a113f4... d4ab26... 864058... [000..1 400..0] [800..1 c00..0] [400..1 800..0] [c00..1 000..0] 11

Partitioning Row Key MD5 Hash raiser trucker dumpe r biker 65236c... a113f4... d4ab26... 864058... [000..1 400..0] [800..1 c00..0] [400..1 800..0] [c00..1 000..0] 12

Replication Remember: Cassandra implements automatic partitioning and replication mechanisms to decide which nodes are in charge of each replica à The user only needs to configure the number of replicas and the system assigns each replica to a node in the cluster. 13

Replication Cassandra stores multiple copies of rows on multiple nodes Replication factor = number of replicas Replica Placement Strategy DEFAULT: SimpleStrategy NetworkTopologyStrategy Configurable both: Replication factor Placement Strategy 14

Replication SimpleStrategy First replica determined by the partitioner Additional replicas rows are placed on the next nodes clockwise in the ring Original Row raiser Copy Row raiser 15

Replication NetworkTopologyStrategy Allows replication between different racks Racks in a data center or in multiple data centers Reliability & Performance Others 16

Consistency The goal of current distributed key-value stores such as Cassandra is to read and write data operations, exactly the same as any database system However, while traditional databases provide strong consistency guarantees of replicated data by controlling the concurrent execution of transactions, Cassandra provides tunable consistency in order to favour scalability and availability. 17

Consistency Data consistency is tunable by the user when queries are performed, so depending on the desired level of consistency, operations can either return as soon as possible or wait until a majority or all nodes respond Tunable data consistency Choose between strong and eventual consistency Consistency per-operation (reads & writes) 18

Strategy for Read 19 19

Strategy for Writes 20 20

Strong/Weak consistency? As it can be derived from their description, strong consistency can only be achieved when using (Quorum and) All consistency levels. Operations that use weaker consistency levels, such as Zero, Any and One, aren t guaranteed to read the most recent data. However, this weaker consistency provides certain flexibility for applications that can benefit from better performance and don t have strong consistency needs. imagine your facebook wall!!! 21

Caching Data is first written to a commit log for durability Local to the node (for disaster recovery purpouse) Then written to a in-memory structure (memtable) Node that store the row And then to disk (SSTable) once memtable is full Data durability is assured memtable Commit log SSTable Source: Juan Luis Pérez researcher at BSC (EEDC 2012 master course) 22