Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere

Similar documents
Introduction to Cassandra

Practical Cassandra. Vitalii

Apache Cassandra for Big Data Applications

Distributed Systems. Tutorial 12 Cassandra

Case study: CASSANDRA

Evaluation of NoSQL databases for large-scale decentralized microblogging

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

So What s the Big Deal?

No-SQL Databases for High Volume Data

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Nikos Parlavantzas

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Going Native With Apache Cassandra. QCon London, 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Introduction to Apache Cassandra

Apache Cassandra Present and Future. Jonathan Ellis

Simba Apache Cassandra ODBC Driver

Cassandra A Decentralized, Structured Storage System

NoSQL and Hadoop Technologies On Oracle Cloud

Hands-on Cassandra. OSCON July 20, Eric

Cloud Computing at Google. Architecture

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

How To Scale Out Of A Nosql Database

NoSQL Database Options

Big Data: Beyond the Hype

INTRODUCTION TO CASSANDRA

TITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12

Cassandra A Decentralized Structured Storage System

CQL for Cassandra 2.0 & 2.1

How To Use Big Data For Telco (For A Telco)

Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Apache Cassandra Query Language (CQL)

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

Time series IoT data ingestion into Cassandra using Kaa

Enabling SOX Compliance on DataStax Enterprise

A Distributed Network Security Analysis System Based on Apache Hadoop-Related Technologies. Jeff Springer, Mehmet Gunes, George Bebis

Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital

LARGE-SCALE DATA STORAGE APPLICATIONS

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón

Bringing Big Data into the Enterprise

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece

CQL for Cassandra 2.2 & later

NoSQL: Going Beyond Structured Data and RDBMS

CitusDB Architecture for Real-Time Big Data

Preparing Your Data For Cloud

Move Data from Oracle to Hadoop and Gain New Business Insights

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Cassandra. Jonathan Ellis

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Future-Proofing MySQL for the Worldwide Data Revolution

Understanding Neo4j Scalability

Rainbird: Real-time

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Apache HBase. Crazy dances on the elephant back

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Big Data: Beyond the Hype

Databases 2 (VU) ( )

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Big Data Technology Core Hadoop: HDFS-YARN Internals

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

May 6, DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform

NOSQL DATABASES AND CASSANDRA

Practical Guidelines for Selecting NoSQL vs. an RDBMS Deployment Considerations Conclusion About DataStax

An Approach to Implement Map Reduce with NoSQL Databases

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Xiaoming Gao Hui Li Thilina Gunarathne

[Hadoop, Storm and Couchbase: Faster Big Data]

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Apache Cassandra 1.2 Documentation

Why Migrate from MySQL to Cassandra?

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Time-Series Databases and Machine Learning

BIG DATA What it is and how to use?

Replicating to everything

CASSANDRA. Arash Akhlaghi, Badrinath Jayakumar, Wa el Belkasim. Instructor: Dr. Rajshekhar Sunderraman. CSC 8711 Project Report

HadoopRDF : A Scalable RDF Data Analysis System

Oracle Database 12c Plug In. Switch On. Get SMART.

Transcription:

Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere

Speaker Michaël Figuière @mfiguiere 2

Ring Architecture Cassandra 3

Ring Architecture Replica Replica Replica 4

Linear Scalability Client Writes/s by Count - Replication Factor = 3 5

Client / Server Communication Client? Replica Client Client Replica Client Replica 6

Client / Server Communication Client Replica Client Client Replica Client Replica Coordinator node: Forwards all R/W requests to corresponding replicas 7

Tunable Consistency Time A A A 3 replicas 8

Tunable Consistency Time A A A B A A Write B Write and wait for acknowledge from one node 9

Tunable Consistency R + W < N Time A A A B A A B A A Read waiting for one node to answer Write and wait for acknowledge from one node 10

Tunable Consistency R + W = N Time A A A B B A B B A Read waiting for one node to answer Write and wait for acknowledges from two nodes 11

Tunable Consistency R + W > N Time A A A B B A B B A Read waiting for two nodes to answer Write and wait for acknowledges from two nodes 12

Tunable Consistency R = W = QUORUM Time A A A B B A B B A QUORUM = (N / 2) + 1 13

Request Path Client 1 2 Replica Client Client Client 4 3 3 2 3 2 Replica Replica Coordinator node 14

Column Family Data Model jbellis name Jonathan email jb@ds.com address 123 main state TX dhutch name Daria email dh@ds.com address 45 2nd st state CA egilmore name Eric email eg@ds.com Row Key Columns 15

Column Family Data Model jbellis dhutch dhutch egilmore datastax mzcassie egilmore egilmore datastax mzcassie Row Key Columns 16

CQL3 Data Model Timeline Table user_id gmason tweet_id 1765 author phenry body Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Partition Key Remaining Key 17

CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree CQL CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id)); 18

CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Timeline Physical Layout gmason [1742, author] gwashington [1742, body] I chopped down the... [1765, author] phenry [1765, body] Give me liberty or give... ahamilton [1742, author] gwashington [1742, body] I chopped down the... [1797, author] jadams [1797, body] A government of laws... 19

Denormalized Data Model Data duplicated over several tables 20

Real-Time Analytics Google Analytics gives you immediate statistics about your website traffic 21

Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time)); 22

Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL UPDATE analytics SET views = views + 1, from_search = from_search + 1 WHERE url = '/index.html' AND time = '2012-10-06 12:00'; 23

Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL SELECT * FROM analytics WHERE url = '/index.html' 24

Online Business Intelligence Storage for application in production Distributed batch processing Application Cassandra Hadoop Using results in production Storage for results 25

Stay Tuned! blog.datastax.com @mfiguiere