Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere
Speaker Michaël Figuière @mfiguiere 2
Ring Architecture Cassandra 3
Ring Architecture Replica Replica Replica 4
Linear Scalability Client Writes/s by Count - Replication Factor = 3 5
Client / Server Communication Client? Replica Client Client Replica Client Replica 6
Client / Server Communication Client Replica Client Client Replica Client Replica Coordinator node: Forwards all R/W requests to corresponding replicas 7
Tunable Consistency Time A A A 3 replicas 8
Tunable Consistency Time A A A B A A Write B Write and wait for acknowledge from one node 9
Tunable Consistency R + W < N Time A A A B A A B A A Read waiting for one node to answer Write and wait for acknowledge from one node 10
Tunable Consistency R + W = N Time A A A B B A B B A Read waiting for one node to answer Write and wait for acknowledges from two nodes 11
Tunable Consistency R + W > N Time A A A B B A B B A Read waiting for two nodes to answer Write and wait for acknowledges from two nodes 12
Tunable Consistency R = W = QUORUM Time A A A B B A B B A QUORUM = (N / 2) + 1 13
Request Path Client 1 2 Replica Client Client Client 4 3 3 2 3 2 Replica Replica Coordinator node 14
Column Family Data Model jbellis name Jonathan email jb@ds.com address 123 main state TX dhutch name Daria email dh@ds.com address 45 2nd st state CA egilmore name Eric email eg@ds.com Row Key Columns 15
Column Family Data Model jbellis dhutch dhutch egilmore datastax mzcassie egilmore egilmore datastax mzcassie Row Key Columns 16
CQL3 Data Model Timeline Table user_id gmason tweet_id 1765 author phenry body Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Partition Key Remaining Key 17
CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree CQL CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id)); 18
CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Timeline Physical Layout gmason [1742, author] gwashington [1742, body] I chopped down the... [1765, author] phenry [1765, body] Give me liberty or give... ahamilton [1742, author] gwashington [1742, body] I chopped down the... [1797, author] jadams [1797, body] A government of laws... 19
Denormalized Data Model Data duplicated over several tables 20
Real-Time Analytics Google Analytics gives you immediate statistics about your website traffic 21
Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time)); 22
Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL UPDATE analytics SET views = views + 1, from_search = from_search + 1 WHERE url = '/index.html' AND time = '2012-10-06 12:00'; 23
Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL SELECT * FROM analytics WHERE url = '/index.html' 24
Online Business Intelligence Storage for application in production Distributed batch processing Application Cassandra Hadoop Using results in production Storage for results 25
Stay Tuned! blog.datastax.com @mfiguiere