Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn tivv00@gmail.com @tivv00
RDBMS problems Sometimes you reach the point where single server can't cope Relational Replication Sharding Not write scalable Data is not instantly visible No foreign keys or joins No transactions Reduced reliability (multiple servers) Schema update is a pain
Cassandra NoSQL Master-Master Replication + Sharding in one bottle Peer-to-peer architecture (no SPOF) Easy cluster reconfiguration Eventual consistency as a standard All data in one record no need to join Flexible schema
Our data We have intelligent Internet cache Intelligent means we don't cache everything or we would need Google's DC It's still hundreds of millions of sites And 10s of TB of packed data Randomly updated Analysis must be able to process all of this in term of hours
Cassandra ring - server - client
Ring partitioner types Order Preserving Each server serves key range Range queries possible Read/Write/Disk space hot spots possible Complex to fix key range Random Data is smoothly distributed on servers No range queries No hot spots Fixed key range
Runtime CAP-solving The whole thing is about replication CAP: Consistency, Availability, Partition tolerance choose two. With cassandra you can choose at runtime.
Runtime CAP-solving Quorum read/write Fast writes Fast reads Fast, less consistency
Data model Keyspaces much like database in RDBMS Column Families storage element, like tables in RDBMS Columns you can have million for a row, names are flexible, still like columns in RDBMS Super Column A column that has structured content, superseded by composite columns
Example Twitter DB Users table ID, Name, Birthday Twitter Keyspace Users CF Key: User ID Name(Str), Birthday(Str) Tweets table UserID, TweetID, TweetContent Timeline CF Key: User ID <TweetID>(TweetContent)
Example (alternative) Twitter DB Twitter Keyspace Users table ID, Name, Birthday Tweets table UserID, TweetID, TweetContent Data CF Key: User ID Name(Str), Birthday(Str), <TweetID>(TweetContent)
Example (data) Users ID Name 1 Tom 2 John Tweets User ID Text 1 1 Hello 1 2 See me? 2 3 See you! Data Key Data 1 Name = Tom T_1 = Hello T_2 = See me? 2 Name = John T_3 = See you!
Data model You can have same key in multiple column families You can have different set of columns for different keys in same column family You can query a range of columns for a key (columns are sorted) with pagination You can have (and it's useful) to have columns without values
ACID vs BASE Super Heroes are good, but not scalable. So, what do we loose?
No Atomicity You've got no transactions no rollback The maximum you have is atomic update to single row Failed operation MAY be applied (that's why counters are not reliable)
Eventual Consistency Cassandra has no central governor This means no bottleneck This also means no one knows if database as a whole is consistent Regular repair is your friend!
No Isolation All mutations are timestamped to restore order from chaotic arrival You MUST have your clock synchronized That's how operation are applied on server :)
Controlled Durability Cassandra uses transaction log to ensure durability on single server Durability of the whole database depends on both total number of replicas and write operation replication factor Remember, single server 99% uptime means 36.6% (0.99 100 ) of full cluster working uptime for 100 servers most time you've got at least one server down!
Data querying With SQL you simply ask. You can easily scan the whole DB Indexes may help Any calculation is repeated each time This can be slow on read
Data querying With NoSQL you can't efficiently scan the whole db No group by or order by You must prepare your data beforehand You have multiple copies of data You must recalculate on application logic change The precalculated reads are fast
Think on your queries in advance! There is no I'll simply add an index, some hints and my query will become fast Any index is created and maintained from application code Now cassandra have secondary indexes, but they are much inferior to custom ones
What's wrong with secondary indexes They work on fixed column names They are consistent with data This means they live near the data they index This means they are distributed between nodes by row key, not by indexed column value This means you need to ask every node to get single value
What's wrong with secondary indexes Node 1 A: phone=1 B: phone=3 Phone index: Node 3 1=A,3=B E: phone=1 F: phone=5 Phone index: 1=E,5=F Node 2 C: phone=3 D: phone=5 Phone index: Node 4 3=C,5=D G: phone=3 H: phone=7 Phone index: 3=G,7=H
Index example Column family people Key: Fred [phone=2223355, phone2=4445566, fax=9998877] Key: John [phone=4445566, mobile=099123456] Column family phone_directory Key: 2223355 [Fred] Key: 4445566 [Fred, John] Key: 9998877 [Fred] Key: 099123456 [John]
Join example Column family customer Key: Boeing [email: boeing@boing.com] Key: Oracle [skype: java] Column family orders Key: 1 [customer: Boeing, total: 200m] Key: 2 [customer: Oracle, total: 300m] Key: 3 [customer: Boeing, total: 500m] Column family customer_order_totals Key: Boeing[ 1:200m, 3:500m] Key: Oracle[ 2:300m]
Peer-to-peer replication Your operation can return OK even if it was not written to every replica Hinted handoff will try to repair later Even if your operation have failed, it may have been written to some replicas This inconsistency won't be repaired automatically This are drawbacks of no master architecture You need to repair regular!
Tombstones and Repair Delete events are recorded as Tombstones to ensure arriving before delete data won't be used Regular repair not only makes sure your data is replicated, but also that your deletes are replicated. If you don't, beware of ghosts!
Resources & Environment Disk space requirements Memory requirements Native plugins & configuration
Disk estimations Say, we've got 1TB of data Replication factor 3 make it 3TB Data duplication make it 12TB Tombstones/repair space make it 24TB Backups make it 36TB
Memory estimations Cassandra has certain in-memory structures that are linear to data amount Key and Row caches configured at column family level. Change defaults if you've got a lot of CFs Bloom filters and key samples cache are configured globally in latest versions Estimate minimum ~0.5% of RAM for your data amount
Native specifics Cassandra (like may other large things) likes JNA. Please install. Cassandra maps files to memory cassandra process virtual and resident memory size will grow because of mmap. Default heap sizes are large tame it if it's not only task on the host
Q&A Author: Vitalii Tymchyshyn tivv00@gmail.com @tivv00