Introduction to Cassandra
|
|
|
- Willa Dawson
- 10 years ago
- Views:
Transcription
1 Introduction to Cassandra DuyHai DOAN, Technical Advocate
2 Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions 2
3 Who Am I?! Duy Hai DOAN Cassandra technical advocate talks, meetups, confs open-source devs (Achilles, ) OSS Cassandra point of contact [email protected] 3
4 Datastax! Founded in April 2010 We contribute a lot to Apache Cassandra 400+ customers (25 of the Fortune 100), 200+ employees Headquarter in San Francisco Bay area EU headquarter in London, offices in France and Germany Datastax Enterprise = OSS Cassandra + extra features 4
5 Cassandra 5 key facts! Key fact 1: linear scalability C* YOU C* C* NetcoSports 3 nodes, 3GB 1k+ nodes, PB+ 5
6 Cassandra 5 key facts! Key fact 2: continuous availability ( 100% up-time) resilient architecture (Dynamo) 6
7 Cassandra 5 key facts! Key fact 3: multi-data centers out-of-the-box (config only) AWS conf for multi-region DCs GCE/CloudStack support Microsoft Azure 7
8 Cassandra 5 key facts! Key fact 4: operational simplicity 1 node = 1 process + 2 config file (main + IP) deployment automation OpsCenter for monitoring 8
9 Cassandra 5 key facts! 9
10 Cassandra 5 key facts! Key fact 5: analytics combo Cassandra + Spark = awesome! realtime streaming/
11 Cassandra architecture! Cluster Replication
12 Cassandra architecture! Cluster layer Amazon DynamoDB paper masterless architecture Data-store layer Google Big Table paper Columns/columns family 12
13 Data distribution! Random: hash of #partition token = hash(#p) Hash: ]-X, X] X = huge number (2 64 /2) n 1 n 2 n 3 n 4 n 5 n 8 n 6 n 7 13
14 Token Ranges! A: ]0, X/8] B: ] X/8, 2X/8] C: ] 2X/8, 3X/8] D: ] 3X/8, 4X/8] B n 2 nc 3 D n 4 E: ] 4X/8, 5X/8] F: ] 5X/8, 6X/8] G: ] 6X/8, 7X/8] A n 1 ne 5 H: ] 7X/8, X] H n 8 nf 6 G n 7 14
15 Distributed Table! nc 3 B n 2 D n 4 user_id 1 user_id 2 user_id 3 A n 1 ne 5 user_id 4 user_id 5 H n 8 nf 6 G n 7 15
16 Distributed Table! nc 3 user_id 1 B n 2 D n 4 user_id 3 A n 1 ne 5 user_id 5 user_id 4 H n 8 nf 6 user_id 2 G n 7 16
17 Linear scalability! 8 nodes n 3 n 2 n 4 n 1 n 5 n 8 n 6 n 7 17
18 Linear scalability! 10 nodes n 3 n 4 n 2 n 5 n 1 n 6 n 10 n 7 n 9 n 8 18
19 Failure tolerance! Replication Factor (RF) = 3 n 3 n 2 n 4 n 1 n 5 n 8 n 6 n 7 19
20 Coordinator node! Incoming requests (read/write) Coordinator node handles the request n 2 n 3 n 4 n 1 n 5 Every node can be coordinator à masterless n 8 n 7 n 6
21 Consistency! Tunable at runtime ONE QUORUM (strict majority w.r.t. RF) ALL Apply both to read & write 21
22 Consistency in action! RF = 3, Write ONE, Read ONE Write ONE: B B A A data replication in progress B A A Read ONE: A 22
23 Consistency in action! RF = 3, Write ONE, Read QUORUM Write ONE: B B A A data replication in progress B A A Read QUORUM: A 23
24 Consistency in action! RF = 3, Write ONE, Read ALL Write ONE: B B A A data replication in progress B A A Read ALL: B 24
25 Consistency in action! RF = 3, Write QUORUM, Read ONE Write QUORUM: B B B A data replication in progress B B A Read ONE: A 25
26 Consistency in action! RF = 3, Write QUORUM, Read QUORUM Write QUORUM: B B B A data replication in progress B B A Read QUORUM: B 26
27 Consistency trade-off! 27
28 Consistency level! ONE Fast, may not read latest written value 28
29 Consistency level! QUORUM Strict majority w.r.t. Replication Factor Good balance 29
30 Consistency level! ALL Paranoid Slow, no high availability 30
31 Consistency summary! ONE Read + ONE Write available for read/write even (N-1) replicas down QUORUM Read + QUORUM Write available for read/write even 1+ replica down 31
32 ! "! Q & R
33 Data model! Last Write Win! CQL basics! Clustered tables! Lightweight transactions!
34 Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES( jdoe, John DOE, 33); #partition jdoe age name 33 John DOE 34
35 Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES( jdoe, jdoe age (t 1 ) name (t 1 ) 33 John DOE 35
36 Last Write Win (LWW)! UPDATE users SET age = 34 WHERE login = jdoe; SSTable 1 SSTable 2 jdoe age (t 1 ) name (t 1 ) 33 John DOE jdoe age (t 2 ) 34 36
37 Last Write Win (LWW)! DELETE age FROM users WHERE login = jdoe; tombstone SSTable 1 SSTable 2 SSTable 3 jdoe age (t 1 ) name (t 1 ) 33 John DOE jdoe age (t 2 ) 34 jdoe age (t 3 ) ý 37
38 Last Write Win (LWW)! SELECT age FROM users WHERE login = jdoe;??? SSTable 1 SSTable 2 SSTable 3 jdoe age (t 1 ) name (t 1 ) 33 John DOE jdoe age (t 2 ) 34 jdoe age (t 3 ) ý 38
39 Last Write Win (LWW)! SELECT age FROM users WHERE login = jdoe; SSTable 1 SSTable 2 SSTable 3 jdoe age (t 1 ) name (t 1 ) 33 John DOE jdoe age (t 2 ) 34 jdoe age (t 3 ) ý 39
40 Compaction! SSTable 1 SSTable 2 SSTable 3 jdoe age (t 1 ) name (t 1 ) 33 John DOE jdoe age (t 2 ) 34 jdoe age (t 3 ) ý New SSTable jdoe age (t 3 ) name (t 1 ) ý John DOE 40
41 Historical data! You want to keep data history? do not use internal generated timestamp!!! time-series data modeling history SSTable 1 SSTable 2 id date 1(t 1 ) date 2 (t 2 ) date 9 (t 9 ) id date 10(t 10 ) date 11 (t 11 ) 41
42 CRUD operations! INSERT INTO users(login, name, age) VALUES( jdoe, John DOE, 33); UPDATE users SET age = 34 WHERE login = jdoe; DELETE age FROM users WHERE login = jdoe; SELECT age FROM users WHERE login = jdoe; 42
43 Simple Table! CREATE TABLE users ( login text, name text, age int, PRIMARY KEY(login)); partition key (#partition) 43
44 Clustered table (1 N)! CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id)); partition key clustering column (sorted) unicity 44
45 On disk layout SSTable 1 jdoe hsue message_id 1 message_id 2 message_id 104 message_id 1 message_id 2 message_id 78 SSTable 2 jdoe message_id 105 message_id 106 message_id
46 Queries! Get message by user and message_id (date) 46
47 Queries! Get message by message_id only (#partition not provided) SELECT * FROM mailbox WHERE message_id = :00:00 ; Get message by date interval (#partition not provided) SELECT * FROM mailbox WHERE and message_id <= :00:00 and message_id >= :00:00 ; 47
48 Without #partition No #partition no token where are my data????????? 48
49 The importance of #partition In RDBMS, no primary key full table scan 49
50 The importance of #partition With Cassandra, no partition key full CLUSTER scan 50
51 Queries! Get message by user range (range query on #partition) SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; Get message by user pattern (non exact match on #partition) SELECT * FROM mailbox WHERE login like %doe% ; 51
52 Clustering order! CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id)) WITH CLUSTERING ORDER BY (message_id DESC) ; 52
53 Reverse on disk layout! SSTable 1 jdoe message_id 169 message_id 168 message_id 105 SSTable 2 jdoe message_id 104 message_id 103 message_id 1 53
54 WHERE clause restrictions! All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition Only exact match (=) on #partition, range queries (<,, >, ) not allowed full cluster scan On clustering columns, only range queries (<,, >, ) and exact match WHERE clause only possible on columns defined
55 WHERE clause restrictions! What if I want to perform «arbitrary» WHERE clause? search form scenario, dynamic search fields 55
56 WHERE clause restrictions! What if I want to perform «arbitrary» WHERE clause? search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL! Apache Solr (Lucene) integration (Datastax Enterprise) Same JVM, 1-cluster-2-products (Solr & Cassandra) 56
57 WHERE clause restrictions! What if I want to perform «arbitrary» WHERE clause? search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL! Apache Solr (Lucene) integration (Datastax Enterprise) Same JVM, 1-cluster-2-products (Solr & Cassandra) SELECT * FROM users WHERE solr_query = age:[33 TO *] AND gender:male ; SELECT * FROM users WHERE solr_query = lastname:*schwei?er ; 57
58 Collections & maps! CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, PRIMARY KEY(login)); Keep cardinality low! ( 1000) 58
59 From SQL to CQL! Normalized User CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_id text, // typical join id content text, PRIMARY KEY((article_id), comment_id)); 1 n Comment 59
60 From SQL to CQL! De-normalized User CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_json text, // de-normalize content text, PRIMARY KEY((article_id), comment_id)); 1 n Comment 60
61 Data modeling best practices! Start by queries identify core functional read paths 1 read scenario 1 SELECT 61
62 Data modeling best practices! Start by queries identify core functional read paths 1 read scenario 1 SELECT Denormalize wisely, only duplicate necessary & immutable data functional/technical trade-off 62
63 Data modeling best practices! Person JSON - firstname/lastname - date of birth - gender - mood - location 63
64 Data modeling best practices! John DOE, male birthdate: 21/02/1981 subscribed since 03/06/2011 San Mateo, CA Impossible is not John DOE Full detail read from User table on click 64
65 Lightweight Transaction (LWT)! What? make operations linearizable Why? solve a class of race conditions in Cassandra that would require installing an external lock manager 65
66 Lightweight Transaction (LWT)! SELECT * FROM account WHERE id= jdoe ; (0 rows) INSERT INTO account (id, ) VALUES ( jdoe, [email protected] ); winner SELECT * FROM account WHERE id= jdoe ; (0 rows) INSERT INTO account (id, ) VALUES ( jdoe, [email protected] ); 66
67 Lightweight Transaction (LWT)! How? implementing Paxos protocol on Cassandra Syntax? INSERT INTO account (id, ) VALUES ( jdoe, [email protected] ) IF NOT EXISTS; UPDATE account SET = [email protected] IF = [email protected] WHERE id= jdoe ; 67
68 Lightweight Transaction (LWT)! Recommendations insert with LWT delete must use LWT INSERT INTO my_table IF NOT EXISTS DELETE FROM my_table IF EXISTS 68
69 Lightweight Transaction (LWT)! Recommendations LWT expensive (4 round-trips), do not abuse only for 1% 5% use cases 69
70 Lightweight Transaction (LWT)! 1 Queue-in 3 Consensus 2 Compare 4 Swap / Learn 70
71 ! "! Q & R
72 Thank You
HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?
Apache Cassandra for Big Data Applications
Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder [email protected] Java User Group Switzerland January 7, 2014 2 AGENDA Cassandra origins and use How we use Cassandra Data
Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014
Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western
Apache Zeppelin, the missing component for your BigData ecosystem
Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate Who Am I?! Duy Hai DOAN Cassandra technical advocate talks, meetups, confs open-source devs (Achilles,
Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere
Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere Speaker Michaël Figuière @mfiguiere 2 Ring Architecture Cassandra 3 Ring Architecture Replica Replica Replica 4 Linear Scalability
Distributed Systems. Tutorial 12 Cassandra
Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise
Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 The Growth in Multiple
Cassandra vs MySQL. SQL vs NoSQL database comparison
Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company Goals Explore some differences of SQL and NoSQL
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
No-SQL Databases for High Volume Data
Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram
Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER
Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER By DataStax Corporation August 2012 Contents Introduction...3 The Growth in Multiple Data Centers...3 Why
NetflixOSS A Cloud Native Architecture
NetflixOSS A Cloud Native Architecture LASER Session 5 Availability September 2013 Adrian Cockcroft @adrianco @NetflixOSS http://www.linkedin.com/in/adriancockcroft Failure Modes and Effects Failure Mode
CQL for Cassandra 2.2 & later
CQL for Cassandra 2.2 & later Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights
Evaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
Apache Cassandra Present and Future. Jonathan Ellis
Apache Cassandra Present and Future Jonathan Ellis History Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 1.0, October 2011 Why people choose Cassandra Multi-master, multi-dc Linearly
Apache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
these three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA
Going Native With Apache Cassandra QCon London, 2014 www.datastax.com @DataStaxEMEA About Me Johnny Miller Solutions Architect www.datastax.com @DataStaxEU [email protected] @CyanMiller https://www.linkedin.com/in/johnnymiller
Apache Cassandra and DataStax. DataStax EMEA
Apache Cassandra and DataStax DataStax EMEA Agenda 1. Apache Cassandra 2. Cassandra Query Language 3. Sensor/Time Data-Modeling 4. DataStax Enterprise 5. Realtime Analytics 6. What s New 2 About me Christian
Preparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
NoSQL: Going Beyond Structured Data and RDBMS
NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate [email protected] Solution Architect Dell Solution Centers Dave
Enabling SOX Compliance on DataStax Enterprise
Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
LARGE-SCALE DATA STORAGE APPLICATIONS
BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
Apache Cassandra Query Language (CQL)
REFERENCE GUIDE - P.1 ALTER KEYSPACE ALTER TABLE ALTER TYPE ALTER USER ALTER ( KEYSPACE SCHEMA ) keyspace_name WITH REPLICATION = map ( WITH DURABLE_WRITES = ( true false )) AND ( DURABLE_WRITES = ( true
Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150
Cassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
Simba Apache Cassandra ODBC Driver
Simba Apache Cassandra ODBC Driver with SQL Connector 2.2.0 Released 2015-11-13 These release notes provide details of enhancements, features, and known issues in Simba Apache Cassandra ODBC Driver with
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER By DataStax Corporation September 2012 Contents Introduction... 3 Overview of HDFS... 4 The Benefits
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying
The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success
The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success 1 Table of Contents Abstract... 3 Introduction... 3 Requirement #1 Smarter Customer Interactions... 4 Requirement
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
DBA'S GUIDE TO NOSQL APACHE CASSANDRA
DBA'S GUIDE TO NOSQL APACHE CASSANDRA THE ENLIGHTENED DBA Smashwords Edition Copyright 2014 The Enlightened DBA This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
MakeMyTrip CUSTOMER SUCCESS STORY
MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently
Big Data Analytics with Cassandra, Spark & MLLib
Big Data Analytics with Cassandra, Spark & MLLib Matthias Niehoff AGENDA Spark Basics In A Cluster Cassandra Spark Connector Use Cases Spark Streaming Spark SQL Spark MLLib Live Demo CQL QUERYING LANGUAGE
Epimorphics Linked Data Publishing Platform
Epimorphics Linked Data Publishing Platform Epimorphics Services for G-Cloud Version 1.2 15 th December 2014 Authors: Contributors: Review: Andy Seaborne, Martin Merry Dave Reynolds Epimorphics Ltd, 2013
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Cassandra. Jonathan Ellis
Cassandra Jonathan Ellis Motivation Scaling reads to a relational database is hard Scaling writes to a relational database is virtually impossible and when you do, it usually isn't relational anymore The
HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
Case study: CASSANDRA
Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Cassandra:
High Throughput Computing on P2P Networks. Carlos Pérez Miguel [email protected]
High Throughput Computing on P2P Networks Carlos Pérez Miguel [email protected] Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
How To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014 www.datastax.com @DataStaxEU
Going Native With Apache Cassandra NoSQL Matters, Cologne, 2014 www.datastax.com @DataStaxEU About Me Johnny Miller Solutions Architect @CyanMiller www.linkedin.com/in/johnnymiller We are hiring www.datastax.com/careers
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
DataStax Enterprise Reference Architecture
DataStax Enterprise Reference Architecture DataStax Enterprise Reference Architecture 7.8.15 1 Table of Contents ABSTRACT... 3 INTRODUCTION... 3 DATASTAX ENTERPRISE... 3 ARCHITECTURE... 3 OPSCENTER: EASY-
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights
Referential Integrity in Cloud NoSQL Databases
Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering
Xiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Time series IoT data ingestion into Cassandra using Kaa
Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka [email protected] Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox
Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra
Data Modeling in the New World with Apache Cassandra TM Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra Download & install Cassandra http://planetcassandra.org/cassandra/ 2014 DataStax. Do
Cloud Computing with Microsoft Azure
Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com [email protected] http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
MariaDB Cassandra interoperability
MariaDB Cassandra interoperability Cassandra Storage Engine in MariaDB Sergei Petrunia Colin Charles Who are we Sergei Petrunia Principal developer of CassandraSE, optimizer developer, formerly from MySQL
Practical Guidelines for Selecting NoSQL vs. an RDBMS Deployment Considerations Conclusion About DataStax
TABLE OF CONTENTS Introduction Why NoSQL? NoSQL 101 Types of NoSQL Databases What are the Advantages of NoSQL Over an RDBMS? A NoSQL Example Apache Cassandra What Makes Cassandra Ideal for Modern Online
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
CQL for Cassandra 2.0 & 2.1
CQL for Cassandra 2.0 & 2.1 Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Introduction to Hbase Gkavresis Giorgos 1470
Introduction to Hbase Gkavresis Giorgos 1470 Agenda What is Hbase Installation About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarise What is Hbase Hbase
May 6, 2013. DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform Systems @pateljay3001
May 6, 2013 DataStax Cassandra South Bay Meetup Cassandra Modeling Best Practices and Examples Jay Patel Architect, Platform Systems @pateljay3001 That s me Technical Architect @ ebay Passion for building
Axibase Time Series Database
Axibase Time Series Database Axibase Time Series Database Axibase Time-Series Database (ATSD) is a clustered non-relational database for the storage of various information coming out of the IT infrastructure.
www.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS
Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical
Integration of Apache Hive and HBase
Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 About Me User and committer of Hadoop since 2007 Contributor to Apache Hadoop, HBase, Hive and Gora Joined
Cassandra A Decentralized Structured Storage System
Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates
Rakam: Distributed Analytics API
Rakam: Distributed Analytics API Burak Emre Kabakcı May 30, 2014 Abstract Today, most of the big data applications needs to compute data in real-time since the Internet develops quite fast and the users
Evaluating Apache Cassandra as a Cloud Database WHITE PAPER
Evaluating Apache Cassandra as a Cloud Database WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Why Move to a Cloud Database?... 3 The Cloud Promises Transparent Elasticity...
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Mark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
Comparing Oracle with Cassandra / DataStax Enterprise
Comparing Oracle with Cassandra / DataStax Enterprise Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Oracle and Today s Online Applications... 3 Architectural Limitations... 3
Apache Cassandra 1.2 Documentation
Apache Cassandra 1.2 Documentation January 13, 2013 2013 DataStax. All rights reserved. Contents Apache Cassandra 1.2 Documentation 1 What's new in Apache Cassandra 1.2 1 Key Improvements 1 Concurrent
