Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA

Similar documents
Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Apache Cassandra Query Language (CQL)

Python Driver 1.0 for Apache Cassandra

Simba Apache Cassandra ODBC Driver

Introduction to Cassandra

No-SQL Databases for High Volume Data

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

DD2471: Modern Database Systems and Their Applications Distributed data management using Apache Cassandra

CQL for Cassandra 2.2 & later

Service Integration course. Cassandra

Informatica Cloud & Redshift Getting Started User Guide

Cassandra in Action ApacheCon NA 2013

CQL for Cassandra 2.0 & 2.1

Java Driver 2.0 for Apache Cassandra

Apache Cassandra for Big Data Applications

Apache Cassandra and DataStax. DataStax EMEA

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Evaluation of NoSQL databases for large-scale decentralized microblogging

SnapLogic Salesforce Snap Reference

Real-Time Big Data in practice with Cassandra. Michaël

Introduction to Apache Cassandra

Getting Started with SandStorm NoSQL Benchmark

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

CASSANDRA. Arash Akhlaghi, Badrinath Jayakumar, Wa el Belkasim. Instructor: Dr. Rajshekhar Sunderraman. CSC 8711 Project Report

Integration of Apache Hive and HBase

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Apache Zeppelin, the missing component for your BigData ecosystem

MariaDB Cassandra interoperability

Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Multimedia im Netz (Online Multimedia) Wintersemester 2014/15. Übung 08 (Hauptfach)

May 6, DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform

Apache Cassandra 1.2 Documentation

Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung bjchung@cubrid.com

Easy-Cassandra User Guide

Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor

So What s the Big Deal?

How graph databases started the multi-model revolution

Enabling SOX Compliance on DataStax Enterprise

Axway API Gateway. Version 7.4.1

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Chapter 9 Java and SQL. Wang Yang wyang@njnet.edu.cn

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Cloudsandra on Brisk. Isidorey

Unified Big Data Processing with Apache Spark. Matei

Windows Geo-Clustering: SQL Server

Open Source Technologies on Microsoft Azure

Getting Started with User Profile Data Modeling. White Paper

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Apache Cassandra 1.2

Solving Large-Scale Database Administration with Tungsten

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Time series IoT data ingestion into Cassandra using Kaa

Weaving Stored Procedures into Java at Zalando

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Volta Log Library user manual

TESTING TOOLS COMP220/285 University of Liverpool slide 1

SQL Injection for newbie

Delivering Intelligence to Publishers Through Big Data

The Internet of Things and Big Data: Intro

Sharding with postgres_fdw

Big Data Analytics with Cassandra, Spark & MLLib

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Plug-In for Informatica Guide

NoSQL Databases. Nikos Parlavantzas

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

let cassandra identify your data

Big Data with Component Based Software

Apache Cassandra 2.0

Integrating Apache Spark with an Enterprise Data Warehouse

New Features in Neuron ESB 2.6

INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3

Software Life-Cycle Management

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Practical Cassandra. Vitalii

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

NoSQL: Going Beyond Structured Data and RDBMS

18.2 user guide No Magic, Inc. 2015

In This Lecture. Security and Integrity. Database Security. DBMS Security Support. Privileges in SQL. Permissions and Privilege.

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

How, What, and Where of Data Warehouses for MySQL

Spring Data JDBC Extensions Reference Documentation

MADOCA II Data Logging System Using NoSQL Database for SPring-8

CitusDB Architecture for Real-Time Big Data

FAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

Spark Application Carousel. Spark Summit East 2015

Ankush Cluster Manager - Cassandra Technology User Guide

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Design and Evolution of the Apache Hadoop File System(HDFS)

Apache Cassandra Present and Future. Jonathan Ellis

Comparing SQL and NOSQL databases

Database Programming. Week *Some of the slides in this lecture are created by Prof. Ian Horrocks from University of Oxford

Hive Interview Questions

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

Transcription:

Going Native With Apache Cassandra QCon London, 2014 www.datastax.com @DataStaxEMEA

About Me Johnny Miller Solutions Architect www.datastax.com @DataStaxEU jmiller@datastax.com @CyanMiller https://www.linkedin.com/in/johnnymiller 2014 DataStax. Do not distribute without consent. 3

DataStax Founded in April 2010 We drive Apache Cassandra 400+ customers (24 of the Fortune 100) 220+ employees Contribute approximately 80% of the code to Cassandra Home to Apache Cassandra Chair & most committers Headquartered in San Francisco Bay area European headquarters established in London We are hiring www.datastax.com/careers 2014 DataStax. Do not distribute without consent. 4

What do I mean by going native? Traditionally, Cassandra clients (Hector, Astynax 1 etc..) were developed using Thrift With Cassandra 1.2 (Jan 2013) and the introduction of CQL3 and the CQL native protocol a new easier way of using Cassandra was introduced. Why? Easier to develop and model Best practices for building modern distributed applications Integrated tools and experience Enable Cassandra to evolve easier and support new features 1 Astynax is being updated to include the native driver: https://github.com/netflix/astyanax/wiki/astyanax-over-java-driver 2014 DataStax. Do not distribute without consent. 5

CQL Cassandra Query Language CQL is intended to provide a common, simpler and easier to use interface into Cassandra - and you probably already know it! e.g. SELECT * FROM users Usual statements CREATE / DROP / ALTER TABLE / SELECT 2014 DataStax. Do not distribute without consent. 6

Creating A Keyspace CREATE KEYSPACE johnny WITH REPLICATION = { class : NetworkTopologyStrategy, USA :3, EU : 2}; DC: USA DC: EU Node 1 1 st copy Node 1 1 st copy Node 5 Node 2 2 nd copy Node 5 Node 2 2 nd copy Node 4 Node 3 3 rd copy Node 4 Node 3 2014 DataStax. Do not distribute without consent. 7

CQL Basics CREATE TABLE sporty_league ( team_name varchar, player_name varchar, jersey int, PRIMARY KEY (team_name, player_name) ); SELECT * FROM sporty_league WHERE team_name = Mighty Mutts and player_name = Lucky ; INSERT INTO sporty_league (team_name, player_name, jersey) VALUES ('Mighty Mutts', Felix, 90); Predicates On the partition key: = and IN On the cluster columns: <, <=, =, >=, >, IN 2014 DataStax. Do not distribute without consent. 8

Collections Data Type CQL supports having columns that contain collections of data. The collection types include: Set, List and Map. CREATE TABLE collections_example ( id int PRIMARY KEY, set_example set<text>, list_example list<text>, map_example map<int, text> ); Some performance considerations around collections. Sometimes more efficient to denormalise further rather than use collections if intending to store lots of data. Favour sets over list more performant Watch out for collection indexing in Cassandra 2.1! 2014 DataStax. Do not distribute without consent. 9

Query Tracing You can turn tracing on or off for queries with the TRACING ON OFF command. This can help you understand what Cassandra is doing and identify any performance problems. Worth reading: http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 2014 DataStax. Do not distribute without consent. 10

Plus much, much more Light Weight Transactions INSERT INTO customer_account (customerid, customer_email) VALUES ( LauraS, lauras@gmail.com ) IF NOT EXISTS; UPDATE customer_account SET customer_email= laurass@gmail.com IF customer_email= lauras@gmail.com ; Counters UPDATE UserActions SET total = total + 2 WHERE user = 123 AND action = xyz'; Time to live (TTL) INSERT INTO users (id, first, last) VALUES ( abc123, abe, lincoln ) USING TTL 3600; Batch Statements BEGIN BATCH INSERT INTO users (userid, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userid = 'user2' INSERT INTO users (userid, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE userid = 'user2 APPLY BATCH; New CQL features coming in Cassandra 2.0.6 http://www.datastax.com/dev/blog/cql-in-2-0-6 2014 DataStax. Do not distribute without consent. 11

CQL Native Protocol

Request Pipelining With it: Without it: 2014 DataStax. Do not distribute without consent. 13

Notifications Polling Notifications are for technical events only: Topology changes Node status changes, Schema changes Without Notifications Pushing With Notifications 2014 DataStax. Do not distribute without consent. 14

Asynchronous Architecture Client Thread 6 Client Thread 1 Driver 5 4 2 3 Client Thread 1 http://netty.io/ 2014 DataStax. Do not distribute without consent. 15

Native Drivers

Native Drivers Get them here: http://www.datastax.com/download Java C# Python C++ (beta) ODBC (beta) Clojure Erlang Node.js Ruby Plus many, many more. 2014 DataStax. Do not distribute without consent. 17

Connect and Write Cluster cluster = Cluster.builder().addContactPoints("10.158.02.40", "10.158.02.44").build(); Session session = cluster.connect("akeyspace"); session.execute( "INSERT INTO user (username, password) + "VALUES( johnny, password1234 )" ); Note: Clusters and Sessions should be long-lived and re-used. 2014 DataStax. Do not distribute without consent. 18

Read from a table ResultSet rs = session.execute("select * FROM user"); List<Row> rows = rs.all(); for (Row row : rows) { String username = row.getstring("username"); String password = row.getstring("password"); } 2014 DataStax. Do not distribute without consent. 19

Asynchronous Read ResultSetFuture future = session.executeasync( "SELECT * FROM user"); for (Row row : future.get()) { String username = row.getstring("username"); } String password = row.getstring("password"); Note: The future returned implements Guava's ListenableFuture interface. This means you can use all Guava's Futures 1 methods! 1 http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/futures.html 2014 DataStax. Do not distribute without consent. 20

Read with Callbacks final ResultSetFuture future = session.executeasync("select * FROM user"); future.addlistener(new Runnable() { public void run() { for (Row row : future.get()) { String username = row.getstring("username"); String password = row.getstring("password"); } } }, executor); 2014 DataStax. Do not distribute without consent. 21

Parallelize Calls int querycount = 99; List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>(); for (int i=0; i<querycount; i++) { futures.add( session.executeasync("select * FROM user " +"WHERE username = '"+i+"'")); } for(resultsetfuture future : futures) { for (Row row : future.getuninterruptibly()) { //do something } } 2014 DataStax. Do not distribute without consent. 22

Tip If you need to do a lot of work, it s often better to make many small queries concurrently than to make one big query. executeasync and Futures makes this really easy! Big queries can put a high load on one coordinator Big queries can skew your 99 th percentile latencies for other queries If one small query fails you can easily retry, if a big query than you have to retry the whole thing 2014 DataStax. Do not distribute without consent. 23

Prepared Statements PreparedStatement statement = session.prepare( "INSERT INTO user (username, password) " + "VALUES (?,?)"); BoundStatement bs = statement.bind(); bs.setstring("username", "johnny"); bs.setstring("password", "password1234"); session.execute(bs); 2014 DataStax. Do not distribute without consent. 24

Query Builder Query query = QueryBuilder.select().all().from("akeyspace", "user").where(eq("username", "johnny")); query.setconsistencylevel(consistencylevel.one); ResultSet rs = session.execute(query); 2014 DataStax. Do not distribute without consent. 25

Multi Data Center Load Balancing Local nodes are queried first, if none are available the request will be sent to a remote data center Cluster cluster = Cluster.builder().addContactPoints("10.158.02.40", "10.158.02.44").withLoadBalancingPolicy( new DCAwareRoundRobinPolicy("DC1")).build(); Name of the local DC 2014 DataStax. Do not distribute without consent. 26

Token Aware Load Balancing Nodes that own a replica of the data being read or written by the query will be contacted first Cluster cluster = Cluster.builder().addContactPoints("10.158.02.40", "10.158.02.44").withLoadBalancingPolicy( new TokenAwarePolicy( new DCAwareRoundRobinPolicy("DC1"))).build(); http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/policies/tokenawarepolicy.html 2014 DataStax. Do not distribute without consent. 27

Retry Policies This defined the behavior to adopt when a request returns a timeout or is unavailable. Cluster cluster = Cluster.builder().addContactPoints("10.158.02.40", "10.158.02.44").withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy("DC1"))).build(); DefaultRetryPolicy DowngradingConsistencyRetryPolicy FallthroughRetryPolicy LoggingRetryPolicy http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/policies/retrypolicy.html 2014 DataStax. Do not distribute without consent. 28

Reconnection Policies Policy that decides how often the reconnection to a dead node is attempted. Cluster cluster = Cluster.builder().addContactPoints("10.158.02.40", "10.158.02.44 ).withretrypolicy(downgradingconsistencyretrypolicy.instance).withreconnectionpolicy(new ConstantReconnectionPolicy(1000)).withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy("DC1"))).build(); ConstantReconnectionPolicy ExponentialReconnectionPolicy 2014 DataStax. Do not distribute without consent. 29

Automatic Paging This was new in Cassandra 2.0 Previously you would select data in batches 2014 DataStax. Do not distribute without consent. 30

Query Tracing Tracing is enabled on a per-query basis. Query query = QueryBuilder.select().all().from("akeyspace", "user").where(eq("username", "johnny")).enabletracing(); ResultSet rs = session.execute(query); ExecutionInfo executioninfo = rs.getexecutioninfo(); QueryTrace querytrace = executioninfo.getquerytrace(); 2014 DataStax. Do not distribute without consent. 31

DevCenter Desktop app friendly, familiar, productive Free http://www.datastax.com/devcenter 2014 DataStax. Do not distribute without consent. 32

Find Out More DataStax: http://www.datastax.com Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html Training: http://www.datatstax.com/training Downloads: http://www.datastax.com/download Documentation: http://www.datastax.com/docs Developer Blog: http://www.datastax.com/dev/blog Community Site: http://planetcassandra.org Webinars: http://planetcassandra.org/learn/cassandracommunitywebinars 2014 DataStax. Do not distribute without consent. 33

2014 DataStax. Do not distribute without consent. 34