Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014



Similar documents
No-SQL Databases for High Volume Data

Introduction to Apache Cassandra

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Enabling SOX Compliance on DataStax Enterprise

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

Going Native With Apache Cassandra. QCon London, 2014

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

So What s the Big Deal?

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric

Distributed Systems. Tutorial 12 Cassandra

Introduction to Cassandra

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

Data Management in the Cloud

Comparing Oracle with Cassandra / DataStax Enterprise

Evaluating Apache Cassandra as a Cloud Database White Paper

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Big Data: Beyond the Hype

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Apache Cassandra and DataStax. DataStax EMEA

DataStax Enterprise Reference Architecture

Dominik Wagenknecht Accenture

Big Data: Beyond the Hype

INTRODUCTION TO CASSANDRA

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Windows Geo-Clustering: SQL Server

Apache Cassandra for Big Data Applications

LARGE-SCALE DATA STORAGE APPLICATIONS

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

DBA'S GUIDE TO NOSQL APACHE CASSANDRA

GigaSpaces Real-Time Analytics for Big Data

nwstor Storage Security Solution 1. Executive Summary 2. Need for Data Security 3. Solution: nwstor isav Storage Security Appliances 4.

The Future of Data Management

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search

Real Time Big Data Processing

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

DataStax Enterprise Reference Architecture. White Paper

Case study: CASSANDRA

Practical Cassandra. Vitalii

Simplifying Database Management with DataStax OpsCenter

Implement Hadoop jobs to extract business value from large and varied data sets

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Oracle Database 12c Plug In. Switch On. Get SMART.

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

High Availability of VistA EHR in Cloud. ViSolve Inc. White Paper February

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

CDH AND BUSINESS CONTINUITY:

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Apache HBase. Crazy dances on the elephant back

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Scalable Architecture on Amazon AWS Cloud

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Integrating a Big Data Platform into Government:

A Survey of Distributed Database Management Systems

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

CSE-E5430 Scalable Cloud Computing Lecture 2

Search and Real-Time Analytics on Big Data

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

How To Scale Out Of A Nosql Database

Practical Guidelines for Selecting NoSQL vs. an RDBMS Deployment Considerations Conclusion About DataStax

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Cassandra A Decentralized, Structured Storage System

SAP HANA Cloud Platform Frequently Asked Questions - Business

NoSQL and Hadoop Technologies On Oracle Cloud

bigdata Managing Scale in Ontological Systems

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

AppSense Environment Manager. Enterprise Design Guide

Apache Hadoop: Past, Present, and Future

Hadoop & Spark Using Amazon EMR

Mark Bennett. Search and the Virtual Machine

HDFS Users Guide. Table of contents

How to Turn the Promise of the Cloud into an Operational Reality

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

NoSQL Data Base Basics

BIG DATA TOOLS. Top 10 open source technologies for Big Data

How to Hadoop Without the Worry: Protecting Big Data at Scale

Cloudwick. CLOUDWICK LABS Big Data Research Paper. Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data

Big Data Management and Security

From Spark to Ignition:

Big Data Trends and HDFS Evolution

EMC IRODS RESOURCE DRIVERS

Transcription:

Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014

About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western Europe www.datastax.com @DataStaxEU @CyanMiller https://www.linkedin.com/in/johnnymiller svandenberg@datastax.com 2014 DataStax Confidential. Do not distribute without consent. jmiller@datastax.com 2

DataStax Founded in April 2010 We drive Apache Cassandra 400+ customers (24 of the Fortune 100) 220+ employees Contribute approximately 80% of the code to Cassandra Home to Apache Cassandra Chair & most committers Headquartered in San Francisco Bay area European headquarters established in London Our Goal To be the first and best database choice for online applications 2014 DataStax Confidential. Do not distribute without consent. 3

Training Checkout the DataStax academy for free online virtual training! http://www.datastax.com/virtual-training Public courses http://www.datastax.com/course-catalog On-site training http://www.datastax.com/training 2014 DataStax Confidential. Do not distribute without consent. 4

DataStax Enterprise for start-ups DataStax gives qualifying start-ups access to DataStax Enterprise for free! For more information: http://www.datastax.com/startup 2014 DataStax Confidential. Do not distribute without consent. 5

DataStax DataStax supports both the open source community and enterprises. Open Source/Community Apache Cassandra (employ Cassandra chair and 80+% of the committers) DataStax Community Edition DataStax OpsCenter DataStax DevCenter DataStax Drivers/Connectors Online Documentation Online Training Mailing lists and forums Enterprise Software DataStax Enterprise Edition Certified Cassandra Built-in Analytics Built-in Enterprise Search Enterprise Security DataStax OpsCenter Expert Support Consultative Help Professional Training 2014 DataStax Confidential. Do not distribute without consent. 6

DataStax Enterprise DataStax Enterprise: LOB* applications, with analytics and search for online/real-time application data. Hadoop: data warehouse applications with analytics and search for data warehouse. Transactions: LOB Style Tunable consistency Analytics: MapReduce Hive Pig Mahout Search Solr LOB App NoSQL LOB App NoSQL LOB App NoSQL Transactions: None Analytics: MapReduce Hive Pig Mahout Search Solr Data Warehouse Hadoop *Line of business 2014 DataStax Confidential. Do not distribute without consent. 7

Availability and Speed Matters for online apps! UK retailers lost 8.5 billion last year to slow web sites, which is 1 million for every 10 million in online sales Over half of all web users expect a response time of 2 seconds or less A 1 second delay causes a nearly 10% reduction in customer interactions A 1 second decrease in Amazon page load time costs the company $1.6 billion in sales 2014 DataStax Confidential. Do not distribute without consent. 8

Apache Cassandra Apache Cassandra is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical online applications. Written in Java and is a hybrid of Amazon Dynamo and Google BigTable Masterless with no single point of failure Distributed and data centre aware 100% uptime Predictable scaling Dynamo CASSANDRA" BigTable BigTable: http://research.google.com/archive/bigtable-osdi06.pdf Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf 2014 DataStax Confidential. Do not distribute without consent. 9

Cassandra Core Values Ease of use Massive scalability High performance Always Available 2014 DataStax Confidential. Do not distribute without consent. 10

Cassandra Performance and Scale In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput. Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2012. Benchmark paper presented at the Very Large Database Conference, 2012. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf Netflix Cloud Benchmark End Point Independent NoSQL Benchmark Highest in throughput Lowest in latency 2014 DataStax Confidential. Do not distribute without consent. 11 http://techblog.netflix.com/2011/11/benchmarkingcassandra-scalability-on.html http://www.datastax.com/wp-content/uploads/2013/02/wp- Benchmarking-Top-NoSQL-Databases.pdf

Cassandra - Performance and Scale Cassandra works for small to huge deployments. Cassandra Footprint @ Netflix 80+ Clusters 2500+ nodes 4 Data Centres (Amazon Regions) > 1 Trillion transactions per day http://planetcassandra.org/functional-use-cases/ 2014 DataStax Confidential. Do not distribute without consent. 12

Cassandra Overview Cassandra was designed with the understanding that system/hardware failures can and do occur Peer-to-peer, distributed system All nodes the same Data partitioned among all nodes in the cluster Custom data replication to ensure fault tolerance Read/Write-anywhere and across data centres Node 5 Node 1 Node 2 Node 4 Node 3 2014 DataStax Confidential. Do not distribute without consent. 13

Cassandra More Than One Server All nodes participate in a cluster Add or remove as needed All nodes the same masterless with no single point of failure Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second Data partitioned among all nodes in the cluster A commit log is used on each node to capture write activity. Data durability is assured Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable) More capacity? Add a server! More throughput? Add a server! Node 5 Node 4 Node 1 Node 3 Node 2 2014 DataStax Confidential. Do not distribute without consent. 14

Cassandra Locally Distributed Client reads or writes to any node Node coordinates with others Data read or replicated in parallel Node 1 1 st copy Replication factor (RF): How many copies of your data? RF = 3 in this example Each node is storing 60% of the clusters total data i.e. 3/5 Node 5 Node 2 2 nd copy Handy Calculator: http://www.ecyrd.com/cassandracalculator/ Node 4 Node 3 3 rd copy 2014 DataStax Confidential. Do not distribute without consent. 15

Cassandra Rack Aware Cassandra is aware of which rack (or availability zone) each node resides in. It will attempt to place each data copy in a different rack. RF = 3 in this example Rack 3 Rack 1 Node 1 1 st copy Rack 1 Node 5 3 rd copy Node 2 Rack 2 Rack 2 Node 4 Node 3 2 nd copy 2014 DataStax Confidential. Do not distribute without consent. 16

Cassandra Data Centre Aware Active Everywhere reads/writes in multiple data centres DC: USA DC: EUROPE Client writes local Data syncs across WAN Replication Factor per DC Node 1 1 st copy Node 1 1 st copy Different number of nodes per data centre Node 5 Node 2 2 nd copy Node 5 Node 2 2 nd copy Node 4 Node 3 3 rd copy Node 4 Node 3 3 rd copy 2014 DataStax Confidential. Do not distribute without consent. 17

Cassandra Tunable Consistency Consistency Level (CL) Client specifies per read or write Handles multi-data center operations 12 μs ack 5 μs ack Node 1 1 st copy ALL = All replicas ack QUORUM = > 51% of replicas ack LOCAL_QUORUM = > 51% in local DC ack Write CL=QUORUM Node 5 Parallel Write Node 2 2 nd copy ONE = Only one replica acks Plus more. (see docs) 12 μs ack Blog: Eventual Consistency!= Hopeful Consistency http://planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistencyhopeful-consistency-by-christos-kalantzis/ Node 4 500 μs ack Node 3 3 rd copy 2014 DataStax Confidential. Do not distribute without consent. 18

Node Failure A single node failure shouldn t bring failure. Replication Factor + Consistency Level = Success This example: RF = 3 CL = QUORUM Write CL=QUORUM 12 μs ack Node 5 5 μs ack Parallel Write Node 1 1 st copy Node 2 2 nd copy >51% ack so request is a success 12 μs ack Node 4 Node 3 3 rd copy 2014 DataStax Confidential. Do not distribute without consent. 19

Node Recovery When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally. When the node recovers, the coordinator replays the missed writes. Note: a hinted write does not count towards the consistency level Note: you should still run repairs across your cluster Node 5 Node 1 1 st copy Node 2 2 nd copy Stores Hints while Node 3 is offline Node 4 Node 3 3 rd copy http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/ dml_about_hh_c.html 2014 DataStax Confidential. Do not distribute without consent. 20

Rack Failure Cassandra will place the data in as many different racks or availability zones as it can. This example: RF = 3 CL = QUORUM Rack 2 fails Data copies still available in Node 1 and Node 5 Quorum can be honored i.e. > 51% ack Rack 3 Node 5 3 rd copy request is a success Rack 1 Node 1 1 st copy Rack 1 Node 2 Rack 2 Rack 2 Node 4 Node 3 2 nd copy 2014 DataStax Confidential. Do not distribute without consent. 21

Don t be afraid of Weak Consistency More tolerant to failure Consistency Level of 1 is the most popular (I think) If you want stronger consistency go for LOCAL_QUORUM i.e. quorum is honored in the local data centre. If you go stronger than LOCAL_QUORUM understand what this means and why you are doing it. Remember you can have different consistency levels for reads and writes e.g. write with CL:1, read with CL:LOCAL_QUORUM 2014 DataStax Confidential. Do not distribute without consent. 22

Example Application Web Tier Web Tier Web Tier Active-Active-Active App Tier Service based DNS routing Web Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication 2014 DataStax Confidential. Do not distribute without consent. 23

Example Application - Uptime Web Tier Web Tier Web Tier Normal service maintenance Application is unaware App Tier Web Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication 2014 DataStax Confidential. Do not distribute without consent. 24

Example Application DC Failure Web Tier Web Tier Web Tier Data is safe. Route Traffic App Tier Web Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication 2014 DataStax Confidential. Do not distribute without consent. 25

Tier Failure App Tier is aware of the other DC Switches to access remote DC automatically Web Tier Web Tier Web Tier App Tier Web Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication 2014 DataStax Confidential. Do not distribute without consent. 26

WAN Failure Web Tier Web Tier Web Tier App Tier Web Tier App Tier Consistency level? App Tier DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication 2014 DataStax Confidential. Do not distribute without consent. 27

Cassandra Clients - Native Driver Clients that use the native driver also have access to various policies that enable the client to intelligently route requests as required. This includes: Load Balancing Data Centre Aware Latency Aware Token Aware Reconnection policies Retry policies Downgrading Consistency Plus others.. http://www.datastax.com/download/clientdrivers 2014 DataStax Confidential. Do not distribute without consent. 28

Quotes Cassandra, our distributed cloud persistence store which is distributed across all zones and regions, dealt with the loss of one third of its regional nodes without any loss of data or availability. http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html During Hurricane Sandy, we lost an entire data center. Completely. Lost. It. Our application fail-over resulted in us losing just a few moments of serving requests for a particular region of the country, but our data in Cassandra never went offline. http://planetcassandra.org/blog/post/outbrain-touches-over-80-of-all-us-online-users-with-help-from-cassandra/ 2014 DataStax Confidential. Do not distribute without consent. 29

Ring-fenced resources If you need to isolate resources for different uses, Cassandra is a great fit. You can create separate virtual data centres optimised as required different workloads, hardware, availability etc.. Cassandra will replicate the data for you no ETL is necessary Customer Facing Cassandra Replication Analytics 2014 DataStax Confidential. Do not distribute without consent. 30

Hybrid Cloud DataStax Enterprise and Cassandra are multi-data centre and cloud capable Data written to any node is automatically and transparently written to all other nodes in multiple data centres i.e. no etl Public Cloud Data Centre 1 Data Centre2 2014 DataStax Confidential. Do not distribute without consent. 31

Security in Cassandra FEATURES BENEFITS Internal Authentication Manages login IDs and passwords inside the database + Ensures only authorized users can access a database system using internal validation + Simple to implement and easy to understand + No learning curve from the relational world Object Permission Management controls who has access to what and who can do what in the database + Provides granular based control over who can add/change/ delete/read data + Uses familiar GRANT/REVOKE from relational systems + No learning curve Client to Node Encryption protects data in flight to and from a database cluster + Ensures data cannot be captured/stolen in route to a server + Data is safe both in flight from/ to a database and on the database; complete coverage is ensured

Advanced Security in DataStax Enterprise FEATURES BENEFITS External Authentication uses external security software packages to control security + Only authorized users have access to a database system using external validation + Uses most trusted external security packages (Kerberos), mainstays in government and finance + Single sign on to all data domains Transparent Data Encryption encrypts data at rest + Protects sensitive data at rest from theft and from being read at the file system level + No changes needed at application level Data Auditing provides trail of who did and looked at what/when + Supplies admins with an audit trail of all accesses and changes + Granular control to audit only what s needed + Uses log4j interface to ensure performance and efficient audit operations

Data Replication Security in Cassandra A popular feature from a data security perspective is the ability to control at a keyspace/schema level which data centres data should be replicated to. What this means is that in a multi-data centre (both physical and virtual) cluster you can ensure that data is not shipped anywhere it shouldn t be and access to that data can be controlled. This is very simple to set-up and is extremely useful when you need to share some of your data, but not all of you data or if you have requirements around where your data is permitted to reside. Shared Data DC 1 DC 2

DataStax Enterprise 4.0, OpsCentre 4.1 DataStax Enterprise 4.0 New in-memory option. Brings all of the goodness of Cassandra to an in-memory database Production-certified version of Apache Cassandra (2.0) Enterprise search enhancements OpsCenter 4.1 Capacity planning updates Better insight into node performance More information: http://www.datastax.com/wp-content/uploads/2014/02/wp-whatsnewdse40.pdf http://www.datastax.com/download 2014 DataStax Confidential. Do not distribute without consent. 35

How does in-memory work? Developers can create new tables to be in-memory or alter existing tables to be in-memory Writes are durable 10-100x improvement 2014 DataStax Confidential. Do not distribute without consent. 36

Find Out More DataStax: http://www.datastax.com Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html Training: http://www.datatstax.com/training Downloads: http://www.datastax.com/download Documentation: http://www.datastax.com/docs Developer Blog: http://www.datastax.com/dev/blog Community Site: http://planetcassandra.org Webinars: http://planetcassandra.org/learn/cassandracommunitywebinars Summit Talks: http://planetcassandra.org/learn/cassandrasummit 2014 DataStax Confidential. Do not distribute without consent. 37

Thank You We power the big data apps that transform business. 2014 DataStax Confidential. Do not distribute without consent. 38