No-SQL Databases for High Volume Data

Similar documents
Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

INTRODUCTION TO CASSANDRA

So What s the Big Deal?

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Introduction to Apache Cassandra

Cassandra in Action ApacheCon NA 2013

Apache Cassandra and DataStax. DataStax EMEA

Table of Contents... 2

Cassandra A Decentralized Structured Storage System

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers

Oracle Database 12c Plug In. Switch On. Get SMART.

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

GigaSpaces Real-Time Analytics for Big Data

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Introduction to Cassandra

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Distributed Systems. Tutorial 12 Cassandra

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Big Data: Beyond the Hype

Evaluating Apache Cassandra as a Cloud Database White Paper

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search

Introduction to NOSQL

Big Data: Beyond the Hype

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Comparing Oracle with Cassandra / DataStax Enterprise

Transforming ecommerce Big Data into Big Fast Data

Webinar: Modern Data Protection For Next-Gen Apps and Databases

Going Native With Apache Cassandra. QCon London, 2014

The Multi-Model Database Cloud Applications in a Complex World

LARGE-SCALE DATA STORAGE APPLICATIONS

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Search and Real-Time Analytics on Big Data

From Spark to Ignition:

Big Data on Cloud Computing- Security Issues

How To Use Big Data For Telco (For A Telco)

Apache Cassandra for Big Data Applications

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Enabling SOX Compliance on DataStax Enterprise

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Big Data Explained. An introduction to Big Data Science.

Getting Real Real Time Data Integration Patterns and Architectures

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

Fully Managed, High-performance Cassandra Service Powered by DataStax Enterprise

NoSQL Database Options

NoSQL Systems for Big Data Management

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Dominik Wagenknecht Accenture

Preparing Your Data For Cloud

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Lecture Data Warehouse Systems

Safe Harbor Statement

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Windows Azure Storage Scaling Cloud Storage Andrew Edwards Microsoft

Practical Cassandra. Vitalii

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Structured Data Storage

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

Real-Time Big Data in practice with Cassandra. Michaël

Real Time Big Data Processing

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Cloud Computing and Big Data What Technical Writers Need to Know

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Applications for Big Data Analytics

Why NoSQL databases are needed for the Internet of Things

Using an In-Memory Data Grid for Near Real-Time Data Analysis

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

NoSQL for SQL Professionals William McKnight

Welcome to the Real-Time Cloud

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011

Data Refinery with Big Data Aspects

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

[Hadoop, Storm and Couchbase: Faster Big Data]

Drive new Revenue With PaaS/IaaS. Ruslan Synytsky CTO, Jelastic

INTRODUCING APACHE IGNITE An Apache Incubator Project

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Transcription:

Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014

The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram e Cloud ClientServer SemiConnected 2014 DataStax Confidential. Do not distribute without consent. Isolated Social Radically Connected

Businesses Must Close the Gap and Fast Your Business Your Customers

If You Started With This Connected Employees Connected Partners Connected Products Connected Customers Connected Devices

You Would End With This Connected Employees Connected Partners Connected Products Distributed Transactional Database Connected Customers Connected Devices

Apache Cassandra Apache Cassandra is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical online applications Written in Java and is a hybrid of Amazon Dynamo and Google BigTable Masterless with no single point of failure Distributed and data centre aware 100% uptime Predictable scaling Dynamo Cassandra BigTable BigTable: Dynamo: http://research.google.com/archive/bigtable-osdi06.pdf http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Apache Cassandra You are already using it!

Distributed Transactional Database Advantages The Hague

Distributed Transactional Database Advantages The Hague

Distributed Transactional Database Advantages The Hague

Distributed Transactional Database Advantages The Hague

Distributed Transactional Database Advantages The Hague London

Distributed Transactional Database Advantages The Hague London

Delivers 150+ Billion Content Recommendations Per Month Serves content for largest media brands in the world: Reuters, Wall St Journal, USA Today Needed a massively scalable data store High velocity of data with 58,000 links to content per second Always-on data architecture Use Case: Recommendations / Personalization

Distributed Transactional Database Advantages The Hague London

Distributed Transactional Database Advantages The Hague London Groningen

Distributed Transactional Database Advantages The Hague London Groningen

Distributed Transactional Database Advantages The Hague London Groningen

Distributed Transactional Database Advantages The Hague London Groningen

Distributed Transactional Database Advantages The Hague London Groningen

Distributed Transactional Database Advantages The Hague London Groningen

Netflix Delights Customers with Personal Recommendations World s leading streaming media provider with digital revenue $1.5BN+ Tailors content delivery based on viewing preference data captured in Cassandra Increased market cap by 600% since 2012 Introduction of Profiles drove throughput to over 10M transactions per second Replaced Oracle in six data centers, worldwide, 100% in the cloud Use Case: Personalization

Cassandra Always On, No Matter What The Hague London Groningen l Transactional Backbone l Industry Leading Performance l Predictable Scalability l Operational Simplicity l Business Flexibility

Cassandra Operational Simplicity

CAP Theorem Oracle vs Cassandra

Cassandra Tunable Consistency Consistency Level (CL) Client specifies per read or write Handles multi-data center operations 5 μs ack 12 μs ack Write CL=QUORUM Node 1 1st copy Parallel Write Node 5 ALL = All replicas ack QUORUM = > 51% of replicas ack LOCAL_QUORUM = > 51% in local DC ack ONE = Only one replica acks Plus more. (see docs) Node 2 2nd copy 12 μs ack Node 4 Node 3 3rd copy 500 μs ack Blog: Eventual Consistency!= Hopeful Consistency http://planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistencyby-christos-kalantzis/

Common Use Cases Product Catalogs and Playlists Fraud detection Internet of things/ Sensor data Recommendation/ Personalization Messaging

Product Catalogs and Playlists A product catalog is an organized collection of products or services. Playlists refer to userdefined queues of songs, movies, games and lessons. Examples: Shopping carts, gift registries, media playlists. Challenges Rigidity of relational databases Increase in volume and diversity of data Application must have zero downtime Predictable scalability is hard Desire to operate in the cloud Why DataStax? Real-time database infrastructure Rich analytics for flexible access to information Fast search and indexing of data Add new features while the application is online Multiple data centers to ensure applications and data have 100% uptime Customers

Recommendation Engine Recommendation and Personalization Engines understand each person s unique habits and preferences and bring to light products and items that a user may be unaware of and not looking for. Examples: News sites, shopping carts. Challenges Large volumes of user data makes accuracy challenging Merging real-time and historical information Cross-product information Response times need to be fast Predictable scalability is hard Why DataStax? Rich query language and enterprise search to store, search and analyze user activity data Integrations with data lakes allow for the merging of real time and historical data Multi-data center replication ensures applications and data suffer no downtime Linear scalability is predictable Customers

Fraud Detection Fraud detection solutions identify out-of-the-ordinary patterns to prevent malicious attacks on digital and physical assets from unauthorized applications and individuals. Examples: credit card monitoring, application infiltration Challenges Increasing volume of fraudulent attacks across all industries Technology sophistication Limited historical and trend information Information is stored across multiple channels The customer can be the first to spot the fraud Why DataStax? Easy management of high-data volumes Real-time monitoring across channels, sites and data centers Integrations with data lakes allow for the merging of real time and historical data Ease of use in managing and monitoring data Multi-data center replication ensures applications and data suffer no downtime Customers

Internet of Things/ Sensor Data IOT refers to the revolution of a growing number of internet-connected devices that can network and communicate with each other. Challenges Vast and diverse amounts of unstructured data from internet enabled devices Volume of sensors is increasing exponentially Fast-changing technology Support multiple channels with varying data types Predictable scalability is hard Why DataStax? Easy management of high-data volumes Rich query language and enterprise search to store, search and analyze data Dynamic database schema Linear scalability offers predictability Customers

Messaging Messaging facilitates communication, interaction and collaboration between diverse user-groups and applications via social networks, cloud services and more. Examples: SMS, email and instant messaging. Challenges Managing large data volumes at a reasonable cost Real-time updates and information, getting detailed alerts and notifications Predictable scalability is hard Information is stored across multiple platforms and systems Agility Why DataStax? Easy management of high-data volumes Real-time monitoring across channels, sites and data centers Multi-data center replication ensures applications and data suffer no downtime Ease of use in managing and monitoring data Dynamic database schema Customers

The Weather Channel on Learning to use Cassandra If you had a look in the past, you may have found Cassandra had a high learning curve and a fair amount of complexity. CQL3, the native drivers, and virtual nodes have changed the game entirely, making Cassandra a much more accessible and friendly platform. While I have years of experience using Cassandra, my team was mostly new to it; CQL made their transition essentially painless. But where Cassandra really shines is in speed and operational simplicity, and I would say those two points were critical. Robbie Strickland Software Dev Manager

API s and Drivers Application drivers and connectors for all popular developer languages exist for Cassandra and DataStax Enterprise. CQL (Cassandra Query Language) is the primary API Drivers/connectors include: Java C++ Python Ruby PHP

DataStax Enabling The Future Connected Employees Connected Partners Connected Products Distributed Transactional Database Connected Customers Connected Devices

Thank you! edward.wijnen@datastax.com @edwardwijnen