Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Similar documents
Cloud Scale Distributed Data Storage. Jürmo Mehine

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

MongoDB Developer and Administrator Certification Course Agenda

NoSQL Databases. Nikos Parlavantzas

Search and Real-Time Analytics on Big Data

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Lecture Data Warehouse Systems

Structured Data Storage

Preparing Your Data For Cloud

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

NoSQL Databases. Polyglot Persistence

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL Roadshow Berlin Kai Spichale

Introduction to new high performance storage engines in MongoDB

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

So What s the Big Deal?

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Introduction to Big Data Training

Comparing SQL and NOSQL databases

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Enterprise Operational SQL on Hadoop Trafodion Overview

NoSQL Database Systems and their Security Challenges

How graph databases started the multi-model revolution

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

INTRODUCTION TO CASSANDRA

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Cloud Big Data Architectures

NoSQL Database Options

The Multi-Model Database Cloud Applications in a Complex World

Performance and Scalability Overview

Integrating Big Data into the Computing Curricula

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Scalable Architecture on Amazon AWS Cloud

Oracle Database 12c Plug In. Switch On. Get SMART.

Open Source Technologies on Microsoft Azure

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

How To Write A Database Program

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Big Data Analytics Nokia

Applications for Big Data Analytics

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Database Scalability {Patterns} / Robert Treat

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Large Scale/Big Data Federation & Virtualization: A Case Study

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Advanced Data Management Technologies

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

How To Use Big Data For Telco (For A Telco)

Understanding NoSQL on Microsoft Azure

InfiniteGraph: The Distributed Graph Database

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Performance and Scalability Overview

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

An Open Source NoSQL solution for Internet Access Logs Analysis

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Performance Analysis for NoSQL and SQL

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

AllegroGraph. a graph database. Gary King gwking@franz.com

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

NoSQL and Graph Database

Databases 2 (VU) ( )

NOSQL DATABASE SYSTEMS

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

NOSQL DATABASES AND CASSANDRA

Benchmarking and Analysis of NoSQL Technologies

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise


NoSQL Systems for Big Data Management

CloudDB: A Data Store for all Sizes in the Cloud

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Microsoft Azure Data Technologies: An Overview

Tungsten Replicator, more open than ever!

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Sharding with postgres_fdw

Open source, high performance database

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Practical Cassandra. Vitalii

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

A survey of big data architectures for handling massive data

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

NoSQL: Going Beyond Structured Data and RDBMS

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Transcription:

Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016

Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket by Rackspace - Passionate about MongoDB & Cassandra

What is Polyglot Persistence? A set of applications that use several core database technologies Application Layer Relational Database Key/Value Store Column-family Database

What is Polyglot Persistence? Using the right tool for the right use case

Let there be RDBMS Monoglot was (and still is) fine for simple application (one type of workload) But applications become complex A simple E-commerce platform must have: - Session data (Add to Basket) - Search Engine (Search for products) - Recommendation engine (Customers Who Bought This Item Also Bought) - Payment platform - Geo Location service

Applications growing rapidly

A RDBMS tale In the good old days monoglot = RDBMS Once upon a time there was an RDBMS with performance issues - Vertical scaling - Secondary indexes - Partitioning - Denormalize - Read-Only Slaves - Shading - Start separating workloads But The more we scale the more features we miss

The 3V Era Volume: Amount of data Velocity: Speed of data processing Variety: Number of types of data New databases introduced for Big Data General-purpose DB is no longer on-trend Per use-case Datastores becoming more popular The raise of Polyglot Persistence

CAP theorem

Picking the right tools Data Structure - Relational databases (Oracle, MySQL) - Key-value stores (Redis, Riak) - Column Family stores (Cassandra, Hbase) - Document databases (MongoDB, CouchDB) - Graph databases (Neo4J)

Relational Databases Based on the relational model (Codd) Data are organized in tables (Row and columns) Using SQL (Structured Query Language)

Relational Databases Use when: - Your dataset is relational - Strong consistency needed - Access patterns are unknown But Doesn t scale well horizontally Use Cases: - Due to early adoption are everywhere - Payment systems

Key/Value stores A big hash map associative array - Very simple, One key <-> One value - Very fast read/write - No secondary indexes { Key : (VRN)} => {value (car facts)} [make#ford {YYY0000} => model#fiesta year#2010]

Key/Value stores Use when: - Operations are based on the key - Data is not highly related - Basic CRUD needed But Complex queries are painful Use cases: - Session Data - User Profile/Preferences - Shopping cards

Document Databases Nested structures of keys and their values - Very flexible schema (JSON, XML) - One key One value but value is visible to queries - Supports hierarchical data - Supports secondary indexes { id (VRN)} => { document (car facts)} { make : Ford, {YYY0000} => model : Fiesta, year :2010 }

Document Databases Use when: - You don t know much for the schema - Unstructured and Heterogeneous data But Joins and references are tricky De-normalization requires more space Use cases: - Product Catalog - CMS - Event logging from different sources

Column family In a table, data of the same column is stored together (K-V that V is K-V) - Data organized as columns - Great for sparse tables - Very fast column operation including aggregation { id (VRN)} => { column families (car facts)} { car :{ make : Ford, model : Fiesta }, {YYY0000} => parts :{ }, service : { } }

Use when: - Big Data (Huge write volumes) - Versioning (Time-series data) But Know your statements in advance Schema design is not trivial Use cases: - Time series data - Bidding platforms - Playlists Column family

Graph Databases Inspired by the graph theory Nodes hold Data/Entities Each node connect to others using attribute(s)

Graph Databases Use when: - Highly interconnected data - Define explicit relationships and need traversal queries But Doesn t scale well horizontally Use Cases: - Where 3 rd degree (or higher) relationship needed - Social Media - Queries like friend of a friend

What about Hadoop? Framework for distributed storing and processing large sets of data Use for: - ETL - Read raw data, apply filters create structured summary - Exploration engine / Data discovery - Data archive / Massive Storage But Do not use Hadoop as a database replacement

Picking the right tools - Questions Data structure: - Does it have a natural structure? Is unstructured? - How it is connected to each other? - How is it distributed? - How much data? Access Patterns: - Read/Write ratio? - Uniform or random? - What is more important?

Picking the right tools - Questions Organization needs: - Do I need Authentication? What type? - Do I need Encryption? - Do I need Backups? - Do I need a Disaster Site? - What level of Monitoring? - Drivers, Languages? Tools - Third Party tools - Add-ons/Plugins

Picking the right tools - Questions Maturity - How long is in the market? Documentation - Books, tutorials - Training Type of Support - Community - Commercial support

Challenges Define the architecture - Decide which datastore will use to store certain data - Wrong decision can lead to painful migration(s) Deployment complexity - Provision different type of HW, OS, patches - Backup/Restore - Control configuration changes - Monitor all different components

Challenges Application complexity - Different connection per datastore - Handle different type of errors - Map different results on the application layer - Keep Datastores in-sync (cross-database consistency) - Active/Passive topology - Active/Active topology Training for Devs and Ops - Develop new skills for your teams - Support for a period of time

Don ts and anti-patterns Over engineering - Keep it simple - Remove pieces that don t add value Conform to stereotypes - Use cases are general guidelines - Benchmarks are indicators Stay static - Try new technologies - Being on-trend but cautious

Use Case: Mongo ElasticSearch MongoDB is using B-Tree indexes for everything B-Tree is great but not excel for all use-cases Start getting hard limits for certain use cases like FullText search Lucene engine is better option for Full text search (using inverted indexes) We already had the ElasticSearch to our portfolio so we just connected the two products

Use Case: Mongo Elastic Search - Connector is using the Active/Passive model - Writes must go through MongoDB and then propagated to ElasticSearch - Flexible, user can pick what to propagate: - Database(s) - Collection(s) - Field mapping - Indexing/analyzing

Use Case: Mongo Elastic Search - An initial sync is needed (bulk API) - Connector is using a tailable cursor that reads MongoDB oplog and propagate the changes - Similar to Extract, Transform, Load (ETL) - Application is responsible to properly direct the read requests to the most suitable datastore

Call me Polyglot, Call me Multi-Model MongoDB (version 3 or higher) - MMAPv1 - WiredTiger - In-Memory Percona Server for MongoDB - MMAPv1 - WiredTiger - RocksDB - PerconaFT

The future(?) Multi Model databases Support different models within the engine -OR- Offer different layers on top of the engine OrientDB supporting graph, document and key/value models Relationships are managed as in graph databases with direct connections between records FoundationDB feature layers on top of a key-value store

Keep in touch : I iamantonios antonios.giannopoulos@rackspace.co.uk We are hiring!!! Data Engineers, DevOps, DBAs and more http://objectrocket.com/careers https://www.rackspace.com/talent/

Questions? Thank you!!!