Domain driven design, NoSQL and multi-model databases



Similar documents
Cloud Scale Distributed Data Storage. Jürmo Mehine

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

MongoDB Developer and Administrator Certification Course Agenda

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

NoSQL Databases. Nikos Parlavantzas

Infrastructures for big data

nosql and Non Relational Databases

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

How graph databases started the multi-model revolution

Preparing Your Data For Cloud

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

bigdata Managing Scale in Ontological Systems

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

In Memory Accelerator for MongoDB

Integrating Big Data into the Computing Curricula

An Approach to Implement Map Reduce with NoSQL Databases

The Sierra Clustered Database Engine, the technology at the heart of

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Lecture Data Warehouse Systems

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL Data Base Basics

Can the Elephants Handle the NoSQL Onslaught?

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Data Modeling for Big Data

Choosing The Right Big Data Tools For The Job A Polyglot Approach

MongoDB and Couchbase

Scalable Architecture on Amazon AWS Cloud

Table Of Contents. 1. GridGain In-Memory Database

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Structured Data Storage

Understanding NoSQL on Microsoft Azure

InfiniteGraph: The Distributed Graph Database

INTRODUCTION TO CASSANDRA

NoSQL for SQL Professionals William McKnight

How To Scale Out Of A Nosql Database

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Database Scalability {Patterns} / Robert Treat

A1 and FARM scalable graph database on top of a transactional memory layer

How To Handle Big Data With A Data Scientist

Challenges for Data Driven Systems

Cassandra vs MySQL. SQL vs NoSQL database comparison

NoSQL and Graph Database

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

CitusDB Architecture for Real-Time Big Data

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

NoSQL Evaluation. A Use Case Oriented Survey

Advanced Data Management Technologies

Comparing SQL and NOSQL databases

NOT IN KANSAS ANY MORE

Big Data With Hadoop

Open Source Technologies on Microsoft Azure

Apache HBase. Crazy dances on the elephant back

Slave. Master. Research Scholar, Bharathiar University

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Introduction to Big Data Training

In-memory databases and innovations in Business Intelligence

Splice Machine: SQL-on-Hadoop Evaluation Guide

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

A Survey of Distributed Database Management Systems

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Search and Real-Time Analytics on Big Data

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Management and NoSQL Databases

Why Zalando trusts in PostgreSQL

Big Systems, Big Data

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

MONGODB - THE NOSQL DATABASE

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Chapter 7. Using Hadoop Cluster and MapReduce

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Time series IoT data ingestion into Cassandra using Kaa

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

NoSQL Database Options

Application of NoSQL Database in Web Crawling

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

A survey of big data architectures for handling massive data

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Transcription:

Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com

Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra (Computational Group Theory) Always juggled with big data Now: working in database development, NoSQL, ArangoDB I like: research, hacking, teaching, tickling the highest performance out of computer systems. 1

A typical Project: a Web Shop The Specification Workshop (need recommendation engine, need statistics, etc.) The Developers get to work... (tables, relations, normalisation, schemas, queries, front-ends, etc.) HANDOVER (Why can I not...? This is unusable!) 2

Solution: Agile Approach and Domain Driven Design These days, many use (or try to use): agile methods (Scrum, sprints, rapid prototyping) with continuous feedback from product owners to developers promising less surprises in deployment and high flexibility. Domain Driven Design (Eric Evans, 2004): identify a Domain (area in which software is applied) make a Model (abstract description of situation) use a Ubiquitous Language (that all team members speak) clearly define the Context in which the model applies. Model your data as close to the domain as possible. Example: object oriented programming 3

Fundamental Problem: need a ubiquitous Language Listening to team members, you hear completely different things: Product Managers talk about customers browsing through the shop, powerful search for products (with the good ones up), useful recommendations. Developers talk about tables, normalisation, queries and joins secondary indexes, front-end pages object oriented, model view controller, responsive design = both groups think the others are morons 4

The problem is rooted very deeply functionality not gathered methodically obvious functions are missing no common language misunderstandings about details 5

NoSQL: Richer Data Models are closer to the Domain Some terms used by Evans as part of the ubiquitous language: Entity: has an identity and mutable state (e.g. a person) Value object: is identified by its attributes and immutable (e.g. an address) Aggregate: is a combination of entities and value objects into one transactional unit (e.g. a customer with its orders) Association: is a relation between entities and value objects, can have attributes, usually immutable Consequences These terms coming from the Domain must be present in the Design. The whole team must understand the same when talking about them. 6

Polyglot Persistence Idea Use the right data model for each part of a system. For an application, persist an object or structured data as a JSON document, a hash table in a key/value store, relations between objects in a graph database, a homogeneous array in a relational DBMS. If the table has many empty cells or inhomogeneous rows, use a column-based database. Take scalability needs into account! 7

Document and Key/Value Stores Document store A document store stores a set of documents, which usually means JSON data, these sets are called collections. The database has access to the contents of the documents. each document in the collection has a unique key secondary indexes possible, leading to more powerful queries different documents in the same collection: structure can vary no schema is required for a collection database normalisation can be relaxed Key/value store Opaque values, only key lookup without secondary indexes: = high performance and perfect scalability 8

Graph Databases Graph database A graph database stores a labelled graph. Vertices and edges are documents. Graphs are good to model relations. graphs often describe data very naturally (e.g. the facebook friendship graph) graphs can be stored using tables, however, graph queries notoriously lead to expensive joins there are interesting and useful graph algorithms like shortest path or neighbourhood need a good query language to reap the benefits horizontal scalability is troublesome graph databases vary widely in scope and usage, no standard 9

A typical Use Case an Online Shop We need to hold customer data: usually homogeneous, but still variations = use a document store: product data: even for a specialised business quite inhomogeneous = use a document store: shopping carts: need very fast lookup by session key = use a key/value store: order and sales data: relate customers and products = use a document store: recommendation engine data: links between different entities = use a graph database: 10

Polyglot Persistence is nice, but... Consequence: One needs multiple database systems in the persistence layer of a single project! Polyglot persistence introduces some friction through data synchronisation, data conversion, increased installation and administration effort, more training needs. Wouldn t it be nice,...... to enjoy the benefits without the disadvantages? 11

The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and a key/value store. Vertices are documents in a vertex collection, edges are documents in an edge collection. a single, common query language for all three data models is able to compete with specialised products on their turf allows for polyglot persistence using a single database queries can mix the different data models can replace a RDMBS in many cases 12

A Map of the NoSQL Landscape Operational DBs Complex queries Map/reduce Extensibility Column Stores Structured Data Documents Massively distributed Key/Value Graphs Analytic DBs 13

is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, strong consistency guarantees using transactions is memory efficient by shape detection, uses JavaScript throughout (Google s V8 built into server), API extensible by JavaScript code in the Foxx framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation, and enjoys good community as well as professional support. 14

A Map of the NoSQL Landscape Operational DBs Complex queries Map/reduce Extensibility Column Stores Structured Data Documents Massively distributed Key/Value Graphs Analytic DBs 15

The ArangoDB Territory Operational DBs Complex queries Map/reduce Extensibility Column Stores Structured Data Documents Massively distributed Key/Value Graphs Analytic DBs 16

Strong Consistency ArangoDB offers atomic and isolated CRUD operations for single documents, transactions spanning multiple documents and multiple collections, snapshot semantics for complex queries, very secure durable storage using append only and storing multiple revisions, all this for documents as well as for graphs. In the (near) future, ArangoDB will offer the same ACID semantics even with sharding, implement complete MVCC semantics to allow for lock-free concurrent transactions. 17

Replication and Sharding horizontal scalability Right now, ArangoDB provides easy setup of (asynchronous) replication, which allows read access parallelisation (master/slaves setup), sharding with automatic data distribution to multiple servers. Very soon, ArangoDB will feature fault tolerance by automatic failover and synchronous replication in cluster mode, zero administration by a self-reparing and self-balancing cluster architecture. 18

Powerful query language: AQL The built in Arango Query Language AQL allows complex, powerful and convenient queries, with transaction semantics, allowing to do joins, with user definable functions (in JavaScript). AQL is independent of the driver used and offers protection against injections by design. For Version 2.3, we are reengineering the AQL query engine: use a C++ implementation for high performance, optimise distributed queries in the cluster. 19