How graph databases started the multi-model revolution



Similar documents
Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Database Scalability {Patterns} / Robert Treat

Domain driven design, NoSQL and multi-model databases

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

Preparing Your Data For Cloud

Open Source Technologies on Microsoft Azure

MongoDB Developer and Administrator Certification Course Agenda

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Lecture Data Warehouse Systems

Cloud Scale Distributed Data Storage. Jürmo Mehine

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

NoSQL and Graph Database

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Cloud Computing and Advanced Relationship Analytics

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

NOSQL DATABASES IN EEG/ERP

Lag Sucks! R.J. Lorimer Director of Platform Engineering Electrotank, Inc. Robert Greene Vice President of Technology Versant Corporation

How To Improve Performance In A Database

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010


Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

An Approach to Implement Map Reduce with NoSQL Databases

NoSQL Databases. Nikos Parlavantzas

Sentimental Analysis using Hadoop Phase 2: Week 2

these three NoSQL databases because I wanted to see a the two different sides of the CAP

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Open source, high performance database

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Slave. Master. Research Scholar, Bharathiar University

The Quest for Extreme Scalability

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

An Open Source NoSQL solution for Internet Access Logs Analysis

Solving Large-Scale Database Administration with Tungsten

Scalable Architecture on Amazon AWS Cloud

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Structured Data Storage

Comparing SQL and NOSQL databases

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Integrating Big Data into the Computing Curricula

Understanding NoSQL Technologies on Windows Azure

Benchmarking and Analysis of NoSQL Technologies

In-memory databases and innovations in Business Intelligence

@tobiastrelle. codecentric AG 1

Practical Cassandra. Vitalii

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

NoSQL in der Cloud Why? Andreas Hartmann

Splice Machine: SQL-on-Hadoop Evaluation Guide

InfiniteGraph: The Distributed Graph Database

NoSQL for SQL Professionals William McKnight

Data sharing in the Big Data era

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Graph Database Proof of Concept Report

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

Big Graph Data Management

A1 and FARM scalable graph database on top of a transactional memory layer

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Challenges for Data Driven Systems

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

NoSQL: Going Beyond Structured Data and RDBMS

Big Data Analytics. Rasoul Karimi

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

The Evolution of. Keith Alsheimer, CMO, EDB EnterpriseDB Corporation. All rights reserved. 1

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Using Apache Derby in the real world

Introduction to NOSQL

Scaling To 1 Billion Hits A Day. Chander Dhall Me@ChanderDhall.com

Issues in Big-Data Database Systems

Can the Elephants Handle the NoSQL Onslaught?

Weaving Stored Procedures into Java at Zalando

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 2,

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

INTRODUCTION TO CASSANDRA

nosql and Non Relational Databases

Transcription:

How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015

Welcome to Big Data 90% of the data in the world today has been created in the last two years alone. - IBM

Just Data Jill (Customer) Commodore Amiga 1200 (Product) Order #134 (Order) Luca (Provider) Bruno (Provider) Monitor 40 (Product) Mouse (Product)

Just Data Commodore Data Jill by itself has little Amiga 1200 (Customer) (Product) value, it s the relationship between data Order #134 that gives it Luca (Order) (Provider) Bruno (Provider) incredible value Monitor 40 (Product) Mouse (Product)

Relationships give data meaning Jill (Customer) Commodore Amiga 1200 (Product) (Makes) (Has) Order #134 (Order) (Has) (Sells) Luca (Provider) Bruno (Provider) (Sells) (Has) Monitor 40 (Product) (Sells) Mouse (Product)

Top NoSQL categories Key/Value Databases Document Databases Column Databases Graph Databases

Top NoSQL categories Key/Value Databases Document Databases Column Databases Graph Databases

Why do most NoSQL products avoid managing relationships?

Joins is the Evil Customer CustomerAddress Address ID Name ID Address ID Location 10 John 11 John 10 24 10 33 24 Milan 33 London 24 Mike 32 44 18 Paris 28 Mike Is this familiar? 18 Madrid 44 Moscow

Why is the join so slow?

Index Lookup: how does it work? A- Z A- L M- Z Imagine an Address Book where we want to find Luca s phone number

Index Lookup: how does it work? A- Z A- L M- Z A- L M- Z A- D E- L M- R S- Z Index algorithms are all similar and based on balanced trees

Index Lookup: how does it work? A- Z A- L M- Z A- L M- Z A- D E- L M- R S- Z A- D E- L A- B C- D E- G H- L

Index Lookup: how does it work? A- Z A- L M- Z A- L M- Z A- D E- L M- R S- Z A- D E- L A- B C- D E- G H- L E- G H- L E- F G H- J K- L

Index Lookup: how does it work? A- Z A- B A- D C- D A- L A- D E- L E- G E- L H- L A- L M- Z Found! M- Z This lookup took 5 steps. S- Z With millions of indexed records, the tree depth could be 1000 s of levels! M- R E- G H- L E- F G H- J K- L Luca

Joins Kill Performance Customer CustomerAddress Address Joins are executed every time ID Name ID Address ID Location 10 Johnyou cross 10 relationships 24 24 Milan 11 John 10 33 33 London 24 Mike 28 Mike Querying million of records 32 44 joining 3-4 tables could generate billions of combinations 18 Paris 18 Madrid 44 Moscow

This is why the database query performance suffers as the database increases in size O(Log N)

RDBMS performance on traversal DATABASE SIZE PERFORMANCE

In a world that s becoming more connected, we need a better way to store data and manage relationships Read: Data is important, but relationships are even more fundamental today

A graph database is any storage system that provides index-free adjacency - Marko Rodriguez (author of TinkerPop Blueprints)

Every developer knows the Relational Model, but who knows the Graph one?

Back to school: Graph Theory crash course

Basic Graph Luca Visited Sao Paulo

Property Graph Model* Vertices are directed Luca company: OrientTechnologies Visited on: 2015 Sao Paulo people: 12,000,000 Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/property- Graph- Model

1-N and N-M Relationships Visited on: 2015 Luca Worked Sao Paulo on: 2015 An Edge connects only 2 vertices Use multiple edges to represent 1- N and N- M relationships

Congrats! This is your diploma in «Graph Theory»

The Graph theory is so simple, yet so powerful

How does a true* Graph Database manage relationships? *a Graph layer on top of a DBMS doesn t qualify as a true GraphDB

Each element in the Graph has own immutable Record ID #22:11 Visited #13:55 Luca on: 2015 (Edge) #15:99 Sao Paulo (Vertex) (Vertex)

#13:55 Luca out = #22:11 out = #13:55 #22:11 Visited on: 2015 (Edge) in = #15:99 in = #22:11 #15:99 Sao Paulo (Vertex) (Vertex) Connections use persistent pointers

#13:55 Luca out = #22:11 out = #13:55 #22:11 Visited on: 2015 (Edge) in = #15:99 in = #22:11 #15:99 Sao Paulo (Vertex) (Vertex)

#13:55 Luca out = #22:11 out = #13:55 #22:11 Visited on: 2015 (Edge) in = #15:99 in = #22:11 #15:99 Sao Paulo (Vertex) (Vertex)

A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database

When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age

Graph Databases Easily Manage Complex Relationships Lives in John Thriller Pulp Fiction Theater B NYC Likes Comedy Mr Bean Theater A San Josè No costs to traverse relationships: Recommendation engines Social Applications Spatial Apps Master Data Management Information Clustering Theater C

GraphDB Database Quadrant Graph Relationships Complexity > Key Value Column Relational Document Data Complexity >

GraphDB Database Quadrant Graph These were 1st generation NoSQL Relationships Complexity > products, where each tool was Relational only good at a few use cases Document Column Key Value Data Complexity >

1st Generation NoSQL: Scenario Redis or Memcache (Key/Value) Application Neo4j (GraphDB) Primary DB Oracle (RDBMS) ETL MongoDB (DocDB)

1st Generation NoSQL: Fact In > 90% of use cases, NoSQL products are used as second DBMS

1st Generation NoSQL: Problems - No standard between NoSQL products - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain - Performance and Reliability is hard to predict Redis or Memcache (Key/Value) Application Neo4j (GraphDB) Oracle (RDBMS) ETL MongoDB (DocDB)

2nd Generation NoSQL is Multi-Model

What s Multi-Model DBMS? Key/Value Document Graph Object Multi Model represents the intersection of multiple models in just one product

What s Multi-Model DBMS? Multi Model represents the intersection of multiple models in just one product Key/Value - Just one product to learn and maintain - Just one vendor relationship to manage - No ETL, Document no synchronization required Graph - Performance and Reliability is easy to test from the beginning Object

Relationships give data meaning Jill (Customer) Commodore Amiga 1200 (Product) (Makes) (Has) Order #134 (Order) (Has) (Sells) Luca (Provider) Bruno (Provider) (Sells) (Has) Monitor 40 (Product) (Sells) 3 Wheel Mouse (Product)

Multi-Model domain schema Actor name: string surname: string Legenda: V Vertex Edge Inherits Makes Customer Provider Order number: int date: datetime Sells price: decimal Has price: decimal Product name: string qty: int

Vertices and Edges are Documents Jill { } @rid": 12:382, @class": Customer", name : Jill, surname : Raggio, ` phone : +39 33123212, details : { city : London", tags : millennial } Makes Order General purpose solution: JSON Schema-less Schema-full Schema-hybrid Nested documents Rich indexing and querying Developer friendly

Polymorphic queries SELECT * FROM Customer Jill (Customer) SELECT * FROM Provider Bruno (Provider) Luca (Provider) SELECT * FROM Actor Bruno (Provider) Jill (Customer) Luca (Provider)

Multi-Model complex domains schema Legenda: V Vertex MusicTaste Likes Account Edge Inherits Band Genre Performs Plays Location

Multi-Model complex domains Jill (Account) (Likes) Indie (Genre) (Plays) (Likes) Snow Patrol (Band) (Likes) Luca (Account) (Likes) Rock (Genre) 123, 1st Street Austin, TX (Location) (Performs) April 7, 2015 9pm-11.30pm

Multi-Model Database Quadrant Graph Multi-Model Relationships Complexity > Key Value Column Relational Document Data Complexity >

Multi-Model Solutions

There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine. The Graph is only a layer on top of the engine. Under the hood they do JOINs, which means traversal time is affected by database size.

Meet OrientDB The First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs

With a true Graph, Document, Key/Value and Object Oriented engine

OrientDB features FEATURES ORIENTDB)) MONGODB NEO4J MYSQL) (RDBMS) Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X

DEMO

API & Standards Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API SQL + extensions for graphs JDBC driver to connect any BI tool HTTP/JSON support Drivers in Java, Node.js, Python, PHP,.NET, Perl, C/C++ and more

Availability and Integrity C C C C C C C Master Node Multi-master Replication Master Node Atomic, Consistent, Isolated and Durable (ACID) multi-statement transactions

Scalability and Performance C C C C C C C Master Node Master Node Auto- Discovered Node Multi-Master Replication, Sharding and Auto- Discovery to Simplify Ops +200k Tps on Commodity Hardware

Some numbers 70+ Committers contributing to the product 1000s Users from SMBs to Fortune 10 Companies. 50,000 Downloads per Month from 200+ countries. 17+ Years of Research have been put in the product

A Bright Future Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3 rd fastest growing category

Some of Our Customers

Get Started for Free OrientDB Community Edition is FREE for any purpose (Apache 2 license) OrientDB Enterprise is Free for Development Udemy Getting Started Training is and Free http://www.orientechnologies.com/getting-started

Thank you. Ask your questions on Twitter for the Big Data Panel using #QCONBIGDATA Luca Garulli @lgarulli www.orientdb.com