How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015
Welcome to Big Data 90% of the data in the world today has been created in the last two years alone. - IBM
Just Data Jill (Customer) Commodore Amiga 1200 (Product) Order #134 (Order) Luca (Provider) Bruno (Provider) Monitor 40 (Product) Mouse (Product)
Just Data Commodore Data Jill by itself has little Amiga 1200 (Customer) (Product) value, it s the relationship between data Order #134 that gives it Luca (Order) (Provider) Bruno (Provider) incredible value Monitor 40 (Product) Mouse (Product)
Relationships give data meaning Jill (Customer) Commodore Amiga 1200 (Product) (Makes) (Has) Order #134 (Order) (Has) (Sells) Luca (Provider) Bruno (Provider) (Sells) (Has) Monitor 40 (Product) (Sells) Mouse (Product)
Top NoSQL categories Key/Value Databases Document Databases Column Databases Graph Databases
Top NoSQL categories Key/Value Databases Document Databases Column Databases Graph Databases
Why do most NoSQL products avoid managing relationships?
Joins is the Evil Customer CustomerAddress Address ID Name ID Address ID Location 10 John 11 John 10 24 10 33 24 Milan 33 London 24 Mike 32 44 18 Paris 28 Mike Is this familiar? 18 Madrid 44 Moscow
Why is the join so slow?
Index Lookup: how does it work? A- Z A- L M- Z Imagine an Address Book where we want to find Luca s phone number
Index Lookup: how does it work? A- Z A- L M- Z A- L M- Z A- D E- L M- R S- Z Index algorithms are all similar and based on balanced trees
Index Lookup: how does it work? A- Z A- L M- Z A- L M- Z A- D E- L M- R S- Z A- D E- L A- B C- D E- G H- L
Index Lookup: how does it work? A- Z A- L M- Z A- L M- Z A- D E- L M- R S- Z A- D E- L A- B C- D E- G H- L E- G H- L E- F G H- J K- L
Index Lookup: how does it work? A- Z A- B A- D C- D A- L A- D E- L E- G E- L H- L A- L M- Z Found! M- Z This lookup took 5 steps. S- Z With millions of indexed records, the tree depth could be 1000 s of levels! M- R E- G H- L E- F G H- J K- L Luca
Joins Kill Performance Customer CustomerAddress Address Joins are executed every time ID Name ID Address ID Location 10 Johnyou cross 10 relationships 24 24 Milan 11 John 10 33 33 London 24 Mike 28 Mike Querying million of records 32 44 joining 3-4 tables could generate billions of combinations 18 Paris 18 Madrid 44 Moscow
This is why the database query performance suffers as the database increases in size O(Log N)
RDBMS performance on traversal DATABASE SIZE PERFORMANCE
In a world that s becoming more connected, we need a better way to store data and manage relationships Read: Data is important, but relationships are even more fundamental today
A graph database is any storage system that provides index-free adjacency - Marko Rodriguez (author of TinkerPop Blueprints)
Every developer knows the Relational Model, but who knows the Graph one?
Back to school: Graph Theory crash course
Basic Graph Luca Visited Sao Paulo
Property Graph Model* Vertices are directed Luca company: OrientTechnologies Visited on: 2015 Sao Paulo people: 12,000,000 Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/property- Graph- Model
1-N and N-M Relationships Visited on: 2015 Luca Worked Sao Paulo on: 2015 An Edge connects only 2 vertices Use multiple edges to represent 1- N and N- M relationships
Congrats! This is your diploma in «Graph Theory»
The Graph theory is so simple, yet so powerful
How does a true* Graph Database manage relationships? *a Graph layer on top of a DBMS doesn t qualify as a true GraphDB
Each element in the Graph has own immutable Record ID #22:11 Visited #13:55 Luca on: 2015 (Edge) #15:99 Sao Paulo (Vertex) (Vertex)
#13:55 Luca out = #22:11 out = #13:55 #22:11 Visited on: 2015 (Edge) in = #15:99 in = #22:11 #15:99 Sao Paulo (Vertex) (Vertex) Connections use persistent pointers
#13:55 Luca out = #22:11 out = #13:55 #22:11 Visited on: 2015 (Edge) in = #15:99 in = #22:11 #15:99 Sao Paulo (Vertex) (Vertex)
#13:55 Luca out = #22:11 out = #13:55 #22:11 Visited on: 2015 (Edge) in = #15:99 in = #22:11 #15:99 Sao Paulo (Vertex) (Vertex)
A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database
When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age
Graph Databases Easily Manage Complex Relationships Lives in John Thriller Pulp Fiction Theater B NYC Likes Comedy Mr Bean Theater A San Josè No costs to traverse relationships: Recommendation engines Social Applications Spatial Apps Master Data Management Information Clustering Theater C
GraphDB Database Quadrant Graph Relationships Complexity > Key Value Column Relational Document Data Complexity >
GraphDB Database Quadrant Graph These were 1st generation NoSQL Relationships Complexity > products, where each tool was Relational only good at a few use cases Document Column Key Value Data Complexity >
1st Generation NoSQL: Scenario Redis or Memcache (Key/Value) Application Neo4j (GraphDB) Primary DB Oracle (RDBMS) ETL MongoDB (DocDB)
1st Generation NoSQL: Fact In > 90% of use cases, NoSQL products are used as second DBMS
1st Generation NoSQL: Problems - No standard between NoSQL products - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain - Performance and Reliability is hard to predict Redis or Memcache (Key/Value) Application Neo4j (GraphDB) Oracle (RDBMS) ETL MongoDB (DocDB)
2nd Generation NoSQL is Multi-Model
What s Multi-Model DBMS? Key/Value Document Graph Object Multi Model represents the intersection of multiple models in just one product
What s Multi-Model DBMS? Multi Model represents the intersection of multiple models in just one product Key/Value - Just one product to learn and maintain - Just one vendor relationship to manage - No ETL, Document no synchronization required Graph - Performance and Reliability is easy to test from the beginning Object
Relationships give data meaning Jill (Customer) Commodore Amiga 1200 (Product) (Makes) (Has) Order #134 (Order) (Has) (Sells) Luca (Provider) Bruno (Provider) (Sells) (Has) Monitor 40 (Product) (Sells) 3 Wheel Mouse (Product)
Multi-Model domain schema Actor name: string surname: string Legenda: V Vertex Edge Inherits Makes Customer Provider Order number: int date: datetime Sells price: decimal Has price: decimal Product name: string qty: int
Vertices and Edges are Documents Jill { } @rid": 12:382, @class": Customer", name : Jill, surname : Raggio, ` phone : +39 33123212, details : { city : London", tags : millennial } Makes Order General purpose solution: JSON Schema-less Schema-full Schema-hybrid Nested documents Rich indexing and querying Developer friendly
Polymorphic queries SELECT * FROM Customer Jill (Customer) SELECT * FROM Provider Bruno (Provider) Luca (Provider) SELECT * FROM Actor Bruno (Provider) Jill (Customer) Luca (Provider)
Multi-Model complex domains schema Legenda: V Vertex MusicTaste Likes Account Edge Inherits Band Genre Performs Plays Location
Multi-Model complex domains Jill (Account) (Likes) Indie (Genre) (Plays) (Likes) Snow Patrol (Band) (Likes) Luca (Account) (Likes) Rock (Genre) 123, 1st Street Austin, TX (Location) (Performs) April 7, 2015 9pm-11.30pm
Multi-Model Database Quadrant Graph Multi-Model Relationships Complexity > Key Value Column Relational Document Data Complexity >
Multi-Model Solutions
There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine. The Graph is only a layer on top of the engine. Under the hood they do JOINs, which means traversal time is affected by database size.
Meet OrientDB The First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs
With a true Graph, Document, Key/Value and Object Oriented engine
OrientDB features FEATURES ORIENTDB)) MONGODB NEO4J MYSQL) (RDBMS) Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X
DEMO
API & Standards Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API SQL + extensions for graphs JDBC driver to connect any BI tool HTTP/JSON support Drivers in Java, Node.js, Python, PHP,.NET, Perl, C/C++ and more
Availability and Integrity C C C C C C C Master Node Multi-master Replication Master Node Atomic, Consistent, Isolated and Durable (ACID) multi-statement transactions
Scalability and Performance C C C C C C C Master Node Master Node Auto- Discovered Node Multi-Master Replication, Sharding and Auto- Discovery to Simplify Ops +200k Tps on Commodity Hardware
Some numbers 70+ Committers contributing to the product 1000s Users from SMBs to Fortune 10 Companies. 50,000 Downloads per Month from 200+ countries. 17+ Years of Research have been put in the product
A Bright Future Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3 rd fastest growing category
Some of Our Customers
Get Started for Free OrientDB Community Edition is FREE for any purpose (Apache 2 license) OrientDB Enterprise is Free for Development Udemy Getting Started Training is and Free http://www.orientechnologies.com/getting-started
Thank you. Ask your questions on Twitter for the Big Data Panel using #QCONBIGDATA Luca Garulli @lgarulli www.orientdb.com