GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1



Similar documents
A Comparison of Current Graph Database Models

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

How graph databases started the multi-model revolution

NoSQL and Graph Database

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Introduction to NOSQL

Domain driven design, NoSQL and multi-model databases

Objectivity positions graph database as relational complement to InfiniteGraph 3.0

Preparing Your Data For Cloud

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Graph Data Management

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Databases 2 (VU) ( )

The Current State of Graph Databases

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Big Data Analytics. Rasoul Karimi

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

NoSQL and Hadoop Technologies On Oracle Cloud

Understanding Neo4j Scalability

1-Oct 2015, Bilbao, Spain. Towards Semantic Network Models via Graph Databases for SDN Applications

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

MongoDB. An introduction and performance analysis. Seminar Thesis

Using Object Database db4o as Storage Provider in Voldemort

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

InfiniteGraph: The Distributed Graph Database

Lecture Data Warehouse Systems

Challenges for Data Driven Systems

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

White Paper: Big Data and the hype around IoT

Choosing The Right Big Data Tools For The Job A Polyglot Approach

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Open Source Technologies on Microsoft Azure

Database Scalability {Patterns} / Robert Treat

An Approach to Implement Map Reduce with NoSQL Databases

INTRODUCTION TO CASSANDRA

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Practical Cassandra. Vitalii

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

these three NoSQL databases because I wanted to see a the two different sides of the CAP

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Review of Graph Databases for Big Data Dynamic Entity Scoring

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Performance investigation of selected SQL and NoSQL databases

2.1.5 Storing your application s structured data in a cloud database

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL for SQL Professionals William McKnight

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NoSQL Databases. Nikos Parlavantzas

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

Review: Graph Databases

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

A Brief Study of Open Source Graph Databases

How To Improve Performance In A Database

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Cassandra A Decentralized, Structured Storage System

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

NoSQL Data Base Basics

A Practical Approach to Process Streaming Data using Graph Database

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

NOSQL DATABASES IN EEG/ERP

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 2,

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

How to Choose Between Hadoop, NoSQL and RDBMS

Slave. Master. Research Scholar, Bharathiar University

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Legal. Copyright 2016 Magento, Inc.; All Rights Reserved.

Visualizing a Neo4j Graph Database with KeyLines


On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Big Data Technologies Compared June 2014

Graph Database Proof of Concept Report

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

Comparing SQL and NOSQL databases

NOSQL DATABASES AND CASSANDRA

Enterprise Operational SQL on Hadoop Trafodion Overview

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

The NoSQL Generation: Embracing the Document Model. May 2014

AllegroGraph. a graph database. Gary King gwking@franz.com

How To Scale Out Of A Nosql Database

NoSQL in der Cloud Why? Andreas Hartmann

Transcription:

GRAPH DATABASE SYSTEMS h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1

Use Case: Route Finding Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 2

Use Case: Logistics Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 3

Use Case: Social Network Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 4

Use Case: Recommendations Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 5

Graph Database: Example Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 6

Graph Database Systems: Definitions A graph database is a database whose data model conforms to some form of graph (...) structure. The graph data model usually consists of nodes (...) and (...) edges (...), where the nodes represent concepts (...) and the edges represent relationships ( ) between these concepts ( ). Encyclopedia of Database Systems (Springer) A graph database management system (henceforth, a graph database) is an online database management system with Create, Read, Update, and delete (CRUD) methods that expose a graph data model Robinson/Webber/Eifrem A graph database is any storage system that provides index-free adjacency. Rodriquez h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 7

Relational vs. Graph (Cont.) Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 8

Relational vs. Graph (Cont.) Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 9

Relational vs. Graph (Cont.) Example: Social Network path exists Performance a sample social graph with ~1,000 persons average 50 friends per person pathexists(a,b) limited to depth 4 caches warmed up to eliminate disk I/O # persons query time Relational DBMS 1,000 2,000 ms Neo4j 1,000 2 ms Neo4j 1.000,000 2ms Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 10

Graph Databases Are good for Highly connected data (social networks) Recommendations (e-commerce) Path Finding (how do I know you?) A* (Least Cost path) If you ve ever Joined more than 7 tables together Modeled a graph in a table Tried to write some crazy stored procedure with multiple recursive self and inner joins you should use a graph database h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 11

Graph Database Management Systems First graph database: R&M (1975) Until 2005: development of different graph models and (research) graph libraries and graph database Since 2005: development of new graph databases (commercial and open source) AllegroGraph (2005) DEX ( Sparksee), Neo4j (2006) Tinkerpop Graph Processing Stack (2007) VertexDB, Pregel (Google)(2008) HyperGraphDB, InfiniteGraph, (sones Graph DB), Filament, Horton (Microsoft), (2009) CloudGraph, Trinity, FlockDB (Twitter), OrientDB, (2010) Titan, ArangoDB, Sqrrl, Giraph (Apache) (2012) h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 12

Graph Database Management Systems (Cont.) Graph Database Management Systems differ strongly in Supported graph data structures Data storing features Operation and manipulation features Query Features Schema and integrity constraints Support for essential graph queries Further reading: Renzo Angles: A Comparison of Current Graph Database Models, 3rd Int. Workshop on Graph Data Management: Techniques and applications (GDM 2012), 5 April, Washington DC, USA h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 13

Graph Database Management Systems (Cont.) Popularity Source: http://db-engines.com/en/ranking_trend/graph+dbms 34.03 5.93 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 14

Graph Database Systems Topics of Interest Graph Data Models Graph Query Languages (not discussed here) Graph Storage and Indexing Scalability, Availability and Consistency h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 15

Literature Edlich, S., Friedland, A., Hampe, J., Brauer, B., Brückner, M. NoSQL Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken, Carl Hanser Verlag, 2011 (2 nd ed.) Robinson, I., Webber, J., Eifrem, E. Graph Databases, O Reilly, 2015 (2 nd ed.) free e-book on http://graphdatabases.com/ h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 16

Introduction Graph Database Systems Graph Data Models Graph Storage and Indexing Scalability, Availability and Consistency h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 17

Graph Data Models Orthogonal graph characteristics Directed vs. undirected Simple vs. multi Weighted vs. unweighted Unlabeled vs. edge-labeled vs. vertex-labeled Property Graph Model Most popular graph data model in graph databases today (Tinkerpop, InfiniteGraph, InfoGrid, Neo4j etc.) Directed labeled multigraph Edge properties: key/value pairs h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 18

Property Graph Data Model Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 19

Property Graph Data Model Nodes Entities Relationships Connect entities and structure domain Properties Attributes and metadata (Labels) Group nodes by role h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 20

Property Graph Data Model: Nodes Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 21

Property Graph Data Model: Relationships Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 22

Property Graph Data Model: Relationships Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 23

Property Graph Data Model: Labels Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 24

Property Graph Data Model Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 25

Graph Data Modeling Easy to design and model direct representation of the model Whiteboard Friendlyness Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 26

Graph Data Modeling (Cont.) Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 27

Graph Data Modeling (Cont.) Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 28

Graph Data Modeling: Best Practice Nodes for Things Relationships for Structure Represent Complex value types as nodes Iterative and incremental development Test-driven data model development h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 29

Graph Data Modelling: Cross-Domain Models Source: Robinson/Webber/Eifrem: 2013 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 30

Graph Data Modelling: Cross-Domain Models Source: Robinson/Webber/Eifrem: 2013 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 31

Graph Data Modelling: Cross-Domain Models theatrical domain literary domain geospatial domain Source: Robinson/Webber/Eifrem: 2013 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 32

Graph Databases and Schema Most graph databases are schema-less However, first graph databases started to introduce schema constructs Example: Neo4j 2.0 Necessary requirement: Labels Unique Constraints: Unique constraints do not mean that all nodes have to have a unique value for the properties nodes without the property are not subject to this rule. CREATE CONSTRAINT / DROP CONSTRAINT CREATE CONSTRAINT ON (book:book) ASSERT book.isbn IS UNIQUE h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 33

Graph Data Models Further graph data model approaches Hypergraphs with hyperedges (e.g. HyperGraphDB, sonesgraphdb) Related approaches RDF Triples (subject, predicate, object) Standardized (W3C) Optimized for reasoning However, e.g. AllegroGraph: RDF store and graph database Further reading for graph data models: Edlich, S., Friedland, A., Hampe, J., Brauer, B., Brückner, M. NoSQL Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken, Carl Hanser Verlag, 2011 (2nd ed.) h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 34

Graph Databases Introduction Graph Data Models Graph Storage and Indexing Scalability, Availability and Consistency h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 35

Graph Storage Adjacency Matrix vs. Adjacency List Advantages? Disadvantages? Source: Edlich et al.:2011 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 36

Graph Storage: Example Concrete Example: Neo4j Architecture Source: Robinson/Webber/Eifrem: 2015 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 37

Graph Storage: Example Concrete Example: Neo4j Physical Storage of a Graph Source: Robinson/Webber/Eifrem: 2015 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 38

Graph Storage: Example (Cont.) Concrete Example: Neo4j File Record Structure Source: Robinson/Webber/Eifrem: 2015 h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 39

Graph Indexing Do we really need indexes in graph databases? Graphs are their own indexes! But sometimes we want short-cuts to well-known nodes Indexes for efficient lookup of specific properties of nodes or relationships AND: Indexes for uniqueness constraints! (properties of nodes or relationships) h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 40

Index data structures B-Trees Hash Indexes Quadtrees R-Trees Graph Indexing (Cont.) Some graph databases use search engines like Apache Lucene as index backend Supports exact and regex-based matching Supports scoring Number of hits in the index for a given item Great for recommendations! h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 41

Graph Databases Introduction Graph Data Models Graph Storage and Indexing Scalability, Availability and Consistency h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 42

Graph Databases: Scalability, Availability and Consistency Example: Neo4J Transactions ACID Write-Ahead logging Replication Master-Slave Replication (asynchronous!) Neo4j also supports writing through slaves the slave first ensures that it is consistent with the master thereafter, the write is synchronously transacted across both instances. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 43

Graph Database Systems: Conclusion Graph Database Systems: Pros & Cons Strengths Fast, for connected data Whiteboard friendly, agile development Weaknesses Global Queries / Number Crunching Binary Data / Blobs Requires conceptual shift No standardization yet h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 44

Graph Database Systems: Outlook Standardization?! Integration of graph database features and native graph storage in RDBMS? RDF already supported in IBM DB2 and Oracle What will be next? Polyglot Persistence?! h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 45

Big Data Technologies Introduction NoSQL Database Systems Column Store Database Systems In-Memory Database Systems Graph Database Systems Conclusion & Outlook h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 46