Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr
Why Graphs? Why now? Big Data is the trend! NOSQL is the answer.
Everyone is Talking About Graphs Facebook Open Graph Google Knowledge Graph Twi7er Interest Graph How things are connected!
What is a Graph? Formally, a graph is just a collecaon of ver$ces and edges. Or, in less inamidaang language, a set of nodes and the rela$onships that connect them. Graphs represent enaaes as nodes and the ways in which those enaaes relate to the world as relaaonships.
Kinds of Graphs
Kinds of Graphs
Graphs are Everywhere Graphs are extremely useful in understanding a wide diversity of datasets in fields such as science, government, and business. For example, Twi7er s data is easily represented as a graph.
RelaAonal Databases Lack RelaAonships
RelaAonal Databases Lack RelaAonships Foreign key constraints add addiaonal development and maintenance overhead just to make the database work. Several expensive joins are needed just to discover what a customer bought. Reciprocal queries are even more costly. What products did a customer buy? is relaavely cheap compared to which customers bought this product?, which is the basis of recommendaaon systems.
Recording RelaAonships in RDBMs Bob s friends SELECT p1.person FROM Person p1 JOIN PersonFriend ON PersonFriend.FriendID = p1.id JOIN Person p2 ON PersonFriend.PersonID = p2.id WHERE p2.person = 'Bob' Who is friends with Bob? SELECT p1.person FROM Person p1 JOIN PersonFriend ON PersonFriend.PersonID = p1.id JOIN Person p2 ON PersonFriend.FriendID = p2.id WHERE p2.person = 'Bob'
Recording RelaAonships in RDBMs Things might get complicated! Alice s friends-of-friends SELECT p1.person AS PERSON, p2.person AS FRIEND_OF_FRIEND FROM PersonFriend pf1 JOIN Person p1 ON pf1.personid = p1.id JOIN PersonFriend pf2 ON pf2.personid = pf1.friendid JOIN Person p2 ON pf2.friendid = p2.id WHERE p1.person = 'Alice' AND pf2.friendid <> p1.id
Graphs Embrace RelaAonships SQL/NOSQL examples have dealt with implicitly connected data. Users infer semanac dependencies between enaaes, but the data models and the databases themselves are blind to these connecaons. We want a cohesive picture of the whole, including the connecaons between elements. In contrast to the SQL/NOSQL data stores we looked at before, in the graph world, connected data is stored as connected data.
What is a Graph Database? A graph database is NOT for charts & diagrams, or vector artwork J It is for storing data that is structured as a graph. A rela$onal database may tell you the average age of everyone in this place, but a graph database will tell you who is most likely to buy you a beer.
Neo4J Neo4J is a NOSQL Graph Database Reliable with ACID transacaons Scalable: 32 Billion nodes h7p://neo4j.org
A Graph Contains Nodes and RelaAonships A Graph records data in Nodes which have ProperAes
A Graph Contains Nodes and RelaAonships Nodes are organized by RelaAonships which also have ProperAes RelaAonships organize Nodes into arbitrary structures, allowing a Graph to resemble a List, a Tree, a Map, or a compound EnAty. Any of which can be combined into yet more complex, richly inter- connected structures.
A Graph Contains Nodes and RelaAonships Nodes are grouped by Labels into Sets Labels are a means of grouping the nodes in the graph. They can be used to restrict queries to subsets of the graph, as well as enabling opaonal model constraints and indexing rules.
Query a Graph with Traversal A Traversal navigates a Graph; it idenafies Paths which order Nodes A Traversal is how you query a Graph, navigaang from starang Nodes to related Nodes according to an algorithm. Finding answers to quesaons like what music do my friends like that I don t yet own, or if this power supply goes down, what web services are affected?
Indexes Look- up Nodes or RelaAonships An Index maps from ProperAes to either Nodes or RelaAonships Ooen, you want to find a specific Node or RelaAonship according to a Property it has. Rather than traversing the enare graph, use an Index to perform a look- up, for quesaons like find the Account for username master- of- graphs.
A Graph Database Stores Data in a Graph A Graph Database manages a Graph and also manages related Indexes OpAmized for graph structures instead of tables. Your applicaaon gets all the expressiveness of a graph, with all the dependability you expect out of a database.
Graph Databases Embrace RelaAonships
Graph Databases Embrace RelaAonships In this social network, the connecaons between enaaes don t exhibit uniformity across the domain. A social network is a popular example of a densely connected, semi- structured network. It resists being captured by a one- size- fits- all schema. The flexibility of the graph model has allowed us to add new nodes and new rela$onships without compromising the exisang network or migraang data
Graph Databases Embrace RelaAonships The graph offers a much richer picture of the network. We can see who LOVES whom (and whether that love is requited). We can see who is a COLLEAGUE_OF of whom. Who is BOSS_OF them all. We can see who s off the market, because they re MARRIED_TO someone else.
Graph vs. RelaAonal An RDBMS is opamized for aggregated data. Graph database is opamized for highly connected data.
Graph vs. Key- Value Store A Key- Value model is great for lookups of simple values or lists. When the values are themselves interconnected, you ve got a graph. K* represents a key, V* a value. Note that some keys point to other keys as well as plain values.
Graph vs. Document Store The container hierarchy of a document database accommodates nice, schema- free data that can easily be represented as a tree. Which is of course a graph. Refer to other documents (or document elements) within that tree and you have a more expressive representaaon of the same data. When in a graph database, those relaaonships are easily navigable. D=Document, S=Subdocument, V=Value, D2/S2 = reference to subdocument in (other) document.
How to Query the Graph Database? Cypher: A graph query language Pa7ern- matching query language DeclaraAve grammar with clauses (like SQL) AggregaAon, ordering, limits Create, read, update, delete
Example Query Who are John s friends? MATCH (john {name: 'John'})-[:friend]->()-[:friend]->(fof) RETURN john, fof Query Results :
EnAty- RelaAonship Model for TV Show Database
Create the Graph CREATE (himym:tvshow { name: "How I Met Your Mother" }) CREATE (himym_s1:season { name: "HIMYM Season 1" }) CREATE (himym_s1_e1:episode { name: "Pilot" }) CREATE (ted:character { name: "Ted Mosby" }) CREATE (joshradnor:actor { name: "Josh Radnor" }) CREATE UNIQUE (joshradnor)-[:played_character]->(ted) CREATE UNIQUE (himym)-[:has_season]->(himym_s1) CREATE UNIQUE (himym_s1)-[:has_episode]->(himym_s1_e1) CREATE UNIQUE (himym_s1_e1)-[:featured_character]->(ted) CREATE (himym_s1_e1_review1 { title: "Meet Me At The Bar In 15 Minutes & Suit Up, content: "It was awesome" }) CREATE (wakenpayne:user { name: "WakenPayne" }) CREATE (wakenpayne)-[:wrote_review]->(himym_s1_e1_review1)<- [:HAS_REVIEW]-(himym_s1_e1)
Inside the Graph Database
Performance of Dealing Connected Data RelaAonships in a graph naturally form paths. Querying or traversing the graph involves following paths. Because of the fundamentally path- oriented nature of the data model, the majority of path- based graph database operaaons are highly aligned with the way in which the data is laid out, making them extremely efficient.
Performance of Dealing with Connected Data Partner and VukoAc s experiment (Graph Databases from O Reilly) seeks to find friends- of- friends in a social network, to a maximum depth of five. Given any two persons chosen at random, is there a path that connects them that is at most five relaaonships long? For a social network containing 1,000,000 people, each with approximately 50 friends.
When to Use Graph Databases? There are two properaes of graph databases you should consider when invesagaang graph database technologies: The underlying storage: Some graph databases use na$ve graph storage that is opamized and designed for storing and managing graphs. Some serialize the graph data into a relaaonal database, an object- oriented database, or some other general- purpose data store. The processing engine: Some definiaons require that a graph database use index- free adjacency, meaning that connected nodes physically point to each other in the database.
Acknowledgement - 1 The course material used for this class is mostly taken and/or adopted* from the course materials of the Big Data class given by Nesime Tatbul and Donald Kossmann at ETH Zurich (h7p://www.systems.ethz.ch/). (*) Original course material is reduced somehow to fit the needs of BBM467. Therefore, original slides were not used as they are.
Acknowledgement - 2 Some material used for this lecture is taken and/or adopted from h7p://docs.neo4j.org/chunked/milestone/index.html Max De Marzi. Michael Hunger from Neo Technology.