BBM467 Data Intensive ApplicaAons



Similar documents
BBM467 Data Intensive ApplicaAons

BBM467 Data Intensive ApplicaAons

Graph Databases Mean Business

A Little Graph Theory for the Busy Developer. Dr. Jim Webber Chief Scientist, Neo

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

BBM467 Data Intensive ApplicaAons

Powering Recommendations with a Graph Database

Database Design Patterns. Winter Lecture 24

Visualizing a Neo4j Graph Database with KeyLines

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL and Graph Database

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

Graph Database Proof of Concept Report

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

How graph databases started the multi-model revolution

Lecture Data Warehouse Systems

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 2,

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Graph Databases: Neo4j

Graph Databases. Ian Robinson, Jim Webber, and Emil Eifrem

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1

A1 and FARM scalable graph database on top of a transactional memory layer

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Modeling and mining large scale biological seman0c networks using NEO4J

How To Improve Performance In A Database

1-Oct 2015, Bilbao, Spain. Towards Semantic Network Models via Graph Databases for SDN Applications

Big Data Analytics. Rasoul Karimi

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall. Objectives

Your Master Data Is a Graph: Are You Ready?

INTRODUCTION TO CASSANDRA

Big Systems, Big Data

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Graph Databases. Prad Nelluru, Bharat Naik, Evan Liu, Bon Koo

REAL-TIME BIG DATA ANALYTICS

NoSQL. Thomas Neumann 1 / 22

Domain driven design, NoSQL and multi-model databases

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Cloud Computing and Advanced Relationship Analytics

CSC/ECE 574 Computer and Network Security. What Is PKI. Certification Authorities (CA)

10/6/2015 PKI. What Is PKI. Certificates. Certification Authorities (CA) PKI Models. Certificates

NoSQL Databases. Polyglot Persistence

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Big Data Management and Analytics

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Database Security. The Need for Database Security

Big Data Database Revenue and Market Forecast,

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

White Paper: Big Data and the hype around IoT

Conceptual Design Using the Entity-Relationship (ER) Model

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Extending Data Processing Capabilities of Relational Database Management Systems.

The Sierra Clustered Database Engine, the technology at the heart of

IC05 Introduction on Networks &Visualization Nov

NoSQL Databases. Nikos Parlavantzas

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

An Introduction to KeyLines and Network Visualization

Introduction to Inbound Marketing

Ibrahim Sallam Director of Development

Exploring Big Data in Social Networks

Scaling Up HBase, Hive, Pegasus

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Understanding Neo4j Scalability

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

How To Make A Route Map On Bpg More Efficient

2.1.5 Storing your application s structured data in a cloud database

Business Case Development for Credit and Debit Card Fraud Re- Scoring Models

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

How To Make A Trustless Certificate Authority Secure

Performance Management of SQL Server

Clustering through Decision Tree Construction in Geology

PostgreSQL Concurrency Issues

Simulating a File-Sharing P2P Network

Cassandra A Decentralized Structured Storage System

Big Graph Processing: Some Background

Data Modeling for Big Data

Infrastructures for big data

Hadoop Ecosystem B Y R A H I M A.

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Outline. What is Big data and where they come from? How we deal with Big data?

Hierarchical Data Visualization

Transcription:

Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr

Why Graphs? Why now? Big Data is the trend! NOSQL is the answer.

Everyone is Talking About Graphs Facebook Open Graph Google Knowledge Graph Twi7er Interest Graph How things are connected!

What is a Graph? Formally, a graph is just a collecaon of ver$ces and edges. Or, in less inamidaang language, a set of nodes and the rela$onships that connect them. Graphs represent enaaes as nodes and the ways in which those enaaes relate to the world as relaaonships.

Kinds of Graphs

Kinds of Graphs

Graphs are Everywhere Graphs are extremely useful in understanding a wide diversity of datasets in fields such as science, government, and business. For example, Twi7er s data is easily represented as a graph.

RelaAonal Databases Lack RelaAonships

RelaAonal Databases Lack RelaAonships Foreign key constraints add addiaonal development and maintenance overhead just to make the database work. Several expensive joins are needed just to discover what a customer bought. Reciprocal queries are even more costly. What products did a customer buy? is relaavely cheap compared to which customers bought this product?, which is the basis of recommendaaon systems.

Recording RelaAonships in RDBMs Bob s friends SELECT p1.person FROM Person p1 JOIN PersonFriend ON PersonFriend.FriendID = p1.id JOIN Person p2 ON PersonFriend.PersonID = p2.id WHERE p2.person = 'Bob' Who is friends with Bob? SELECT p1.person FROM Person p1 JOIN PersonFriend ON PersonFriend.PersonID = p1.id JOIN Person p2 ON PersonFriend.FriendID = p2.id WHERE p2.person = 'Bob'

Recording RelaAonships in RDBMs Things might get complicated! Alice s friends-of-friends SELECT p1.person AS PERSON, p2.person AS FRIEND_OF_FRIEND FROM PersonFriend pf1 JOIN Person p1 ON pf1.personid = p1.id JOIN PersonFriend pf2 ON pf2.personid = pf1.friendid JOIN Person p2 ON pf2.friendid = p2.id WHERE p1.person = 'Alice' AND pf2.friendid <> p1.id

Graphs Embrace RelaAonships SQL/NOSQL examples have dealt with implicitly connected data. Users infer semanac dependencies between enaaes, but the data models and the databases themselves are blind to these connecaons. We want a cohesive picture of the whole, including the connecaons between elements. In contrast to the SQL/NOSQL data stores we looked at before, in the graph world, connected data is stored as connected data.

What is a Graph Database? A graph database is NOT for charts & diagrams, or vector artwork J It is for storing data that is structured as a graph. A rela$onal database may tell you the average age of everyone in this place, but a graph database will tell you who is most likely to buy you a beer.

Neo4J Neo4J is a NOSQL Graph Database Reliable with ACID transacaons Scalable: 32 Billion nodes h7p://neo4j.org

A Graph Contains Nodes and RelaAonships A Graph records data in Nodes which have ProperAes

A Graph Contains Nodes and RelaAonships Nodes are organized by RelaAonships which also have ProperAes RelaAonships organize Nodes into arbitrary structures, allowing a Graph to resemble a List, a Tree, a Map, or a compound EnAty. Any of which can be combined into yet more complex, richly inter- connected structures.

A Graph Contains Nodes and RelaAonships Nodes are grouped by Labels into Sets Labels are a means of grouping the nodes in the graph. They can be used to restrict queries to subsets of the graph, as well as enabling opaonal model constraints and indexing rules.

Query a Graph with Traversal A Traversal navigates a Graph; it idenafies Paths which order Nodes A Traversal is how you query a Graph, navigaang from starang Nodes to related Nodes according to an algorithm. Finding answers to quesaons like what music do my friends like that I don t yet own, or if this power supply goes down, what web services are affected?

Indexes Look- up Nodes or RelaAonships An Index maps from ProperAes to either Nodes or RelaAonships Ooen, you want to find a specific Node or RelaAonship according to a Property it has. Rather than traversing the enare graph, use an Index to perform a look- up, for quesaons like find the Account for username master- of- graphs.

A Graph Database Stores Data in a Graph A Graph Database manages a Graph and also manages related Indexes OpAmized for graph structures instead of tables. Your applicaaon gets all the expressiveness of a graph, with all the dependability you expect out of a database.

Graph Databases Embrace RelaAonships

Graph Databases Embrace RelaAonships In this social network, the connecaons between enaaes don t exhibit uniformity across the domain. A social network is a popular example of a densely connected, semi- structured network. It resists being captured by a one- size- fits- all schema. The flexibility of the graph model has allowed us to add new nodes and new rela$onships without compromising the exisang network or migraang data

Graph Databases Embrace RelaAonships The graph offers a much richer picture of the network. We can see who LOVES whom (and whether that love is requited). We can see who is a COLLEAGUE_OF of whom. Who is BOSS_OF them all. We can see who s off the market, because they re MARRIED_TO someone else.

Graph vs. RelaAonal An RDBMS is opamized for aggregated data. Graph database is opamized for highly connected data.

Graph vs. Key- Value Store A Key- Value model is great for lookups of simple values or lists. When the values are themselves interconnected, you ve got a graph. K* represents a key, V* a value. Note that some keys point to other keys as well as plain values.

Graph vs. Document Store The container hierarchy of a document database accommodates nice, schema- free data that can easily be represented as a tree. Which is of course a graph. Refer to other documents (or document elements) within that tree and you have a more expressive representaaon of the same data. When in a graph database, those relaaonships are easily navigable. D=Document, S=Subdocument, V=Value, D2/S2 = reference to subdocument in (other) document.

How to Query the Graph Database? Cypher: A graph query language Pa7ern- matching query language DeclaraAve grammar with clauses (like SQL) AggregaAon, ordering, limits Create, read, update, delete

Example Query Who are John s friends? MATCH (john {name: 'John'})-[:friend]->()-[:friend]->(fof) RETURN john, fof Query Results :

EnAty- RelaAonship Model for TV Show Database

Create the Graph CREATE (himym:tvshow { name: "How I Met Your Mother" }) CREATE (himym_s1:season { name: "HIMYM Season 1" }) CREATE (himym_s1_e1:episode { name: "Pilot" }) CREATE (ted:character { name: "Ted Mosby" }) CREATE (joshradnor:actor { name: "Josh Radnor" }) CREATE UNIQUE (joshradnor)-[:played_character]->(ted) CREATE UNIQUE (himym)-[:has_season]->(himym_s1) CREATE UNIQUE (himym_s1)-[:has_episode]->(himym_s1_e1) CREATE UNIQUE (himym_s1_e1)-[:featured_character]->(ted) CREATE (himym_s1_e1_review1 { title: "Meet Me At The Bar In 15 Minutes & Suit Up, content: "It was awesome" }) CREATE (wakenpayne:user { name: "WakenPayne" }) CREATE (wakenpayne)-[:wrote_review]->(himym_s1_e1_review1)<- [:HAS_REVIEW]-(himym_s1_e1)

Inside the Graph Database

Performance of Dealing Connected Data RelaAonships in a graph naturally form paths. Querying or traversing the graph involves following paths. Because of the fundamentally path- oriented nature of the data model, the majority of path- based graph database operaaons are highly aligned with the way in which the data is laid out, making them extremely efficient.

Performance of Dealing with Connected Data Partner and VukoAc s experiment (Graph Databases from O Reilly) seeks to find friends- of- friends in a social network, to a maximum depth of five. Given any two persons chosen at random, is there a path that connects them that is at most five relaaonships long? For a social network containing 1,000,000 people, each with approximately 50 friends.

When to Use Graph Databases? There are two properaes of graph databases you should consider when invesagaang graph database technologies: The underlying storage: Some graph databases use na$ve graph storage that is opamized and designed for storing and managing graphs. Some serialize the graph data into a relaaonal database, an object- oriented database, or some other general- purpose data store. The processing engine: Some definiaons require that a graph database use index- free adjacency, meaning that connected nodes physically point to each other in the database.

Acknowledgement - 1 The course material used for this class is mostly taken and/or adopted* from the course materials of the Big Data class given by Nesime Tatbul and Donald Kossmann at ETH Zurich (h7p://www.systems.ethz.ch/). (*) Original course material is reduced somehow to fit the needs of BBM467. Therefore, original slides were not used as they are.

Acknowledgement - 2 Some material used for this lecture is taken and/or adopted from h7p://docs.neo4j.org/chunked/milestone/index.html Max De Marzi. Michael Hunger from Neo Technology.