NoSQL Drill-Down: So What s a Graph Database? NoCOUG Aug 2013. Philip Rathle Sr. Director of Products for Neo4j philip@neotechnology.



Similar documents
Graph Databases Use Cases

The World s Leading Graph Database

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

Oracle Big Data SQL Technical Update

Introduction to NOSQL

Lecture Data Warehouse Systems

REAL-TIME BIG DATA ANALYTICS

NoSQL Databases. Polyglot Persistence

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

InfiniteGraph: The Distributed Graph Database

Study concluded that success rate for penetration from outside threats higher in corporate data centers

How graph databases started the multi-model revolution

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL Databases. Nikos Parlavantzas

Big Data Analytics. Rasoul Karimi

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 30,

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Sentimental Analysis using Hadoop Phase 2: Week 2

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

So What s the Big Deal?

NoSQL and Graph Database

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NoSQL Database Options

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 2,

Cloud Scale Distributed Data Storage. Jürmo Mehine

Data Services Advisory

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

Open Source Technologies on Microsoft Azure

Structured Data Storage

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Querying MongoDB without programming using FUNQL

Introduction in Graph Databases and Neo4j

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

MySQL Enterprise Monitor

How To Scale Out Of A Nosql Database

Enterprise Operational SQL on Hadoop Trafodion Overview

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 16,

White Paper: Big Data and the hype around IoT

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

How To Handle Big Data With A Data Scientist

Data Warehousing in the Age of Big Data

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

DCIM Software and IT Service Management - Perfect Together DCIM: The Physical Heart of ITSM

Informatica Best Practice Guide for Salesforce Wave Integration: Building a Global View of Top Customers

Oracle Big Data Strategy Simplified Infrastrcuture

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Cisco Prime Optical. Overview

Introduction to Big Data Training

Challenges for Data Driven Systems

Performance and Scalability Overview

Understanding Neo4j Scalability

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Eliminating Complexity to Ensure Fastest Time to Big Data Value

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

NoSQL Data Base Basics

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

Eliminating Complexity to Ensure Fastest Time to Big Data Value

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Data Integration Checklist

The Quest for Extreme Scalability

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Solution Guide for Citrix NetScaler and Cisco APIC EM

Speed, scale, query: can NoSQL give us all three? Arun Matthew Couchbase

Master in Information Technology for Business Intelligence. Subject: Advanced Databases. Project Name: (NEO4J)-[:IS A]->( GRAPH DATABASE)

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

MongoDB Developer and Administrator Certification Course Agenda

GigaSpaces Real-Time Analytics for Big Data

MDM and Data Warehousing Complement Each Other

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Best Practices from Deployments of Oracle Enterprise Operations Monitor

NoSQL: Going Beyond Structured Data and RDBMS

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect

SAP IT Infrastructure Management

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

How To Write A Database Program

The Impact of PaaS on Business Transformation

XpoLog Center Suite Data Sheet

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Cloudera Enterprise Data Hub in Telecom:

Luncheon Webinar Series May 13, 2013

Big Data and the Cloud Trends, Applications, and Training

Ganzheitliches Datenmanagement

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Transcription:

NoSQL Drill-Down: So What s a Graph Database? NoCOUG Aug 2013 Philip Rathle Sr. Director of Products for Neo4j philip@neotechnology.com @prathle

143 Philip 143 326 326 725 Big Data Fremont Neo4j San Francisco 143 725 143 981 981 NoCOUG Member Member_Group Group

143 Philip 143 326 326 725 Big Data Fremont Neo4j San Francisco 143 725 143 981 981 NoCOUG

A Property Graph Nodes member uid: FRE where: Big Data Fremont uid: PPR name: Philip member uid: SFO where: Neo4j San Francisco member uid: BOS where: NoCOUG Relationships

Domain-Driven Design Perspective Put forth by Martin Fowler... Aggregate oriented Graph mongo Source: NoSQL Distilled 7

NOSQL - 4 Categories ABK name:andreas, UID: ABK, Photo: B75DD108A893A MEMBER since: 2011 NI name:neo4j India, UID: PNQ, Photo: 218758D88E901 Graph DB Neo4j 0x235C 0xCD21 0x235C 0xCD21 {name:andreas, UID: ABK, Groups: [PNQ,SFO,BOS]} {name:neo4j India, UID: PNQ, Members:[ABK,RB,NL], where:{city:pune, Country: India}} Name UID Members Groups Photo Andreas ABK PNQ, SFO, BOS Neo4j India NI ABK,RB, NL B75DD108A893A 218758D88E901 Document DB MongoDB CouchDB Column Family HBase Cassandra 0x235C 0xCD21 0x2014 0x3821 0x3890 Andreas Neo4j India [ABK,RB,NL] [PNQ, SFO, BOS] B75DD108A Oracle NoSQL Riak Redis Key-Value

Graph data model name: James age: 32 twitter: @spam LIVES WITH LOVES LOVES name: Mary age: 35 OWNS property type: car DRIVES brand: Volvo model: V70

Graph data model type: Mobile model: iphone 5 IMEI: 99 000107 765315 1 AUTHORIZES AUTHENTICATES TRANSMITS_DATA type: BTS Height_m: 9.8 Power: 400A Backup_Generator: Y ASSIGNED_TO device type: Test CONNECTS_TO type: Trunk mbps_capacity:5000 type: Central Office CLLI Code: PTLEORTEDS0

RECRUITED Morpheus RECRUITED Neo Trinity MATCH (a) -[:RECRUITED]-> (b) WHERE a.name = Morpheus RETURN b.name AS recruit Neo Technology, Inc Confidential +---------+ recruit +---------+ Neo Trinity +---------+ 2 rows

Facebook Graph Search Example

MATCH (me:person)-[:is_friend_of]->(friend), (friend)-[:likes]->(restaurant), (restaurant)-[:located_in]->(city:location), (restaurant)-[:serves]->(cuisine:cuisine) WHERE me.name = 'Philip' AND city.location='new York' AND cuisine.cuisine='sushi' RETURN restaurant.name http://maxdemarzi.com/?s=facebook * Cypher query language example

MATCH (me:person)-[:is_friend_of]->(friend)-[:likes]->(restaurant)- [:LOCATED_IN]->(city:Location), (restaurant)-[:serves]->(cuisine:cuisine) WHERE me.name = 'Philip' AND city.location='new York' AND cuisine.cuisine='sushi' RETURN restaurant.name http://maxdemarzi.com/?s=facebook * Cypher query language example

A Few Uses of Graphs in Industry Organizational Hierarchy (Actual Neo4j Graphs) Social Networks Customer & Employee Product Subscriptions CMDB (Network Inventory) Neo Technology, Inc Confidential

A Few Uses of Graphs in Industry Entitlements & Identity Management (Actual Neo4j Graphs) Insurance Risk Analysis Geo Routing (Public Transport) Network Cell Analysis Network Asset Management BioInformatics Neo Technology, Inc Confidential

What is a Graph Database A graph database... is an online database management system with CRUD methods that expose a graph data model 1 Two important properties: Native graph storage engine: written from the ground up to manage graph data Native graph processing, including index-free adjacency to facilitate traversals 1] Robinson, Webber, Eifrem. Graph Databases. O Reilly, 2013. p. 5. ISBN-10: 1449356265

Neo4j: Key Features 1. Property Graph Model Nodes & relationships each comprise a set of key-value pairs In this respect, relationships & nodes have equal status 2. Supports Native Graph Processing Supports Index-Free Adjacency, meaning that traversals are done through pointer chasing vs. indexed lookups Has its own native graph query language 3. Uses Native Graph Storage Built from ground up to support graph 4. Fully ACID*. All writes use transactions. XA support *Tunable consistency across cluster instances (eventual to strong) 5. High Availability Features incl. Clustering & Online Backups

Top Reasons People Use Graph Databases 1. Lots of Joins. 2. Continuously evolving data set (often involves wide and sparse tables) 3. The Shape of the Domain is naturally a graph 4. Open-ended business requirements necessitating fast, iterative development. Neo Technology, Inc Confidential

Graph Databases are Designed to: 1. Store inter-connected data 2. Make it easy to make sense of that data 3. Enable extreme-performance operations for: Discovery of connected data patterns Relatedness queries > depth 1 Relatedness queries of arbitrary length 4. Make it easy to evolve the database Neo Technology, Inc Confidential

Graph Database Deployment End User Ad-hoc visual navigation & discovery Graph- Dashboards & Ad-hoc Analysis Reporting Graph Visualization Application Other Databases Bulk Analytic Infrastructure (e.g. Graph Compute Engine) Graph Mining & Aggregation ETL Ad-Hoc Analysis Graph Database Cluster Data Storage & Business Rules Execution ETL Data Scientist

Network Management Example

Industry: Communications Use case: Network Management Paris, France Background Second largest communications company in France Part of Vivendi Group, partnering with Vodafone Router Router DEPENDS_ON LINKED DEPENDS_ON Switch DEPENDS_ON Service LINKED DEPENDS_ON Switch DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON LINKED DEPENDS_ON DEPENDS_ON Fiber Link Fiber Link Fiber Link Oceanfloor Cable DEPENDS_ON Business problem Infrastructure maintenance took one full week to plan, because of the need to model network impacts Needed rapid, automated what if analysis to ensure resilience during unplanned network outages Identify weaknesses in the network to uncover the need for additional redundancy Network information spread across > 30 systems, with daily changes to network infrastructure Business needs sometimes changed very rapidly Solution & Benefits Flexible network inventory management system, to support modeling, aggregation & troubleshooting Single source of truth (Neo4j) representing the entire network Dynamic system loads data from 30+ systems, and allows new applications to access network data Modeling efforts greatly reduced because of the near 1:1 mapping between the real world and the graph Flexible schema highly adaptable to changing business requirements Neo Technology, Inc Confidential

Industry: Web/ISV, Communications Use case: Network Management Global (U.S., France) Background World s largest provider of IT infrastructure, software & services HP s Unified Correlation Analyzer (UCA) application is a key application inside HP s OSS Assurance portfolio Carrier-class resource & service management, problem determination, root cause & service impact analysis Helps communications operators manage large, complex and fast changing networks Business problem Use network topology information to identify root problems causes on the network Simplify alarm handling by human operators Automate handling of certain types of alarms Help operators respond rapidly to network issues Filter/group/eliminate redundant Network Management System alarms by event correlation Solution & Benefits Accelerated product development time Extremely fast querying of network topology Graph representation a perfect domain fit 24x7 carrier-grade reliability with Neo4j HA clustering Met objective in under 6 months

Practical Cypher Network Management - Create CREATE! (crm {name:"crm"}),! (dbvm {name:"database VM"}),! (www {name:"public Website"}),! (wwwvm {name:"webserver VM"}),! (srv1 {name:"server 1"}),! (san {name:"san"}),! (srv2 {name:"server 2"}),! (crm)-[:depends_on]->(dbvm),! (dbvm)-[:depends_on]->(srv2),! (srv2)-[:depends_on]->(san),! (www)-[:depends_on]->(dbvm),! (www)-[:depends_on]->(wwwvm),! (wwwvm)-[:depends_on]->(srv1),! (srv1)-[:depends_on]->(san) Neo Technology, Inc Confidential

Practical Cypher Network Management - Impact Analysis // Server 1 Outage MATCH (n)<-[:depends_on*]-(upstream) WHERE n.name = "Server 1" RETURN upstream upstream {name:"webserver VM"} {name:"public Website"} Neo Technology, Inc Confidential

Practical Cypher Network Management - Dependency Analysis // Public website dependencies MATCH (n)-[:depends_on*]->(downstream) WHERE n.name = "Public Website" RETURN downstream downstream {name:"database VM"} {name:"server 2"} {name:"san"} {name:"webserver VM"} {name:"server 1"} Neo Technology, Inc Confidential

Practical Cypher Network Management - Statistics // Most depended on component MATCH (n)<-[:depends_on*]-(dependent) RETURN n, count(distinct dependent) AS dependents ORDER BY dependents DESC LIMIT 1 n dependents {name:"san"} 6 Neo Technology, Inc Confidential

Social Example

Practical Cypher Social Graph - Create CREATE! (joe:person {name:"joe"}),! (bob:person {name:"bob"}),! (sally:person {name:"sally"}),! (anna:person {name:"anna"}),! (jim:person {name:"jim"}),! (mike:person {name:"mike"}),! (billy:person {name:"billy"}),!! (joe)-[:knows]->(bob),! (joe)-[:knows]->(sally),! (bob)-[:knows]->(sally),! (sally)-[:knows]->(anna),! (anna)-[:knows]->(jim),! (anna)-[:knows]->(mike),! (jim)-[:knows]->(mike),! (jim)-[:knows]->(billy) Neo Technology, Inc Confidential

Practical Cypher Social Graph - Shortest Path MATCH path = shortestpath( (person1)-[:knows*..6]-(person2) ) WHERE person1.name = "Joe"! AND person2.name = "Billy" RETURN path path {start:"13759", nodes:["13759","13757","13756","13755","13753"], length:4, relationships:["101407","101409","101410","101413"], end:"13753"} Neo Technology, Inc Confidential

Practical Cypher Social Graph - Friends of Joe's Friends MATCH (person)-[:knows]-(friend), (friend)-[:knows]-(foaf) WHERE person.name = "Joe" AND NOT(person-[:KNOWS]-foaf) RETURN foaf foaf {name:"anna"} Neo Technology, Inc Confidential

Practical Cypher Social Graph - Common Friends MATCH (person1)-[:knows]-(friend), (person2)-[:knows]-(friend) WHERE person1.name = "Joe" AND person2.name = "Sally" RETURN friend friend {name:"bob"} Neo Technology, Inc Confidential

Industry: Online Job Search Use case: Social / Recommendations Sausalito, CA Background Online jobs and career community, providing anonymized inside information to job seekers KNOWS Person KNOWS WORKS_AT Company Person KNOWS Person WORKS_AT Company Business problem Wanted to leverage known fact that most jobs are found through personal & professional connections Needed to rely on an existing source of social network data. Facebook was the ideal choice. End users needed to get instant gratification Aiming to have the best job search service, in a very competitive market Solution & Benefits First-to-market with a product that let users find jobs through their network of Facebook friends Job recommendations served real-time from Neo4j Individual Facebook graphs imported real-time into Neo4j Glassdoor now stores > 50% of the entire Facebook social graph Neo4j cluster has grown seamlessly, with new instances being brought online as graph size and load have increased Neo Technology Confidential

Industry: Professional Social Network Use case: Social, Recommendations Silicon Valley & France Background World s second-largest professional network (after LinkedIn) 50M members. 30K+ new members daily. Over 400 staff with offices in 12 countries Business problem Business imperative for real-time recommendations: to attract new users and retain existing ones Key differentiator: show members how they are connected to any other member Real-time traversals of social graph not feasible with MySQL cluster. Batch precompute meant stale data. Process taking longer & longer: > 1 week! Solution & Benefits Neo4j solution implemented in 8 weeks with 3 parttime programmers Able to move from batch to real-time: improved responsiveness with up-to-date data. Viadeo (at the time) had 8M members and 35M relationships. Neo4j cluster now sits at the heart of Viadeo s professional network, connecting 50M+ professionals Neo Technology, Inc Confidential

Recommended Reading & Next Steps for Learning About Graphs... Use Promo Code NOCOUG for 50% Off!! Innovate. Share. Connect. San Francisco October 3-4 www.graphconnect.com (graphs)-[:are]->(everywhere) Get the free ebook! www.graphdatabases.com www.neo4j.org