Graph Database Performance: An Oracle Perspective



Similar documents
Mining Big Data with RDF Graph Technology:

Oracle Spatial and Graph

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Oracle Graph: Graph Features of Oracle Database

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Geospatial Technology Innovations and Convergence

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Introduction to Ontologies

Geospatial Platforms For Enabling Workflows

Geospatial Platforms For Enabling Workflows

Oracle Big Data SQL Technical Update

Big Data for Official Statistics Processing Big and Fast Data Optimizing Results with a Multi-Model Database

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Semantic and Data Mining Technologies. Simon See, Ph.D.,

bigdata Managing Scale in Ontological Systems

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

A collaborative platform for knowledge management

Comparison of Triple Stores

Deploying a Geospatial Cloud

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Bigdata : Enabling the Semantic Web at Web Scale

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Oracle Database 12c Plug In. Switch On. Get SMART.

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Big Data Technologies Compared June 2014

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

Oracle Database Public Cloud Services

The Semantic Web for Application Developers. Oracle New England Development Center Zhe Wu, Ph.D. 1

Safe Harbor Statement

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

TUT NoSQL Seminar (Oracle) Big Data

LDIF - Linked Data Integration Framework

Oracle Database 11g Comparison Chart

Oracle Database In-Memory The Next Big Thing

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

Federated Query Processing over Linked Data

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Publishing Linked Data Requires More than Just Using a Tool

Oracle Big Data Building A Big Data Management System

SUN ORACLE EXADATA STORAGE SERVER

Oracle Business Intelligence EE. Prab h akar A lu ri

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

HadoopRDF : A Scalable RDF Data Analysis System


Cost-Effective Business Intelligence with Red Hat and Open Source

Inge Os Sales Consulting Manager Oracle Norway

Ampersand and the Semantic Web

Data Store Interface Design and Implementation

Semantic Stored Procedures Programming Environment and performance analysis

2009 Oracle Corporation 1

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

An Oracle White Paper February Managing Unstructured Data with Oracle Database 11g

Safe Harbor Statement

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

SQL Server 2012 Business Intelligence Boot Camp

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

1st Threat and Risk Information Exchange Day

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Main Memory Data Warehouses

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

The Ontological Approach for SIEM Data Repository

Is there any alternative to Exadata X5? March 2015

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

RDF Support in Oracle Oracle USA Inc.

SQL Server 2005 Features Comparison

Introducing Oracle Exalytics In-Memory Machine

ORACLE DATABASE 10G ENTERPRISE EDITION

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Integrating Open Sources and Relational Data with SPARQL

<Insert Picture Here> Oracle Retail Data Model Overview

SUN ORACLE DATABASE MACHINE

Using OBIEE for Location-Aware Predictive Analytics

Emerging Geospatial Trends The Convergence of Technologies. Jim Steiner Vice President, Product Management

Database Scalability and Oracle 12c

Accelerating and Simplifying Apache

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

Constructing a Data Lake: Hadoop and Oracle Database United!

Transcription:

Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Program Agenda Broad Perspective on Performance Graph Technology Enhancements at Oracle Performance: Database 11g Concluding Topics / Discussion 2 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

A Broad Perspective on Performance & Features Hardware: - Microprocessor-specific graph optimizations - Disc based storage Database: - RDF and NDM graph models, SPARQL language, optimizer, query engine, text search Big Data Appliance: - RDF for NoSQL; HBase Middleware: - Jena,Sesame adapters; Protégé plug-in, Cytoscape plug-in, graph API Tools / Applications: - Oracle Business Intelligence, BPMN 3 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Spatial and Graph option Two Graph Data Models Network Data Model graph Manages logical / spatial networks in database Persists link/node structure, connectivity and direction Supports constraints at link and node level Logically partitioning network graphs for scalability RDF Semantic graph (triple store) Enterprise class RDF Graph Database Scales to petabytes of triples by exploiting Exadata, RAC, SQL*Loader, Parallelism, Label Security W3C standards support SQL, PL/SQL APIs and Java APIs (Jena/Sesame) 4 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

RDF Graph (Triple Store) Use Cases Semantic Metadata Layer Text Mining & Entity Analytics Social Media Analysis Unified content metadata for federated resources Validate semantic and structural consistency Find related content & relations by navigating connected entities Reason across entities Analyze social relations using curated metadata - Blogs, wikis, video - Calendars, IM, voice 5 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Metadata driving Federation & Integration Domain applications Domain Vocabularies (integrated graph metadata) Index Data Servers Lab/clinical Research Content Mgmt Billings/Claims Reporting/BI Data Sources / Data Types Social Media Medical Devices Lab Information Systems Subscription Services Legacy Records 6 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Industries Have Already Adopted the Concept Industries Life Sciences Finance Media / Publishing Networks & Communications Defense & Intelligence Public Sector Hutchinson 3G Austria Thomson Reuters 7 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

RDF Semantic Graph Technologies Partners: Integrated Tools and Solution Providers Ontology Engineering Reasoners TrOWL NLP Entity Extractors Open Source Frameworks Standards Joseki Sesame Applications & Tools SI / Consulting 8 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

RDF DATABASE FEATURES 9 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Database 11g RDF Triple Store Scalable to billions of triples RAC & Exadata scalability Compression & partitioning SQL*Loader direct path load Parallel load, inference, query High Availability Triple-level label security Choice of SPARQL or SQL Native inference engine Growing ecosystem of 3 rd party tools Load / Storage Query Reasoning Key Capabilities: Native RDF graph data store Manages billions of triples Optimized storage architecture SPARQL-Jena/Joseki, Sesame SQL/graph query, b-tree indexing Ontology assisted SQL query RDFS, OWL2 RL, EL+, SKOS User-defined SWRL-like rules Incremental, parallel reasoning Plug-in architecture 10 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

"THE FOLLOWING IS INTENDED TO OUTLINE OUR GENERAL PRODUCT DIRECTION. IT IS INTENDED FOR INFORMATION PURPOSES ONLY, AND MAY NOT BE INCORPORATED INTO ANY CONTRACT. IT IS NOT A COMMITMENT TO DELIVER ANY MATERIAL, CODE, OR FUNCTIONALITY, AND SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISION. THE DEVELOPMENT, RELEASE, AND TIMING OF ANY FEATURES OR FUNCTIONALITY DESCRIBED FOR ORACLE'S PRODUCTS REMAINS AT THE SOLE DISCRETION OF ORACLE." 11 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

New functions in Oracle Database Release 12.1 Native SPARQL 1.1 query support 40+ new query functions/operators: IF, COALESCE, STRBEFORE, REPLACE, ABS, Aggregates: COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, SAMPLE Sub-queries Value Assignment: BIND, GROUP BY Expressions, SELECT Expressions Negation: NOT EXISTS, MINUS Improved Path Searching with Property Paths GeoSPARQL Support - Leverages native spatial database feature in Oracle - Provide foundation for qualitative spatial reasoning 12 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

New functions in Oracle Database Release 12.1 RDF views on relational tables (through W3C RDB2RDF) RDF views can be created on a set of relational tables and/or views SPARQL queries access data from both a relational and RDF store Allows filtering of data in a relational store based upon ontology Support RDF view creation using Direct Mapping: simple and straightforward to use R2RML Mapping: customizations allowed 13 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

New functions in Oracle Database Release 12.1 Inference Native OWL 2 EL inference support Useful for expressing large biomedical ontologies (SNOMED CT) User defined inferencing Allows generation of new RDF resources Temporal reasoning, Spatial reasoning Ladder Based Inference Fine grained security for inference graph Performance optimization for user defined rules Integration with TrOWL*, an external OWL 2 reasoner TrOWL is a transformation based, tractable reasoner for OWL 2 Pellet was supported in 11g 14 Copyright 2012, Oracle and/or its affiliates. All rights reserved. * http://trowl.eu/

RDF & SPARQL for Oracle NoSQL Database RDF Graph Feature for NoSQL RDF support in Oracle NoSQL Database Enterprise Edition High performance Key Value store Standard access to graph data: SPARQL 1.1 Jena & Joseki SPARQL endpoint Web Services Massive horizontal scalability petabytes of triples Support for World Wide Web Consortium (W3C) Semantic Web standards 15 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

When to Consider a NoSQL Database For horizontal scalability, lower query latency/cost, ease of install & management RDF Graph Feature for NoSQL Scale-out requirements High volume, simple queries Queries aggregating over most of the graph (e.g. what are the hobbies of the 100 most popular people in the network) Frequent, large-scale updates Open Linked Data applications 16 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

LUBM PERFORMANCE ORACLE DATABASE 11G - LOAD - INFERENCE - QUERY 17 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Spatial and Graph - LUBM 200K on 3-Node RAC Sun Server X2-4 Load Performance Data Set Quads Loaded Time Degrees of Parallelism LUBM200K Load into Staging Table: Load into the RDF graph: 27.4 billion Quads (with duplicates) 26.6 billion Quads (unique quads) 2 hrs 6 min. 22 hrs 23 min. DOP = 66 DOP = 80 Data loading included de-duplication and building of two indexes on the quads. A significant portion (11 hrs 18 minutes) of the total load time was spent in building the two indexes. Loading from the 198 compressed N-Quad formatted files was done by defining an External Table (with gunzip preprocessor) on those files and then using sem_apis.load_into_staging_table Load flags => parse mbv_method=shadow parallel=80 parallel_create_index DEL_BATCH_DUPS=USE_INSERT Setup: Hardware: Sun Server X2-4, 3-node RAC - Each node configured with 1TB RAM, 4 CPU 2.4GHz 10-Core Intel E7-4870) - Storage: Dual Node 7420, both heads configured as: Sun ZFS Storage 7420 4 CPU 2.00GHz 8-Core (Intel E7-4820) 256G Memory 4x SSD SATA2 512G (READZ) 2x SATA 500G 10K. Four disk trays with 20 x 900GB disks @10Krpm, 4x SSD 73GB (WRITEZ) Software: Oracle Database 11.2.0.3.0, SGA_TARGET=750G and PGA_AGGREGATE_TARGET=200G Note: Only one node in this RAC was used for performance test. Test performed in April 2013. 18 Copyright 2012, 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

Oracle Spatial and Graph - LUBM 200K on 3-Node RAC Sun Server X2-4 Inference Performance Data Set (# quads) Quads Inferred Time Degrees of Parallelism LUBM 200K (27.4B) 21.4 billion 17 hrs 56 min. DOP = 80 Inference included building 2 indexes on the inferred triples that took a little over 5 hrs. Inference Semantics: OWLPrime + the following components: INTERSECT, INTERSECTSCOH, SVFH, THINGH, THINGSAM, UNION Inference Options: RAW8=T, Dynamic Sampling level 1 Setup: Hardware: Sun Server X2-4, 3-node RAC - Each node configured with 1TB RAM, 4 CPU 2.4GHz 10-Core Intel E7-4870) - Storage: Dual Node 7420, both heads configured as: Sun ZFS Storage 7420 4 CPU 2.00GHz 8-Core (Intel E7-4820) 256G Memory 4x SSD SATA2 512G (READZ) 2x SATA 500G 10K. Four disk trays with 20 x 900GB disks @10Krpm, 4x SSD 73GB (WRITEZ) Software: Oracle Database 11.2.0.3.0, SGA_TARGET=850G and PGA_AGGREGATE_TARGET=150G Note: Only one node in this RAC was used for performance test. Test performed in April 2013. 19 Copyright 2012, 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

Oracle Spatial and Graph - LUBM 200K on 3-Node RAC Sun Server X2-4 Query Performance Ontology LUBM 200K 48B quads 27.4 billion asserted quads 26.6 billion inferred quads LUBM Benchmark Queries Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 OWLPrime & new inference components # answers 4 494.5M 6 34 719 2.067B 67 Time (sec) 0.01 1160 0.01 609.22 0.04 1105.07 712.48 Query Q8 Q9 Q10 Q11 Q12 Q13 Q14 # answers 7790 53.86M 4 224 15 926088 1.568B Time (sec) 1228.95 3139.28 0.01 0.01 1.2 208.88 946.01 DOP = 40, Dynamic sampling level = 6. 4.18 Billion answers generated in 2.53 hrs on a single node. Setup: Hardware: Sun Server X2-4, 3-node RAC - Each node configured with 1TB RAM, 4 CPU 2.4GHz 10-Core Intel E7-4870) - Storage: Dual Node 7420, both heads configured as: Sun ZFS Storage 7420 4 CPU 2.00GHz 8-Core (Intel E7-4820) 256G Memory 4x SSD SATA2 512G (READZ) 2x SATA 500G 10K. Four disk trays with 20 x 900GB disks @10Krpm, 4x SSD 73GB (WRITEZ) Software: Oracle Database 11.2.0.3.0, SGA_TARGET=850G and PGA_AGGREGATE_TARGET=150G Note: Only one node in this RAC was used for performance test. Test performed in April 2013. 20 Copyright 2012, 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

USING RDF GRAPHS FOR MINING SOCIAL MEDIA 21 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle In-Database Analytics Platform R Enterprise Data Mining Spatial Analytics Graph Analytics Text Analytics Temporal Analytics Parallel Processing Engine XML Relational OLAP Spatial Data Layer Graph RDF Media 22 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Tools: Discovery & Predictive Analysis Oracle Data Mining Anomaly Detection Problem Classification Sample Problem Given demographic data about a set of customers, identify customer purchasing behavior that is significantly different from the norm Association Rules Find the items that tend to be purchased together and specify their relationship market basket analysis Clustering Segment demographic data into clusters and rank the probability that an individual will belong to a given cluster Feature Extraction Given demographic data about a set of customers, group the attributes into general characteristics of the customers F1 F2 F3 F4 23 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Finance Data: Visualizing RDF in OBIEE 24 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Charting RDF data: Oracle R Graphics 25 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Charting RDF data: Oracle R Graphics (2) 26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27 Copyright 2012, Oracle and/or its affiliates. All rights reserved. CONCLUDING DISCUSSION TOPICS

Some topics to consider Excellent work identifying customer RDF pain-points!! Challenge: translating to repeatable database benchmarks Pre-processing, loading, inferencing, querying Keep options open for explanatory benchmarks Hardware, database, middleware, applications Better definition of graph models LDBC is evaluating RDF and graph models. Please define each carefully Distinguishing the two graphs via best practices and use cases might be useful 28 Copyright 2012, Oracle and/or its affiliates. All rights reserved.