Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible



Similar documents
Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Semantic Interoperability

excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

We have big data, but we need big knowledge

OSLC Primer Learning the concepts of OSLC

bigdata Managing Scale in Ontological Systems

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

Geospatial Platforms For Enabling Workflows

12 The Semantic Web and RDF

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Semantic Web Success Story

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Grids, Logs, and the Resource Description Framework

The Ontology and Architecture for an Academic Social Network

An industry perspective on deployed semantic interoperability solutions

Graph Database Performance: An Oracle Perspective

A generic approach for data integration using RDF, OWL and XML

Network Graph Databases, RDF, SPARQL, and SNA

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Module I: Overview of Semantic Technologies and the Semantic Web

Geospatial Platforms For Enabling Workflows

RDF y SPARQL: Dos componentes básicos para la Web de datos

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Connecting the Smithsonian American Art Museum to the Linked Data Cloud

Automating Cloud Service Level Agreements using Semantic Technologies

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Using RDF Metadata To Enable Access Control on the Social Semantic Web

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

Geospatial Technology Innovations and Convergence

Publishing Linked Data Requires More than Just Using a Tool

How To Build A Cloud Based Intelligence System

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

NoSQL and Graph Database

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

RDF Resource Description Framework

Open source business rules management system

Data Validation with OWL Integrity Constraints

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Evaluating SPARQL-to-SQL translation in ontop

Amit Sheth & Ajith Ranabahu, Presented by Mohammad Hossein Danesh

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

The Ontological Approach for SIEM Data Repository

DISCOVERING RESUME INFORMATION USING LINKED DATA

An Ontological Approach to Oracle BPM

Industry 4.0 and Big Data

Integrating Open Sources and Relational Data with SPARQL

Experiences from a Large Scale Ontology-Based Application Development

SMART Apps. Rob Tweed M/Gateway Developments

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

DC Proposal: Automation of Service Lifecycle on the Cloud by Using Semantic Technologies

Introduction to Service Oriented Architecture (SOA)

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

HybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

Secure and Semantic Web of Automation

Lecture 2: Storing and querying RDF data

Introduction to Ontologies

Lightweight Data Integration using the WebComposition Data Grid Service

Business Intelligence Extensions for SPARQL

In Memory Accelerator for MongoDB

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Open Data Integration Using SPARQL and SPIN

Case Study: Semantic Integration as the Key Enabler of Interoperability and Modular Architecture for Smart Grid at Long Island Power Authority (LIPA)

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Linked Statistical Data Analysis

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

Completing the Big Data Ecosystem:

STAR Semantic Technologies for Archaeological Resources.

Deploying a Geospatial Cloud

From Spark to Ignition:

Reference Architecture, Requirements, Gaps, Roles

A Framework for Collaborative Project Planning Using Semantic Web Technology

Oracle WebLogic Server 11g Administration

Developing Web 3.0. Nova Spivak & Lew Tucker Tim Boudreau

Transcription:

Taming Big Data Variety with Semantic Graph Databases Evren Sirin CTO Complexible

About Complexible Semantic Tech leader since 2006 (née Clark & Parsia) software, consulting W3C leadership Offices in DC & Boston Launched Stardog 1.0 in 2012 Currently raising A Round

Big Data Vs Volume Velocity Variety Veracity Volatility Value

Data diversity is the real challenge Based on Paradigm4 survey of more than 100 data scientists http://www.paradigm4.com/infographic2014/

Data Variety Syntax: Formats Structure: Schemas Identity: Entities https://www.flickr.com/photos/designmilk/8552219138

In large and complex enterprises with lots of data, most analytic challenges can be reduced to data integration challenges.

Data integration approaches Integrated data Data warehouses Sweet spot Data lakes Integration effort

What is a Unified Data Model? Global coherent view over heterogenous data flexible and composable at the right level of abstraction enabling automated processing and analysis

Data models Tables Trees Graphs

Data models Tables Trees Graphs

Data models Tables Trees Graphs

Data models Tables Trees Graphs https://commons.wikimedia.org/wiki/file:social_network_analysis_visualization.png

Graphs are everywhere Knowledge Graph Open Graph Linked Open Data

Why graphs? Generic data representation model Utilize connectedness of the data Flexible and extensible Easy to compose and connect Increasing number of graph database offerings

not Why graphs? Generic data representation model Utilize connectedness of the data Flexible and extensible Easy to compose and connect Increasing number of graph database offerings

Semantic graphs = RDF graphs Meaning is defined in an explicit and machine-processable way

Abstract Graph http://www.w3.org/tr/rdf11-primer/

RDF Graph http://www.w3.org/tr/rdf11-primer/

RDF Serialization 01 BASE <http://example.org/> 02 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 03 PREFIX xsd: <http://www.w3.org/2001/xmlschema#> 04 PREFIX schema: <http://schema.org/> 05 PREFIX dcterms: <http://purl.org/dc/terms/> 06 PREFIX wd: <http://www.wikidata.org/entity/> 07 08 <bob#me> 09 a foaf:person ; 10 foaf:knows <alice#me> ; 11 schema:birthdate "1990-07-04"^^xsd:date ; 12 foaf:topic_interest wd:q12418. 13 14 wd:q12418 15 dcterms:title "Mona Lisa" ; 16 dcterms:creator <http://dbpedia.org/resource/leonardo_da_vinci>. 17 18 <http://data.europeana.eu/item/04802/243fa8618938f4117025f17a8b813c5f9aa4d619> 19 dcterms:subject wd:q12418. http://www.w3.org/tr/rdf11-primer/ SPARQL Query PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX schema: <http://schema.org/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dbpedia: <http://dbpedia.org/resource/> SELECT?person?title WHERE {?person a foaf:person ; schema:birthdate?birthdate ; foaf:topic_interest?interest.?interest dcterms:title?title ; dcterms:creator dbpedia:leonardo_da_vinci. FILTER (?birthdate < "1991-01-01"^^xsd:date ) }

Schema - Ontology rdfs:subclassof Agent rdfs:subclassof worksfor owl:inverseof hasemployee Person Organization rdfs:range rdf:type rdf:type rdf:type rdf:type Bob worksfor ACME hasemployee

Tables to graphs (R2RML) http://www.w3.org/tr/rdb2rdf-ucr/

RDF Models Interoperable - No vendor lock-in Actionable - Run queries against it Expressive - Multiple views over same data Reusable - By different apps in other domains

Model-based data integration

Stardog: Semantic Graph Database The leading RDF database Pure Java: any JVM language, full REST bindings Client-server, embedded, middleware modes Rich feature set ACID Transactions, High Availability, Hot backup/restore, JMX server monitoring, Access & Audit logging, RBAC security model, LDAP integration, SPARQL 1.1 queries, OWL 2 Reasoning, Proof trees, Integrity constraints, Full-text search, Geospatial support, Virtual graphs, Provenance support Supports property graphs (Tinkerpop)

Single-node Scalability Scale up to 50B triples on modest hardware 32 cores, 256 GB RAM, 2 x 7200RPM HDDs, < $10K cost Load rates up to 500k triples/second That s 100M triples in 3 min, 1B in 30 min, and 20B in 20 hours Best-of-breed query answering performance Query 100M triples with a throughput of 3M+ queries/hour, 1B at 500k queries/hour, and 10B at 20k queries/hour (BSBM, 64 clients)

Stardog for Big Data HDFS-backed storage Horizontal partitioning of data Advanced query planner and optimization Parallel query execution with async messaging Coming in version 5 (2016)

Big Data computations Integration with Apache Spark Run Spark jobs over integrated view of the data PageRank, machine learning algos,... Different ways to expose RDF data in Spark RDD[Triple] RDD[SPARQLResults]

Thanks!