excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own



Similar documents
Introduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases

Introduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases

Network Graph Databases, RDF, SPARQL, and SNA

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Semantic Web Success Story

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Cray: Enabling Real-Time Discovery in Big Data

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

DISTRIBUTED RDF QUERY PROCESSING AND REASONING FOR BIG DATA / LINKED DATA. A THESIS IN Computer Science

We have big data, but we need big knowledge

Drupal.

The Ontology and Architecture for an Academic Social Network

12 The Semantic Web and RDF

VIVO Dashboard A Drupal-based tool for harvesting and executing sophisticated queries against data from a VIVO instance

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

RDF y SPARQL: Dos componentes básicos para la Web de datos

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Semantic Interoperability

An Efficient and Scalable Management of Ontology

Comparison of Triple Stores

Integrating Open Sources and Relational Data with SPARQL

Using RDF Metadata To Enable Access Control on the Social Semantic Web

LDIF - Linked Data Integration Framework

OWL: Path to Massive Deployment. Dean Allemang Chief Scien0st, TopQuadrant Inc.

Semantic Stored Procedures Programming Environment and performance analysis

Reputation Network Analysis for Filtering

YarcData urika Technical White Paper

Semantic Web Standard in Cloud Computing

Fast Innovation requires Fast IT

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

Graph Database Performance: An Oracle Perspective

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Client Overview. Engagement Situation. Key Requirements

LiDDM: A Data Mining System for Linked Data

Federated Data Management and Query Optimization for Linked Open Data

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

SPARQL UniProt.RDF. Get these slides! Tutorial plan. Everyone has had some introduction slash knowledge of RDF.

Data Store Interface Design and Implementation

Standards for Big Data in the Cloud

DISCOVERING RESUME INFORMATION USING LINKED DATA

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

Federated Query Processing over Linked Data

Introduction to Ontologies

Developing Web 3.0. Nova Spivak & Lew Tucker Tim Boudreau

An Enhanced Visualization Service based on Geospatial and Statistical Linked Open Data

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Data Publishing with DaPaaS

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Serendipity a platform to discover and visualize Open OER Data from OpenCourseWare repositories Abstract Keywords Introduction

Additional mechanisms for rewriting on-the-fly SPARQL queries proxy

Semantic Web. Prof. Dr. Steffen Staab Dipl.-Med.Inf. Bernhard Tausch. Steffen Staab ISWeb Lecture Semantic Web

SPARQL By Example: The Cheat Sheet

Experiences from a Large Scale Ontology-Based Application Development

Grids, Logs, and the Resource Description Framework

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

SPARQL: Un Lenguaje de Consulta para la Web

Extending the Linked Data API with RDFa

Bigdata Model And Components Of Smalldata Structure

Industry 4.0 and Big Data

A Big Picture for Big Data

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

Modelling «Base Bibliotek» as Linked Data

A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.

How To Make Sense Of Data With Altilia

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

Scalable and Reactive Programming for Semantic Web Developers

bigdata Managing Scale in Ontological Systems

Publishing Relational Databases as Linked Data

High-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store

Using Big Data in Healthcare

A generic approach for data integration using RDF, OWL and XML

Transcription:

Steve Reinhardt 2

The urika developers are extending SPARQL s excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own algorithms. 3

How I viewed graph analytics while working on KDT How I view graph analytics while working on urika Extending SPARQL for global operations 4

Former Mental Model for Graph Analytics Knowledge Discovery Workflow 1. Cull relevant data 2. Build input graph 3. Analyze input graph Graph Analysis is the core of the workflow Input methods and formats are ad hoc 4. Visualize result graph Modest heterogeneity of data Complex algorithms are high value memory Data filtering technologies KDT Graph viz engine

Current Mental Model for Graph Analytics 1. Extract and convert data to RDF 2. Ingest into database Extracting, representing, and converting data as a graph is hard The data (both vertices and edges) in the graph will be highly heterogeneous 3. Complex algorithms are high value 4. Reasoning Query about data is an Visualize important database capability results The RDF/SPARQL community is investing heavily to make those technologies do steps 1, 2, and part of 3 very well How can we build on that investment to make graph analytics more widely usable? 6

RDF / SPARQL RDF/SPARQL Interpreter SPARQL query engine ( endpoint ) x86 Service Nodes Threadstorm processors 7

Resource Description Framework (RDF) Designed to enable a) semantic web searching and b) integration of disparate data sources W3C standard formats Every datum represented as subject/predicate/object(/graph) Ideally with each of those expressed with a URI Standard ontologies in some domains e.g., SnoMED/CT for clinical medical terms Example: <ncbitax:ncbitaxon_8401> rdf:type owl:class. <ncbitax:ncbitaxon_1954> rdfs:subclassof <ncbitax:ncbitaxon_185881>. <ncbitax:ncbitaxon_8681> rdfs:label "Characiformes sp. BOLD:AAG5151"@en. <http://cs.org/pdns#2> cs:hasquerydomain <http://cs.org/domain#ns1.secure.net>. nf:netflowrecord_2 nf:hasdstaddr <http://netflow.org/nf#ipaddress_255.255.23.18>. 8

Student type SPARQL Protocol and RDF Query Language (SPARQL) Enables matching of graph patterns advisor in the semantic DB Reminiscent of SQL teacherof Minimal ability to do structural or global operations on a graph Faculty # Lehigh University BenchMark (LUBM) Query 9 PREFIX == shorthand for a URI PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#> SELECT?X,?Y,?Z variables to be returned from WHERE the query {?X rdf:type ub:student.?y rdf:type ub:faculty.?z rdf:type ub:course. find each student who took a?x ub:advisor?y. course from her advisor?y ub:teacherof?z.?x ub:takescourse?z. } X type Y takescourse Z type Course 9

Choose only vertices representing people, and edges between those vertices representing phone calls and text messages during the last hour Calculate BC for the vertices in the graph created by those vertices and edges # in KDT def vfilter(self, vtypes): return self.type in vtypes def efilter(self, etypes, stime, etime): return (self.type in etypes) and ((self.stime > stime) and (self.etime < etime)) wantedvtypes = (People) wantedetypes = (PhoneCall, TextMessage) start = dt.now() dt.timedelta(hours=1) end = dt.now() bc = G.centrality( BC,filter=((vfilter, wantedvtypes),(efilter, wantedetypes, start, end))) 10

# in SPARQL PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX comm: <http://www.comm.org#> # a fictional communication ontology SELECT?p1,?p2 WHERE {?p1 rdf:type foaf:person.?p2 rdf:type foaf:person. {?call rdf:type comm:phonecall. } UNION {?call rdf:type comm:textmessage.}?call comm:hascaller?p1.?call comm:hascallee?p2.?call comm:hastime?time FILTER (?time > xsd:datetime("2012-07-11t09:30:00z") ) } # # Here, need to do betweenness centrality, but no good interface # 11

Choose people via a more complicated pattern, e.g. only Students from LUBM Query 9 # in KDT <<No way to do this today>> def vfilter(self, vtypes): return self.type in vtypes def efilter(self, etypes, stime, etime): return (self.type in etypes) and ((self.stime > stime) and (self.etime < etime)) wantedvtypes = (People) wantedetypes = (PhoneCall, TextMessage) start = dt.now() dt.timedelta(hours=1) end = dt.now() bc = G.centrality( BC,filter=((vfilter, wantedvtypes),(efilter, wantedetypes, start, end))) 12

# in SPARQL PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX comm: <http://www.comm.org#> # a fictional communication ontology SELECT?x,?call WHERE {?X rdf:type ub:student.?y rdf:type ub:faculty.?z rdf:type ub:course.?x ub:advisor?y.?y ub:teacherof?z.?x ub:takescourse?z. {?call rdf:type comm:phonecall. } UNION {?call rdf:type comm:textmessage.}?call comm:hascaller?x.?call comm:hascallee?p2.?call comm:hastime?time FILTER (?time > xsd:datetime("2012-07-11t09:30:00z") ) } # # Here, need to do betweenness centrality, but no good interface # 13

SPARQL already has the idea of global graph operations SELECT?var1?var2 WHERE { } ORDER BY?var?var2 or SELECT?var1?var2 WHERE { } GROUP BY?var?var2 we just need to denote the right function, inputs, and outputs One possibility: property functions Can be defined externally just as any other RDF object (Obviously) needs to point to a function that can be executed within the SPARQL endpoint Syntax (?in1?in2 ) <function-name> (?out1?out2 ) 14

# in SPARQL PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX comm: <http://www.comm.org#> # a fictional communication ontology PREFIX ga: <http://www.graphanalytics.org#> # graph-analytic operation identifiers SELECT?X,?call,?bc WHERE {?X rdf:type ub:student.?y rdf:type ub:faculty.?z rdf:type ub:course.?x ub:advisor?y.?y ub:teacherof?z.?x ub:takescourse?z. {?call rdf:type comm:phonecall. } UNION {?call rdf:type comm:textmessage.} {?call comm:hascaller?x. } UNION {?call comm:hascallee?x. } } ORDER BY (?call?x) ga:bc (?bc?x) 15

The set of functions will be extensible (Still need to point to function(s) that can execute within the SPARQL endpoint!) APIs will be provided so that researchers can implement their own functions Think CombBLAS or Mex SPARQL extended with global graph ops provides a mechanism for graph analytic researchers to test their algorithms on real data at scale If you re interested in using this interface, please contact me 16

RDF / SPARQL Extension functions RDF/SPARQL Interpreter SPARQL query engine ( endpoint ) x86 Service Nodes Threadstorm processors 17

A challenge to focus attention on big data graph analytics with societal value Solutions via RDF and SPARQL Submissions open now until Sep 15, 2012 Criteria: importance and complexity of the problem, scalability and performance of solution, innovativeness of solution $70K first prize, $13K, $8K, $3K, $3K, $3K http://yarcdata.com/graph analytic challenge.html 18

The urika developers are extending SPARQL s excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own algorithms. 19

SPARQL by Example tutorial, by Lee Feigenbaum, http://www.cambridgesemantics.com/2008/09/sparql by example/ Search RDF data with SPARQL, by Philip McCarthy http://www.ibm.com/developerworks/xml/library/j sparql/ Semantic Web for the Working Ontologist, by Dean Allemang and James Hendler, ISBN 978 0123859655 Learning SPARQL, by Bob DuCharme, O Reilly, ISBN 978 1 449 30659 5 RDF: www.w3.org/rdf/ SPARQL: www.w3.org/tr/rdf sparql query/ 20