Introduction to Ontologies



Similar documents
Graph Database Performance: An Oracle Perspective

Oracle Spatial and Graph

Network Graph Databases, RDF, SPARQL, and SNA

Mining Big Data with RDF Graph Technology:

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Geospatial Platforms For Enabling Workflows

Semantic Interoperability

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

We have big data, but we need big knowledge

Geospatial Technology Innovations and Convergence

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Comparison of Triple Stores

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

AllegroGraph. a graph database. Gary King gwking@franz.com

Geospatial Platforms For Enabling Workflows

Semantic Web Tool Landscape

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Publishing Linked Data Requires More than Just Using a Tool

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Semantic Stored Procedures Programming Environment and performance analysis

The use of Semantic Web Technologies in Spatial Decision Support Systems

GetLOD - Linked Open Data and Spatial Data Infrastructures

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

OWL: Path to Massive Deployment. Dean Allemang Chief Scien0st, TopQuadrant Inc.

A smart app integrated with a Webbased advisory system for designing and managing grain drying and storage

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Big Data for Official Statistics Processing Big and Fast Data Optimizing Results with a Multi-Model Database

LDIF - Linked Data Integration Framework

Oracle Graph: Graph Features of Oracle Database

RDF Support in Oracle Oracle USA Inc.

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

The Ontological Approach for SIEM Data Repository

STAR Semantic Technologies for Archaeological Resources.

DISCOVERING RESUME INFORMATION USING LINKED DATA

A collaborative platform for knowledge management

Lecture 2: Storing and querying RDF data

Practical Semantic Web and Linked Data Applications

Cray: Enabling Real-Time Discovery in Big Data

Semantic and Data Mining Technologies. Simon See, Ph.D.,

Developing Web 3.0. Nova Spivak & Lew Tucker Tim Boudreau

The Semantic Web for Application Developers. Oracle New England Development Center Zhe Wu, Ph.D. 1

TopBraid Insight for Life Sciences

Towards the Integration of a Research Group Website into the Web of Data

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Analyzing Linked Data tools for SHARK

Grids, Logs, and the Resource Description Framework

Deploying a Geospatial Cloud

Oracle Big Data Strategy Simplified Infrastrcuture

A Comparison of Current Graph Database Models

Andreas Harth, Katja Hose, Ralf Schenkel (eds.) Linked Data Management: Principles and Techniques

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Industry 4.0 and Big Data

12 The Semantic Web and RDF

Linked Statistical Data Analysis

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

bigdata Managing Scale in Ontological Systems

Semantic Web Success Story

A Semantic web approach for e-learning platforms

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

NoSQL and Graph Database

Big Data. Marriage of RDBMS-DWH and Hadoop & Co. Author: Jan Ott Trivadis AG Trivadis. Big Data - Marriage of RDBMS-DWH and Hadoop & Co.

Explorer's Guide to the Semantic Web

Bigdata Model And Components Of Smalldata Structure

Data Store Interface Design and Implementation

Put SPARQL in Your Code: Building Applications with Oracle Semantic Technologies. Xavier Lopez, Ph.D. Zhe Wu, Ph.D. Souripriya Das, Ph.D.

An industry perspective on deployed semantic interoperability solutions

Publishing Relational Databases as Linked Data

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

MarkLogic Semantics in Healthcare and Life Sciences for LIDER COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Natural Language Processing in the EHR Lifecycle

Bigdata : Enabling the Semantic Web at Web Scale

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Objectivity positions graph database as relational complement to InfiniteGraph 3.0

Getting Started with GRUFF

OSLC Primer Learning the concepts of OSLC

Transcription:

Technological challenges Introduction to Ontologies Combining relational databases and ontologies Author : Marc Lieber Date : 21-Jan-2014 BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STU TTGART WIEN 1

AGENDA 1. Introduction to Semantic Web 2. Graph databases / Triple Stores overview Oracle Graph databases Franz Allegrograph 3. Uses cases Novartis Fraud detetion 2

Semantic technologies 1. Semantic technologies generally refers to a broad spectrum of techniques for finding signal in large or complex data sources Link Analysis Distance Pattern Detect anomalies Complex search 3

Ontology Editing and Engineering TopQuadrant TopBraid 4

Semantic Web in Use 1. Industries include: Life Science, Health care and Pharma Energy sector, Oil & Gas Google, Facebook, Linkedin Financial services Digital libraries Libraries & museums Defense & Intelligence Service egovernement Media, Sport (BBC, NFL) Networks & Communication Department Stores (Wallmart) 5

W3C Semantic Web technologies Goes back to few years now Large set of specifications for many application domains RDF, RDFS, OWL, SKOS, SNOMED, etc Google s schema.org initiative to federate the definition of ontologies Ontologies : FOAF (Friend of a Friend) Serialisation in n3 triple, RDF/XML, Turtle or RDFa (XHTML) 6

Graph DBs 1. Graph databases can be split into W3c Semantic Web Databases also named as Triplestores or RDF graphdb General Graph databases; Property Graph and Hypergraph are two main types of General Graph databases (Property Graph Vs. Hypergraph). 2. Triple stores store the relationships between nodes and their properties as triples or quads 3. Property Graphs store the relationships between nodes and the properties of each node separately 4. Some database such as Allegrograph can be considered as a W3c Semantic Web Database and a Property Graph DB since it supports Graph traversals and the W3C SPARQL querying language 7

Property graphs and hypergraphs 1. In a property graph both nodes and links can have properties time 2013-01-01T12:12:12 Lat long 37.30 121.90 ja@franz.com 96777543 email account# amount pays 2000 pays pays pays

Resource Description Framework Graphs URIs are used to identify Resources, entities, relationships, concepts Creates Subject-Property-Object triples Properties of subjects are triples Standarts defined by W3c and OGC (Open Geospatial Consortium ) 9

RDF Triples RDF as core data format Uniform structure to represent data (triples) [subject] [predicate] [object] JFK president of the United States [resource] [property] [value] JFK PresidentOf The United States quad = triple + named graph, quint = quad + technical ID (rowid) use of namespaces to differentiate terms Some are predefined, but you can create your own namespaces <http://www.world.org/celibrity#jfk> <http://www.w3.org/2000/01/rdf-schema#label> "John Fitzgerald Kennedy"^^<http://www.w3.org/2001/XMLSchema#string>. <http://www.world.org/airport#jfk> <http://www.world.org/airport#islocatedin> New York City. 10 Presentation Title Presenter Name Date Subject Business Use Only

Data migration : Where do triples come from? 1. Relational storage ID Name Hiredate Job Salary Deptno 7982 Scott 12-02-1998 Clerk 4800 30 7855 Adams 27-09-2001 Manager 7500 30 2. Equivalent in triples Subject Predicate Object <...emp:7982> rdfs:label Scott xsd:string <...emp:7982> <..HR#Hiredate> 12-02-1998 xsd:date <...emp:7982> <..HR#hasJob> Clerk xsd:string <...emp:7982> <..HR#HasSalary> 4800 xsd:int <...emp:7982> <..HR#worksIn> <...dept:30> <...dept:30> rdfs:label Sales xsd:string 11

Databases Market Overview The database world is changing rapidly NoSQL databases are often used in conjunction with Big Data Graph databases can be split into W3c Semantic Web Databases and others 12

Triple stores comparaison Tripe Stores Scalability (Billion Triples) Query Reasoning support Full text Search support Jena (TDB) up to 1.7 BT SPARQL 1.1 OWL, RDFS Yes (lucene integration) Programming Java Sesame Millions Triples SPARQL 1.1 RDFS Yes (through Lucene SAIL) Java OpenLink Viruoso 15.4 BT SPARQL 1.1 RDFS, subsets of OWL yes Java Oracle >500 Billons Triples SPARQL 1.0 (11g) Sparql 1.1 (12c), SEM_MATCH, SEM_RELATED RDFS, OWL, OWLIM, SKOS, SNOMED Yes (Oracle Text) OWLIM 20 BT SPARQL 1.1 RDFS, OWL, OWLIM yes Java Java, SQL, PL/SQL Allegrograph >500 Billons Triples SPARQL 1.1, Prolog RDFS, Prolog rules yes Java, LISP, Python, Ruby, C# 4 Store 15 BT SPARQL 1.1 RDFS yes Java BigData over 10 BT SPARQL 1.1 RDFS, OWL Lite Internal, external through Lucene Java 13 Urika ( YarcData) Anzo Cambridge Trillions SPARQL 1.1 RDFS Yes Java, Python unknown SPARQL 1.1 RDFS, OWL Yes (Information Mining) Java, SOAP

SPARQL Protocol and RDF Query Language Latest Version 1.1 SELECT returns all, or a subset of, the variables bound in a query pattern match CONSTRUCT returns triples ASK returns a boolean DESCRIBE asks for triples that describe a particular resource 14

SPARQL compared to SQL A SPARQL query of this type would be quite difficult to translate into SQL queries : 15

Inferencing / Reasoning Inferencing is the ability to make logical deductions based on Ontology rules. The reasoning tools use the rules defined in the RDF Model (RDFS, OWL, SKOS, ) to detect new properties and new relationships. The ability to draw inferences from existing data using the precision and rigor of mathematical logic is probably the most important property that distinguishes semantic data from others. Example of use: Linkedin or Facebooks discovering new links between persons 16

Reasoning example Graph representation and data modelisation Reasoning builts the missing relation Can take time.. Some DBs do it on the fly or materialize the generated triples 17 O-XML: Introducing XML

LOD : Linked Open Data Initative 18

Semantic Web query federation Searching multiple Datasets with one Query 19

Semantic Web in relation to Big Data or how to transform Big Data into Smart Data. Sample vs. All Clean vs Dirty Many Undiscovered causation (Why) vs Correlation Table vs Graph Planned Path vs Discovery 20

Data Science example using R and SPARQL 1. Extracts data from htp://spatial.linkedscience.org and represents the result as a graph : 21

Linked Data in Enterprise Access & Presentation Layer Semantic Graph model (W3C RDF Metadata Model) Index Data Servers Event Server Hadoop Appliance Content Mgmt BI Server Data Warehouse Data Sources / Types Machine Generated Data Social Media Human Sourced Information Subscription Services Transaction Systems

Franz Corp. Allegrograph 1. Allegrograph is licensed under proprietary commercial license 2. Focuses on high scalability 3. Development language : Java, Python or LISP 4. Alternative to SPARQL queries : PROLOG 5. RESTful HTTP protocol to maintain triples in the DB 6. Graphical tool : GRUFF 23

Oracle Spatial & Graphs 1. The Oracle RDF Triple Store embedded in the relational databases Schema MDSYS contains RDF_LINK$ and RDF_VALUE$ tables SPARQL 1.1 supported in 12c Native support of most of the W3C rules Use of named graphs (quad) since 11.2.0.3 Scales up to 100 s billions of triples Oracle specific adapters available for JENA, SESAME, TopBraid, Protégé and Cytoscape 24

Oracle Spatial & Graphs other features 1. Support of Temporal reasoning, Spatial reasoning 2. Fine grained security on triple level and for inferenced graphs 3. The oracle reasoner persists the infered triples in the DB. As an alternative, integration with Pellet or TrOWL, as an external OWL 2 reasoner 4. Jena and Sesame Adapters 1. To build SPARQL end points 2. Bulk load triples from Java 3. Develop applications in Java 5. Integration with OBIEE, RDF browser 25

SPARQL and SPARQL in SQL Architecture HTTP Standard SPARQL Endpoint Enhanced with query management control Java Jena API Jena Adapter Sesame API Sesame Adapter SPARQL-to-SQL Translation Logic SQL SEM_MATCH rewritable table function

ORACLE Database RDF Query engine Can be joined with any other relational table or view 27

RDB2RDF & R2RML : Modeling Relational Data as a Graph Relational to RDF Modeling W3C R2RML Oracle Spatial and Graph 12c can represent relational schema as graph view Integrate content from distributed sources Federate distributed databases Apply SPARQL queries on tables, views, SQL query results No duplication of data and storage

Graph Support on Oracle NoSQL Available on Oracle NoSQL Database (Enterprise Edition) Graph Feature for NoSQL RDF Graph support in Oracle NoSQL Database Enterprise Edition High performance Key Value store Standard access to graph data: SPARQL 1.1 Jena & Joseki SPARQL endpoint Web Services Massive horizontal scalability of triples petabytes Support for World Wide Web Consortium (W3C) Semantic Web standards

Novartis Institutes for BioMedical Research (NIBR) Usecase : project Metastore NIBR is the global pharmaceutical organization for Novartis committed to discovering innovative medicines to treat diseases with high unmet medical need 6000+ scientists, physicians, business professionals worldwide METASTORE is a Scientific knowledge portal used by many application to Search over Ontology oriented data Organized around scientific concept types : Genes, Proteins, Indications, Anatomy, diseases, taxonomy etc ; Can be hierarchically organized and classified Builds a semantic network of scientific concepts 30

Solution implemented : Oracle Spatial & Graph 1. Accessible through dedicated service layer and reusable widgets Integrated application to visualize all Metastore content. 31

Use case Fraud detection 32

A real world fraud detection example Find any circle of payments between accounts that all happened within 10 miles of San Jose within the last day and where the payments > $1000 Requires Graph Analytics Temporal reasoning Geospatial reasoning Social Network Analysis

Social Network Analysis answers 4 questions Social Network Analysis answers 4 questions How far is P1 from P2 and how strong is the relation To what groups does this person belong (ego groups, cliques?) How important is this person in the group? Does this group have a leader, how cohesive are they?

Activity recognition Find all meetings that happened in November within 5 miles of Berkeley that was attended by the most important person in Jans friends and friends of friends. (select (?x) (ego-group person:jans knows?group 2) SNA (actor-centrality-members?group knows?x?num) SNA (q?event fr:actor?x) DB Lookup (qs?event rdf:type fr:meeting) RDFS (interval-during?event 2008-11-01 2008-11-06 ) Temporal (geo-box-around geoname:berkeley?event 5 miles) Spatial!)

Fraud detection example using SPARQL Find any circle of payments between accounts that all happened within 10 miles of San Jose within the last day and where the payments > $1000 Find the circle Inspect the property graph Temporal Geo

Conclusion : Why should you choose Semantic Web? 1. You want a flexible, adaptable, transparant information architecture 2. Project requires complex structures and large amount of relations beetween classes as well as properties 3. project requires integration of data from different sources 4. heterogeneous sets of metadata and vocabulary concepts, originating from multiple sources 5. Need for semantic annotations using controlled vocabularies and thesauri such as FOAF, OWL, SKOS, etc 6. There is a need for making logical deductions based on rules defined by these controlled vocabularies. 37

THANK YOU. Marc Lieber Marc.lieber@trivadis.com www.trivadis.com BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STU TTGART WIEN 38