From relational databases to linked data:r for the semantic web. Jose Quesada, Max Planck Institute, Berlin



Similar documents
Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

Supporting Change-Aware Semantic Web Services

Semantics and Ontology of Logistic Cloud Services*

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Tensor Factorization for Multi-Relational Learning

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Semantic Interoperability

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

RDF Resource Description Framework

OWL: Path to Massive Deployment. Dean Allemang Chief Scien0st, TopQuadrant Inc.

HIGH PERFORMANCE BIG DATA ANALYTICS

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Linked Medieval Data: Semantic Enrichment and Contextualisation to Enhance Understanding and Collaboration

Architecturing Component Based Systems with XML Technologies and Standards

Introduction to Markov Chain Monte Carlo

Part 1: Link Analysis & Page Rank

Information Technology for KM

Annotea and Semantic Web Supported Collaboration

Designing a Semantic Repository

Data Integration using Semantic Technology: A use case

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Graph Algorithms and Graph Databases. Dr. Daisy Zhe Wang CISE Department University of Florida August 27th 2014

Network Graph Databases, RDF, SPARQL, and SNA

Presente e futuro del Web Semantico

Reputation Network Analysis for Filtering

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Semantic Stored Procedures Programming Environment and performance analysis

How To Build A Cloud Based Intelligence System

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum

Secure Semantic Web Service Using SAML

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Towards a reference architecture for Semantic Web applications

Semantic Web Development in China

12 The Semantic Web and RDF

HEALTH INFORMATION MANAGEMENT ON SEMANTIC WEB :(SEMANTIC HIM)

THE SEMANTIC WEB AND IT`S APPLICATIONS

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Semantically Enhanced Web Personalization Approaches and Techniques

Ontology based Recruitment Process

Open Source egovernment Reference Architecture Osera.modeldriven.org. Copyright 2006 Data Access Technologies, Inc. Slide 1

We have big data, but we need big knowledge

An Ontology-based e-learning System for Network Security

CRM dig : A generic digital provenance model for scientific observation

Ampersand and the Semantic Web

The Ontology and Architecture for an Academic Social Network

Big Data Analytics. Rasoul Karimi

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Converging Web-Data and Database Data: Big - and Small Data via Linked Data

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology

LDIF - Linked Data Integration Framework

Standards for Big Data in the Cloud

SmartLink: a Web-based editor and search environment for Linked Services

Big Data Analytics Process & Building Blocks

Search engines: ranking algorithms

A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS

Archetypes and ontologies to facilitate the breast cancer identification and treatment process

Data Validation with OWL Integrity Constraints

Experiments in Web Page Classification for Semantic Web

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

An Efficient and Scalable Management of Ontology

M3039 MPEG 97/ January 1998

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Characterizing Knowledge on the Semantic Web with Watson

Semantic Search in Portals using Ontologies

SNOMED-CT

NoSQL. Thomas Neumann 1 / 22

Ontology based ranking of documents using Graph Databases: a Big Data Approach

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Transcription:

From relational databases to linked data:r for the semantic web Jose Quesada, Max Planck Institute, Berlin

Who this talk targets You have big data; you use a database You have an evolving schema definition. Sometimes at runtime You are interested in alternative ways to present your data You would thrive by using data out there, if only they were more accessible

Semantic web

Credit: Jim Hendler THE TWO TOWERS

The Semantic web Ontology as Barad-dur (Sauron s tower) Extremely powerful Patrolled by Orcs OWL Let one little hobbit in it, and the whole thing could come crashing down

The Semantic web Ontology as Barad-dur (Sauron s tower) Extremely powerful Patrolled by Orcs OWL Decidable logic basis inconsistency Let one little hobbit in it, and the whole thing could come crashing down

Inconsistency

The semantic web The tower of Babel We will build a tower to reach the sky We only need a little ontological agreement Who cares if we all speak different languages? This is RDFS Statistics matter here Web-scale Lots of data; finding anything in the mess can be a win

Approaches to data representation Objects Tables (relational databases) Non-relational databases Tables (data.frame) Graphs

What one can do with semantic web data, now: People that died in Nazi Germany and if possible, any notable works that they might have created SELECT * WHERE {?subject dbpprop:deathplace <http://dbpedia.org/resource/nazi_germany>. OPTIONAL {?subject dbpedia-owl:notableworks?works } }

subject :Anne_Frank :Martin_Bormann - :Ir%C3%A8ne_N%C3%A9mirovsky - :Erich_Fellgiebel - :Friedrich_Ferdinand%2C_Duke_of _Schleswig-Holstein :Friedrich_Olbricht - :Ludwig_Beck - :Erwin_Rommel - :Maurice_Bavaud - :Early_Years_of_Adolf_Hitler - :Emil_Zegad%C5%82owicz - :Friedrich_Fromm - :Helmuth_James_Graf_von_Moltk - works :The_Diary_of_a_Young_Girl -

Scale to the entire web Do reasoning with open word assumption Use cases: Real time city Cancer monographs for WHO Gene expression finding Retrieval in real-time Go beyond logics

RDF is a graph We have lots of interesting statistics that run on graphs In many Semantic Web (SW) domains a tremendous amount of statements (expressed as triples) might be true but, in a given domain, only a small number of statementsis known to be true or can be inferred to be true. It thus makes sense to attempt to estimate the truth values of statements by exploring regularitiesin the SW data with machine learning

Scale You cannot use the entire thing at once: subsetting Are there patterns in knowledge structures that we can use for subsetting?

Idea Graph theory applied to subsetting large graphs Developing Semantic Web applications requires handling the RDF data model in a programming language Problem: current software is developed in the object-oriented paradigm, programming in RDF is currently triple-based.

Data IMDB is a big graph: 1.4 m movies 1.7 m actors 11 M connections Movies have votes Bipartite network Packages: igraph: Nice functions that you cannot find anywhere else Uses Sparse Matrices Implemented in C Some support for bipartite networks Rmysql, Matrix (sparse m)

Centrality

Centrality

Pagerank The pagerank vector is the stationary distribution of a markov chain in a link matrix 1 3 Some assumptions to warrant convergence The typical value of d is.85 2 4 norm <- function(x) x/sum(x) norm(eigen(0.15/nvertices + 0.85 * t(a))$vectors[,1])

Top movies by pagerank in the actor->movie network degree pagerank cluster imdbid title rank votes 0.000243688 1298252192870 0 822609Around the World in Eighty Days (1956) 40031 6134 0.000103540 313862390464 0 76352\Beyond Our Control\" (1968)" 0 0 0.000091669 2910099912811 0 993780Gone to Earth (1950) 7.0 291 0.000089025 2855923652847 0 915626Deadlands 2: Trapped (2008) 39971 15 0.000083882 424328163772 0 1282574Stuck on You (2003) 6.0 19709 0.000080824 6291101098043 0 622100\Shortland Street\" (1992)" 39850 225

Problems Graphs have advantages over RDBMS/tables[1]. But we are used to think in tables There is no direct way to handle RDF in R. worth an R package?

Linked data are out there for the grabs We need to start thinking in terms of graphs, and slowly move away from tables Thanks for your attention Jose Quesada, quesada@workingcogs.com, http://josequesada.name Twitter: @Quesada