Best practices for Linked Data



Similar documents
Publishing Linked Data There is no One-Size-Fits-All Formula

Open Data. Asunción Gómez-Pérez Ontology Engineering Group Artificial Intelligence Department Universidad Politécnica de Madrid

Introduction to the Semantic Web

Towards the Integration of a Research Group Website into the Web of Data

Drupal.

Publishing Linked Data Requires More than Just Using a Tool

Developing Web 3.0. Nova Spivak & Lew Tucker Tim Boudreau

Publishing Relational Databases as Linked Data

Joint Steering Committee for Development of RDA

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

IAAA Grupo de Sistemas de Información Avanzados

How To Create A Federation Of A Federation In A Microsoft Microsoft System (R)

Linked Statistical Data Analysis

DISCOVERING RESUME INFORMATION USING LINKED DATA

Evaluation experiment for the editor of the WebODE ontology workbench

Programming the Semantic Web with Java. Taylor Cowan Travelocity 8982

EAC-CPF Ontology and Linked Archival Data

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Integration of Polish National Bibliography within the repository platform for science and humanities

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

María Elena Alvarado gnoss.com* Susana López-Sola gnoss.com*

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Visual Analysis of Statistical Data on Maps using Linked Open Data

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

DISIT Lab, competence and project idea on bigdata. reasoning

GeoLinked Data. An application case/ Un caso de aplicación. Vilches Blázquez, Luis Manuel; Villazón-Terrazas, Boris; Corcho, O.; Gómez Pérez, Asunción

The Ontology and Architecture for an Academic Social Network

GetLOD - Linked Open Data and Spatial Data Infrastructures

Serendipity a platform to discover and visualize Open OER Data from OpenCourseWare repositories Abstract Keywords Introduction

Semantic Interoperability

Web NDL Authorities: Authority Data of the National Diet Library, Japan, as Linked Data

New Generation of Social Networks Based on Semantic Web Technologies: the Importance of Social Data Portability

Industry 4.0 and Big Data

Open Data Integration Using SPARQL and SPIN

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Connecting the Smithsonian American Art Museum to the Linked Data Cloud

Dendro: collaborative research data management built on linked open data

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

CASRAI, eurocris, Lattes, and VIVO: Four Perspectives on Research Information Standards

UNIMARC, RDA and the Semantic Web

Open Data collection using mobile phones based on CKAN platform

Network Graph Databases, RDF, SPARQL, and SNA

We have big data, but we need big knowledge

ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

RDF y SPARQL: Dos componentes básicos para la Web de datos

Multilingual and Localization Support for Ontologies

Proceedings of the SPDECE Ninth nultidisciplinary symposium on the design and evaluation of digital content for education

DATA MANAGEMENT PLAN DELIVERABLE NUMBER RESPONSIBLE AUTHOR. Co- funded by the Horizon 2020 Framework Programme of the European Union

Mining the Web of Linked Data with RapidMiner

A generic approach for data integration using RDF, OWL and XML

CitationBase: A social tagging management portal for references

Infrastructures, Pla/orms and Services for the Mul8lingual Digital Single Market

LDIF - Linked Data Integration Framework

The Manuscript as Cultural Heritage: Digitisation ++

Transcription:

Best practices for Linked Data Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net asun@fi.upm.es Phone: 34.91.3367417, Fax: 34.91.3524819 Acknowledgements: M. Poveda, V. Rodríguez-Doncel, D. Vila BabeLData: TIN2010-17550

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December Linked Data: why it is important? Facilitate data integration From heterogeous sources In different formats Different granularity In different languages From different countries Slide adapted from 5min Introduction to Linked Data - Olaf Hartig

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December 3 3 BD BNE BD VIAF BD AEMET BD IGN BD Prisa BD DBpedia Data Integration BNE Ubicado en Alcalá de Henares 1605 El Quijote Año de Publicación Autor birthplace Same as M. Cervantes M. Cervantes Alcalá de Henares M. Cervantes Year of publication creator Don Quixote 1960 Translated into Hebrew VIAF located Alcalá de Henares guía Tapas Siglo de Oro Alcalá de Henares Temperatura 20º

RDF(S) models Unique identifiers: URI identify or name a resource Foundations Equivalence links to other datasets Same As Data navigation http://iflastandards.info/ns/fr/frbr/frbrer/c1005 Person Is creator of Cer http://iflastandards.info/ns/fr/frbr/frbrer/c1001 Work Is a Is a Cervantes http://datos.bne.es/resource/xx1718747 Is creator of Cer El Quijote http://datos.bne.es/resource/xx3383563 Same As Same As Cervantes http://viaf.org/viaf/17220427 Cervantes http://dbpedia.org/resource/miguel_de_cervantes Asunción Gómez-Pérez http://www.w3.org/designissues/linkeddata.html W3C @ Spain 2013 Madrid, 18 th December 4

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December 5 5 The model (Ontology) and the data for humans Idiom Year translation Publication date Work Is creator of Person birthplace Place Ontology Located at Library Has subject Catalán 1960 translation Publication date El Quijote Is creator of Cervantes birthplace Alcalá de Henares Located in Has subject Vida de Cervantes Data BNE

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December 6 6 The model and the data for Machines Language http://iflastandards.info/ns/fr/frbr/frbrer/c1002 Ontology translation work Is creator of Person Año http://iflastandards.info/ns/fr/frbr/frbrer/c1001 http://iflastandards.info/ns/fr/frbr/frbrer/c1005 Publication date birthplace Located in Has subject http://geo.linkeddata.es/ontology/municipio Biblioteca http://xmlns.com/foaf/0.1/organization Catalán http://datos.bne.es/resource/xx1924295 translation http://geo.linkeddata.es/resource/alcalá de Henares 1960 Publication date Don Quijote de la Mancha http://datos.bne.es/resource/xx3383563 Es autor Cervantes Saavedra, Miguel de http://datos.bne.es/resource/xx1718747 birthplace Has subject BNE Located in http://datos.bne.es/# http://datos.bne.es/resource/bimo0002045496 Vida de Miguel de Cervantes Saavedra Data

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December Linked Data is to be processed by machines

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December The generation process Providers Domains Sources Languages

The Linked Data Generation Process Specification Data Curation Exploitation Modelling Publication Generation Linking 9 There is no One-Size-Fits-All Formula

Lot of data in many domains Music On-line activities E-Gov Cross-domains Publications Geographic Life Sciences

I want to use Linked Open Data Who generated the LD dataset? When the LD dataset was created? How the LD dataset was created? Is the latest version of the LD dataset? Is the license information clearly stated in the LD dataset? How is LD licenses offered? Is the LD dataset monolingual or multilingual?

LOD observations How the LD generation process influence the use of the data by third parties? Vocabularies Licenses Language Provenance

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December How to prevent GIGO GARBAGE PROCESS

Vocabularies 14 th

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December Cervantes at the data level http://www.server1.org/resource/cervantes Same as URI URI URI URI URI Cervantes http://d-nb.info/gnd/11851993x Same as http://datos.bne.es/resource/xx1718747 Same as http://www.server2.es/resource/cervantes D. Quijote Author Phone Date of Birth #People 914 296 093 Same as 1547 Size 1547 276,4 km² http://geo.linkeddata.es/page/resource/municipio/cervantes

http://www.server1.org/resource/cervantes rdf:type Cervantes and a bit of semantics rdf:type Person Retaurant URI URI URI URI URI Cervantes (Person) http://d-nb.info/gnd/11851993x rdf:type Same as http://datos.bne.es/resource/xx1718747 rdf:type Street Author http://www.server2.es/resource/cervantes D. Quijote Date of Birth rdf:type Municipality 1547 http://geo.linkeddata.es/page/resource/municipio/cervantes Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December

17 Cervantes foaf foaf:agent foaf:group foaf:organization foaf:document foaf:person foaf:publications foaf:image foaf:mbox - foaf:firstname - foaf:surname - foaf:birthday foaf:img owl:thing foaf:knows foaf:depiction Miguel de Cervantes Saavedra foaf:firstname foaf:surname instanceof bibliothek:cervantes instanceof foaf:homepage instanceof instanceof 29-09 foaf:birthday foaf:img http://www.bibliothekberlin/ /images/quixote.tif http://.../authors/cervantes.png foaf:publications foaf:depiction http://www.bibliothekberlin.com/.../3-538-06892-5

18 License Information

How Open is the Open Linked Data Cloud? LOD observations: Licenses

An example: the British National Bibliography

License Information is not up to date

Metadata information without license information

License information provided as XML

Linked Data Rights pattern http://oeg-dev.dia.fi.upm.es/licensius/static/ldr/

Lenguage 25

Rationale: LOD is dominated by the English Language 2007 2009 2013 Questions: 1. Searching resources in a particular language 2. Distribution of natural languages across RDF datasets? 3. Usage of language tags to indicate the natural language of RDF tags? 1. Distribution of usage of language tags 2. Distribution of literals tagged as English vs other languages 3. Distribution of literals tagged in languages other than English 26

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December 27 Example of multilingual library resource The dataset publisher does not tag the language of the content of different fields Ernest Hemingway and El viejo y el mar MARC 21 records

Asunción Gómez-Pérez W3C @ Spain 2013 Madrid, 18 th December Multilingualism and the Linked Data Process How to represent language information for datasets? # VoiD description :bne a void:dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es>. # DCAT description :bne a dcat:dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es> How to represent language information in Linked Data? Traditional annotation properties for most cases dbpedia:miguel_de_cervantes rdfs:label "Miguel de Cervantes"@es. "ミゲル デ セルバンテス"@ja. " "@ko. Richer models for more demanding applications # LEMON isbd:t1001 lemon:isreferenceof [lemon:issenseof :cartographic]. :cartographic a lemon:lexicalentry; lemon:form [lemon:writtenrep cartográfico @es; isocat:grammaticalgender isocat:masculine]; lemon:form [lemon:writtenrep cartográfica @es; isocat:grammaticalgender isocat:feminine]. isocat:grammaticalgender rdfs:subpropertyof lemon:property.

Implementation of the recording of data and metadata provenance Generation process PROV-O @W3C Resource provenance DC File.txt creator creadondate John 12-2- 1900 rights GPL used Revision Process generatedby PROVENANCE Model (RDF(S)) Filev1. txt RDF Store 29 1

Asuncion Gomez-Perez W3C @ Spain 2013 Madrid, 18 th December Conclusions The use of Data curated Use vocabularies widely known License metadata in RDF Language metadata in RDF Provenance metadata in RDF Will influence the use of the linked data by third parties

Thanks for your attention! Asuncion Gomez-Perez Guidelines for Multilingual Linked Data. WIMS 2013 Madrid, 12-14 June 31

There is no One-Size-Fits-All Formula Phase BNE IGN AEMET PRISA INE Modeling DC hydrontology Wgs84 time SSN ontology SIOC Scovo Data cube RDF generation MARiMbA geometry2rdf NOR2O CSV parser CSV parser NOR2O Links generation DNB VIAF LIBRIS DBPEDIA Silk Silk Silk DBPEDIA DBPEDIA Geolinkeddata.es Geonames Geolinkeddata.es NOR2O Geolinkeddata.es Publication Pubby sitemap4rdf Exploitation map4rdf SPARQL http://oa.upm.es/14465/1/2.formulald.pdf

The multilingual Web of Data: Current state Monolingual datasets Multilingual datasets RDF literals without language tag RDF literals with language tag 349 635 676 2,567,324 3,154,779 3,365,930 1,906 2,201 1,984 10,250,936 10,594,338 12,272,806 January 2012 June 2012 December 2012 1. Number of Monolingual and multilingual datasets January 2012 June 2012 December 2012 2. Current usage of language tagging capabilities in RDF RDF literals with English tag RDF literals with other language tag 431,660 403,714 557,785 2,135,664 2,751,065 2,808,145 January 2012 June 2012 December 2012 3. English tags versus other languages' tags 4. Evolution of top-10 languages 33