Publishing Linked Data There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net asun@fi.upm.es Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0 LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012
Table of content 1. The concept 2. Foundations 3. The process 4. Examples Libraries: http://datos.bne.es Geo: http://geo.linkeddata.es/ Metereology:http://aemet.linkeddata.es/ Travelling: http://webenemasuno.linkeddata.es/ 2
Complex queries using data from heterogeneous Web pages http://www.bne.es/ http://elviajero.elpais.com/ Cervantes enthusiast from Germany visiting Madrid and willing to know more about Cervantes work and life http://www.viaf.org/ http://www.aemet *Picture attribution: http://commons.wikimedia.org/wiki/user:gugerell 3
BD BNE BD VIAF BD AEMET BD IGN BD Prisa BD DBpedia Data Integration BNE Ubicado en Alcalá de Henares 1605 El Quijote Año de Publicación Autor birthplace Same as M. Cervantes M. Cervantes Alcalá de Henares M. Cervantes Year of publication creator Don Quixote 1960 Translated into Hebrew VIAF located Alcalá de Henares guía Tapas Siglo de Oro Alcalá de Henares Temperatura 20º 4
Table of content 1. The concept 2. Foundations 3. The process 4. Examples Libraries: http://datos.bne.es Geo: http://geo.linkeddata.es/ Metereology:http://aemet.linkeddata.es/ Travelling: http://webenemasuno.linkeddata.es/ 5
Linked Data: why it is important? Facilitate data integration From heterogeous sources In different formats Different granularity In different languages From different countries Slide adapted from 5min Introduction to Linked Data - Olaf Hartig
(S) models Unique identifiers: URI identify or name a resource Foundations Equivalence links to other datasets Same As Data navigation http://iflastandards.info/ns/fr/frbr/frbrer/c1005 Person Is creator of Cer http://iflastandards.info/ns/fr/frbr/frbrer/c1001 Work Is a Is a Cervantes http://datos.bne.es/resource/xx1718747 Is creator of Cer El Quijote http://datos.bne.es/resource/xx3383563 Same As Same As Cervantes Cervantes http://viaf.org/viaf/17220427 http://dbpedia.org/resource/miguel_de_cervantes
Aligning Models with Owl EquivalentClass Person Foundations http://schema.org/person http://iflastandards.info/ns/fr/frbr/frbrer/c1005 EquivalentClass Person birthplace Person http://xmlns.com/foaf/0.1/person Municipality http://dbpedia.org/resource/municipalities_of_spain EquivalentClass Municipio http://geo.linkeddata.es/ontology/municipio Is a Is a Alcalá de Henares http://dbpedia.org/page/alcal%c3%a1_de_henares Same As Alcalá de Henares http://geo.linkeddata.es/resource/alcalá de Henares Lessons learnt 1. Reuse existing models 2. Align the data and the concepts.
Table of content 1. The concept 2. Foundations 3. The process 4. Examples 9
Methodology Data sources analysis URI Design License definition Reunión bilateral CNIG OEG Proyecto OTALEX 10
Identification and selection of data sources Geographical Spanish Institute Statistical Spanish Institute Spanish National Libraries Metereological Office (AEMET) 11
1. Identification and selection of the data sources Geographic Spanish Institute Multilingual (Spanish, Vasc, Gallician, Catalan) Conceptualization mistmatches Granularity (scale concept) Domain vocabulary Inform. hidrográfica. Embalse, albufera, río, etc. Transportes. Vía desdoblada, Ferrocarril, Unidades Administrativas. Municipio. Particularaties Longitude and latitude Statistic Spanish Institute Monolingual Numerical information Particularaties Geo (textual level) and Temporal 12
1. Identification and selection of the data sources: Geographical information IGN-E
1. Identification and selection of the data sources Statistical information 14
Records in the MARC 21 format 3.9 million bibliographical records 4.2 million authority records Version: November, 2011 15
URI design Meaningful URIs versus Opaque URIs Separate TBox (ontology model) from ABox Base URI http://linkeddata.es/ http://datos.bne.es/ http://geo.linkeddata.es/ http://otalex.linkeddata.es/ OntologyTBox URIs) http://iflastandards.info/ns/fr/frbr/frbrer/c1005 http://phenomenontology.linkeddata.es/ontology/{concept property} http://phenomenontology.linkeddata.es/ontology/municipio We use the Data Cube Vocabulary and/or other vocabularies Data (ABox URIs) http://datos.bne.es/resource/xx1718747 http://geo.linkeddata.es/resource/{resource type}/{resource name} http://geo.linkeddata.es/resource/municipio/badajoz 16
Ontology Ontologies: A set of terms A set of explicit assumptions regarding the intended meaning of the terms. Almost always including concepts and their classification Almost always including properties between concepts Shared understanding of a domain of interest Ontologies expressed in OWL or (S), both based on The NeOn methodology helps to build ontologies 18
2. Vocabulary development Features Lightweight : Taxonomies and a few properties Consensuated vocabularies To avoid the mapping problems Multilingual Linked data are multilingual The NeOn methodology can help to Re-enginer Non ontological resources into ontologie Pros: use domain terminology already consensuated by domain experts Withdraw in heavyweight ontologies those features that you don t need Reuse existing vocabularies 19
The Ontology for BNE: based on IFLA vocabularies
Geolinkeddata ontology hydrographical phenomena (rivers, lakes, etc.) haslat/long W3C Vocabulary WGS84 4 WGS84 Geo Positioning: an vocabulary haslat/long hasstatisticaldata O. Statistics SCOVO scv:dimension scv:item scv:dataset UNESCO EGM / ERM GeoNames hydrontology 4 Ontology for OGC Geography Markup Language hasgeometry haslocation/islocated GML 4 GML hasgeometry FAO FAO Geopolitical ontology on Names and international code systems for territories and groups O. Time W3C Time Ontology Legend Vocabulary for instants, intervals, durations, etc. 4 Classes 33 33 Object Properties 44 44 Data Properties 318 318 Thesaurus reused Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation. hydrontology,scovo, FAO Geopolitcal, WGS84, GML, and Time
3. of BNE From the Data sources Geographic information (Databases) Statistic information (.xsl) Geospatial information Biobliographic information (MARC 21) Different technologies for generation NOR20 (from excell, XML, text files, ) R20 and ODEMapster (from Databases) Geometry2 and SPh2 (for Geo data) Marimba for Libraries
Libraries: Marimba uses the ontology to generate BNE
Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia BNE
Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia http://d-nb.info/gnd/11851993x DNB http://viaf.org/viaf/17220427 VIAF Same As Same As http://dbpedia.org/resource/miguel_de_cervantes http://datos.bne.es/resource/xx1718747 Same As DBpedia Same As BNE Same As http://www.idref.fr/026774771/id SUDOC http://libris.kb.se/resource/auth/45369 LIBRIS
Publicación Data publication Metadata publicacion using VOID To facilitate the discovery Register in CKAN your dataset Use to sitemap4rdf to generate the site map Upload the site map to Google and Sindice
Especification Web Interface generation SPARQL queries select distinct COUNT(?Obras) where { http://datos.bne.es/resource/xx1718747 URI Cervantes Is author <http://iflastandards.info/ns/fr/frbr/frbrer/p2010>?obras } http://linkeddata3.dia.fi.upm.es/bne-demo
Table of content 1. The concept 2. Foundations 3. The process 4. Examples Libraries: http://datos.bne.es http://linkeddata3.dia.fi.upm.es/bne-demo Geo: http://geo.linkeddata.es/ Metereology: http://aemet.linkeddata.es/ Travelling: http://webenemasuno.linkeddata.es/ 29
Estacion MADRID,RETIRO 21 :40 26/5/201 1 Djr media del viento: 276 grados Recorrido del viento: 13 Hm V el. media del viento: 2.2 nnls l!ti l.ü!!l.rul.-ª!!ti 1 semana O ir. de la v. max. del viento 251 grados l!ti l.ü!!l.rul.-ª Temperatura del aire 18.5 grados C. l!ti l.ü!!l.rul.-ª Humedad relativa: 75 % l!ti l.ü!!l.rul.-ª Sanuago Composle Pon!evedr ogo L p Temp. del pto. de rocio: 13.9 grados C. Vel max del viento 4. 7 mis Precjpjtacjon: O litros/m2 ~ 938. 4 h Pa l!ti l.ü!!l.rul.-ª l!ti l.nm.20.2 l!ti l.nm.20.2 Pres. reducida al nivel del mar 1 013.6 hpa!!ti 1.ü!!l.rul.-ª Brog11 O CIIev.. 0 o Fofe Porto 0 O Pore<IH o Sln!e Mono da Fe,.,..,'l:'. Nantes o,._ Tours 0 o DV10m on MADRID,RETIRO Capas El Viajero Filtr porf" ch" No hay fotos disponibes o Las chicas de Artón Martfn o Reflejos versalle~cos en 8 Jn paseo por Madrid - o Visitando El E scorial..:.. O <luo<do 0 OCovllhl _,?.,...- PombJI O COmtwl LOV-HIVE (e) d::~e!. Symposium. 18th June 2012 0
There is no One-Size-Fits-All Formula Phase BNE IGN AEMET PRISA INE Modeling DC hydrontology Wgs84 time SSN ontology SIOC Scovo Data cube generation MARiMbA geometry2rdf NOR2O CSV parser CSV parser NOR2O generation DNB VIAF LIBRIS DBPEDIA Silk Silk Silk DBPEDIA DBPEDIA Geolinkeddata.es Geonames Geolinkeddata.es NOR2O Geolinkeddata.es Pubby sitemap4rdf map4rdf SPARQL 31
URI Follow existing design guidelines for new URIs Reuse existing URIs from authoritative sources Models Reuse existing models when available Create new models from authoritative sources Do not forget to align your model with existing models Link Vertical domains usually require specific tools for generation Generic link discovery tools performs well in vertical domains Link to other data sets using Discovery Equivalence links (sameas) Typed links bne:cervantes Person Use sitemap4rdf to allow search engines to find your data Use an iterative-incremental life cycle in your development Lessons learnt Learn about Linked Data with UPM official courses in one week sameas birthplace Dbpedia:cervantes Municipality 32
Publishing Linked Data There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net asun@fi.upm.es Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Our partners at: BNE, IGN, Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0 LOV SYMPOSIUM: LINKING AND OPENING VOCABULARIES. 18th June, 2012