Linked Data Management and INSPIRE Compliant Data Export Dimitris Kotzinos FORTH-ICS & University of Cergy - Pontoise Yannis Roussakis, Kyriakos Kritikos, Athina Kritsotaki, Martin Doerr FORTH-ICS
Problems to tackle Data integration Data retrieval in a uniform way Data export to other formats November 26, 2013 2
The Web of databecomes a reality by connecting so far isolated islands or data silos such as Enterprise silos (Disparate DBMS Engines & Development Frameworks) Social Media silos (Social Networking Services, Discussion Forums, Blogs, Wikis etc.) Scientific silos (in biology, physics, earth sciences, etc.) Linked Open Data(LOD) is a way of publishing data on the (Semantic) Web that: Encourages reuse Promotes its (real & potential) inter-connectedness Enables network effects to add value to data Information Integration November 26, 2013 Linked Open Geodata Source: flickr http://www.flickr.com/photos/docsearls/5500714140/ 3
Linked Open Data as Service Extensible Application Pool Rel DB Visualization Data Integration Layer Rel DB Collaboration A P I Query Update Import Query Engine Linked Data Excel files External Resources Cross Data sets' Querying Export XML files Abstraction layer for data access abstract the applications fromthe specific setup of the datamanagement service (such as localvs. remote, federation, and distribution) Beyond Data Access November 26, 2013 Enabling automation of discovery, composition, and use of datasets Data Markets Online Visualization Services Data Publishing Solutions Data Aggregators BI / Analytics as a Service 4
Data Exports Query results viewed as: RDF/XML JSON GML KML Shapefiles Transformation services to user provided conceptualizations Export to INSPIRE compliant XML formats November 26, 2013 5
Introduction on Available Datasets in the project In this project we deal with scientific data described in different formats and notions from diverse areas ground waters, boreholes, chemical analyses, etc. GEUS, EKBAA, BRGM earthquakes, recordings, stations, etc. EPPO Landslides, susceptibility maps, etc. GEO-ZS, EKBAA Purpose: integrate, describe and query such heterogeneous data in a uniform way November 26, 2013 6
Available Datasets in the project Approach Creation of a Conceptual Model to integrate and cover all the thematic fields Map the source relational data into RDF data compliant to the Conceptual Model Provide unique URIs for data Rely on a scalable RDF Triple Store to enforce the mappings and enable the storage and query of the RDF data Provide an API for the RDF data management November 26, 2013 7
Models, Mappings and Integration Providers Data results transform GSOM mappings query tsv/csv xml json GeoJSON KML Shapefile. export xml trig trix n3 formats Inspire compliant InGeoCloudS Workshop, Brussels 21 November 2013 8 November 26, 2013
Conceptual Model Our model: needs to be a generic, extendable and interoperable, which relies not only to geographic data Integrates different datasets created from sources w.r.t. their semantics developed for the documentation of scientific observation measurement and sampling An event-centric model that captures complex contexts where time, place and participants are associated. Capable to describe complex relationships to integrate or connect rich information November 26, 2013 9
Basic Concepts E73 Information Object P67B is referred to by E1 CRM Entity P2 has type P1 is identified by E55 Type S21 Spatial Distribution Model P3 has note E41 Appellation E62 String S19 Observable Entity E52 Time-span S18 Map P4 has time-span E39 Actor E53 Place E42 Place Appellation P14 carried out by E7 Activity P7 took place at P16 used specific object P12B was present at E70 Thing P101 had as general use dc:coverage dc:date dc:subject dc:format dc:creator dc:locationperiodorjurisdiction rdf:literal LEGEND dc:mediatypeorextent Proposed (New) Concepts Reused Concepts (CIDOC) dc:agent Reused Concepts (DC, RDF) November 26, 2013 10
The model in details -Observation, Measurement, Sample Taking LEGEND Proposed (New) Concepts Reused Concepts (CIDOC) Reused Concepts (DC, RDF) November 26, 2013 11
Sample Taking Measurement November 26, 2013 12
A General Scientific Observation Model November 26, 2013 13
Data Mappings to Conceptual Model The data providers relational data were informally mapped into our Conceptual Model Such mappings can be formally expressed using R2RML R2RML is a W3C Recommendation http://www.w3.org/tr/r2rml/ Enables the automatic creation and synchronization of the Linked Data w.r.t. the Relational Data Some mappings have already been expressed using R2RML November 26, 2013 14
GEUS data -> Conceptual Model Borehole and Location E55.Type Depth P2F.has_type E54.Dimension DRILLDEPTH P90F.has_value E60.Number E42.Identifier BOREHOLENO E41.Appelation BOREHOLENAME P1F.is_identified_by S16.Borehole P43F.has_ Dimension O9.contains_or_ confines S15.Acquifer_Concept INTAKE P1F.is_identified_ by E42.Identifier INTAKENO O7.consists_of E47.Spatial_Coordinates XUTM S17.Borehole_Collar P87F.is_identified_by E47.Spatial_Coordinates YUTM P2F.has_type E55.Type EPSG:32632 E47.Spatial_Coordinates LATITUDE E47.Spatial_Coordinates ELEVATION E47.Spatial_Coordinates LONGITUDE P2F.has_type E55.Type EPSG:4326 P87F.is_identified_by P87F.is_identified_by E47.Spatial_Coordinates XUTM32EUREF89 E47.Spatial_Coordinates YUTM32EUREF89 E47.Spatial_Coordinates ZDVR90 P2F.has_type P2F.has_type E55.Type UTM32EUREF89 E55.Type DVR90 November 26, 2013 15
GEUS data -> Conceptual Model Ground Water Sample E55.Type GrWater_Sampling P2F.has_type O9.contains_or_ confines E52.Time-Span SAMPLEDATE P4F.has_ time-span S2.Sample_Taking P16F.used_specific_ object S15.Acquifer_Concept INTAKE S16.Borehole O5.removed P1F.is_identified_ by E42.Identifier INTAKENO O9.contains_or_ confines P1F.is_identified_by O4.sampled_at P53F_has_former_or_ current_location E42.Identifier SAMPLEID S13.Sample E53.Place O12.upper_vertical_limit E47.Spatial_Coordinates TOP O13.lower_vertical_limit E47.Spatial_Coordinates BOTTOM November 26, 2013 16
GEUS data -> Conceptual Model Water Level Measurement E39.Actor CATEGORY E55.Type QUALITY P2F.has_type P1F.is_identified_by E42.Identifier WATLEVELNO E55.Type METHOD P14F.carried_out_by E16.Measurement P7F.took_place_at S16.Borehole P32F.used_general_technique P4F.has_time-span P16F.used_specific_ object S15.Acquifer_Concept INTAKE O9.contains_or_ confines P40F.observed_dimension E52.Time-Span TIMEOFMEAS E54.Dimension WATERLEVEL E54.Dimension WATLEVMSL E54.Dimension WATLEVGRSU E54.Dimension WATLEVMP P90F.has_value P90F.has_value P90F.has_value P90F.has_value E60.Number E60.Number E60.Number E60.Number P2F.has_type P2F.has_type P2F.has_type P2F.has_type E55.Type "Waterlevel E55.Type " Watlevmsl E55.Type " Watlevgrsu E55.Type " Watlevmp November 26, 2013 17
GEUS data -> Conceptual Model Ground Water Chemical Analysis P1F.is_identified_by E42.Identifier ANALYSISNO E55.Type Chemical_Analysis P2F.has_type E16.Measurement S13.Sample P39F.measured P40F.observed_ dimension E42.Identifier COMPOUNDNO E42.Identifier EUNO E42.Identifier CASNO P91F.has_unit E54.Dimension P1F.is_identified_by P90F.has_value P1F.is_identified_by P2F.has_type P1F.is_identified_by O15.is_bounded_by E58.Measurement_Unit UNIT E60.Number AMOUNT E55.Type Comp_Descr E60.Number DETECTIONLIMIT November 26, 2013 18
Conceptual Model and INSPIRE INSPIRE (Infrastructure for Spatial Information in Europe) covers 34 Spatial Data Themes. It uses ISO 19100 series standards; oriented to environmental and spatial data The Generic conceptual schema is not event-oriented (especially the schema that uses the observationmeasurement specification) Our data and INSPIRE Data are not INSPIRE-compliant but there is a specific INSPIRE-to-model mapping that enables making queries and presenting accordingly the results Exported Data are made INSPIRE-compliant. November 26, 2013 19
Why do not use INSPIRE directly? INSPIRE is not made for information integration INSPIRE lacks semantic clarity of concepts in various places, i.e. An observation is an act that results in the estimation of the value of a feature property Confusion of the action (event) aka observation with the result The observation itself is also a feature, since it has properties and identity. Observation is the value of a feature property and also a feature November 26, 2013 20
Geology core: borehole Why do not use INSPIRE directly? INSPIRE schemas Geosciml: borehole Borehole is used by 2 schemas structures and it has different attributes in each one
INSPIRE schemas How many observations definitions (structures) INSPIRE includes? 1) ISO 19156 Observations and Measurements/OM Observation 2) GML Observation 3) Observation OGC-WaterML:Observation Process, MeasurementTimeseries `and TimeseriesObservation: What is their relation with the other observation schemas(iso 19156)? 1) ISO 19156 2) GML 3) WaterML
Conceptual Model and INSPIRE November 26, 2013 23
Conceptual Model and INSPIRE Model Mapped to INSPIRE November 26, 2013 24
Linked Data Storage -Virtuoso Virtuoso Universal Service is a hybrid of DB engine & middleware combining the func. of a RDBMS, ODBMS, and a RDFStore Supports various Internet & Web protocols: HTTP(s), WebDav, SOAP, UDDI, SPARQL Implemented various data access APIs: ODBC, JDBC, OLE DB, ADO. NET Provides a Web-based DB Administration UI Provides an open-source version Can create a clustered-based system of RDFStores+ a non-free version available in the Cloud November 26, 2013 25
Linked Data Storage -Virtuoso November 26, 2013 26
Linked Data Storage -Virtuoso Reasons for selection: Scalable to handle millions of triples + exhibits a very good performance Offers API for RDF data management Supports SPARQL and SPARUL Supports 2-D features and offers spatial operators Supports R2RML and provides the respective relational-to- RDF data mapping mechanism Offers a cloud-based version November 26, 2013 27
Linked Data Storage -Virtuoso Scalability / Elasticity Planned at least 2 instances of Elastic LD Storage to be available in the platform to handle the query load (crossprovider queries are more demanding) The elastic capabilities of Amazon Cloud are exploited to increase/decrease the number of instances when the load surpasses or goes below a specific threshold Stress tests are currently conducted to determine these upper and lower thresholds November 26, 2013 28
Linked Data Storage and Querying Generated Linked Data for ground waters (EKBAA, BRGM, GEUS) Successfully inserted ~1B triples into Virtuoso Created specific queries For each partner s dataset Combining similar notions from cross provider datasets Relative small execution times depending on the number of fetched results + query complexity At the moment using only 1 instance in the Amazon Cloud Created the R2RML mappings for ground water datasets November 26, 2013 29
Conceptual Data Integration & Linking API Methods for: Querying data/metadata via SPARQL (both provider-specific & cross-provider queries are supported) Importing data/metadata in various formats (both blocking and non-blocking depending on the size of meta-data to be imported) Direct through respective API method Data provider should be responsible of synchronizing his/her relational and RDF data Indirect through the API method for the definition of relational-to-rdf mappings Apart from the definition of mappings, the data provider does not need to perform any kind of synchronization, as the system will be responsible for performing such a synchronization on his/her behalf Exporting data/metadata in various formats (inline in the response or available in a specific URL) Updating data/metadata via SPARUL Inserting/Defining relational-to-rdf data mappings via R2RML language November 26, 2013 30
Conceptual Data Integration & Linking API Different datasets A P I Query Update Import Export Different formats, models e.g. INSPIRE Virtuoso Data Integration Layer Query Engine Linked Data Conclusions Easy integration Easy querying Hide complexities Link your data Open your data Data for all Rel DB Rel DB Excel files External Resources XML files November 26, 2013 31
Demo Example Cross query over datasets Find the boreholes and their locations in all datasets if ground water samples where taken from them after the date 01/01/2005. November 26, 2013 32
Demo Example select?bhole?date?latitude?longitude where { { select?bhole?date?latitude?longitude where {?st sci:o5.removed?sample; crm:p4f.has_time-span?date; sci:o4.sampled_at?place; crm:p2f.has_type "GRWATER_SAMPLING".?bhole sci:o9.contains_or_confines?place; a sci:s16.borehole; sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_identified_by?latitude; crm:p87f.is_identified_by?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 } UNION { select?bhole?date?latitude?longitude where {?anal crm:p2f.has_type "QUALITY_MEASUREMENT"; crm:p4f.has_time-span?date; sci:o4.sampled_at?bhole;?bhole sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_identified_by?latitude; crm:p87f.is_identified_by?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 } Cont d November 26, 2013 UNION { select?bhole?date?latitude?longitude where {?st crm:p4f.has_time-span?date; sci:o4.sampled_at?bhole.?bhole a sci:s16.borehole; sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_identified_by?latitude; crm:p87f.is_identified_by?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 } } 33
Demo Example November 26, 2013 34
Demo Example November 26, 2013 35
Demo Session How to publish your data as Linked Open Data using GSOM? How to query your data? How to export in different formats? How to export your data as INSPIRE compliant data? 17:15 (45min) Room E Next Door Yannis Roussakis November 26, 2013 36
November 26, 2013 37 This content by the InGeoCloudS consortium members is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://www.ingeoclouds.eu/.