Linked Data Management and INSPIRE Compliant Data Export



Similar documents
Cloud based platforms for Linking and Managing Geodata

Cloud-based Linked Data Geoprocessing: Implementing Kriging as WPS on the Cloud

Cloud-based Infrastructures. Serving INSPIRE needs

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

GetLOD - Linked Open Data and Spatial Data Infrastructures

Publishing Linked Data Requires More than Just Using a Tool

InGeoCloudS: open Cloud-based services for Geospatial Data management in an INSPIRE context

Lift your data hands on session

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Portal Version 1 - User Manual

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Semantic Interoperability

Standards for Big Data in the Cloud

GIS Databases With focused on ArcSDE

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

STAR Semantic Technologies for Archaeological Resources.

STAR Semantic Technologies for Archaeological Resources.

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Following a guiding STAR? Latest EH work with, and plans for, Semantic Technologies

The ORIENTGATE data platform

GeoKettle: A powerful open source spatial ETL tool

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

MarkLogic Enterprise Data Layer

Data Integration Checklist

A Generic Database Web Service

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

XIII. Service Oriented Computing. Laurea Triennale in Informatica Corso di Ingegneria del Software I A.A. 2006/2007 Andrea Polini

There are various ways to find data using the Hennepin County GIS Open Data site:

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Linked Statistical Data Analysis

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability

A Web services solution for Work Management Operations. Venu Kanaparthy Dr. Charles O Hara, Ph. D. Abstract

Types and Annotations for CIDOC CRM Properties

RDF Dataset Management Framework for Data.go.th

Network Graph Databases, RDF, SPARQL, and SNA

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

How To Build A Cloud Based Intelligence System

Sisense. Product Highlights.

Service Oriented Architecture

Call for experts for INSPIRE maintenance & implementation

Concept for an Ontology Based Web GIS Information System for HiMAT

Introduction to Service Oriented Architectures (SOA)

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

E-Business Technologies for the Future

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

Deploying a Geospatial Cloud

Quattra s Cloud Vision & Framework Value

Evaluating SPARQL-to-SQL translation in ontop

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

CDI/THREDDS Interoperability: the SeaDataNet developments. P. Mazzetti 1,2, S. Nativi 1,2, 1. CNR-IMAA; 2. PIN-UNIFI

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Integration of location based services for Field support in CRM systems

Resource Oriented Architecture and REST

Geospatial Platforms For Enabling Workflows

ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

Fast Innovation requires Fast IT

Geospatial Platforms For Enabling Workflows

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

DISCOVERING RESUME INFORMATION USING LINKED DATA

Standardized data sharing through an open-source Spatial Data Infrastructure: the Afromaison project

Chapter 5. Learning Objectives. DW Development and ETL

Integrating data from The Perseus Project and Arachne using the CIDOC CRM An Examination from a Software Developer s Perspective

CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services Skilltop Technology Limited. All rights reserved.

Model-Driven Data Warehousing

How To Write A Blog Post On Globus

XpoLog Competitive Comparison Sheet

CRM dig : A generic digital provenance model for scientific observation

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Data Publishing with DaPaaS

Pan-European infrastructure for management of marine and ocean geological and geophysical data

ReportPortal Web Reporting for Microsoft SQL Server Analysis Services

Sextant. Spatial Data Infrastructure for Marine Environment. C. Satra Le Bris, E. Quimbert, M. Treguer

GEOGRAPHIC INFORMATION SYSTEMS CERTIFICATION

How To Write An Inspire Directive

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The Ontological Approach for SIEM Data Repository

Geodatabase Programming with SQL

Tayloring The CRM to Archaeological Requirements

Ernesto Ongaro BI Consultant February 19, The 5 Levels of Embedded BI

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Towards a Web of Sensors built with Linked Data and REST

Integrating Open Sources and Relational Data with SPARQL

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Transcription:

Linked Data Management and INSPIRE Compliant Data Export Dimitris Kotzinos FORTH-ICS & University of Cergy - Pontoise Yannis Roussakis, Kyriakos Kritikos, Athina Kritsotaki, Martin Doerr FORTH-ICS

Problems to tackle Data integration Data retrieval in a uniform way Data export to other formats November 26, 2013 2

The Web of databecomes a reality by connecting so far isolated islands or data silos such as Enterprise silos (Disparate DBMS Engines & Development Frameworks) Social Media silos (Social Networking Services, Discussion Forums, Blogs, Wikis etc.) Scientific silos (in biology, physics, earth sciences, etc.) Linked Open Data(LOD) is a way of publishing data on the (Semantic) Web that: Encourages reuse Promotes its (real & potential) inter-connectedness Enables network effects to add value to data Information Integration November 26, 2013 Linked Open Geodata Source: flickr http://www.flickr.com/photos/docsearls/5500714140/ 3

Linked Open Data as Service Extensible Application Pool Rel DB Visualization Data Integration Layer Rel DB Collaboration A P I Query Update Import Query Engine Linked Data Excel files External Resources Cross Data sets' Querying Export XML files Abstraction layer for data access abstract the applications fromthe specific setup of the datamanagement service (such as localvs. remote, federation, and distribution) Beyond Data Access November 26, 2013 Enabling automation of discovery, composition, and use of datasets Data Markets Online Visualization Services Data Publishing Solutions Data Aggregators BI / Analytics as a Service 4

Data Exports Query results viewed as: RDF/XML JSON GML KML Shapefiles Transformation services to user provided conceptualizations Export to INSPIRE compliant XML formats November 26, 2013 5

Introduction on Available Datasets in the project In this project we deal with scientific data described in different formats and notions from diverse areas ground waters, boreholes, chemical analyses, etc. GEUS, EKBAA, BRGM earthquakes, recordings, stations, etc. EPPO Landslides, susceptibility maps, etc. GEO-ZS, EKBAA Purpose: integrate, describe and query such heterogeneous data in a uniform way November 26, 2013 6

Available Datasets in the project Approach Creation of a Conceptual Model to integrate and cover all the thematic fields Map the source relational data into RDF data compliant to the Conceptual Model Provide unique URIs for data Rely on a scalable RDF Triple Store to enforce the mappings and enable the storage and query of the RDF data Provide an API for the RDF data management November 26, 2013 7

Models, Mappings and Integration Providers Data results transform GSOM mappings query tsv/csv xml json GeoJSON KML Shapefile. export xml trig trix n3 formats Inspire compliant InGeoCloudS Workshop, Brussels 21 November 2013 8 November 26, 2013

Conceptual Model Our model: needs to be a generic, extendable and interoperable, which relies not only to geographic data Integrates different datasets created from sources w.r.t. their semantics developed for the documentation of scientific observation measurement and sampling An event-centric model that captures complex contexts where time, place and participants are associated. Capable to describe complex relationships to integrate or connect rich information November 26, 2013 9

Basic Concepts E73 Information Object P67B is referred to by E1 CRM Entity P2 has type P1 is identified by E55 Type S21 Spatial Distribution Model P3 has note E41 Appellation E62 String S19 Observable Entity E52 Time-span S18 Map P4 has time-span E39 Actor E53 Place E42 Place Appellation P14 carried out by E7 Activity P7 took place at P16 used specific object P12B was present at E70 Thing P101 had as general use dc:coverage dc:date dc:subject dc:format dc:creator dc:locationperiodorjurisdiction rdf:literal LEGEND dc:mediatypeorextent Proposed (New) Concepts Reused Concepts (CIDOC) dc:agent Reused Concepts (DC, RDF) November 26, 2013 10

The model in details -Observation, Measurement, Sample Taking LEGEND Proposed (New) Concepts Reused Concepts (CIDOC) Reused Concepts (DC, RDF) November 26, 2013 11

Sample Taking Measurement November 26, 2013 12

A General Scientific Observation Model November 26, 2013 13

Data Mappings to Conceptual Model The data providers relational data were informally mapped into our Conceptual Model Such mappings can be formally expressed using R2RML R2RML is a W3C Recommendation http://www.w3.org/tr/r2rml/ Enables the automatic creation and synchronization of the Linked Data w.r.t. the Relational Data Some mappings have already been expressed using R2RML November 26, 2013 14

GEUS data -> Conceptual Model Borehole and Location E55.Type Depth P2F.has_type E54.Dimension DRILLDEPTH P90F.has_value E60.Number E42.Identifier BOREHOLENO E41.Appelation BOREHOLENAME P1F.is_identified_by S16.Borehole P43F.has_ Dimension O9.contains_or_ confines S15.Acquifer_Concept INTAKE P1F.is_identified_ by E42.Identifier INTAKENO O7.consists_of E47.Spatial_Coordinates XUTM S17.Borehole_Collar P87F.is_identified_by E47.Spatial_Coordinates YUTM P2F.has_type E55.Type EPSG:32632 E47.Spatial_Coordinates LATITUDE E47.Spatial_Coordinates ELEVATION E47.Spatial_Coordinates LONGITUDE P2F.has_type E55.Type EPSG:4326 P87F.is_identified_by P87F.is_identified_by E47.Spatial_Coordinates XUTM32EUREF89 E47.Spatial_Coordinates YUTM32EUREF89 E47.Spatial_Coordinates ZDVR90 P2F.has_type P2F.has_type E55.Type UTM32EUREF89 E55.Type DVR90 November 26, 2013 15

GEUS data -> Conceptual Model Ground Water Sample E55.Type GrWater_Sampling P2F.has_type O9.contains_or_ confines E52.Time-Span SAMPLEDATE P4F.has_ time-span S2.Sample_Taking P16F.used_specific_ object S15.Acquifer_Concept INTAKE S16.Borehole O5.removed P1F.is_identified_ by E42.Identifier INTAKENO O9.contains_or_ confines P1F.is_identified_by O4.sampled_at P53F_has_former_or_ current_location E42.Identifier SAMPLEID S13.Sample E53.Place O12.upper_vertical_limit E47.Spatial_Coordinates TOP O13.lower_vertical_limit E47.Spatial_Coordinates BOTTOM November 26, 2013 16

GEUS data -> Conceptual Model Water Level Measurement E39.Actor CATEGORY E55.Type QUALITY P2F.has_type P1F.is_identified_by E42.Identifier WATLEVELNO E55.Type METHOD P14F.carried_out_by E16.Measurement P7F.took_place_at S16.Borehole P32F.used_general_technique P4F.has_time-span P16F.used_specific_ object S15.Acquifer_Concept INTAKE O9.contains_or_ confines P40F.observed_dimension E52.Time-Span TIMEOFMEAS E54.Dimension WATERLEVEL E54.Dimension WATLEVMSL E54.Dimension WATLEVGRSU E54.Dimension WATLEVMP P90F.has_value P90F.has_value P90F.has_value P90F.has_value E60.Number E60.Number E60.Number E60.Number P2F.has_type P2F.has_type P2F.has_type P2F.has_type E55.Type "Waterlevel E55.Type " Watlevmsl E55.Type " Watlevgrsu E55.Type " Watlevmp November 26, 2013 17

GEUS data -> Conceptual Model Ground Water Chemical Analysis P1F.is_identified_by E42.Identifier ANALYSISNO E55.Type Chemical_Analysis P2F.has_type E16.Measurement S13.Sample P39F.measured P40F.observed_ dimension E42.Identifier COMPOUNDNO E42.Identifier EUNO E42.Identifier CASNO P91F.has_unit E54.Dimension P1F.is_identified_by P90F.has_value P1F.is_identified_by P2F.has_type P1F.is_identified_by O15.is_bounded_by E58.Measurement_Unit UNIT E60.Number AMOUNT E55.Type Comp_Descr E60.Number DETECTIONLIMIT November 26, 2013 18

Conceptual Model and INSPIRE INSPIRE (Infrastructure for Spatial Information in Europe) covers 34 Spatial Data Themes. It uses ISO 19100 series standards; oriented to environmental and spatial data The Generic conceptual schema is not event-oriented (especially the schema that uses the observationmeasurement specification) Our data and INSPIRE Data are not INSPIRE-compliant but there is a specific INSPIRE-to-model mapping that enables making queries and presenting accordingly the results Exported Data are made INSPIRE-compliant. November 26, 2013 19

Why do not use INSPIRE directly? INSPIRE is not made for information integration INSPIRE lacks semantic clarity of concepts in various places, i.e. An observation is an act that results in the estimation of the value of a feature property Confusion of the action (event) aka observation with the result The observation itself is also a feature, since it has properties and identity. Observation is the value of a feature property and also a feature November 26, 2013 20

Geology core: borehole Why do not use INSPIRE directly? INSPIRE schemas Geosciml: borehole Borehole is used by 2 schemas structures and it has different attributes in each one

INSPIRE schemas How many observations definitions (structures) INSPIRE includes? 1) ISO 19156 Observations and Measurements/OM Observation 2) GML Observation 3) Observation OGC-WaterML:Observation Process, MeasurementTimeseries `and TimeseriesObservation: What is their relation with the other observation schemas(iso 19156)? 1) ISO 19156 2) GML 3) WaterML

Conceptual Model and INSPIRE November 26, 2013 23

Conceptual Model and INSPIRE Model Mapped to INSPIRE November 26, 2013 24

Linked Data Storage -Virtuoso Virtuoso Universal Service is a hybrid of DB engine & middleware combining the func. of a RDBMS, ODBMS, and a RDFStore Supports various Internet & Web protocols: HTTP(s), WebDav, SOAP, UDDI, SPARQL Implemented various data access APIs: ODBC, JDBC, OLE DB, ADO. NET Provides a Web-based DB Administration UI Provides an open-source version Can create a clustered-based system of RDFStores+ a non-free version available in the Cloud November 26, 2013 25

Linked Data Storage -Virtuoso November 26, 2013 26

Linked Data Storage -Virtuoso Reasons for selection: Scalable to handle millions of triples + exhibits a very good performance Offers API for RDF data management Supports SPARQL and SPARUL Supports 2-D features and offers spatial operators Supports R2RML and provides the respective relational-to- RDF data mapping mechanism Offers a cloud-based version November 26, 2013 27

Linked Data Storage -Virtuoso Scalability / Elasticity Planned at least 2 instances of Elastic LD Storage to be available in the platform to handle the query load (crossprovider queries are more demanding) The elastic capabilities of Amazon Cloud are exploited to increase/decrease the number of instances when the load surpasses or goes below a specific threshold Stress tests are currently conducted to determine these upper and lower thresholds November 26, 2013 28

Linked Data Storage and Querying Generated Linked Data for ground waters (EKBAA, BRGM, GEUS) Successfully inserted ~1B triples into Virtuoso Created specific queries For each partner s dataset Combining similar notions from cross provider datasets Relative small execution times depending on the number of fetched results + query complexity At the moment using only 1 instance in the Amazon Cloud Created the R2RML mappings for ground water datasets November 26, 2013 29

Conceptual Data Integration & Linking API Methods for: Querying data/metadata via SPARQL (both provider-specific & cross-provider queries are supported) Importing data/metadata in various formats (both blocking and non-blocking depending on the size of meta-data to be imported) Direct through respective API method Data provider should be responsible of synchronizing his/her relational and RDF data Indirect through the API method for the definition of relational-to-rdf mappings Apart from the definition of mappings, the data provider does not need to perform any kind of synchronization, as the system will be responsible for performing such a synchronization on his/her behalf Exporting data/metadata in various formats (inline in the response or available in a specific URL) Updating data/metadata via SPARUL Inserting/Defining relational-to-rdf data mappings via R2RML language November 26, 2013 30

Conceptual Data Integration & Linking API Different datasets A P I Query Update Import Export Different formats, models e.g. INSPIRE Virtuoso Data Integration Layer Query Engine Linked Data Conclusions Easy integration Easy querying Hide complexities Link your data Open your data Data for all Rel DB Rel DB Excel files External Resources XML files November 26, 2013 31

Demo Example Cross query over datasets Find the boreholes and their locations in all datasets if ground water samples where taken from them after the date 01/01/2005. November 26, 2013 32

Demo Example select?bhole?date?latitude?longitude where { { select?bhole?date?latitude?longitude where {?st sci:o5.removed?sample; crm:p4f.has_time-span?date; sci:o4.sampled_at?place; crm:p2f.has_type "GRWATER_SAMPLING".?bhole sci:o9.contains_or_confines?place; a sci:s16.borehole; sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_identified_by?latitude; crm:p87f.is_identified_by?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 } UNION { select?bhole?date?latitude?longitude where {?anal crm:p2f.has_type "QUALITY_MEASUREMENT"; crm:p4f.has_time-span?date; sci:o4.sampled_at?bhole;?bhole sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_identified_by?latitude; crm:p87f.is_identified_by?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 } Cont d November 26, 2013 UNION { select?bhole?date?latitude?longitude where {?st crm:p4f.has_time-span?date; sci:o4.sampled_at?bhole.?bhole a sci:s16.borehole; sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_identified_by?latitude; crm:p87f.is_identified_by?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 } } 33

Demo Example November 26, 2013 34

Demo Example November 26, 2013 35

Demo Session How to publish your data as Linked Open Data using GSOM? How to query your data? How to export in different formats? How to export your data as INSPIRE compliant data? 17:15 (45min) Room E Next Door Yannis Roussakis November 26, 2013 36

November 26, 2013 37 This content by the InGeoCloudS consortium members is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://www.ingeoclouds.eu/.