Cloud based platforms for Linking and Managing Geodata



Similar documents
Linked Data Management and INSPIRE Compliant Data Export

Cloud-based Linked Data Geoprocessing: Implementing Kriging as WPS on the Cloud

Cloud-based Infrastructures. Serving INSPIRE needs

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

InGeoCloudS: open Cloud-based services for Geospatial Data management in an INSPIRE context

Publishing Linked Data Requires More than Just Using a Tool

Chapter 3. Database Architectures and the Web Transparencies

bigdata Managing Scale in Ontological Systems

GetLOD - Linked Open Data and Spatial Data Infrastructures

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Introduction to Service Oriented Architectures (SOA)

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Lift your data hands on session

CHAPTER 8 CLOUD COMPUTING

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Cloud Computing and Government Services August 2013 Serdar Yümlü SAMPAŞ Information & Communication Systems

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

TECHNOLOGY GUIDE THREE. Emerging Types of Enterprise Computing

Semantic Interoperability

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Service Oriented Architecture

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

GIS Databases With focused on ArcSDE

Cloud on TIEN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat

Deploying a Geospatial Cloud

Grid Computing Vs. Cloud Computing

A Generic Database Web Service

Getting Familiar with Cloud Terminology. Cloud Dictionary

Emerging Trends in SDI.

So#ware to Data model

Middleware- Driven Mobile Applications

Preparing Your Data For Cloud

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Cloud Computing Standards: Overview and first achievements in ITU-T SG13.

How To Understand Cloud Computing

MOBILE APPLICATIONS AND CLOUD COMPUTING. Roberto Beraldi

GeoKettle: A powerful open source spatial ETL tool

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Chapter 2: Cloud Basics Chapter 3: Cloud Architecture

Interna'onal Standards Ac'vi'es on Cloud Security EVA KUIPER, CISA CISSP HP ENTERPRISE SECURITY SERVICES

Agenda. Background and cloud portability and interoperability concepts Distributed computing reference model. development Conclusions

Cluster, Grid, Cloud Concepts

Resource Oriented Architecture and REST

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing

Managing Large Imagery Databases via the Web

Cloud computing: the state of the art and challenges. Jānis Kampars Riga Technical University

Cloud Compu)ng in Educa)on and Research

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Cloud Computing. Cloud computing:

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

Clouds and Other Computa1onal Frameworks. Evere7 Toews, Cybera Inc. Todd King, UCLA

Industry 4.0 and Big Data

SOA and Cloud in practice - An Example Case Study

Service Oriented Architectures

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

資 料 倉 儲 (Data Warehousing)

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Pan-European infrastructure for management of marine and ocean geological and geophysical data

MarkLogic Enterprise Data Layer

MOBILE ARCHITECTURE FOR DYNAMIC GENERATION AND SCALABLE DISTRIBUTION OF SENSOR-BASED APPLICATIONS

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Cloud Computing & Service Oriented Architecture An Overview

Data Integration Checklist

IaaS Federation. Contrail project. IaaS Federation! Objectives and Challenges! & SLA management in Federations 5/23/11

2) Xen Hypervisor 3) UEC

Chapter 7. Using Hadoop Cluster and MapReduce

Portal Version 1 - User Manual

Standards for Big Data in the Cloud

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

SOA Myth or Reality??

Cloud Computing Standards: Overview and ITU-T positioning

CLOUD SECURITY SECURITY ASPECTS IN GEOSPATIAL CLOUD. Guided by Prof. S. K. Ghosh Presented by - Soumadip Biswas

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Open Innova*on Has Enabled Cloud Compu*ng

Fundamental Concepts and Models

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Choosing the right GIS framework for an informed Enterprise Web GIS Solution

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Quattra s Cloud Vision & Framework Value

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

SLA BASED SERVICE BROKERING IN INTERCLOUD ENVIRONMENTS

IAAA Grupo de Sistemas de Información Avanzados

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

CDI/THREDDS Interoperability: the SeaDataNet developments. P. Mazzetti 1,2, S. Nativi 1,2, 1. CNR-IMAA; 2. PIN-UNIFI

Transcription:

Cloud based platforms for Linking and Managing Geodata Dimitris Kotzinos ETIS Lab. (ENSEA/UCP/CNRS UMR 8051) & Department of Computer Science, University of Cergy - Pontoise

21st Century Science Theory Theory is augmented and explored through Computa8on Data IntegraOon & Hypotheses are discovered in Linking Data and drive Theory Data Storage Data Mining (Cloud) CompuOng Computa8ons generate Data Theory suggests hypotheses that are verified through Experiment Data Experiments generate Data Computation Computa8ons inform the design of Experiments Experiment & Observation 3

Key Challenges of Data Intensive Science(s) Large volumes High-throughput Relating and Linking Heterogeneity Complexity 4

Inspired GeoData Cloud Services The project wants to demonstrate that a Cloud infrastructure can be used by public organisa7ons to provide more efficient, scalable and flexible services for crea7ng, sharing and dissemina7ng spa7al environmental data

ID- Card of the Project Who? 5 Geological Surveys bringing in 6 inioal Use Cases (datasets and applicaoons) Ground Water Management Geo- Hazards: Landslides, Earthquakes GeoPublicaOon and Web Mapping made easy 3 ICT organizaoons bringing key- experose Cloud CompuOng GIS SemanOc Web and Linked Data So\ware architecture and integraoon EC Support

Why InGeoCloudS Some common characteris7cs about Geo- Applica7ons Usage The Digi;zed Earth: avalanche of data generated by various insotuoons all over Europe in numerous domains: Geography, Earth observaoon, Geology, Public AdministraOon, Private Geo- Companies Data intensity scenarios: storage, distribuoon, etc. Compu;ng- intensity scenarios: geo- processing on demand, procurement and release of resources Access- intensity scenarios: World- wide and/or concurrent access to informaoon in case of parocular events

Why InGeoCloudS Considering INSPIRE?

Why considering the Cloud? Promises of the Cloud Easy procurement (on- demand, self- service) TCO SimplificaOon and ReducOon Power supply, physical space, hardware, cables, network devices «Unlimited» provisioning of resources (up to your wallet depth) Rapidity versus classical ordering processes Technical staff now available for tasks with more added- value Security, Availability, Quality of Service, Performance Very performant (virtual) hardware available Broad / Ubiquitous network access Up- to- date so\ware pieces, OS Possible Scale- economy through pooling and share with peers (consolidaong different types of resources such as Database servers, applicaoon servers ) FacilitaOon for up- taking standards

ID- Card of the Project Achievements Fundamental scalable/elas;c services for data management: Database Server, File Server, Linked Data Store Geo Data publica;on services (SaaS mode) An API available publicly: RESTful Web Services upon a loose- coupled architecture Data providers data and applica;ons in the cloud Generic Services, Integra;on Portal and Management Tools

Pilot DescripOon A Portal

Pilot DescripOon Integrated Geo Applica7ons

the <anything>aas Trend IaaS Elements Servers Network Data storage AdministraOon APIs DaaS Elements GSOM LOD APIs Data storage, import, sync Meta- data descripoon, PaaS Elements AuthenOcaOon Roles implementaoon Workspaces, IntegraOon Portal RESTful APIs Basic Mapping Framework SaaS Elements Geo applicaoons (partner Use Cases), Geo PublicaOon, Catalogues Management, Opensearch Queries Geo- Processing

Pilot2 DescripOon The Whole Picture

Linked Data Management

Problems to tackle Data integraoon Data retrieval in a uniform way Data export to other formats November 25, 2014 16

The Web of data becomes a reality by connecong so far isolated islands or data silos such as Enterprise silos (Disparate DBMS Engines & Development Frameworks) Social Media silos (Social Networking Services, Discussion Forums, Blogs, Wikis etc.) ScienOfic silos (in biology, physics, earth sciences, etc.) Linked Open Data (LOD) is a way of publishing data on the (SemanOc) Web that: Encourages reuse Promotes its (real & potenoal) inter- connectedness Enables network effects to add value to data Informa;on Integra;on November 25, 2014 Linked Open Geodata Source: flickr hmp://www.flickr.com/photos/docsearls/5500714140/ 17

Extensible Applica;on Pool A P I Rel DB VisualizaOon Data IntegraOon Layer Rel DB Query Engine Excel files Linked Open Data as Service CollaboraOon Query Update Import Export Linked Data External Resources Cross Data sets' Querying XML files November 25, 2014 AbstracOon layer for data access abstract the applica7ons from the specific setup of the data management service (such as local vs. remote, federa7on, and distribu7on) Beyond Data Access Enabling automaoon of discovery, composioon, and use of datasets Data Markets Online VisualizaOon Services Data Publishing SoluOons Data Aggregators BI / AnalyOcs as a Service 18

Available Datasets ScienOfic data described in different formats and nooons from diverse areas ground waters, boreholes, chemical analyses, etc. GEUS, EKBAA, BRGM earthquakes, recordings, staoons, etc. EPPO Landslides, suscepobility maps, etc. GEO- ZS, EKBAA Purpose: integrate, describe and query such heterogeneous data in a uniform way November 25, 2014 19

Work Approach CreaOon of a Conceptual Model to integrate and cover all the themaoc fields Map the source relaoonal data into RDF data compliant to the Conceptual Model Provide unique URIs for data Rely on a scalable RDF Triple Store to enforce the mappings and enable the storage and query of the RDF data Provide an API for the RDF data management November 25, 2014 20

Models, Mappings and Integra;on Providers Data results transform GSOM mappings query tsv/csv xml json GeoJSON KML Shapefile. export xml trig trix n3 formats Inspire compliant 21 November 25, 2014

Conceptual Model Our model: needs to be a generic, extendable and interoperable, which relies not only to geographic data Integrates different datasets created from sources w.r.t. their semanocs developed for the documentaoon of scienofic observaoon measurement and sampling An event- centric model that captures complex contexts where Ome, place and parocipants are associated. Capable to describe complex rela;onships to integrate or connect rich informaoon November 25, 2014 22

GEUS data - > Conceptual Model Borehole and Loca;on E55.Type Depth P2F.has_type E54.Dimension DRILLDEPTH P90F.has_value E60.Number E42.Iden;fier BOREHOLENO E41.Appela;on BOREHOLENAME P1F.is_idenOfied_by S16.Borehole P43F.has_ Dimension O9.contains_or_ confines S15.Acquifer_Concept INTAKE P1F.is_idenOfied_ by E42.Iden;fier INTAKENO O7.consists_of S17.Borehole_Collar P87F.is_idenOfied_by E47.Spa;al_Coordinates XUTM E47.Spa;al_Coordinates YUTM P2F.has_type E55.Type EPSG:32632 E47.Spa;al_Coordinates LATITUDE E47.Spa;al_Coordinates ELEVATION E47.Spa;al_Coordinates LONGITUDE P2F.has_type E55.Type EPSG:4326 P87F.is_idenOfied_by P87F.is_idenOfied_by E47.Spa;al_Coordinates XUTM32EUREF89 E47.Spa;al_Coordinates YUTM32EUREF89 E47.Spa;al_Coordinates ZDVR90 P2F.has_type P2F.has_type E55.Type UTM32EUREF89 E55.Type DVR90 November 25, 2014 23

Data Mappings to Conceptual Model The data providers relaoonal data were informally mapped into our Conceptual Model Such mappings can be formally expressed using R2RML R2RML is a W3C RecommendaOon hmp://www.w3.org/tr/r2rml/ Enables the automaoc creaoon and synchronizaoon of the Linked Data w.r.t. the RelaOonal Data Some mappings have already been expressed using R2RML November 25, 2014 24

Conceptual Model and INSPIRE INSPIRE (Infrastructure for SpaOal InformaOon in Europe) covers 34 SpaOal Data Themes. It uses ISO 19100 series standards; oriented to environmental and spa;al data The Generic conceptual schema is not event- oriented (especially the schema that uses the observaoon- measurement specificaoon) Our data and INSPIRE Data are not INSPIRE- compliant but there is a specific INSPIRE- to- model mapping that enables making queries and presenong accordingly the results Exported Data are made INSPIRE- compliant. November 25, 2014 25

Why do not use INSPIRE directly? INSPIRE is not made for informaoon integraoon INSPIRE lacks semanoc clarity of concepts in various places, i.e. An observa7on is an act that results in the es8ma8on of the value of a feature property Confusion of the acoon (event) aka observaoon with the result The observa8on itself is also a feature, since it has proper8es and iden8ty. ObservaOon is the value of a feature property and also a feature November 25, 2014 26

INSPIRE schemas How many observations definitions (structures) INSPIRE includes? 1) ISO 19156 Observations and Measurements/OM Observation 2) GML Observation 3) Observation OGC-WaterML:Observation Process, MeasurementTimeseries `and TimeseriesObservation: What is their relation with the other observation schemas(iso 19156)? 1) ISO 19156 2) GML 3) WaterML

Conceptual Model and INSPIRE November 25, 2014 28

Data Exports Query results viewed as: RDF/XML JSON GML KML Shapefiles TransformaOon services to user provided conceptualizaoons Export to INSPIRE compliant XML formats November 25, 2014 29

Linked Data Storage - Virtuoso Virtuoso Universal Service is a hybrid of DB engine & middleware combining the func. of a RDBMS, ODBMS, and a RDFStore Supports various Internet & Web protocols: HTTP(s), WebDav, SOAP, UDDI, SPARQL Implemented various data access APIs: ODBC, JDBC, OLE DB, ADO. NET Provides a Web- based DB AdministraOon UI Provides an open- source version Can create a clustered- based system of RDFStores + a non- free version available in the Cloud November 25, 2014 30

Reasons for selecoon: Scalable to handle millions of triples + exhibits a very good performance Offers API for RDF data management Supports SPARQL and SPARUL Supports 2- D features and offers spaoal operators Supports R2RML and provides the respecove relaoonal- to- RDF data mapping mechanism Offers a cloud- based version Scalability / ElasOcity Linked Data Storage - Virtuoso Planned at least 2 instances of ElasOc LD Storage to be available in the plaxorm to handle the query load (cross- provider queries are more demanding) The elasoc capabilioes of Amazon Cloud are exploited to increase/decrease the number of instances when the load surpasses or goes below a specific threshold November 25, 2014 31

Linked Data Storage and Querying Generated Linked Data for ground waters (EKBAA, BRGM, GEUS), earthquakes, landslides Successfully inserted ~2B triples into Virtuoso Created specific queries For each partner s dataset Combining similar nooons from cross provider datasets RelaOve small execuoon Omes depending on the number of fetched results + query complexity At the moment using only 1 instance in the Amazon Cloud Created the R2RML mappings for ground water datasets November 25, 2014 32

Conceptual Data Integra;on & Linking API Methods for: Querying data/metadata via SPARQL (both provider- specific & cross- provider queries are supported) ImporOng data/metadata in various formats (both blocking and non- blocking depending on the size of meta- data to be imported) Direct through respecove API method Data provider should be responsible of synchronizing his/her relaoonal and RDF data Indirect through the API method for the definioon of relaoonal- to- RDF mappings Apart from the definioon of mappings, the data provider does not need to perform any kind of synchronizaoon, as the system will be responsible for performing such a synchronizaoon on his/her behalf ExporOng data/metadata in various formats (inline in the response or available in a specific URL) UpdaOng data/metadata via SPARUL InserOng/Defining relaoonal- to- RDF data mappings via R2RML language November 25, 2014 33

Example Cross query over datasets Find the boreholes and their locaoons in all datasets if ground water samples where taken from them a\er the date 01/01/2005. select?bhole?date?laotude?longitude where { { select?bhole?date?laotude?longitude where {?st sci:o5.removed?sample; } crm:p4f.has_ome- span?date; sci:o4.sampled_at?place; crm:p2f.has_type "GRWATER_SAMPLING".?bhole sci:o9.contains_or_confines?place; a sci:s16.borehole; sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_idenofied_by?laotude; crm:p87f.is_idenofied_by?longitude. filter (regex(?laotude, "LaOtude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01- 01"^^xsd:date). } limit 100 Cont d November 25, 2014 UNION { select?bhole?date?laotude?longitude where {?anal crm:p2f.has_type "QUALITY_MEASUREMENT"; crm:p4f.has_ome- span?date; sci:o4.sampled_at?bhole;?bhole sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_idenofied_by?laotude; crm:p87f.is_idenofied_by?longitude. filter (regex(?laotude, "LaOtude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01- 01"^^xsd:date). } limit 100 } UNION { select?bhole?date?laotude?longitude where {?st crm:p4f.has_ome- span?date; sci:o4.sampled_at?bhole.?bhole a sci:s16.borehole; sci:o7.consists_of?bcollar.?bcollar crm:p87f.is_idenofied_by?laotude; crm:p87f.is_idenofied_by?longitude. filter (regex(?laotude, "LaOtude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date). } limit 100 34

Example November 25, 2014 35

Querying Linked Data Store Geoprocessing implemented as a WPS service: refers to ordinary kriging interpolaoon WPS is using ElasOcWebServer and uses data: Given by the user / Fetched from InGeoCloudS RDF Triplestore

Data IntegraOon & Linking Prototype

Prototype Show benefits of LD technology Link in a meaningful and straighxorward way internal and external data obtained from web services: automaocally by the prototype & manually via the UI Develop a complete applicaoon which exhibits query, visualizaoon & linking funcoonality Indicate that such applicaoon can be easily built by exploiong the Linked Data Management API as well as web- based technologies (HTML and javascript)

Prototype Overview Available at: hmp://portal.ingeoclouds.eu/sitools/client- user/ DataProviderToolkit/project- index.html Two themes are considered: Earthquake management Landslide management External sources exploited: Earthquakes: Earthquake Data Portal ( hmp://www.seismicportal.eu/servicesdoc/search- event.html) Weather Info: Weather Underground (www.wunderground.com) Points of Interest: LinkedGeoData portal ( hmp://linkedgeodata.org/sparql) External services drawn either through SPARQL Queries (case of points of interest) or calls in REST- based web services (case of earthquakes and weather informaoon)

Technical Details HTML, & javascript used for the implementaoon plus parocular methods of the Linked Data Management API (ldquery, earthquakes, landslides, ldupdate) Google Maps API (javascript- based) exploited for map visualizaoon and loading of KML results External services drawn either through SPARQL Queries (case of points of interest) or calls in REST- based web services (case of earthquakes and weather informaoon)

Main Func;onality VisualizaOon of web form & map User fills in the fields for a parocular theme Default values are provided User is corrected when wrong values are given User presses the Submit bumon Depending on user choice on output format, either the map is updated with markers denoong the different types of (related) data/features or a specific textarea is completed with the resulong XML or JSON file By selecong two markers, the user can then link them by pressing the Link Resources bumon: Link properoes supported: rfds:seealso (see addioonal info), owl:sameas (markers represent equivalent informaoon), and sci:o21_triggers (one has triggered the other, e.g., severe weather condioons a landslide)

Concluding remarks

1 billion triples in the RDF triplestore published as Linked Open Data (LOD) Volume Datasets from 3 different scienofic fields Velocity Ground water (informaoon and chemical analyses) (slow change) Landslides (fast change) Earthquakes (real- Ome measurements) Datasets from 4 different countries Denmark (RelaOonal Data) France (RelaOonal Data) ID- Card of the Project (Big?) Data and the 5 Vs Greece (text/xls data, XML data) Slovenia (RelaOonal Data) Variety Veracity X Value?

Monitoring and Costs More Big Data? A key needs in order to master one s Cloud- based system. Two main levels in InGeoCloudS that help in bemer managing the cloud: ApplicaOons (analyocs about usage) Resources (supervision of indicators, alerts and problems management)

Challenges related to Big Data and the Cloud Scien;fic Challenges Big Data storage No naove RDF triplestore that exploits the Cloud capabilioes Data security / privacy / control also a technical issue, especially when we deal with big data that you cannot monitor precisely Querying Big Data Fast querying for big data (many Omes queries are simple but volume is big) Support for geospaoal queries (Fast) Indexing for (RDF) geospaoal data UpdaOng Big Data Consistency? (CAP theorem, eventually consistent databases) GeospaOal support in NoSQL databases? Exchanging Big Data Data providers want to keep their infrastructures & synchronize data between them &the cloud! Issues on synchronizaoon &exchange of big data Monitoring and Management on the Cloud More big data to manage?

Challenges related to Big Data and the Cloud Poli;cal Challenges User adopoon Users are happy with what they have and they paoently wait unol the problem surfaces Users with weak or no infrastructure are most recepove to turn to Cloud and/or Linked Data paradigms than others User see the cloud as a plaxorm with more resources (memory, storage, processing) but they soll want their applicaoons to run there unchanged Security, privacy and trust Public vs. Private Clouds Publicly owned vs. Company- owned Clouds

Acknowledgements FORTH- ICS : Yannis Roussakis, Kyriakos KriOkos, Athina Kritsotaki, MarOn Doerr AKKA Technologies: Benoit Baurens EKBAA: Makis, Atzemoglou, Elias Grinias 49

November 25, 2014 50