Publishing Census Data as Linked Open Data



Similar documents
Open Data Integration Using SPARQL and SPIN

Publishing Linked Data Requires More than Just Using a Tool

Efficient SPARQL-to-SQL Translation using R2RML to manage

GetLOD - Linked Open Data and Spatial Data Infrastructures

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Semantic Interoperability

An Enhanced Visualization Service based on Geospatial and Statistical Linked Open Data

Linked Statistical Data Analysis

Geospatial Platforms For Enabling Workflows

Proceedings of the SPDECE Ninth nultidisciplinary symposium on the design and evaluation of digital content for education

Relational Database to RDF Mapping Patterns

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Geospatial Platforms For Enabling Workflows

We have big data, but we need big knowledge

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Introduction to Ontologies

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

Heterogeneous databases mediation

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

RDF Dataset Management Framework for Data.go.th

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

LINKED OPEN DRUG DATA FROM THE HEALTH INSURANCE FUND OF MACEDONIA

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Graph Database Performance: An Oracle Perspective

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

Visual Analysis of Statistical Data on Maps using Linked Open Data

Mining Big Data with RDF Graph Technology:

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

A collaborative platform for knowledge management

Lift your data hands on session

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

An industry perspective on deployed semantic interoperability solutions

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

EAC-CPF Ontology and Linked Archival Data

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

spatialite_gui v a quick tutorial

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

IAAA Grupo de Sistemas de Información Avanzados

STAR Semantic Technologies for Archaeological Resources.

Service Computing: Basics Monica Scannapieco

Statistical Metadata System based on SDMX

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Semantic ETL from structured data sources Report v.1

An Ontological Approach to Oracle BPM

Databases in Organizations

Towards the Integration of a Research Group Website into the Web of Data

Geospatial Information in the Statistical Business Cycle 1

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

Publishing Relational Databases as Linked Data

Toward a framework for statistical data integration

The use of Semantic Web Technologies in Spatial Decision Support Systems

Mining the Web of Linked Data with RapidMiner

Comparison of Triple Stores

Evaluating SPARQL-to-SQL translation in ontop

How To Build A Cloud Based Intelligence System

Data Publishing with DaPaaS

A generic approach for data integration using RDF, OWL and XML

Data Integration and Fusion using RDF

DISIT Lab, competence and project idea on bigdata. reasoning

DISCOVERING RESUME INFORMATION USING LINKED DATA

From Open Data & Linked Data to Ontology. example:

MarkLogic Semantics in Healthcare and Life Sciences for LIDER COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Presente e futuro del Web Semantico

DDI Lifecycle: Moving Forward Status of the Development of DDI 4. Joachim Wackerow Technical Committee, DDI Alliance

Transcription:

Publishing Census Data as Linked Open Data Monica Scannapieco, R. M. Aracri, S. De Francisci, A. Pagano, L. Tosco, L. Valentino Istituto Nazionale di Statistica ISTAT

Official Statistics & Data Dissemination Official statistics provide an indispensable element in the information system of a democratic society, serving the Government, the economy and the public with data about the economic, demographic, social and environmental situation. [UN Statistical Division - Fundamental Principles of Official Statistics, Principle 1] Data dissemination is a fundamental phase of statistical production processes Monica Scannapieco, LOD, Rome, 20-21/02/2014 2

Data Dissemination: Models Data and metadata standardization in the statistical domain: Neuchâtel model: 10-years work on a common language and a common perception of the structure of classifications and the links between them GSIM (Generic Statistical Information Model): reference framework of internationally agreed definitions, attributes and relationships that describe the pieces of information that are used in the production of official statistics (information objects) SDMX (Statistical Data and Metadata Exchange): ISO international standard, based on XML, available since 2001 DDI (Document Data Initiative), based on XML, supports the entire research data life cycle (SDMX is mainly oriented to data dissemination) Monica Scannapieco, LOD, Rome, 20-21/02/2014 3

Istat Data Dissemination Istat dissemination architecture based on SDMX: Compliant to Eurostat SDMX Reference Infrastructure SDMX download of data available on Web Warehouse I.stat (http://dati.istat.it) SEP (Single Exit Point) for SDMX-based machine-to-machine communication Need to broaden the dissemination to nonstatistical/non-sdmx users In 2012, the IS-LOD (Istat LOD) project started! ICT Directorate Monica Scannapieco, LOD, Rome, 20-21/02/2014 4

The IS-LOD Project Experimental Projects Production Projects Design Production Projects Implementation [2012] [Jan-June 2013] [July 2013- On-going] Production projects: SDMX-to-DataCubeVocabulary Translator to be integrated with SEP under a Eurostat grant Official Classifications in LOD, jointly with the Italian Agency for IT (Agenzia per l Italia Digitale) Census LOD: Population Census Data in LOD Monica Scannapieco, LOD, Rome, 20-21/02/2014 5

Census-LOD: Data Description Censpop dataset: describing the population Census indicators, at the territorial level of Census section Published in the past as CSV files or as XLS files (http://www.istat.it/it/archivio/104317 ) Territory dataset :describing the Italian territorial features from both administrative and geographical perspectives Street dataset: describing streets with their denominations, civic numbers, etc. Monica Scannapieco, LOD, Rome, 20-21/02/2014 6

street territory censpop COD REG Census-LOD: Data Example COD COD PROVI COMU PRO_ NCIA NE COM SEZ2001 ID ID_IN DIRIZ ZO DENO M_TIP O_DU G 1 5 5 5005 50050000001 1 27729 Corso 1 5 5 5005 50050000001 1 26278 Corso 1 5 5 5005 50050000001 1 27730 Galleria 1 5 5 5005 50050000001 1 27731 Galleria 1 5 5 5005 50050000343 343 28 Strada ESPO NENT TOPONIMO CIVICO E VITTORIO ALFIERI 238 A SNC Asti VITTORIO ALFIERI 240 Asti DEI MERCANTI 0 SNC Asti DEI MERCANTI 0 SNC 1 Asti ABAZIA DEGLI APOSTOLI 7 Asti 1 5 5 5005 50050000001 1 12492 Piazza ITALIA 44 Asti 1 5 5 5005 50050000001 1 27237 Piazza MILENA 0 SNC Asti COD_REG COD_PRO COD_ISTAT PRO_COM NOME DENOM COMUNE ALTITUDINE MINIMA DENOM REGIONE PIEMONTE - VALLE D'AOSTA PIEMONTE - VALLE D'AOSTA PIEMONTE - VALLE D'AOSTA PIEMONTE - VALLE D'AOSTA PIEMONTE - VALLE D'AOSTA PIEMONTE - VALLE D'AOSTA PIEMONTE - VALLE D'AOSTA ALTITUDINE MASSIMA 1 5 1005005 5005 Asti 110 295 3 13 3013004 13004 Albese con Cassano 370 1270 5 26 5026052 26052 Ormelle 11 22 3 97 3097001 97001 Abbadia Lariana 199 1700 8 99 8099019 99019 Torriana 78 455 COD_PRO COD_COM PRO_COM SEZ2001 SEZIONE P1 P2 P3 P4 P5 P6 P7 5 1 5001 50010000005 5 9 6 3 3 4 0 2 5 5 5005 50050000343 343 34 17 17 12 15 2 5 5 118 5118 51180000013 13 13 7 6 5 5 1 1 5 120 5120 51200000001 1 292 141 151 104 133 7 45 5 121 5121 51210000037 37 23 11 12 10 8 0 4 Monica Scannapieco, LOD, Rome, 20-21/02/2014 7

Census-LOD: Data Size How many data are involved? 402.903 Cenus Sections 74.482 Localities 2.200 Census Areas 3.631 Geomorphological entities And others classes 43 indicators for each entity: Resident Population Males Resident Population age > 74 years Foreigners and stateless persons resident in Italy Males Monica Scannapieco, LOD, Rome, 20-21/02/2014 8

Census-LOD: Test Workflow Test project as a first step Implemented in Datalift (http://datalift.org/), platform including several tools supporting the whole datasets publication process The workflow produced as a result of this phase followed (part of) the process expected by the usage of this platform, namely: 1. Loading the datasets from CSV files into the platform 2. Loading the ontologies modeled as OWL ontologies into the platform 3. Direct mapping 4. URI Policy Design 5. RDF triples generation 6. Linking among datasets 7. Publishing 8. Applications and Visualization Monica Scannapieco, LOD, Rome, 20-21/02/2014 9

Census LOD: Implementation Issues Issues: Large amount of data Complex Ontology Annotations required for all variables (Dissemination Database) Activities in progress: New platform definition with RDF graph store that can scale up to billions of triples, supporting bulk and incremental load Use of a «general purpose mapping language»: R2RML (RDB to RDF Mapping Language) Monica Scannapieco, LOD, Rome, 20-21/02/2014 10

Census-LOD: Production Workflow Ontologies Design.csv Ontologies Publish RDBMS Mapping R2RML Reasoning & Inferencing GUI Design and Implementation Monica Scannapieco, LOD, Rome, 20-21/02/2014 11

Mapping Examples Example D2RQ Mapping @prefix map: <#>. @prefix ter: <http://rdf.istat.it/ter/>. @prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/0.1#>. map:zonaincontestazione a d2rq:classmap; d2rq:datastorage map:database; d2rq:uripattern "ter/zonaincontestazione/@@zone_in_contestazione.cod_zona_c urlify@@"; d2rq:class ter:zonaincontestazione; d2rq:class ter:areaspeciale; d2rq:classdefinitionlabel "Zone in contestazione"; map:contestatoda a d2rq:propertybridge; d2rq:belongstoclassmap map:zonaincontestazione; d2rq:property ter:contestatoda; d2rq:propertydefinitionlabel "Codice Comune contestatario"; d2rq:column "ZONE_IN_CONTESTAZIONE.PRO_COM";. Example R2RML mapping @prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns#>. @prefix ter: <http://rdf.istat.it/ter/>. <#TriplesMapZonaInContestazione> rr:logicaltable [ rr:tablename "ZONE_IN_CONTESTAZIONE" ]; rr:subjectmap [ rr:template "http://dati.istat.it/ter/zonaincontestazione/{cod_zona_c}"; rr:class ter:zonaincontestazione; rr:class ter:areaspeciale; ]; rr:predicateobjectmap [ rr:predicate ter:contestatoda; rr:objectmap [ rr:column "PRO_COM" ]; ];. Result (Turtle) <http://dati.istat.it/ter/zonaincontestazione/5> a ter:zonaincontestazione, ter:areaspeciale ; ter:contestatoda "96001", "2066" ; ter:nomeareaspeciale "Regione Folla". Mapping of «Area in Dispute» to the corresponding subject with predicate «DisputedBy» and object «Municipaliy» 12

Ontologies (1) Two distinct Ontologies (so far): Territorial Ontology Census Data Ontology Common features: OWL Ontologies Use of Meta Ontologies: SKOS: skos:concept, ADMS: adms:assetrepository, Data Cube Vocabulary: qb:dataset, qb:observation, PROV: prov:wasgeneratedby, GeoNames: gn:name, gn:countrycode, gn:parentcountry, Monica Scannapieco, LOD, Rome, 20-21/02/2014 13

Ontologies (2) Territorial Ontology Description of principal classes of the domain, as: Administrative Region Province Municipality Geographical- Statistical Location Census Section Special Areas Contested Zone Administrative Island Special Units Abbey Hospital Climatic Colony Monica Scannapieco, LOD, Rome, 20-21/02/2014 14

Ontologies (3) Census Data Ontology Use of RDF Data Cube Vocabulary that allows to publish multi-dimensional data DIMENSIONS - Sex - Age - Marital Status MEASURE - Resident Population - Number of dwellings DIMENSIONS - Construction Period - Intended Use - Number of floors Monica Scannapieco, LOD, Rome, 20-21/02/2014 15

Certifying Istat Data Istat data are the results of established methodological procedures: Official Statistics has a precise meaning in terms of quality and trust of the statistical information product We used the W3C PROV Ontology as a structured description of the provenance of the data we intend to publish Where data come from Official data sources according to European and National regulation Domain standard conformance (e.g., variant and version of a statistical classification) Monica Scannapieco, LOD, Rome, 20-21/02/2014 16

Platform Requirements Oracle D2RQ Virtuoso Open Source edition DataLift + Sesame Ontology Data Mapping Storing RDF Triples YES (R2RML) Yes (billions of triples) YES (proprietary & R2RML) NO (mapping on-demand with relational db) YES (proprietary & part of R2RML) Yes Yes (direct mapping) Yes (small triplestore) Querying/ Reasoning YES YES YES YES SPARQL Endpoint NO YES YES YES Scalability YES Depends on the used db? NO Integration with Istat Environment YES NO NO NO Monica Scannapieco, LOD, Rome, 20-21/02/2014 17

Concluding Remarks Cens-LOD is the first production process that deploys Istat data on an Istat SPARQL Endpoint 2014: Publication of CensPop and Territory 2015: Addresses LOD-based data dissemination will allow: Machine-to-machine data provisioning by Istat (currently only SDMX datasets via SEP) Widening the range of Istat data users Improving efficiency of data exchange flows with Italian administrations and much more! Monica Scannapieco, LOD, Rome, 20-21/02/2014 18