THE EUROPEAN DATA PORTAL

Similar documents
Portal Version 1 - User Manual

INSPIRE Dashboard. Technical scenario

What s new in Carmenta Server 4.2

Standards for Big Data in the Cloud

Client Overview. Engagement Situation. Key Requirements

TENDER SPECIFICATIONS. Deployment of an EU Open Data core platform: implementation of the pan-european Open Data portal and related services

Sisense. Product Highlights.

An Esri White Paper June 2011 ArcGIS for INSPIRE

PROJECT WEBSITE PROJECT WEBSITE

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

D3.8 ADVANCED BENCHMARKING, REPORTING AND ANALYTICAL FEATURE DESCRIPTION

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

Mobile app for ios Version 1.10.x, August 2014

Corporate Bill Analyzer

TERMS OF REFERENCE. Revamping of GSS Website. GSS Information Technology Directorate Application and Database Section

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Analytics: Pharma Analytics (Siebel 7.8) Student Guide

ITP 342 Mobile App Development. APIs

Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software

How can public institutions benefit from automated translation infrastructure?

WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Mobile app for ios Version 1.11.x, December 2015

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Functional Requirements for Digital Asset Management Project version /30/2006

USING THE INTERNET TO MANAGE AND DISTRIBUTE GEOSPATIAL SUBMARINE CABLE DATA

ArcGIS Data Models Practical Templates for Implementing GIS Projects

elearning Content Management Middleware

API Architecture. for the Data Interoperability at OSU initiative

SQL Server 2012 Business Intelligence Boot Camp

Building native mobile apps for Digital Factory

Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW

Chapter 2 Database System Concepts and Architecture

Open Government Data Initiative. Dejan Cvetkovic Regional Technology Officer, Microsoft CEE Athens, Greece, December 1 st, 2011

Ecuadorian Geospatial Linked Data

OpenText Information Hub (ihub) 3.1 and 3.1.1

Department of the Interior Open Data FY14 Plan

Advanced Visualizations Tools for CERN Institutional Data

PLSAP CONNECTOR FOR TALEND USER MANUAL

Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software

SAP HANA Core Data Services (CDS) Reference

Figure 2: System Flow Diagram for Workflow Management

CDI/THREDDS Interoperability: the SeaDataNet developments. P. Mazzetti 1,2, S. Nativi 1,2, 1. CNR-IMAA; 2. PIN-UNIFI

Online Data Services. Security Guidelines. Online Data Services by Esri UK. Security Best Practice

HOW CAN I TRANSFORM AN ONLINE SERVICE TO TRULY MULTILINGUAL?

The truth about Drupal

WatchDox SharePoint Beta Guide. Application Version 1.0.0

GeoNetwork User Manual

Spectrum Technology Platform. Version 9.0. Administration Guide

Rotorcraft Health Management System (RHMS)

Delivering secure, real-time business insights for the Industrial world

Qlik Sense Enabling the New Enterprise

Elgg 1.8 Social Networking

HydroDesktop Overview

D5.5 Initial EDSA Data Management Plan

Web to Print Knowledge Experience. A Case Study of the Government of Hessen, Germany s Half-Time Report

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

SYNCSHIELD FEATURES. Preset a certain task to be executed. specific time.

Report. Architecture specification for data. workbench

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Performance Management Platform

Case Study EPA. Agency-Wide Governance of Reusable Components

Institute of Computational Modeling SB RAS

IBM WebSphere ILOG Rules for.net

Product Navigator User Guide

How To Help The European Single Market With Data And Information Technology

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

EMC Documentum Composer

EXPLORING AND SHARING GEOSPATIAL INFORMATION THROUGH MYGDI EXPLORER

Using standards for ocean data

SAP Data Services 4.X. An Enterprise Information management Solution

Technical. Overview. ~ a ~ irods version 4.x

Acronym: Data without Boundaries. Deliverable D12.1 (Database supporting the full metadata model)

OpenAIRE Research Data Management Briefing paper

PDOK Kaart, the Dutch Mapping API

MS 10978A Introduction to Azure for Developers

Website Usage Monitoring and Evaluation

Budget Event Management Design Document

SOA REFERENCE ARCHITECTURE: WEB TIER

ORACLE MOBILE SUITE. Complete Mobile Development Solution. Cross Device Solution. Shared Services Infrastructure for Mobility

SAS IT Resource Management 3.2

Social Sentiment Analysis Financial IndeXes ICT Grant: D3.1 Data Requirement Analysis and Data Management Plan V1

ObserveIT Service Desk Integration Guide

Assignment 5: Visualization

ORACLE MOBILE APPLICATION FRAMEWORK DATA SHEET

Enterprise Service Bus

Open EMS Suite. O&M Agent. Functional Overview Version 1.2. Nokia Siemens Networks 1 (18)

ANSYS EKM Overview. What is EKM?

Migrating to vcloud Automation Center 6.1

SAP Business One mobile app for ios. Version 1.9.x September 2013

European Public Sector Information Platform Topic Report No / 3. The amendment of the PSI directive: where are we heading?

PORTAL ADMINISTRATION

Embed BA into Web Applications

Course Summary. Prerequisites

COURSE SYLLABUS COURSE TITLE:

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Sophos Mobile Control Technical guide

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

ArcGIS. Server. A Complete and Integrated Server GIS

The Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform

Transcription:

European Public Sector Information Platform Topic Report No. 2016/03 UNDERSTANDING THE EUROPEAN DATA PORTAL Published: February 2016 1

Table of Contents Keywords... 3 Abstract/ Executive Summary... 3 Introduction... 4 The Architecture... 5 Access to the Portal... 5 Searching the Portal... 6 Geo spatial data... 6 Harvesting Data... 6 Enhancing quality... 7 Summary of Components... 8 About the Consortium... 10 Copyright information... 11 2

Keywords EDP; European Data Portal; pan-european; portal; open data; reuse Abstract/ Executive Summary This report presents the technical arquitecture of the European Data Portal. This portal, available since 16 November 2015, aims at being the new one-stop shop on PSI re-use in Europe, taking over all the promotion duties of epsi Platform. The European Data Portal includes information, tools, and a multilingual data catalog where all the open datasets from the EU Member States are being federated to create a single entry point for European-wide stakeholders who want to find and reuse PSI in Europe. 3

Introduction More and more volumes of data are published every day. The amount of data across the world is increasing exponentially. A substantial amount of this data is collected by the public sector. But for the data to be re-used, it needs to be accessible. The Beta version of the European Data Portal is available since 16 November 2015. The Portal harvests the metadata available on public data and geospatial portals across European countries. Portals can be national, regional, local or domain specific. They cover the 28 EU Member States, EFTA countries and countries involved in the EU's neighbourhood policy. But how does the Portal function? This factsheet provides a summary of the architecture of the European Data Portal. 4

The Architecture For a better understanding of the integration of the components into the overall architecture, each components functionality as well as interactions from different perspectives (user/system) are described below. The following figure provides a high-level overview diagram of the European Data Portal architecture. Third party portals / Experts Users / Experts API GUI Proxy Graphical Portal API / Portal Portal Portal Visualisation Usage tools (access, search, etc. API / Portal statistics - PIWIK) and caching Graphical Pre-processor API / Portal Portal API / Portal MQA Helpdesk (JIRA) map.apps backend FME Drupal SOLR CKAN Gazetteer Harvester (Transformer) Licensing Assistant Sync SPARQL Manager DCAT-AP Virtuoso Access to the Portal The access to the Portal is provided in two ways: a machine-readable API and a human readable web site Graphic User Interface or GUI. The API enables its users to search, create, modify and delete metadata on the portal. The GUI is basically built on two components: CKAN and DRUPAL. CKAN manages and provides metadata content (datasets) in a central repository. DRUPAL provides the Portal s Home Page with editorial content (e.g. Portal s objectives, articles, news, events, tweets, etc.) and links to an Adapt Framework-based training platform. In addition it offers extended functionalities to registered users via user login. Both systems are used in a side-by-side architecture. A proxy is responsible for delivering the 5

web pages requested by the user. Both systems are equally themed with the same Look&Feel so that the user is not aware on which system he/she is currently browsing. Searching the Portal The portal uses the SOLR search engine in order to separately search for editorial content in DRUPAL and for datasets in the CKAN repository. The GUI includes a Licensing Assistant component that supports the user by providing legal information on the usage of a specific dataset in terms of licenses that apply to the dataset. The SPARQL Manager component allows the user to enter and run SPARQL queries on the Virtuoso linked data repository. It also allows the logged-in user to store and re-run SPARQL queries and notifies the user when a query has finished running. Geo spatial data Using the map.apps backend application, Geo spatial data is visualised on geographical maps. The application is a proprietary solution that comes with different tooling and thematic focus, a graphical configuration interface, supports responsive web-design and internationalisation files. The application also implements the OSGI specification on the client side (in JavaScript) allowing sharing and re-usage of the bundled application logic as well as a straightforward maintenance. Statistical data that is linked to datasets can be visualised in tabular (tables) and graphical (charts) form using a D3.js library. Harvesting Data On the Harvesting side, the portal follows a two-fold architecture too. CKAN is used as the central metadata repository for storing, browsing and searching datasets in a POSTGRES relational database. In order to also support a linked data functionality the CKAN metadata is replicated into a Virtuoso quad store repository via a CKAN synchronisation extension, in order to ensure that both repositories have the same set of metadata. 6

The Harvester is a separate component that is able to harvest data from multiple data sources with different formats and APIs. The harvester is acting as a single point of entry for all metadata that gets harvested, transformed into the CKAN JSON schema and pushed into the CKAN repository. The Gazetteer component is used by the Harvester to enhance the metadata with geo-spatial data and information (geo-coordinates, names, places, etc.). The Gazetteer is mainly used to improve the search functionality. It uses the FME component as a universal spatial ETL (Extract- Transform-Load) tool that supports the accessing, processing and outputting of all spatial file/database formats and that is used for harvesting the sources for geographical names. Enhancing quality The Portal architecture includes three additional components to enhance the quality of the metadata and the portal. A Helpdesk handles user support requests and feedback. The Metadata Quality Assistant (MQA) periodically generates reports on the quality of the harvested metadata and creates tickets for the Helpdesk in case of harvesting issues. The third component is the monitoring component based on PIWIK and located at the Proxy in the architecture. In the full respect of data privacy, it records requests and user interactions on the portal in order to generate anonymised user traffic statistics that will help enhancing the usage of the Portal. 7

Summary of Components COMPONENT API Access GUI Access CKAN DRUPAL ECAS Adapt Framework Proxy SOLR Portal Search (editorial data) SOLR Dataset Search (metadata) Licensing Assistant SPARQL Manager Virtuoso map.apps: Geospatial Data Visualisation Graphical Data Visualisation Pre-processor Harvester Gazetteer FME smart.finder Helpdesk Metadata Quality Assistant (MQA) DESCRIPTION Machine-readable (SOAP / REST) API Portal website graphical user interface Portal s central metadata (dataset) repository Portal s Home Page managing editorial content European Commission Authentication System used for user registration and login in order to provide extended functionalities of the Portal Online platform used for Portal Training Modules (available in EN + FR) Routing of (HTTP(S)) user requests to corresponding components Search engine used for searching portal editorial content Search engine used for searching and filtering datasets in the CKAN metadata repository Component to provide legal information on (re-)usage of specific datasets SPARQL query editor allowing to run SPARQL queries on linked data in the Virtuoso repository Linked data quadruple store that is synchronized with the CKAN repository Proprietary application to visualize geo-spatial data and information using geo-maps. It comes with different tooling and thematic focus, a graphical configuration interface, supports responsive web-design, i18n internationalization files, client side implementation of the OSGI specification (JavaScript) Recline.js/D3.js JavaScript libraries to visualize (statistical) data in tables and graphical charts RESTFUL web API, running on a Node.js server which analyzes and transforms XSL/XLSX files into CSV format in order to be used from the Visualisation tool Single entry point component for harvesting data from multiple data sources in different formats and from different APIs Component providing geo-spatial data and information Component used by the Gazetteer as a universal spatial ETL tool (Extract- Transform-Load) that supports the accessing, processing and outputting of all spatial file / database formats and that is used for harvesting the sources for geographical names Component used by the Gazetteer and simplifying searches for spatial data, services and documents. It enables fast and structured access to extensive, distributed and heterogeneous data stores Portal offers a user request/feedback form via the JIRA-API and generates JIRA tickets for follow-up by helpdesk Component to report on the quality of the harvested metadata and to alert helpdesk in case of issues 8

Monitoring Multilingual Support MT@EC PIWIK component that provides Traffic Analytics of portal usage Web pages + core editorial content + dataset descriptions available in all 24 official EU languages Machine Translation Services of the European Commission used for translation of the metadata into all of the supported languages by the portal For more information, please visit the European Data Portal or contact via email. The European Data Portal initial content has been collected by harvesting national public data and geospatial portals. Progressively, the portal will harvest additional metadata collected from regional, local and domain specific portals. Do you want your portal or website to be harvested by the European Data Portal? Read the requirements. Share your story about how you make use of Open Data. Are you an entrepreneur? A nongovernmental organisation? A civil servant responsible for publishing data? A local authority? Tell us your story! The purpose of the collection of use cases is to assemble interesting European stories about the benefits and efficiency gains that result from the use of Open Data. 9

About the Consortium www.capgemini-consulting.com www.opendatainstitute.org www.intrasoft-intl.com www.timelex.eu www.sogeti.com www.southampton.ac.uk www.conterra.de www.fokus.fraunhofer.de 10

Copyright information 2016 European PSI Platform This document and all material therein has been compiled with great care. However, the author, editor and/or publisher and/or any party within the European PSI Platform or its predecessor projects the epsiplus Network project or epsinet consortium cannot be held liable in any way for the consequences of using the content of this document and/or any material referenced therein. This report has been published under the auspices of the European Public Sector information Platform. The report may be reproduced providing acknowledgement is made to the European Public Sector Information (PSI) Platform. 11