Epimorphics Linked Data Publishing Platform



Similar documents
Spektrix Service Definition

Service Description Archive Storage in the Cloud

DataCentred Cloud Storage

Agilisys G-Cloud Service V

Involve Cloud Video Conferencing Service. VC:me (Video Conferencing: made easy) Service Definition

ArcGIS Online School Locator

Technical Overview Simple, Scalable, Object Storage Software

UDiMan. Introduction. Benefits: Name: UDiMan Identity Management service. Service Type: Software as a Service (SaaS Lot 3)

Application Management. Lot 4 - Specialist Cloud Services. Version: 3.0, Issue Date: 05/02/2014. Classification: Open

Lot 4 Specialist Cloud Service Questmark Ltd. Video Conferencing Small Meeting Room Service

Service Definition Easysite Web CMS

Open Source Sales Force Automation (SFA) in the Cloud SaaS

Backup to the Cloud Service Definition

Data Services as a Service for the G-Cloud

IBM Web Server as a Service

Business Intelligence as a Service for the G-Cloud

Vodafone secure mail services

Service Definition Nine23 MDM

WebFOCUS Cloud Express. The WebFOCUS Cloud Express service is delivered as a managed G-Cloud service by Amtex Solutions Ltd.

Vodafone Private Cloud

HadoopRDF : A Scalable RDF Data Analysis System

Operational Risk Management G-Cloud 7 Service Definition

Service Description Cloud Storage Openstack Swift

Carers Assessment. SaaS Product

Open Source Server Product Description

End-User Remote Support and Helpdesk Services

IBM Database as a Service

OpenStack Private Cloud Hosting in an Tier 3 Data Centre. G-Cloud Lot 1 IaaS

Amazon Relational Database Service (RDS)

How to Set Up Your Virtual Server infrastructure

Online Backup Service Definition

THOMSON REUTERS C-TRACK E-FILING SOFTWARE AS A SERVICE SERVICE DEFINITION FOR G-CLOUD 6

AOL CUSTOMER SUCCESS STORY

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

G-Cloud Custom Enterprise Mobile Applications Service Definition

Graphical Applications in the Cloud. Lot 2 - Platform as a Service. Version: 4.0, Issue Date: 05/02/2014. Classification: Open

The IaaS Server On Boarding Process

Cloud Storage and Backup

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

PAAS Public Sector Managed Services

IBM Websphere Application Server as a Service

IBM G-Cloud Application Systems Management as a Service

Software as a Service (SaaS) Online HR

Documentum Document Management in the Cloud Service Definition

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

TECHNOLOGY WHITE PAPER Jun 2012

ATTERCOPIA MANAGED HOSTING & DOMAIN SERVICES TERMS & CONDITIONS

The deployment of OHMS TM. in private cloud

Search and Real-Time Analytics on Big Data

dxw s WordPress Platform

SERVICE DEFINITION DOCUMENT MANAGEMENT IN THE CLOUD

BUILDING HIGH-AVAILABILITY SERVICES IN JAVA

Mule Enterprise Service Bus (ESB) Hosting

Bramble.cc Konetic - Applicant Tracking/eRecruitment

Integrated windows authentication for customers based on Probation GSI network

How To Set Up Wiremock In Anhtml.Com On A Testnet On A Linux Server On A Microsoft Powerbook 2.5 (Powerbook) On A Powerbook 1.5 On A Macbook 2 (Powerbooks)

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Secure Remote Access. Lot 4 - Specialist Cloud Services. Version: 2.0, Issue Date: 05/02/2014. Classification: Open

Alfresco Enterprise on AWS: Reference Architecture

Vodafone Cloud Storage

A Performance Analysis of Distributed Indexing using Terrier

Data Backup and Restore (DBR) Overview Detailed Description Pricing... 5 SLAs... 5 Service Matrix Service Description

SysAid Cloud Architecture Including Security and Disaster Recovery Plan

Data Protection Act Guidance on the use of cloud computing

Incident Report CUBE DS 1477 drive failure

Postgres Plus Cloud Database!

Service Definition MMaaS Mobile Device Management. G- Cloud VII. Service Definition Nine23 MMaaS Mobile Device Management

Solution Overview. Our Solution employs two tiers of storage aligning costs of storage with the changing value of data over time.

Why back up the Cloud?

An overview of Electronic Medical Records as a Service

GPG13 Protective Monitoring. Service Definition

END-USER REMOTE SUPPORT AND HELPDESK SERVICES SERVICE DEFINITION

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

SmartImpact MS Dynamics CRM. Support Service Definition

Alfresco Enterprise on Azure: Reference Architecture. September 2014

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Hosted Exchange Service

Web Application Hosting Cloud Architecture

Vodafone Primary Storage NAS

Amazon Compute - EC2 and Related Services

Legalesign Service Definition Electronic signature and contract management service

Scalable Architecture on Amazon AWS Cloud

Scalable Application. Mikalai Alimenkou

SFW CRM for Stakeholders - MS Dynamics CRM

Managed Backup. Lot 4 - Specialist Cloud Services. Version: 3.0, Issue Date: 05/02/2014. Classification: Open

Apache Hadoop. Alexandru Costan

Deploying Microsoft SharePoint Services with Stingray Traffic Manager DEPLOYMENT GUIDE

Building Success on Acquia Cloud:

Involve Visual Collaboration Ltd Cloud Based Videoconferencing Services

IBM G-Cloud Microsoft Windows Active Directory as a Service

ArcGIS 10.3 Server on Amazon Web Services

TECHNOLOGY WHITE PAPER Jan 2016

High-Availability in the Cloud Architectural Best Practices

Impact Level HootSuite does not yet have an Impact Level accreditation, however if we were to apply we believe we would be at the IL3 level.

Capito. G- Cloud 6. REFERENCE NUMBER RM1557vi. Service Definition Document. Secure Archiving

Service Definition - HR and Payroll Solutions

Moodle & Totara Learning Management Systems Service Description G-Cloud 7

24/7 Monitoring Pro-Active Support High Availability Hardware & Software Helpdesk. itg CloudBase

Transcription:

Epimorphics Linked Data Publishing Platform Epimorphics Services for G-Cloud Version 1.2 15 th December 2014 Authors: Contributors: Review: Andy Seaborne, Martin Merry Dave Reynolds Epimorphics Ltd, 2013

1 Overview The Epimorphics Linked Data Publishing Platform is a resilient, scalable, cloud-based solution for publishing linked data. It is widely used for publishing linked data on data.gov.uk, including data at environment.data.gov.uk, location.data.gov.uk, landregistry.data.gov.uk, and many others. We offer the platform as a fully hosted and managed service for publishing linked data; in addition we can install the platform on a client s own infrastructure. The prices quoted in this document assume we are providing a hosted service on top of Amazon Web Services. An instance of the platform runs on a cluster of dedicated machines for each client. The platform includes A Linked Data API engine, providing access to the data in a number of developer-friendly formats as well as human-readable web pages Customisable text search A triple store for storing data as RDF A fully SPARQL 1.1-compliant endpoint A scale-out, fault tolerant runtime platform An upload manager, to enable clients to load their own data Optionally we can provide additional upload mechanisms which will integrate with clients existing workflows to support business as usual publication of linked data. The platform is customisable and can also be used to host applications running on top of the data. We offer consultancy and application development services to support the development of such applications; see our G-Cloud services Linked Data Modelling and Consultancy and Linked Data Application Development for further details. We also offer training courses for people wishing to develop their own skills in linked data publishing see our G-Cloud service Linked Data Training. In addition to the full platform we also offer an entry-level system for people wishing to start linked data publication. The entry level platform runs on a single dedicated machine, so is neither faulttolerant nor scalable, and will need to be taken out of service during scheduled maintenance. It only provides limited management information. Our hosting service includes support during UK business hours. 1

2 Platform Details The Epimorphics Linked Data Platform is used by the Environment Agency, Land Registry as well as commercial customers for linked data publication. It consists of: A Linked Data API engine, provided by Epimorphics implementation of the LDA, ELDA Text search, provided by Apache Solr 4 A fully compliant SPARQL 1.1, provided by Apache Jena ARQ A scale-out, fault-tolerant runtime platform, hosted in Amazon Web Services An update controller for managing coordinated updates to replicated services The platform architecture has 3 main tiers: load balancing and routing, application services and storage. Linked Data API Engine Platform Architecture The Linked Data API is a specification commissioned by the Cabinet Office and co-developed by Epimorphics, to provide web developer-friendly access to linked data (http://code.google.com/p/linked-data-api/wiki/specification). It enables developers to consume linked data in a variety of formats without having to learn the details of SPARQL and RDF. Our platform uses ELDA, our own widely-used open source implementation of the Linked Data API. ELDA can also combine text search as an additional facility in defining web-developer APIs to access the data. ELDA optionally uses SPARQL 1.1 (particularly sub-queries) in order to improve responsiveness. 2

Text search The text search indexing is provided by Apache Solr. This can be accessed via the Linked Data AIP or directly within SPARQL queries: The indexed data model is based on the conceptual entities within the data, rather than raw indexing of triples. SPARQL 1.1 Engine Our platform is based on Apache Jena, including TDB and Fuseki. This includes the ARQ query engine, which passes the complete SPARQL 1.1 test suite for query, update and protocol. In addition, the engine is capable of combining free text search with SPARQL queries. Runtime Platform The runtime platform can be deployed within a number of different cloud service providers, as well as on a client s own infrastructure. In this document we assume that the deployment will be within AWS. It achieves scalability and fault-tolerance by having a number of identical replicas across different AWS availability zones. Data is kept with the EU for data protection jurisdiction. The replicas are a combination of application services and a local copy of the SPARQL database and, separately, Solr text indexing. An Amazon load balancer tracks active nodes and routes traffic based on current load and availability of service machines. The number is adjustable to meet the expected load on the system and desired responsiveness within the available budget. The ELDA and SPARQL services reside on the same machine because the ELDA implementation uses the triple store for all its data. The text search may have different scalability requirements and is scaled independently of the triple store. The platform logs all incoming requests, including originating IP address, enabling clients to understand and mine the log information to determine usage patterns as desired. 3

Deployment View Update Controller Changes to the published data are performed by a secured controller. The controller is responsible for determining the necessary changes to the replicated triple store and replicated text index. The controller can be used both by user interface and by scripted processes. The controller also provides SPARQL Update for management of the triple stores, such as corrections to published data. The public interface exposed to the data consumer does not include the SPARQL Update service, which is only available via the secured controller. Entry level platform For the entry level system the runtime platform is limited to a single dedicated machine (there is no replication and no load balancing). There is no direct access to the logs of incoming requests. Apart from this the details are the same as those described under Runtime platform above. 4

3 Service Details As a hosted service, our platform is accredited to store and process IL0 information only. All data loaded onto the platform is backed up at the time the data is loaded, so the backup is always an accurate reflection of the data in the system. The replicated nature of the platform means that a hardware failure will not cause data from the running system to be lost. In the event of catastrophic infrastructure failure which takes all out the replicated instances the data will be restored from backup as quickly as possible. On-boarding: if no customisation of the web interfaces etc. is required, then we will provide the client access to the upload manager so that they are able to have data loaded onto and published by the platform within 5 business days after contracts have been signed. We can provide expedited onboarding at extra cost if desired. Off-boarding: no user data is collected by the system the only data stored on the publishing platform is data supplied by the client. On termination of the contract all client data will be securely deleted. During the life of the contract clients can request access to a copy of all the data stored on the system. As the system is fully replicated routine maintenance can be carried out without taking the system off-line; there is no need for scheduled maintenance windows when the system is out of service. We aim for the availability of the system to be 100%. Details of our support services are given in the next section. We do not offer a trial service, though we do offer an entry level offering for fewer than 10M triples see our separate pricing document for details. 5

4 Support Our hosting support for the full system includes all regular maintenance, monitoring and backups. We will provide reports to the clients on the usage of the system the precise details of the data reported will be agreed with the client during the setup phase. We also provide an incident reporting service. The basic service is available during normal business hours (09.00 17.30 Mondays Fridays, excluding public holidays). We provide an email address for incident reporting and will respond to any notification within 4 hours. If an incident results in loss of service we will restore the service within 1 business day; in all other cases we will use reasonable efforts to resolve the incident as quickly as possible. Additional support options are available at extra cost, including telephone support and faster response times. For such additional support services we offer service credits in the event of failing to meet targets. We note that the replicated nature of our architecture is such that we do not need to take the system down in order to perform regular maintenance and system updates. The production system we run for the Environment Agency went live in April 2012 and since then has been available for 100% of the time. For the entry-level system, running on a single dedicated machine, we will still provide an email address for incident reporting and will respond to any notification within 4 hours; however, if an incident results in a loss of service we will use reasonable efforts to restore the service as quickly as possible, but will not offer a guarantee that we will restore the service within 1 business day. 5 Use of Open Source Software Our platform is based on open source software, notably Apache Jena, including ARQ, TDB and Fuseki Apache Web Server Apache SOLR ELDA, Epimorphics open source implementation of the Linked Data API Apache Tomcat Apache Lucene 6 Compliance with Open Standards 6

Linked data is crucially dependent on the correct implementation of the relevant open standards. Our platform is fully compliant with all the relevant standards, notably RDF syntaxes: RDF/XML, Turtle, N-Triples RDF 1.1 Turtle SPARQL 1.1 Query SPARQL 1.1 result set formats (XML, JSON, CSV, TSV) SPARQL 1.1 Update 7