LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model



Similar documents
Semantic Interoperability

Comparison of Triple Stores

Oracle Architecture, Concepts & Facilities

Epimorphics Linked Data Publishing Platform

Linked Statistical Data Analysis

Introduction to Ontologies

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

DISCOVERING RESUME INFORMATION USING LINKED DATA

The Ontological Approach for SIEM Data Repository

2007 to 2010 SharePoint Migration - Take Time to Reorganize

MarkLogic Semantics in Healthcare and Life Sciences for LIDER COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

How To Create A Federation Of A Federation In A Microsoft Microsoft System (R)

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

Semantic Method of Conflation. International Semantic Web Conference Terra Cognita Workshop Oct 26, Jim Ressler, Veleria Boaten, Eric Freese

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

bigdata Managing Scale in Ontological Systems

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

MarkLogic Server. Reference Application Architecture Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Open Data Integration Using SPARQL and SPIN

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Mission-Critical Database with Real-Time Search for Big Data

Grids, Logs, and the Resource Description Framework

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Completing the Big Data Ecosystem:

Unstructured Data Management with Oracle Database 12c O R A C L E W H I T E P A P E R S E P T E M B E R

STAR Semantic Technologies for Archaeological Resources.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

Semantic Stored Procedures Programming Environment and performance analysis

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Mining Big Data with RDF Graph Technology:

Semantic Web Technologies and Data Management

RDF Dataset Management Framework for Data.go.th

MarkLogic 8: Samplestack

Collaborative Metadata Management

How To Build A Cloud Based Intelligence System

HOW TO DO A SMART DATA PROJECT

Graph Database Performance: An Oracle Perspective

Supporting Change-Aware Semantic Web Services

Geospatial Platforms For Enabling Workflows

Experiences from a Large Scale Ontology-Based Application Development

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

An Oracle White Paper February Managing Unstructured Data with Oracle Database 11g

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Geospatial Platforms For Enabling Workflows

Course 6232A: Implementing a Microsoft SQL Server 2008 Database

12 The Semantic Web and RDF

Lift your data hands on session

A Framework for Collaborative Project Planning Using Semantic Web Technology

Semantic Web Success Story

ebay : How is it a hit

VIVO Dashboard A Drupal-based tool for harvesting and executing sophisticated queries against data from a VIVO instance

GetLOD - Linked Open Data and Spatial Data Infrastructures

D5.3.2b Automatic Rigorous Testing Components

Towards the Integration of a Research Group Website into the Web of Data

Structured Content: the Key to Agile. Web Experience Management. Introduction

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Semantic Web Languages: RDF vs. SOAP Serialisation

HadoopRDF : A Scalable RDF Data Analysis System

Presentation / Interface 1.3

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Semantically Enhanced Web Personalization Approaches and Techniques

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

Semantic Web Standard in Cloud Computing

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Data Modeling for Big Data

Table of Contents Chapter 1 - Getting Started with Oracle Data Relationship Management (DRM) 1

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Towards a reference architecture for Semantic Web applications

Oracle Big Data SQL Technical Update

Transcription:

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin

Background About Macmillan and what we are doing 1

Macmillan Science and Education Group brands and businesses

MS&E Current trends Developing a richer graph of objects Change Drivers Digital first workflow print becomes secondary support for multiple workflows User-centric design things, not data focus on user experience Deeply integrated datasets standard naming convention common metadata model flexible schema management rich dataset descriptions

NPG Linked Data Platform (2012) data.nature.com Deliverables (2012 2014) Prototype for external use Two RDF dataset releases in 2012 April 2012 (22m triples) July 2012 (270m triples) Live updates to query endpoint SPARQL query service (decommissioned) Current Work (2014 ) Focus on internal use-cases Publish ontology pages Periodic data snapshots

NPG Core Ontology (2014) Things: assets, documents, events, types Features Classes: ~65 Properties: ~200 Named graphs (per class) Namespaces npg: => http://ns.nature.com/terms/ npgg: => http://ns.nature.com/graphs/ Approach Incremental formalization (RDF, RDFS, OWL-DL) Shared metamodel vs. automatic inference Minimal commitment to external vocabs

NPG Subject Pages (2014) Topical access to content Features Based on SKOS taxonomy >2500 scientific terms content inherited via SKOS tree Dynamically generated one webpage per subject term secondary pages for article types Various formats, e.g. e-alerts, feeds allows people to follow a subject Customized related content ads, jobs, events, etc.

Data Storage and Query Achieving speed by means of a hybrid architecture 2

Content Hub Managed content warehouse for data discovery Capabilities Discovery Graph Storage Content Repos Features Hybrid RDF + XML architecture MarkLogic for XML, RDF/XML Triplestore (TDB) for RDF validation Repo s for binary assets Datasets Documents (large; >1m) Ontologies (small; <10k)

System Architecture Hub content

Content Discovery Principles Readying the API for applications Generations 1st Generic linked data API (RDF/*) 2nd Specific page model API (JSON) Concerns Speed (20ms single object; 200ms filtered object) Simplicity (data construction) Stability (backup, clustering, security, transactions) Principles Chunky not chatty, all data in a single response Data as consumed, rather than as stored Support common use cases in simple, obvious ways Ensure a guaranteed, consistent speed of response for more complex queries Build on foundation of standard, pragmatic REST (collections, items)

Content Discovery Optimization Tuning the API for performance Approaches TDB + Fuseki SPARQL MarkLogic Semantics SPARQL MarkLogic XQuery MarkLogic (Optimized) XQuery Techniques Partitioning RDF/XML objects Streaming serialization Hashing dictionary lookup Cacheing Varnish

Content Storage Layout and Indexing Readying the data for page delivery Challenges Sort orders RDF Lists Facetting, counting Layout Semantic RDF/XML includes in XML RDF objects serialized in list order Application XML for subject hierarchy Indexes Indexes over all elements Range indexes for datatypes (e.g. datetimes)

In Conclusion A few lessons learned Summary An RDF metamodel allows for scalable enterprise-level data organization It is crucial to adequately distinguish between external and internal use cases A hybrid architecture proved to be an efficient internal solution for content delivery Future Work Grow the ontology so that it matches product requirements more closely Support automated reasoning and richer query options both RDF and XML based Maintain and expand the vision of a shared semantic model as a core enterprise asset

Thank you For more information please contact TONY HAMMOND Data Architect, Content Data Servicestony.hammond@macmillan.com MICHELE PASIN Information Architect, Product Office michele.pasin@macmillan.com