Big Data & Semantic Technology Michel de Ru michel.de.ru@dayon.nl



Similar documents
From Big Data to Smart Data. Marin Dimitrov - CTO

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

How To Make Sense Of Data With Altilia

A 360 Degree View of Anything

OpenText Content Hub for Publishers

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Making Sense of Big Data in Insurance

A Content Hub strategy for newspaper and magazine publishers

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

MarkLogic Semantics in Healthcare and Life Sciences for LIDER COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Introduction to Epinomy. Big Data Semantics

How To Manage Your Digital Assets On A Computer Or Tablet Device

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

HOW TO DO A SMART DATA PROJECT

BIG DATA. John A. Eisenhauer Chair, Data Governance Society Rick Young - Managing Director 3Sage Consulting

MarkLogic Pre-conference Tutorial. om de titelstijl van het model te bewerken

Search and Real-Time Analytics on Big Data

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Structured Content: the Key to Agile. Web Experience Management. Introduction

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

Big Data and Analytics: Challenges and Opportunities

The Foundations of Successful Reference Data Management

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

... Foreword Preface... 19

Big Data Analytics Nokia

Introduction to Glossary Business

The next level of enterprise digital asset management

Open Source egovernment Reference Architecture Osera.modeldriven.org. Copyright 2006 Data Access Technologies, Inc. Slide 1

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Data collection architecture for Big Data

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.

Luncheon Webinar Series July 29, 2010

SAP HANA Cloud Portal Overview and Scenarios

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Unlock Business Agility with Oracle Business Intelligence Cloud Service (BICS)

Smarter Content with a Dynamic Semantic Publishing Platform. The Semantic Technologies That Can Make Any Content Intelligent

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Data Virtualization A Potential Antidote for Big Data Growing Pains

MarkLogic Enterprise Data Layer

JBoss Data Services. Enabling Data as a Service with. Gnanaguru Sattanathan Twitter:@gnanagurus Website: bushorn.com

Providing real-time, built-in analytics with S/4HANA. Jürgen Thielemans, SAP Enterprise Architect SAP Belgium&Luxembourg

Data Governance for Regulated Industries

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Migrating Scientific Applications from Grid and Cluster Computing into the Cloud Issues & Challenges

COMP9321 Web Application Engineering

The big data business model: opportunity and key success factors

What is Digital Asset Management all about?

Architected Blended Big Data with Pentaho

Oracle Big Data SQL Technical Update

GetLOD - Linked Open Data and Spatial Data Infrastructures

BIG DATA GOVERNANCE: BALANCING BIG DATA VELOCITY & INFORMATION GOVERNANCE

Market Analysis For Photography: Extract From D5.1

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Software Architecture Document

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

VRTRESEARCH&INNOVATION

Big Data & Security. Aljosa Pasic 12/02/2015

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

Auto-Classification for Document Archiving and Records Declaration

Architectuur hulpmiddelen TechnoVision & CORA. Maarten Engels Nieuwegein, 9 februari 2012

The Liaison ALLOY Platform

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

Apigee Edge API Services Manage, scale, secure, and build APIs and apps

Practical meta data solutions for the large data warehouse

The future of Big Data A United Hitachi View

Cloud Computing Landscape: The Importance Of Standards

DDI Lifecycle: Moving Forward Status of the Development of DDI 4. Joachim Wackerow Technical Committee, DDI Alliance

Big Data Architect Certification Self-Study Kit Bundle

MEDIA & CABLE. April Taras Bugir. Broadcast Reference Architecture. WW Managing Director, Media and Cable

Graph Database Performance: An Oracle Perspective

Functional Requirements for Digital Asset Management Project version /30/2006

Organisation data architecture with standards connections

Create and Distribute Rich Media for Optimized, Omnichannel Customer Engagement

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

White paper. Ensuring Big Data Security with Identity and Access Management

How To Build A Cloud Portal For Sap Hana Cloud Platform

Beyond Health 2.0: the semantic web and intelligent systems

Roadmap Talend : découvrez les futures fonctionnalités de Talend

A Case Study of Hadoop in Healthcare

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Big Data Integration: A Buyer's Guide

Transcription:

Michel de Ru michel.de.ru@dayon.nl

Newz.nl When competitors band together Introduction Michel de Ru Solution architect @ Dayon 16 years experience in publishing Among others Wolters-Kluwer, Sdu (ELS) and Dutch Railways Specialized in Content related Big Data challenges Specialized in added value through Semantic Technology Dayon, part of the HintTech Group We design, build and maintain content driven online and mobile applications We help customers develop their Content Strategy We realize it using Content Technology Partners include MarkLogic, Ontotext, Alfresco, Hippo CMS, Solr and OpenText Big Data projects for Dutch Public Library, Kluwer, Newz

Newz.nl When competitors band together Contents Challenges for news publishers Unified search and delivery platform How we built it New use-cases Conclusion Questions

Challenges for news publishers

Mln Euro Turn-key platform Newz Main challenges Traditional market is changing rapidly Publishers are under pressure 2,0 Gross income 1,6 1,2 0,8 0,4 0,0 2002 2003 2004 2005 2006 2007 2008 2008 2010 2011 2012 Public perception News for free Competitors steal content/piracy New initiatives have no content available

Main challenges Possible answers Ignore and let it happen Prohibit / forbid the use of our content Outsmart the competition We as combined publishers have the best quality and can offer the best knowledge about our combined news content

Main targets Efficiency: standardized B2B service delivery One delivery platform for all daily news publishers Quality: automated metadata enrichment Most newspapers don t have metadata available Linked open data and semantic technologies Centralized semantic curation Enable new business opportunities Centralized sourcing/licensing of content Incubator for new initiatives

Newz in the news

NDP Nieuwsmedia in the news Newz video in Dutch See www.newz.nl

The project 2012 January : RFI/RFP June : Dayon/Ontotext selected July : Project kick-off December: Newz BV founded 2013 March : System delivered July : Production

How it works

Data Journalistiek Applicatie

How we put it together

Newz.nl Big Data approach Volume Velocity Terra/Petabytes Catalogs Files Content Real time Near time Batch Streaming Volume Velocity Variety Big Data Variety Unstructured Structured Text, image, video Mixed rich content Value Value creation Added value Solutions

Dutch news = Big Data Volume Velocity Volume 15.000 news articles a day Variety Value Velocity Delivery spike during 2 hours a day (just before the morning starts) Usage is continuously (through API, Search and Subscription interfaces) Variety News articles without metadata and no structure whatsoever Linked Open Data Value Facilitate new News business solutions for integrators, app suppliers, etc. Deliver a standardized (NITF NewsML) and enriched format

Key aspects Big Data Content Store Enterprise NoSQL Volume Velocity Structured/unstructured ACID compliant (Atomicity, Consistency, Isolation, Durability) Semantic Technologies Concept extraction Variety Linked (Open) Data Graph databases / Inferencing Content Lifecycle Management Part of Application Lifecycle Management

Volume, Velocity Interface with News publishers Content Processing Framework Added a Java layer for full ETL and trailing capabilities Storage of News articles In cooperation with IPTC a Dutch version of NewsML-G2 has been defined Interface with Semantic Extraction framework Full search capabilities Enterprise grade We also calculated a MongoDB/Lucene solution ML won on: TCO, Success rate of business implementations, Enterprise resilience

Variety Semantic Extraction Existing news vocabularies and taxonomies + Linked Open Data World class Semantic Extraction (NLP, Golden Standard, Rules, etc.) Conversion to an ontology (similar to semantic web) Triples stored in OWLIM Enterprise Enrichment of news articles Organizations Persons Locations Events Keywords Mentions From a lot of data To even more data!

e.g. Barack Obama e.g. Democratic Party e.g. Netherlands

Architecture overview

Use cases

Voorbeeld: Automatische geo taxonomie Nieuwsartikel gaat over Haditha in Irak Wat als je meer wilt weten over de regio? 1. Artikel is semantisch verrijkt met de plaatsnaam 2. Op basis van Linked Open Data wordt een taxonomie getoond 3. Daarmee kan alle content die over de regio gaat gevonden worden

Voorbeeld: tijd reizen door infographics

Voorbeeld: Research Research over bepaalde onderwerpen Geef de meest relevante artikelen Geef relevatie in de tijd gezien Geef de mogelijkheid tot een verdiepende zoektocht

Voorbeeld: Mashups Research over bepaalde onderwerpen Verrijk resultaat met Verrijk Linked Open resultaat met Data Linked Open Data Verrijk resultaat met eigen taxonomie / ontologie

Concluding

Newz.nl When competitors band together In conclusion Lessons learned Hard to keep 12 frogs in the wheelbarrow, but we succeeded! Semantic Extraction is a gigantic specialism, start early in the project! MarkLogic delivers to it s promises, don t think about it just use it! Business opportunities need both commercial and technical perspectives! Next steps Commercialize the Newz organization Connect more publications Enable market initiatives by using the Newz API (commercial and non-commercial) Enable re-use within the publishers

Any Questions?