VRTRESEARCH&INNOVATION



Similar documents
Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Semantic Content Management with Apache Stanbol

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Big Data & Semantic Technology Michel de Ru michel.de.ru@dayon.nl

How To Manage Your Digital Assets On A Computer Or Tablet Device

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Skip the But it Works on My Machine Excuse with Vagrant

Things Made Easy: One Click CMS Integration with Solr & Drupal

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Content Management for Content Enrichment: Architectural Issues and Strategies

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

How To Set Up Wiremock In Anhtml.Com On A Testnet On A Linux Server On A Microsoft Powerbook 2.5 (Powerbook) On A Powerbook 1.5 On A Macbook 2 (Powerbooks)

GetLOD - Linked Open Data and Spatial Data Infrastructures

Publishing Linked Data Requires More than Just Using a Tool

ULTEO OPEN VIRTUAL DESKTOP UBUNTU (PRECISE PANGOLIN) SUPPORT

LDIF - Linked Data Integration Framework

Virtualisatie. voor desktop en beginners. Gert Schepens Slides & Notities op gertschepens.be

Web NDL Authorities: Authority Data of the National Diet Library, Japan, as Linked Data

CDH installation & Application Test Report

Catalogus Patronen Turnhoutse Kant Versie september 2008 Catalogue Patterns Paris Lace Version September 2008

INSTALLING MALTED 3.0 IN LINUX MALTED: INSTALLING THE SYSTEM IN LINUX. Installing Malted 3.0 in LINUX

Visual Analysis of Statistical Data on Maps using Linked Open Data

Semaphore Overview. A Smartlogic White Paper. Executive Summary

Installation & Upgrade Guide

CHEF IN THE CLOUD AND ON THE GROUND

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering

María Elena Alvarado gnoss.com* Susana López-Sola gnoss.com*

Open Data Integration Using SPARQL and SPIN

Scope. Cognescent SBI Semantic Business Intelligence

INUVIKA OVD INSTALLING INUVIKA OVD ON UBUNTU (TRUSTY TAHR)

Semantic Interoperability

D5.4.4 Integrated SemaGrow Stack API components

How To Understand The Architecture Of An Ulteo Virtual Desktop Server Farm

DevOps. Building a Continuous Delivery Pipeline

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Subversion Server for Windows

A collaborative platform for knowledge management

The Open Source CMS. Open Source Java & XML

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

Smarter Content with a Dynamic Semantic Publishing Platform. The Semantic Technologies That Can Make Any Content Intelligent

INUVIKA OPEN VIRTUAL DESKTOP FOUNDATION SERVER

Chapter 7. Using Hadoop Cluster and MapReduce

Installing Proview on an Windows XP machine

Uptime Infrastructure Monitor. Installation Guide

depl Documentation Release depl contributors

SecureVault Online Backup Service Client Installation Guide

Test automation Delta Lloyd, successful IT business alignment. Eric de Graaf

How To Make Sense Of Data With Altilia

Linstantiation of applications. Docker accelerate

STAR Semantic Technologies for Archaeological Resources.

CASRAI, eurocris, Lattes, and VIVO: Four Perspectives on Research Information Standards

Hadoop Data Warehouse Manual

Module 11 Setting up Customization Environment

D5.3.2b Automatic Rigorous Testing Components

A Performance Analysis of Distributed Indexing using Terrier

Structured Content: the Key to Agile. Web Experience Management. Introduction

/14/$ IEEE 327

SAS Marketing Automation 4.4. Unix Install Instructions for Hot Fix 44MA10

The Core Pillars of AN EFFECTIVE DOCUMENT MANAGEMENT SOLUTION

Training Events Database (TED) Setup Guide

How To Develop An Open Play Context Framework For Android (For Android)

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

DevShop. Drupal Infrastructure in a Box. Jon Pugh CEO, Founder ThinkDrop Consulting Brooklyn NY

Video Transcription in MediaMosa

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

IUCLID 5 Guidance and Support

Seeking Open Educational Resources to Compose Massive Open Online Courses in Engineering Education An Approach Based on Linked Open Data

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

OCS Virtual image. User guide. Version: Viking Edition

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)


ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

Transcription:

15/02/2013 VRTRESEARCH&INNOVATION Project C: Future CMS Onderzoek mogelijkheden aggregatie en categorisatie content C.3.1.2 Proof of concept aggregatie en categorisatie content VRT RESEARCH &INNOVATION

Scope Content Aggregatie & Categorisatie Binnen een Future Content Management Systeem (externe) content kunnen categoriseren en aggregeren met het oog op een op:malisa:e van het authoring proces enerzijds en het recommenderen van content anderzijds. Strategie Op basis van thesauri (controlled vocabulary) en het Seman:c Web (Linked Open Data LOD) (tekstuele) content verwerken, herkennen en categoriseren met metadata tags. Op basis van deze metadata tags overeenkoms:ge content groeperen en aggregeren.

Architectuur: Enrichment Service Items Enricher Items & tags Thesaurus Linked Open Data

Architectuur: Aggregator Service Items & tags Lily Aggregator Contextualized Grouped items User behavior User Context

Content Enrichment Frameworks : OntoText KIM State-of-The-Art GATE (General Architecture for Text Engineering) Apache Stanbol

Content Enrichment Frameworks : Apache Stanbol Selection PRO Open Source Framework (based on the Apache License) Flexible Set of Reusable components for semantic content management Support for Content Enhancement based on Semantic Engines Support for Custom & Domain Vocabularies (like VRT s Thesaurus) CON Incubator Framework work in progress

FutureCMS: enricher components Setup of RDF triple store with DBPedia, GeoNames & VRT Thesaurus Setup of Apache Stanbol Setup of Enricher Chain Implementation of Polopoly Enrich Service (based on JSON)

Enricher: setup of Stanbol This is done through chef solo provisioning (http://wiki.opscode.com/display/ chef/chef+solo). The chef run can be started on a clean Ubuntu Precise box and will download and build Stanbol and install it as a service. It will also install the Enrichment service which will make use of Stanbol. A vagrant (http://vagrantup.com/) configuration is provided in enrichment/ Provisioning/vagrants/enricher_dev. Running vagrant up in this folder will create a Virtualbox virtual machine with Stanbol and the service on it. We also provide a mccloud (https://github.com/jedi4ever/mccloud) configuration to easily run the chef recipes on the VRT instance.

Enricher: Custom Vocabulary (http://stanbol.apache.org/docs/trunk/customvocabulary.html) The chef run will assemble the indexing tool, and place it in /opt/stanbol/indexingworking-dir. In that folder, you can run sudo java -Xmx1024m -jar org.apache.stanbol.entityhub.indexing.genericrdf-*-jarwith-dependencies.jar init! which will set up the folder structure. Then, place your.nt files with the SKOS thesaurus in /opt/stanbol/indexingworking-dir/indexing/resources/rdfdata. Then, run sudo java -Xmx1024m -jar org.apache.stanbol.entityhub.indexing.genericrdf-*-jarwith-dependencies.jar index! After a while, two files are created in /opt/stanbol/indexing-working-dir/indexing/ dist: vrtthesaurus.solrindex.zip, which is a Solr Index with the thesaurus labels org.apache.stanbol.data.site.vrtthesaurus-1.0.0.jar, which is a bundle to install in the Felix console of Stanbol.

Enricher: Stanbol Chain Configuration Keyword Linking Chain This chain compares the labels of the controlled vocabulary (VRT s Thesaurus) with the words in the text. It is language independent. The configuration is as follows:

Enricher: Stanbol Chain Configuration For the gaza ar:cle (hup://www.deredac:e.be/cm/vrtnieuws/buitenland/ 1.1484869) the following en::es are recognized: Places: DIE, EEN, EGYPTE, EGYPTISCHE POLITIEKE GROEPERINGEN, GAZA, GEMEENSCHAP, ISLAMITISCHE JIHAD, MAROKKO, MOSLIMBROEDERS, WAREN

Enricher: NER Tagging Chain This chain first performs Named Entity Recognition on the text. It is therefore language dependent. The language of the text is first detected using langid Engine. Afterwards, we try to link the detected entities with the controlled vocabulary using the NER Tagging Engine.

Enricher: NER Tagging Chain

Enricher: NER Tagging Chain For the gaza ar:cle (hup://www.deredac:e.be/cm/vrtnieuws/ buitenland/1.1484869) the following en::es are recognized: People: MOEBARAK HOSNI, MURSI, VRANCKX RUDI Organisa4ons: VRT

C.3.1.2 Enricher: Prototype