DBpedia German: Extensions and Applications



Similar documents
Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

QAD Business Intelligence Release Notes

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Pcounter. Category Characteristics. Unified print room management Print policies and rules Product-based job processing Print queue management

XProtect Corporate 2013

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

Low-cost Open Data As-a-Service in the Cloud

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Very Large Enterprise Network Deployment, 25,000+ Users

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

A collaborative platform for knowledge management

Microsoft Dynamics CRM 2011 Guide to features and requirements

Storage Sync for Hyper-V. Installation Guide for Microsoft Hyper-V

owncloud Enterprise Edition on IBM Infrastructure

Cost-Effective Business Intelligence with Red Hat and Open Source

Sage 200 On Premise. System Requirements and Prerequisites

Graph Database Performance: An Oracle Perspective

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

LDIF - Linked Data Integration Framework

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Publishing Linked Data Requires More than Just Using a Tool

Microsoft SharePoint Server 2010

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Building a BI Solution in the Cloud

Hardware and Software Requirements for Server Applications

Very Large Enterprise Network, Deployment, Users

Cisco TelePresence Management Suite

A Performance Analysis of Distributed Indexing using Terrier

Cisco TelePresence Management Suite

Microsoft SharePoint Server 2010

ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

Geospatial Data and the Semantic Web. The GeoKnow Project. Sebastian Hellmann AKSW/KILT research group, Leipzig University & DBpedia Association

D&B360. Installation and Administration Guide. for Microsoft Dynamics CRM. Version 3.0

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Mediasite EX server deployment guide

Sharp Remote Device Manager (SRDM) Server Software Setup Guide

Soma: Linked Data Infrastructure

VIVO Dashboard A Drupal-based tool for harvesting and executing sophisticated queries against data from a VIVO instance

Cisco TelePresence Management Suite Extension for Microsoft Exchange

Microsoft Dynamics CRM 2011 New Features

Adonis Technical Requirements

Data-Flow Awareness in Parallel Data Processing

Software and Hardware Requirements

COMP9321 Web Application Engineering

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation

HEAT Endpoint Management and Security Suite 8.4. Server Install Guide

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

McAfee Network Security Platform 8.2

Setting Up the Development Workspace

Legal Notices Introduction... 3

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Client-aware Cloud Storage

Benchmarking Cassandra on Violin

Industry 4.0 and Big Data

Deliverable Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Oracle Big Data SQL Technical Update

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

iway Roadmap: 2011 and Beyond Dave Watson SVP, iway Software

MS 8912A: Customization and Configuration in Microsoft Dynamics CRM 4.0

Toolbox 4.3. System Requirements

XenDesktop 7 Database Sizing

Big Data Technologies Compared June 2014

Querying DBpedia Using HIVE-QL

a division of Technical Overview Xenos Enterprise Server 2.0

Big Data and Analytics: Challenges and Opportunities

Business applications:

Deploying and administering Microsoft Dynamics CRM Online and Microsoft Dynamics CRM 2015

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

An Oracle White Paper June Oracle Database Firewall 5.0 Sizing Best Practices

Managing Enterprise Devices and Apps using System Center Configuration Manager

MS 10978A Introduction to Azure for Developers

Yet Another Triple Store Benchmark? Practical Experiences with Real-World Data

IMPLEMENTING GREEN IT

Cisco Data Preparation

Hadoop Architecture. Part 1

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 2. Feature and Technical Overview

Enabling Technologies for Distributed Computing

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

White Paper. Recording Server Virtualization

Transcription:

DBpedia German: Extensions and Applications Alexandru-Aurelian Todor FU-Berlin, Innovationsforum Semantic Media Web, 7. Oktober 2014

Overview Why DBpedia? New Developments in DBpedia German Problems in DBpedia 2

Why DBpedia? Knowledge Bases are the core of Intelligent Web Applications Gazetteerless NER Question Answering Engines Document Enrichment Relation Extraction Event Detection Large web companies are developing their own alternatives Google Knowledge Graph/Freebase Microsoft Satori KB Wikidata Yahoo Knowledge Graph IBM Watson API, Wolfram Alpha etc. 3

DBpedia-LOD Cloud 4

International DBpedia Chapters http://wiki.dbpedia.org/internationalization/chapters Goal: provide additional resources for extraction, access, services, language specific endpoints and services language specific extension German Chapter addresses German Language URL: http://de.dbpedia.org/ 5

DBpedia German: What We Offer DBpedia German Data Dumps: http://de.dbpedia.org/downloads/ DBpedia German SPARQL Endpoint: http://de.dbpedia.org/sparql DBpedia German Spotlight: http://de.dbpedia.org/spotlight/demo DBpedia German Live: http://live.de.dbpedia.org/sparql DBpedia German Live Changesets: http://live.de.dbpedia.org/changesets Improved interlinking data, e.g. Linked Hypernyms Dataset (hypernyms from the first sentences) 6

DBpedia German Statistics Property Value Triples 146 Million Classes 206 Entities 4.3 Million Distinct Subjects 7.6 Million Properties 15609 Distinct Objects 36.7 Million Category Improvement Mappings >30% Missing Labels >300% 7

DBpedia German Infrastructure Main Sever 1: CPU: 2x Hexacore Xeon Ivy Bridge => 24 vcores Memory: 256 GB Ram SDD ~ 1TB Raid 5 Array HDD ~ 10 TB Raid 5 Array Secondary Servers 2: CPU: Quadcore Xeon Sandy Bridge: =>16 vcores Memory: 32 GB x2 HDD: 1TB 10000 RPM x2 8

Problems In DBpedia Ontology & Missing Data Missing Labels Missing Types Editing Capabilities Administration 9

Missing Labels Why Missing Labels are a problem? People don t understand the ontology New classes and properties are created needlessly Not a true multilingual ontology Language Missing Class Labels Missing Property Labels German 147 1530 French 280 2534 Spanish 706 2747 Italian 496 2712 Polish 678 2709 10

Missing Labels How Do We Address the Problem Automatically Translate Labels using Translation services Present translation suggestions to editors in a batch mode Allow editors to edit and commit multiple translations at the same time 11

Missing Labels: Missingbot Bot-framework for editing the MappingsWiki Rest service for communicating with the mappings wiki and other applications Plugins for the Mappings Wiki in order to review added information https://github.com/dbpedia/missingbot 12

MissingBot Label Translations 14

Missing Types Why are Missing Types a Problem? rdf:type statements are the main way we query a KB Without precise type information there is no easy way to say Which CDU politicians born in berlin List all capitals in Europe List all actors (schauschpieler) in Berlin Without precise type information NER annotations are imprecise You can t filter out or select specific entities Ex: annotate only politicians, or software companies in a text document 15

Missing Types Solution: Linked Hypernyms Dataset Cooperation with the Prague University of Economics http://ner.vse.cz/datasets/linkedhypernyms/ Extract type information from Hypernyms Significant improvement over instance-types dataset DBpedia Instance Types LHD 1.0 LHD 2.0 Nr. of resources 910834 893120 795415 New resources N/A 495924 403475 Improvement N/A 52.5 % 44.3% 16

LHD Examples from the German DBpedia Dbpedia Resource Dbpedia Types LHD 1.0 Type LHD 2.0 Types http://de.dbpedia.org/r esource/brad_pitt http://de.dbpedia.org/r esource/tom_hanks wikidata:q5 owl:thing schemaorg:person wikidata:q215627 dul:agent dul:naturalperson dbo:agent dbo:person dbo:actor dbo:actor same dbo:actor dbo:actor http://de.dbpedia.org/r esource/wladimir_wladi mirowitsch_putin same dbo:politician dbo:politician http://de.dbpedia.org/r esource/barack_obama http://de.dbpedia.org/r esource/berlin http://de.dbpedia.org/r esource/leipzig same dbo:politician dbo:politician schemaorg:place odp:location dbo:place wikidata:q532 opengis:_feature Same + dbo:populatedplace dbo:settlement http://dbpedia.org/reso urce/capital_city http://de.dbpedia.org/p age/großstadt dbo:place dbo:place 17

Missing Ontology Editing Capabilities Why are Missing Ontology Editing Capabilities a problem? No good overview of the Ontology No efficient Way to rename or reorganize classes No efficient way to align the ontology with other ontologies 18

Missing Ontology Editing Capabilities Web Protégé Integration How to solve the Ontology Editing Problem? Use an advanced Collaborative Ontology Editor Solve the compatibility problem by integrating the editor into the existing framework Solve authentication and synchronization problems Architecture: 19

Missing Editing Capabilities 20

Why Administration is a Problem Configuring the different DBpedia services is a very complex task DBpedia Static: configuring the abstract extraction, generating datasets and importing them into virtuoso DBpedia Live: creating a Syncwiki, configuring the live extraction and an endpoint for the streaming updates DBpedia Spotlight: configuring a Hadoop cluster for dataset generation and then configuring a rest service DBpedia Lookup: generating the index for the lookup service Debugging: Problems are very specific to a configuration, there is no way to inspect specific issues without replicating the envirtonment 21

Addressing the Administration Problem Container Virtualisation Package the different dbpedia services in docker containers Share Conainers together with the configuration Docker Build once run everywhere Filesystem-level versioning Small containers Easy deployment Docker HUB Share Containers Push and Pull https://registry.hub.docker.com/repos/alexa ndru/ 22

Addressing the Administration Problem: DBpedia+Docker DBpedia Spotlight Static Endpoint Live Endpoint Static Extraction Dockerized DBpedia Live Extraction DBpedia SyncWiki 23

Thank You! http://www.corporate-smart-content.de