Data Integration and Fusion using RDF

Similar documents
LDIF - Linked Data Integration Framework

BPMN 2.0 Descriptive Constructs

Mining the Web of Linked Data with RapidMiner

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Semantic Interoperability

A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.

Towards a Sales Assistant using a Product Knowledge Graph

SPARQL UniProt.RDF. Get these slides! Tutorial plan. Everyone has had some introduction slash knowledge of RDF.

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

STAR Semantic Technologies for Archaeological Resources.

Big RDF Data Partitioning and Processing using hadoop in Cloud

DISCOVERING RESUME INFORMATION USING LINKED DATA

Open Data Integration Using SPARQL and SPIN

Index. Registry Report

Publishing Relational Databases as Linked Data

Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute

Industry 4.0 and Big Data

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Publishing Linked Data Requires More than Just Using a Tool

Visual Analysis of Statistical Data on Maps using Linked Open Data

Linked Statistical Data Analysis

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

Applying Semantic Web Technologies in Service-Oriented Architectures

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Short Paper: Enabling Lightweight Semantic Sensor Networks on Android Devices

Executive Summary. of the new Italian legislation on innovative startups

Creating an RDF Graph from a Relational Database Using SPARQL

We have big data, but we need big knowledge

Experiences from a Large Scale Ontology-Based Application Development


ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT

A generic approach for data integration using RDF, OWL and XML

a Data Science Univ. Piraeus [GR]

Scope. Cognescent SBI Semantic Business Intelligence

THE SEMANTIC WEB AND IT`S APPLICATIONS

Drupal.

LiDDM: A Data Mining System for Linked Data

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Bigdata Model And Components Of Smalldata Structure

Sieve: Linked Data Quality Assessment and Fusion

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Master Program SUSTAINABLE ENGINEERING IN PRODUCTION

BYODs & FAIR Data Stewardship

LINKED OPEN DRUG DATA FROM THE HEALTH INSURANCE FUND OF MACEDONIA

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Lift your data hands on session

Semantic Web Technologies and Data Management

SmartLink: a Web-based editor and search environment for Linked Services

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

A Semantic web approach for e-learning platforms

Linking Maritime Datasets to Dutch Ships and Sailors Cloud - Case studies on Archangelvaart and Elbing. J.A. Entjes July 10th, 2015

Building a Mobile Applications Knowledge Base for the Linked Data Cloud

Evaluating SPARQL-to-SQL translation in ontop

Principles of Database. Management: Summary

OWL: Path to Massive Deployment. Dean Allemang Chief Scien0st, TopQuadrant Inc.

Application of ontologies for the integration of network monitoring platforms

Using Open Source software and Open data to support Clinical Trial Protocol design

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

Open Data collection using mobile phones based on CKAN platform

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

GetLOD - Linked Open Data and Spatial Data Infrastructures

Towards the Integration of a Research Group Website into the Web of Data

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

Transcription:

Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1 st Semester, 2013 Data Integration and Fusion using RDF Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info 1

Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Thanks to Anton Deik for helping me preparing this lecture 2

Example from the Government Domain Consider this simplified example from the Government domain. Consider three governmental agencies that record information about companies. In this example, we will integrate the three databases by transforming each one into RDF and then concatenating the resultant RDF tables into one table. After that, we investigate the concatenated data and link the different resources. Data integration is simply achieved through concatenation of RDF graphs and linking different resources. It is also achieved when building and executing the queries over the concatenated dataset. Companies DB in Ministry of Justice Companies DB in Chamber of Commerce Companies DB in Ministry of Economy 3

Ministry of Justice Ministry of Justice records some information about companies in addition to the advocates that represent the companies. Company Advocate 4

Ministry of Justice: To RDF Company Advocate To RDF 5

Chamber of Commerce Chamber of Commerce records information about companies in addition to information about companies owners. Company Owner Company_Owner 6

Chamber of Commerce: To RDF To RDF 7

Ministry of Economy Ministry of Economy records information about companies, their owners, and their advocates. Company Owner Lawyer 8

Ministry of Economy: To RDF To RDF 9

Integration of RDF Data As simple as S P O S P O S P O 10

In our example 11

Linking resources How are same entities described in different datasets linked? By linking the Global Identifier, that is, the URI**! Let s have a look: :YH852 owl:sameas :8327848 :YH852 owl:sameas :4354JU - Links the company called Palestine Antiques in the three databases. - This is called entity resolution/ disambiguation. :H782YU owl:sameas :L85652r - Links the lawyer called Tony Deik recorded in the ministry of Justice and the ministry of national economy. - This is called entity resolution/ disambiguation. ** Note that in our example we used colons to distinguish URIs. For example :JK452, :H782YU, :Country, and :Name are all URIs. For example: :H782YU might actually be something like: http://www.palgov.ps//h782yu 12

Data Integration and Fusion Concatenating RDF graphs and linking entities in different datasets forms an integrated view where applications see all datasets as one integrated database. Source: Christian Bizer 13

Practical Session 14

Practical Session Description: From previous practical sessions: The central management of students profiles by the ministry of education is becoming an urgent need in the last years. Many students in Palestine move from one university to another, and they need to transfer their academic records. Also, the ministry of higher education needs to certify the diplomas and mark sheets of students. Moreover, there is a need to centrally manage/monitor students financial aids. Therefore, the ministry of higher education has decided to build a national student registry, such that, each semester every university has to send the academic record of every student to the ministry of education. The ministry will then update and integrate the academic records according to the data combined from all universities into the national student registry. The ministry wants to use RDF to integrate this data. Thus, each university must map its relational data (or data in any other model) into RDF, and at the ministry this data is integrated and fused. Map the universities relational data into RDF and integrate and fuse it. 15

Practical Session Each two students form a group. Each group must be composed of students from different universities (in their first level degrees). Students are expected to use three different mark sheets from different universities to construct 3 different hypothetical relational data schemes of students records. Students must populate the three databases (pertaining to the 3 different data schemes) with sample data. Students must integrate and fuse all data using RDF. Students are highly recommended to use the ontologies developed in previous practical sessions when mapping and integrating RDF data. Students must write at least three SPARQL queries on the integrated RDF data that involves data from all 3 sources Students must work this practical session using Oracle Semantic Technologies. After finalizing their work, each group will be asked to present their work to all students, so to collect comments and feedback. The final delivery include: (i) Snapshots of the three hypothetical databases and schemes taken from Oracle DB. (ii) The RDF mapping of each database (SPO tables). (iii) The integrated final RDF showing how entities were disambiguated. (iv) The executed SPARQL queries and their results. Note that this final delivery should have the form of a report where discussion of the various steps are expected to be clear. 16