KD2R: a Key Discovery method for semantic Reference Reconciliation

Similar documents
SAKey: Scalable Almost Key discovery in RDF data

How To Create A Web System

Unique column combinations

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Database Design Methodology

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Mining the Web of Linked Data with RapidMiner

RDF y SPARQL: Dos componentes básicos para la Web de datos

Semantic Interoperability

Optimizing Description Logic Subsumption

Chapter 2: Entity-Relationship Model. E-R R Diagrams

Characterizing Knowledge on the Semantic Web with Watson

Ampersand and the Semantic Web

We have big data, but we need big knowledge

Sorting Hierarchical Data in External Memory for Archiving

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Linked Medieval Data: Semantic Enrichment and Contextualisation to Enhance Understanding and Collaboration

QASM: a Q&A Social Media System Based on Social Semantics

XML Data Integration

The Manuscript as Cultural Heritage: Digitisation ++

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

DISCOVERING RESUME INFORMATION USING LINKED DATA

Publishing Linked Data Requires More than Just Using a Tool

Definition of the CIDOC Conceptual Reference Model

Robust Module-based Data Management

A Secure Mediator for Integrating Multiple Level Access Control Policies

LDIF - Linked Data Integration Framework

SmartLink: a Web-based editor and search environment for Linked Services

BIRCH: An Efficient Data Clustering Method For Very Large Databases

XML Data Integration in OGSA Grids

Relational Database Design

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

Evaluating Semantic Web Service Tools using the SEALS platform

Binary Coded Web Access Pattern Tree in Education Domain

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA. by Zareen Saba Syed

Semantic Web Standard in Cloud Computing

12 The Semantic Web and RDF

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Chapter 2: Entity-Relationship Model. Entity Sets. " Example: specific person, company, event, plant

Semantics of UML class diagrams

Data Validation with OWL Integrity Constraints

HybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report

CSC 742 Database Management Systems

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

Omega Automata: Minimization and Learning 1

3. The Junction Tree Algorithms

Full and Complete Binary Trees

IV. The (Extended) Entity-Relationship Model

Database Management System

Name-based Approach to Build a Hub for Biodiversity LOD

Techniques to Produce Good Web Service Compositions in The Semantic Grid

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Binary Trees and Huffman Encoding Binary Search Trees

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.

Semantic Description of Distributed Business Processes

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

Protein Protein Interaction Networks

Joint Steering Committee for Development of RDA

Object-Process Methodology as a basis for the Visual Semantic Web

No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

Visual Analysis of Statistical Data on Maps using Linked Open Data

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Data Mining Algorithms Part 1. Dejan Sarka

A Logical Approach to NoSQL Databases

Semantic Variability Modeling for Multi-staged Service Composition

CRM dig : A generic digital provenance model for scientific observation

XV. The Entity-Relationship Model

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

LiDDM: A Data Mining System for Linked Data

A Collaborative System Software Solution for Modeling Business Flows Based on Automated Semantic Web Service Composition

Transcription:

KD2R: a Key Discovery method for semantic Reference Reconciliation Danai Symeonidou, Nathalie Pernelle and Fatiha Saϊs LRI (University Paris-Sud XI) February, 8th 2013

2 Linked Open Data cloud (LOD) LOD contains all the RDF sources in the Web links between them Same as is the most important type of link: combine information given in different data sources The number of already existing links is very small How to create links automatically?

3 Reference Reconciliation Problem Dataset1 Dataset2 FirstName: Michael LastName: Jackson SSN: 011223456 Job: Singer FirstName: Michael LastName: Jackson SSN: 011223456 Job: Singer FirstName: Michael LastName: Jackson SSN: 444223456 Job: Teacher

4 Reference Reconciliation Problem Dataset1 Dataset2 FirstName: Michael LastName: Jackson SSN: 011223456 Job: Singer SameAs FirstName: Michael LastName: Jackson SSN: 011223456 Job: Singer FirstName: Michael LastName: Jackson SSN: 444223456 Job: Teacher

5 Reference Reconciliation Problem Dataset1 Dataset2 FirstName: Michael LastName: Jackson SSN: 011223456 Job: Singer FirstName: Michael LastName: Jackson SSN: 444223456 Job: Teacher SameAs SameAs FirstName: Michael LastName: Jackson SSN: 011223456 Job: Singer

6 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O 12 Royal Academy of Arts London UK O SOURCE2 Name Located incountry TicketPrice 21 Tate Britain London England Free 22 Royal Academy of Arts London England Free

7 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O 12 Royal Academy of Arts London UK O SOURCE2 Name Located incountry TicketPrice 21 Tate Britain London England Free 22 Royal Academy of Arts London England Free

8 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O 12 Royal Academy of Arts London UK O SOURCE2 Name Located incountry TicketPrice 21 Tate Britain London England Free Sim. 0.5 22 Royal Academy of Arts London England Free

9 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds UK UK O Sim(12, Royal Academy London 22) = 0.5 England Free of Arts SOURCE2 Name Located incountry TicketPrice Sim. 0.5 21 Tate Britain London England Free 22 Royal Academy of Arts London England Free

10 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O 12 Royal Academy of Arts London UK O SOURCE2 Name Located incountry TicketPrice Name KEY 21 Tate Britain London England Free 22 Royal Academy of Arts London England Free

11 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O 12 Royal Academy of Arts London UK O SOURCE2 Name Located incountry TicketPrice 21 Tate Britain London England Free Sim. Using keys 1 22 Royal Academy of Arts London England Free

12 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O Sim(12, Royal Academy 22) UK = 1 England è SameAs O of Arts SOURCE2 Name Located incountry TicketPrice Sim. 1 Using keys 21 Tate Britain London England Free 22 Royal Academy of Arts London England Free

13 Reference Reconciliation Problem How do we decide if two identifiers refer to the same real world entity??? SOURCE1 Name Located incountry TicketPrice 11 Madame Tussauds London UK O 12 Royal Academy of Arts UK England O Solution è Use keys to reconcile data SOURCE2 Name Located incountry TicketPrice Sim. 1 Using keys 21 Tate Britain London England Free 22 Royal Academy of Arts London England Free

14 Reference Reconciliation with or without key constraints No knowledge given about the properties: all the properties have the same importance. Knowledge given by an expert: Specific expert rules [Arasu and al. 09, Low and al. 01, Volz and al. 09 (Silk)] Example: max(jaro(phone-number,phone-number), jaro-winkler(ssn,ssn)) > 0.88 Key constraints [Saïs, Pernelle and Rousset 09] Example: haskey( ()((museumname, museumaddress)) ² Problem: when data sources contain numerous data and/or complex ontologies ² Some keys are not obvious to find by the expert. ² Erroneous keys can be given by the expert. Aim: automatic discovery of a complete set of keys from RDF data

15 Key discovery methods Supervisedè Learn keys using a set of reconciled data Unsupervisedè No additional information are given Property-based è Guided by the properties Suchanek et al. 2011 (only single keys) Attencia et al. 2012 (CWA) Instance-based è Guided by the instances Symeonidou et al. 2011 (multi keys, OWA)

16 Key definition RDF data conform to an OWL2 RL ontology Key for a class expression: a combination of (inverse) properties which identifies uniquely an entity. HasKey( CE ( OPE 1... OPE m ) ( DPE 1... DPE n ) ) x, y, z 1,..., z m, w 1,..., w n : if x (CE) C and ISNAMED O (x) and y (CE) C and ISNAMED O (y) and ( x, z i ) (OPE i ) OP and ( y, z i ) (OPE i ) OP and ISNAMED O (z i ) for each 1 i m and ( x, w j ) (DPE j ) DP and ( y, w j ) (DPE j ) DP for each 1 j n then x = y If we consider haskey(city (Inverse(IsInCity)()) as a key and we have in the dataset : isincity(restaurant1,city1), isincity(restaurant1, city2), isincity(restaurant2,city2) Then we will infer that city1 = city2

17 Key Discovery Problem in OWA A set of RDF data sources: each data source conform to an OWL 2 ontology Multivalued properties may exist. Open world assumption (incomplete data) name firstname hasfriend i1 Atencia Manuel i2,i3 i2 Atencia Madalina i3 David Jerôme i2, i4 i4 Chein Michel How to discover keys when we don t know if : i1 =?= i2 =?=i3 =?=i4 hasfriend(i1,i4), hasfriend(i2, i3).?? firstname(i1, Elodie)?

18 Key Discovery Problem: our assumptions Unique Name Assumption (UNA): Two distinct URIs refer to two different real world entities. In the LOD, we consider the data sources generated from relational databases or those build in a way the UNA is fulfilled (Yago) i1 <> i2<> i3 <> i4 Two literals that are syntactically different are semantically different (e.g. Napoleon Bonaparte <> Napoleon ) Heuristic 1 - Pessimistic: Not instantiated property è all the values are possible Example: hasfriend(i2, i3), hasfriend(i2, i4) are possible. Instantiated property è only given values are considered Example: not hasfriend(i1, i4)

19 Key Discovery Problem: our assumptions A set of property expressions {pe1,, pe n } is a non key for the class c in a data source s i if: Example: {name}, {hasfriend} is a non key A set of property expressions {pe1,, pe n } is a key for the class c in a data source s i if: Example: {firstname}, {name, firstname}, {firstname, hasfriend} are keys {hasfriend, name} are neither a key nor a non key, it is called undetermined key.

20 Key Discovery Problem: our assumptions Heuristic 2 -Optimist : Not instantiated property è value not one of the already existing ones Example: not hasfriend(i2, i3), not hasfriend(i2, i1), not hasfriend(i2, i4). Instantiated property è only given values are considered Example: not hasfriend(i1, i4) The same definition for non keys A set of property expressions {pe1,, pe n } is a key for the class c in a data source s i if: pe j, Zpe j (X,Z) Wpe j (Y,W ) or Example : {firstname}, {name, firstname}, {firstname, hasfriend} are keys

21 KD2R approach Find all minimal keys that are valid w.r.t the previous definition, in all the considered data sources Scalability Do not check all the combinations of properties Partially scan the data Find first the set of maximal non keys and undetermined keys (inspired from Gordian [Y. Sismanis and al. 2006]) è derive keys from this set. Unlike Gordian, KD2R: is ontology based: subsumption relation is exploited to inherit keys considers multi-valued properties and incomplete information.

22 KD2R approach Topological sort of the classes (subsumption). The keys are obtained by selecting the minimal keys of the Cartesian product (w.r.t mappings) of the minimal key sets discovered in the sources S1, S2. Example: K1 = {{name, firstname}, {hasfriend}} K2 = {{firstname}} K 1-2 = { {name, firstname}, {hasfriend, firstname}}

23 KD2R approach: Key Finder The set of maximal non keys and undetermined keys is computed on a prefix-tree (a compact representation of the data of one class) Key derivation: Computation of the complement set of each non key and undetermined key Computation of the Cartesian product of the complement sets Selection of the minimal keys. Time complexity: quadratic in terms of number of discovered keys.

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France {M2, M3} England {M4} Node cell located City1 {M1} Null City 3 {M3} City 4 {M4} contains Null {M1} P4 P5 Null {M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg rue de Lille {M3} Marylebone Road {M4} Each level represents an attribute of a class Each node describes instances that share the same father-cell value. Each cell contains a value and a list of identifiers (URI List)

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France England {M4} located City1 {M1} Null City 4 {M4} contains Null {M1} P4 P5 Null {M4} Name Archaeological {M1} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg Marylebone Road {M4}

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France {M2, M3} England {M4} located City1 {M1} Null City 4 {M4} contains Null {M1} P4 P5 Null {M4} Name Archaeological {M1} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg Marylebone Road {M4}

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France {M2, M3} England {M4} located City1 {M1} Null City 3 {M3} City 4 {M4} contains Null {M1} P4 P5 Null {M4} Name Archaeological {M1} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg Marylebone Road {M4}

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France {M2, M3} England {M4} located City1 {M1} Null City 3 {M3} City 4 {M4} contains Null {M1} P4 P5 Null {M3} Null {M4} Name Archaeological {M1} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg Marylebone Road {M4}

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France {M2, M3} England {M4} located City1 {M1} Null City 3 {M3} City 4 {M4} contains Null {M1} P4 P5 Null {M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg Marylebone Road {M4}

Pessimistic: Prefix-tree Creation - Step1 incountry located contains museumname museumaddress 1 Greece City1 - - - Archaeological 44 Pa:ssion Street 2 France - - - S1_p4, S1_p5 19 rue Beaubourg 3 France City3 - - - Musee d orsay 62, rue de Lille 4 England City4 - - - Madame Tussauds Marylebone Road incountry Greece {M1} France {M2, M3} England {M4} located City1 {M1} Null City 3 {M3} City 4 {M4} contains Null {M1} P4 P5 Null {M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg rue de Lille {M3} Marylebone Road {M4}

Pessimistic: Prefix-tree Creation Step2 incountry located contains Greece {M1} City1 {M1} Null {M1} P4 France {M2, M3} Null City 3 {M3} P5 Null {M3} England {M4} City 4 {M4} Null {M4} Merging the cells of a node Merging nodes Name Archaeological {M1} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} Final Prefix Tree contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4}

UNKeyFinder Wax(S1_m2), museumname(s1_m2, Wax ), Prefix tree creation UNKey Finder Maximal undetermined keys and non keys Input: One dataset, one class, a set of known keys Output: set of maximal non keys and undetermined keys Examination of each possible subset of attributes. Recursive method The traversal is top down and left first è When URI List >1 : More than two instances share the same value for a specific subset of attributes The subset of attributes belongs to a UNKey Different prunings: Key Monitonicity Detection of paths describing one entity Use existing inherited keys to avoid exploring sub-trees in the prefix-tree. Non Key anti-monitonicity Use the already computed non keys to avoid exploring sub-trees in the prefix-tree.

UNKeyFinder Example We call the UNKeyFinder for the highlighted node Since the URI List is 1 we stop Pruning step (key Monotonicity) incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address

UNKeyFinder Example We call the UNKeyFinder for the highlighted node incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address

UNKeyFinder Example We call the UNKeyFinder for the highlighted node incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address

UNKeyFinder Example We call the UNKeyFinder for the node In the next step we follow the left child of the highlighted node incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address

UNKeyFinder Example We call the UNKeyFinder for the highlighted node Cell with URI List = 1 Pruning step (1) Cell Musee d orsay with URI List = 1 Pruning step (1) Now we have to merge the children of the node and call UNKeyFinder for the merged node incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address

UNKeyFinder Example We call the UNKeyFinder for the highlighted node incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address

UNKeyFinder Example Since there a cell with URIList> 1 the curunkey is a UNKey incountry Greece { M1} France {M2, M3} England {M4} located City 1{M1} City3{M2, M3} City4{M4} contains Null {M1} P4 {M2, M3} P5 {M2, M3} Null {M4} Name Archaeological {M1} Musee d orsay {M3} Musee d orsay {M3} Madame Tussauds {M4} Address 44 Pa:ssion Street {M1} rue Beaubourg rue de Lille {M3} rue Beaubourg rue de Lille {M3} Marylebone Road {M4} incountry, located, contains, Name, Address incountry, located, contains

40 Experiments: OAEI 10 datasets Datasets RDF files #instances Restaurants Dataset Person Dataset Restaurant1.rdf 339 Restaurant2.rdf 1390 Person11.rdf 1000 Peson12.rdf 1000 Person21.rdf 1200 Experiments executed to compare: KD2R keys Expert keys Datasets Classes Property set Restaurants (2 files) Person (3files) Restaurant Address Person Address name, phonenumber, hascategory, hasaddress street, city, Inverse(hasAddress) givenname, state, surname, dateofbirth, socsecurityid, phonenumber, age, hasaddress street, housenumber, postcode, isinsuburb

Person Dataset 41 Person dataset consists of 2000 instances of the classes Person and Address.

Restaurant Dataset 42 Restaurant dataset describes 1729 instances (classes Restaurant and Address).

ChefMoz Dataset 43 32586 instances (class Restaurant). 1575 instances of the class Restaurant.

Dbpedia Dataset 44 Dbpedia Person è 6 discovered keys 763644 instances 5639680 RDF triples Natural Places è 21 discovered keys 49887 instances 1604347 RDF triples Subclasses of Natural Places Lake è 6 discovered keys BodyOfWater è 17 discovered keys

45 Conclusion Approach that discover composite keys in RDF datasets different ontologies (aligned) Unique Name Assumption Experiments: Discovered keys improve the data linking KD2R is scalable thanks to the pruning techniques Ex. Dbpedia Natural Places 5% of data explored

46 Future work DAVI approach Keys with N exceptions Key with N number of instances that violate of the definition of the key Conditional keys.

QUESTIONS??? 47

THANK YOU!!! 48