Data Integration and Data Provenance. Profa. Dra. Cristina Dutra de Aguiar Ciferri
|
|
|
- Tyrone Cross
- 10 years ago
- Views:
Transcription
1 Data Integration and Data Provenance Profa. Dra. Cristina Dutra de Aguiar Ciferri
2 Outline n Data Integration q Schema Integration q Instance Integration q Our Work n Data Provenance q Basic Concepts q The PrInt Model 2
3 Outline n Index Structures q Biological databases q Similarity search of complex data q Spatial data warehouses q Similarity search over data warehouses of images n Mining of Medical Data n 3
4 Data Integration n Schema Level Integration q specification of mappings that describe the semantic relationships among schemas from heterogeneous data sources n Instance Level Integration q identification of which entities from heterogeneous sources refer to the same entity in the real-world q resolution of value conflicts 4
5 Schema Level Integration n Semantic Relativism q conflicts between two or more representations is related to the fact that different users model the same piece of the real-world in different ways according to their perceptions q Types of Conflict Identification n n n name, including homonymous and synonymous semantic structural 5
6 Types of Schema n Global Schema (or Mediated Schema) q integration of several heterogeneous local schemas into a homogeneous global schema q development of mappings that describe the semantic relationships between the mediated schema and the schemas of the sources n Federated Schema q there is not a homogeneous global schema q there are several heterogeneous local schemas, each one related to a given source 6
7 Example of Structural Mappings X C C C C X C cor cor atrib Catrib atrib C X cor C X C Y X Y cor C W C Z cor C Z W Z C cor X Y a X Ca Y X a C atrib cor = corresponde a = associação atrib = atributo C,W,X,Y,Z = classe c o r Y C X aatrib atrib Y SPACCAPIETRA, S., PARENT, C. View Integration: A Step Forward in Solving Structural Conflicts. IEEE Transactions on Knowledge and Data Engineering, v.6, n.2, p ,
8 Instance Level Integration n Reference Reconciliation (or Entity Resolution) q automatically detect references to the same entity of the real-world and group them in a cluster of similar entities n Value Conflict Resolution q solve the differences among values of attributes of the entities that refer to the same entity of the real-world 8
9 Reference Reconciliation Examples of entities from the class article (a) a 1 = ({ Distributed query processing in a relational database system }, { }, {p 1 ; p 2 ; p 3 }; {c 1 }) a 2 = ({ Distributed query processing in a relational database system }, { },{p 4 ; p 5 ; p 6 }; {c 2 }) Examples of entities from the class person (p) p 1 = ({ Robert S. Epstein }, null, {p 2, p 3 }, null) p 2 = ({ Michael Stonebraker }, null, {p 1, p 3 }, null) p 3 = ({ Eugene Wong }, null, {p 1, p 2 }, null) p 4 = ({ Epstein, R.S. }, null, {p 5, p 6 }, null) p 5 = ({ Stonebraker, M. }, null, {p 4, p 6 }, null) p 6 = ({ Wong, E. }, null, {p 4, p 5 }, null) p 7 = ({ Eugene Wong }, { [email protected]"}, null, {p 8 }) p 8 = (null, { [email protected]"}, null, {p 7 }) p 9 = ({ mike }, { [email protected]"}, null, null) Examples of entities from the class conference (c) c 1 = ({ ACM Conference on Management of Data }, { 1978 }, { Austin, Texas }) c 2 = ({ ACM SIGMOD }, { 1978 }, null) article: title, pages, *authors, *conference person: name, , *authors, * contact conference: name, year, local 9
10 Reference Reconciliation Grouping from entities of the class article (a) grouping 1 = {a 1, a 2 } Grouping from entities of the class person (p) grouping 2 = {p 1, p 4 } grouping 3 = {p 2, p 5, p 8, p 9 } grouping 4 = {p 3, p 6, p 7 } Grouping from entities of the class conference (c) grouping 5 = {c 1, c 2 } DONG, X.; HALEVY, A.; MADHAVAN, J. Reference Reconciliation in Complex Information Spaces. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), p.85-96,
11 Our Work n The Academic Data Reconciler Tool q semi-automates the identification of n n correspondent objects inconsistencies q helps user to n n n eliminate inconsistencies complete data exchange data n The Academic Data Reconciler Tool for Reference Reconciliation 11
12 Academic Data Reconciler Tool (ADR) n Functional aspects q visualization of objects from two documents n side-by-side visualization q synchronization of objects q edition of incomplete and erroneous data q data exchange between objects from different documents 12
13 Interface 13
14 Visualization 14
15 Synchronization 15
16 Edition 16
17 Data Exchange 17
18 ADR for Reference Reconciliation! integrated entity similar entities that belong to the same grouping 18
CHAPTER-6 DATA WAREHOUSE
CHAPTER-6 DATA WAREHOUSE 1 CHAPTER-6 DATA WAREHOUSE 6.1 INTRODUCTION Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their
A Survey on Data Warehouse Architecture
A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 DATA INTEGRATION Motivation Many databases and sources of data that need to be integrated to work together Almost all applications have many sources
A STATISTICAL DATA FUSION TECHNIQUE IN VIRTUAL DATA INTEGRATION ENVIRONMENT
A STATISTICAL DATA FUSION TECHNIQUE IN VIRTUAL DATA INTEGRATION ENVIRONMENT Mohamed M. Hafez 1, Ali H. El-Bastawissy 1 and Osman M. Hegazy 1 1 Information Systems Dept., Faculty of Computers and Information,
Chapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration
DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course
Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy
The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
Dependencies Revisited for Improving Data Quality
Dependencies Revisited for Improving Data Quality Wenfei Fan University of Edinburgh & Bell Laboratories Wenfei Fan Dependencies Revisited for Improving Data Quality 1 / 70 Real-world data is often dirty
Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales
Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May 2014 1 Introduction F1 [1] is a distributed relational database developed by Google
2 Associating Facts with Time
TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points
MULTI AGENT-BASED DISTRIBUTED DATA MINING
MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
A Solution for Data Inconsistency in Data Integration *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 27, 681-695 (2011) A Solution for Data Inconsistency in Data Integration * Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai,
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday
SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS
SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS Irwan Bastian, Lily Wulandari, I Wayan Simri Wicaksana {bastian, lily, wayan}@staff.gunadarma.ac.id Program Doktor Teknologi
Chapter 3 - Data Replication and Materialized Integration
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:
Big Data Governance Certification Self-Study Kit Bundle
Big Data Governance Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Governance Certification.
Contents RELATIONAL DATABASES
Preface xvii Chapter 1 Introduction 1.1 Database-System Applications 1 1.2 Purpose of Database Systems 3 1.3 View of Data 5 1.4 Database Languages 9 1.5 Relational Databases 11 1.6 Database Design 14 1.7
Query Management in Data Integration Systems: the MOMIS approach
Dottorato di Ricerca in Computer Engineering and Science Scuola di Dottorato in Information and Communication Technologies XXI Ciclo Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
Survey on Data Cleaning Prerna S.Kulkarni, Dr. J.W.Bakal
Survey on Data Cleaning Prerna S.Kulkarni, Dr. J.W.Bakal Abstract DATA warehouse of an enterprise consolidates the data from multiple sources of the organization/enterprise in order to support enterprise
Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm
Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm Dr.M.Mayilvaganan, M.Saipriyanka Associate Professor, Dept. of Computer Science, PSG College of Arts and Science,
Data Integration in Multi-sources Information Systems
ISSN (e): 2250 3005 Vol, 05 Issue, 01 January 2015 International Journal of Computational Engineering Research (IJCER) Data Integration in Multi-sources Information Systems Adham mohsin saeed Computer
Turkish Journal of Engineering, Science and Technology
Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006
Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 What is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-varying, non-volatile
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
Big Data Governance Certification Self-Study Kit Bundle
Big Data Governance Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Governance Certification.
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs
IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs Elzbieta Malinowski and Esteban Zimányi Computer & Decision Engineering Department, Université Libre de Bruxelles 50 av.f.d.roosevelt,
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Data Integration and Data Cleaning in DWH
Frühjahrssemester 2010 Data Integration and Data Cleaning in DWH Dr. Diego Milano Organization Motivation: Data Integration and DWH Data Integration Schema (intensional) Level Instance (extensional) Level:
Big Data & Its Importance
Big Data and Data Science: Case Studies Priyanka Srivatsa 1 1 Department of Computer Science & Engineering, M.S.Ramaiah Institute of Technology, Bangalore- 560054. Abstract- Big data is a collection of
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture References Anatomy of a database system. J. Hellerstein and M. Stonebraker. In Red Book (4th
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology) Software
What Your CEO Should Know About Master Data Management
White Paper What Your CEO Should Know About Master Data Management A Business Use Case on How You Can Use MDM to Drive Revenue and Improve Sales and Channel Performance This document contains Confidential,
Query reformulation for an XML-based Data Integration System
Query reformulation for an XML-based Data Integration System Bernadette Farias Lóscio Ceará State University Av. Paranjana, 1700, Itaperi, 60740-000 - Fortaleza - CE, Brazil +55 85 3101.8600 [email protected]
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
International Journal of Advanced Research in Computer Science and Software Engineering
Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
Comparing Data Integration Algorithms
Comparing Data Integration Algorithms Initial Background Report Name: Sebastian Tsierkezos [email protected] ID :5859868 Supervisor: Dr Sandra Sampaio School of Computer Science 1 Abstract The problem
chapater 7 : Distributed Database Management Systems
chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Using Ontologies for Geographic Information Intergration Frederico Torres Fonseca
USING ONTOLOGIES FOR GEOGRAPHIC INFORMATION INTEGRATION Frederico Torres Fonseca The Pennsylvania State University, USA Keywords: ontologies, GIS, geographic information integration, interoperability Contents
Constraint-based Query Distribution Framework for an Integrated Global Schema
Constraint-based Query Distribution Framework for an Integrated Global Schema Ahmad Kamran Malik 1, Muhammad Abdul Qadir 1, Nadeem Iftikhar 2, and Muhammad Usman 3 1 Muhammad Ali Jinnah University, Islamabad,
2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications
2. Background on Data Management Aspects of Data Management and an Overview of Solutions used in Engineering Applications Overview Basic Terms What is data, information, data management, a data model,
SOLUTION BRIEF CA ERwin Modeling. How can I understand, manage and govern complex data assets and improve business agility?
SOLUTION BRIEF CA ERwin Modeling How can I understand, manage and govern complex data assets and improve business agility? SOLUTION BRIEF CA DATABASE MANAGEMENT FOR DB2 FOR z/os DRAFT CA ERwin Modeling
Semantic Information Retrieval from Distributed Heterogeneous Data Sources
Semantic Information Retrieval from Distributed Heterogeneous Sources K. Munir, M. Odeh, R. McClatchey, S. Khan, I. Habib CCS Research Centre, University of West of England, Frenchay, Bristol, UK Email
CMDB Federation. DMTF Standards for Federating CMDBs and other Management Data Repositories
CMDB Federation DMTF Standards for Federating CMDBs and other Management Data Repositories Synopsis Many organizations base IT management on a configuration management system consisting of a configuration
INFORMING A INFORMATION DISCOVERY TOOL FOR USING GESTURE
INFORMING A INFORMATION DISCOVERY TOOL FOR USING GESTURE Luís Manuel Borges Gouveia Feliz Ribeiro Gouveia {lmbg, fribeiro}@ufp.pt Centro de Recursos Multimediáticos Universidade Fernando Pessoa Porto -
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
"The performance driven Enterprise" Emerging trends in Enterprise BI Platforms
1 Month, Day, Year Venue City "The performance driven Enterprise" Emerging trends in Enterprise BI Platforms Kostiantyn Stupak Oracle BI representative in Ukraine 2 The Race to Gain Insight 2014? 50% 2009
Enable Location-based Services with a Tracking Framework
Enable Location-based Services with a Tracking Framework Mareike Kritzler University of Muenster, Institute for Geoinformatics, Weseler Str. 253, 48151 Münster, Germany [email protected] Abstract.
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems Agusthiyar.R, 1, Dr. K. Narashiman 2 Assistant Professor (Sr.G), Department of Computer Applications,
Modelling Architecture for Multimedia Data Warehouse
Modelling Architecture for Warehouse Mital Vora 1, Jelam Vora 2, Dr. N. N. Jani 3 Assistant Professor, Department of Computer Science, T. N. Rao College of I.T., Rajkot, Gujarat, India 1 Assistant Professor,
Distributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
