Data Integration and Data Cleaning in DWH

Size: px
Start display at page:

Download "Data Integration and Data Cleaning in DWH"

Transcription

1 Frühjahrssemester 2010 Data Integration and Data Cleaning in DWH Dr. Diego Milano

2 Organization Motivation: Data Integration and DWH Data Integration Schema (intensional) Level Instance (extensional) Level: Data Cleaning Building an DWH: Data Integration & Cleaning in DWH Design (Introduction to Data Quality)

3 Background Knowledge & Tools If you don't master some of these tools, let me know immediately: Database & basics: RDBMs concepts Relational model Entity Relationship Model (and Possibly UML) Database design: from a conceptual model to the logical model

4 What is a DW? A collection of data from different sources Integrated Persistent Dynamically Evolving Focused Used for Decision Support

5 DWH Operational data (from production/sales OLTP environments) External data (e.g. exchange rates, prices from other sales chains etc.) We focus on what happens here DWH OLAP Data Mining Reporting

6 Data Integration Given a set of data sources, data integration is the task of presenting them to the user as a single data source. Local Schemas Sources S 1 S 2 S 3... Integrated DB G Global Schema

7 Two approaches: virtual/materialized Virtual integration: Data stays at the sources, the extension of the global schema is not materialized Queries on the global schema answered using data at sources Pros/cons: + updates on the local sources immediately reflected on the (virtual) integrated DB + No redundancy/no conflicts due to lack of synchronization Enforcing constraints on the global schema not alway possible. Depending on the relationships between the global and the local schema, answering queries may be hard (and inefficient) Propagating updates from the global schema to the global sources is hard Solving inconsistencies at the extensional level is hard

8 Two approaches: virtual/materialized Materialized integration: Data is copied to a single integrated database Pros/cons: + Queries on the integrated repository are more efficient + Possible/Easier to apply complex transformations to the original data: Integrated schema can be very different from source Instance level transformations made easier Integrated DB goes out of sync with sources, needs periodical refreshing Less storage-efficient, potential inconsistencies due to redundancy A Data Warehouse is first of all a data integration system adopting the materialized approach

9 Heterogeneity The main issue in data integration tasks is heterogeneity Data residing at different sources present differences on a number of aspects. These differences make it more complex to reduce these data to a single, integrated view It is not easy to classify heterogenity in a crisp way. Some differences relate to syntactic aspects (the specific language/technology used to represent reality), other relate to semantic aspects (how a certain representation captures reality, its meaning), but these differences coexist and it is not always easy or possible to draw lines between what is syntax and what is semantics.

10 Heterogeneity (Systems/Technology/Syntax) Legacy systems (ad-hoc interfaces) Flat files Web-sources XML files/databases Different DBMSs (e.g. RDBMS, OODBMS...) DBMS with the same flavour (e.g. RDBMS) but with differences in proprietary syntax

11 Heterogeneity (Data Representation) Intensional Level (schema): Data Model (modeling language): Relational, object-oriented, reticular, semi-structured etc. Structure (representation choices): Different designers have different views of the world (and different application needs), and may use different constructs/data types to represent the same concepts/reality: e.g. Date represented as attribute/standalone concept e.g. Attribute 'sex' encoded as String / Acronym / Integer (0,1) Different views of the world include/exclude portions of information: e.g. Record marital status of employees. Linguistics/terminology: Different designers may use different terms to denote the same concept or use the same term to mean different concepts, at various levels: e.g. attribute 'price': $ Data Warehousing (CS242)

12 Heterogeneity (Data Representation) Extensional level (instances) Unmappable or partially mappable domains Non-overlapping domains e.g. All students in basel, only students enrolled after Domains with different granularity: e.g. Sales per day/per month Application-specific domains: e.g. custom identifiers (like employee_code, color_code) meaningful only within a certain application domain. Inconsistencies between semantically equivalent instances Due to errors or other Data Quality problems Data Warehousing (CS242)

13 Solving heterogeneity issues: Systems/Model level: Wrapper-based architectures Intensional Level: Schema Integration Extensional Level: Instance Identification Instance Reconciliation

14 Wrapper-based Architectures A wrapper is a piece of software that encapsulates another softwaresystem and acts as an interpreter for it. Allows to: Hide technological differences Hide (to a certain extent) model differences, presenting all sources in a single canonical language. Canonical Model/Language Wrapper Wrapper Wrapper? Legacy RDBMS XML data <xsd:schema> <xsd:element> <xsd:cheneso>... </xsd:cheneso> <xsd:<schema>

15 Schema Integration Given n data source schemas L1,..,Ln, integrating them means: Identifying correspondences among them Designing a new, integrated schema G that abstracts over all of them and is possibly tailored to some specific application (e.g. for Data Warehousing) Formally specifying mappings between the integrated schema and the source schemas. There are tools to semi-automatically perform some of the activities in schema integration, but these are mostly research-level prototypes. Schema integration is still a (complex) design task for human. Requires expertise in database modeling, and a deep knowledge of the application domains of the schemas to integrate.

16 Wrapper-Mediator Mediator A mediator interacts with the wrappers, and presents to the users a unified global view over the local schemas Mapping Wrapper Wrapper Wrapper? Legacy RDBMS XML data <xsd:schema> <xsd:element> <xsd:cheneso>... </xsd:cheneso> <xsd:<schema>

17 Schema Integration Steps 1.Analysis, Normalization, Abstraction to a common conceptual modeling language 2.Choice of integration strategy 3.Schema Matching: Identify relationships among local schemas 4.Schema Alignment: solve conflicts 5.Schema Fusion: create the Global schema The result of this process is a mapping between the source schemas and the integrated schema

18 1. Analysis For each data source in isolation, the designer must acquire a deep understanding of the application domain: In-depth analysis of the schema(s) interaction with domain experts The result of this phases is to produce a conceptual schema in the canonical language of choice, which: Reflects in the most accurate and complete way possible the domain of interest. Is well-understood Is well-documented

19 Analysis: Know Your Enemy Gathering knowledge about complex application domains is difficult: Business rules covered by secret/not well-documented (Cooperative) domain experts are key elements Understanding the IS of an enterprise is difficult: Legacy systems requires ad-hoc knowledge (e.g. No database schema but data in flat files with custom format) Even if the DB is relational: Software/System documentation is often poor. The domain conceptualization steps that lead to a certain database design, and many design choices, may be lost. Reverse-engineering of the logical schemas and associated applications is sometimes required. This might involve:» Normalization: For efficiency reasons, or bad design, logical schemas are sometimes denormalized» Inferring constraints: not all contraints of the domain are always enforced at the level of logical schema (e.g. not enforced at all, enforced at the application level) Systems are not always well designed/schemas become old. Sometimes corrections to the schema are required

20 Analysis, Normalization, Abstraction CREATE TABLE product( cat_desc VARCHAR(255), cat_name VARCHAR(255), cat_code INTEGER, prod_desc VARCHAR(255), prod_name VARCHAR(255), prod_code INTEGER PRIMARY KEY ); cat_desc cat_name cat_code Product prod_desc prod_name prod_code CREATE TABLE category( cat_desc VARCHAR(255), cat_name VARCHAR(255), cat_code INTEGER PRIMARY_KEY, ); CREATE TABLE product( prod_desc VARCHAR(255), prod_name VARCHAR(255), prod_code INTEGER PRIMARY KEY cat_code INTEGER REFERENCES category(cat_code) ); normalization/correction: the original logical schema is unnormalized AND does not enforce all constraints holding in the application domain. Product (1,1) belongs_to (0,n) Category description/string Name/String Code / integer Description / String Name/String Code / String

21 2. Choice of Integration Strategy Comparing at the same time too many schemas is not always easy/feasible Integration process binary n-ary ladder balanced single step iterative

22 3. Schema Matching Schemas are comparatively analyzed to identify: common concepts and relationships among them differences and structural/semantic conflicts interschema properties

23 Structural Conflicts on Concepts Book is a common concept Publisher and its relationship to book have a structural conflict: the designers used different language constructs to model the same reality an entity set+relationship in one schema, attributes in the other one Book title ISBN title ISBN Book published_by Publisher Publisher_address Publisher Address Name

24 Semantic Conflicts on Concepts The attributes Age and Birthdate clearly model two semantically different concepts. However, it is rather easy to solve this conflict because there is an obvious dependency among then. Solving the conflict means being able to restructure one of the schemas (and thus applying to the data some transformation) to make the two concepts identical. Birthdate SSN Citizen SSN Age Citizen

25 Pitfalls in language: stat rosa pristina nomen... Homonimy: two concepts have the same name but different semantics Synonimy: two concepts have the same semantics but different name Equivalent, with linguistic conflicts: synonims Employee Worker Teacher (1,1) (1,1) (1,1) assigned_to assigned_to assigned_to (1,n) (1,n) (1,n) Department Department Department Identical Non-equivalent, homonims!

26 Scheme Comparison Identity: the concept is modeled in the same way both from the point of view of structure and that of semantics Equivalence: the concept have the same semantics (same view of the world) but there are structural conflicts Comparability: concepts are modelled with different structure/semantics but the views of the world do not conflict Incomparability: The view of the world differs producing a conflict that is not (easily) solvable

27 Different, but comparable views Employee Employee (1,1) (1,1) participates_in assigned_to (1,n) (1,n) Project Department (1,1) belongs_to (1,n) Department

28 Incomparable views The semantics of the two schemes look the same. However, there is a conflict in the integrity constraints which makes the schemas incompatible. Professor Name Professor Name (0,1) (2,n) teaches teaches (1,1) (1,1) Course Course_ID Course Course_ID

29 Inter-schema properties Schema 1 Schema 2 title title ISBN Book Book ISBN published_by written_by Address Address Name Publisher works_for Author Name

30 4. Schema Alignment The goal of this phase is to solve differences/conflicts identified at the previous step Obtained by applying transformations to the local schemas: names and types of attributes functional dependencies integrity constraints Issues: Not all conflicts can be solved, e.g. they derive by a substantial differences in how different information systems are designed (how they model the application domain). In this case, users/domain experts must give hints on which is the intepretation of the world they prefer In case of uncertainty, priority is given to those schemas which are more important in the system (e.g., for DWH, schemas with central concepts in the data mart)

31 5. Schema Fusion Aligned schemas are merged to obtain a single integrated schema. Overlap common concepts Add all other concepts, connecting them to the common concepts

32 Alignment and Fusion Alignment and fusion, are applied in an iterative way: Solve some conflicts produce temporary integrated schema To solve new conflicts, apply transformations to either the schemas or to the temporary integrated schema

33 Mappings A mapping is a set of assertions about correspondencies that hold between two schemas. For very different schemas, mappings are hardly formalizable As the integration process proceeds, it becomes possible to express relationships about the extensions of the schemas. At the conceptual level: as set-relationships At the logical level: as queries (in the simplest case) or as transformations The goal is to link every concept in the integrted schema to some concept in the initial schemas through a chain of transformations

34 Questions & Answers

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

Data warehouse Architectures and processes

Data warehouse Architectures and processes Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between

More information

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course

More information

ECS 165A: Introduction to Database Systems

ECS 165A: Introduction to Database Systems ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction

More information

How To Write A Diagram

How To Write A Diagram Data Model ing Essentials Third Edition Graeme C. Simsion and Graham C. Witt MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Chapter 3 - Data Replication and Materialized Integration

Chapter 3 - Data Replication and Materialized Integration Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Data Modeling Basics

Data Modeling Basics Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy

More information

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways

More information

Enterprise Modeling and Data Warehousing in Telecom Italia

Enterprise Modeling and Data Warehousing in Telecom Italia Enterprise Modeling and Data Warehousing in Telecom Italia Diego Calvanese Faculty of Computer Science Free University of Bolzano/Bozen Piazza Domenicani 3 I-39100 Bolzano-Bozen BZ, Italy Luigi Dragone,

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

Principles of Database. Management: Summary

Principles of Database. Management: Summary Principles of Database Management: Summary Pieter-Jan Smets September 22, 2015 Contents 1 Fundamental Concepts 5 1.1 Applications of Database Technology.............................. 5 1.2 Definitions.............................................

More information

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Chapter 1: Introduction. Database Management System (DBMS) University Database Example This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Talend Metadata Manager provides a comprehensive set of capabilities for all facets of metadata

More information

Chapter 5. Learning Objectives. DW Development and ETL

Chapter 5. Learning Objectives. DW Development and ETL Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases

More information

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in

More information

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in

More information

CSE 132A. Database Systems Principles

CSE 132A. Database Systems Principles CSE 132A Database Systems Principles Prof. Victor Vianu 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric:

More information

2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications

2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications 2. Background on Data Management Aspects of Data Management and an Overview of Solutions used in Engineering Applications Overview Basic Terms What is data, information, data management, a data model,

More information

Introduction to Datawarehousing

Introduction to Datawarehousing DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society

More information

Query Management in Data Integration Systems: the MOMIS approach

Query Management in Data Integration Systems: the MOMIS approach Dottorato di Ricerca in Computer Engineering and Science Scuola di Dottorato in Information and Communication Technologies XXI Ciclo Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria

More information

SOA Success is Not a Matter of Luck

SOA Success is Not a Matter of Luck by Prasad Jayakumar, Technology Lead at Enterprise Solutions, Infosys Technologies Ltd SERVICE TECHNOLOGY MAGAZINE Issue L May 2011 Introduction There is nothing either good or bad, but thinking makes

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Data-Warehouse-, Data-Mining- und OLAP-Technologien

Data-Warehouse-, Data-Mining- und OLAP-Technologien Data-Warehouse-, Data-Mining- und OLAP-Technologien Chapter 2: Data Warehouse Architecture Bernhard Mitschang Universität Stuttgart Winter Term 2014/2015 Overview Data Warehouse Architecture Data Sources

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

COMP5138 Relational Database Management Systems. Databases are Everywhere!

COMP5138 Relational Database Management Systems. Databases are Everywhere! COMP5138 Relational Database Management Systems Week 1: COMP 5138 Intro to Database Systems Professor Joseph Davis and Boon Ooi Databases are Everywhere! Database Application Examples: Banking: all transactions

More information

Overview of Data Management

Overview of Data Management Overview of Data Management Grant Weddell Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Winter 2015 CS 348 (Intro to DB Mgmt) Overview of Data Management

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs Topics Software V:. Database concepts: records, fields, data types. Relational and objectoriented databases. Computer maintenance and operation: storage health and utilities; back-up strategies; keeping

More information

A Survey on Data Warehouse Architecture

A Survey on Data Warehouse Architecture A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India

More information

Course Notes on A Short History of Database Technology

Course Notes on A Short History of Database Technology Course Notes on A Short History of Database Technology Traditional File-Based Approach Three Eras of Database Technology (1) Prehistory file systems hierarchical and network systems (2) The revolution:

More information

Course Notes on A Short History of Database Technology

Course Notes on A Short History of Database Technology Course Notes on A Short History of Database Technology Three Eras of Database Technology (1) Prehistory file systems hierarchical and network systems (2) The revolution: relational database technology

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML? CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency

More information

Chapter 1 Databases and Database Users

Chapter 1 Databases and Database Users Chapter 1 Databases and Database Users Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 1 Outline Introduction An Example Characteristics of the Database Approach Actors

More information

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 What is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-varying, non-volatile

More information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline Chapter 1 Databases and Database Users Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Introduction Chapter 1 Outline An Example Characteristics of the Database Approach Actors

More information

Relational Database Basics Review

Relational Database Basics Review Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on

More information

Data Warehousing. Jens Teubner, TU Dortmund [email protected]. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview

More information

Data Modeling and Databases I - Introduction. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases I - Introduction. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases I - Introduction Gustavo Alonso Systems Group Department of Computer Science ETH Zürich ADMINISTRATIVE ASPECTS D-INFK, ETH Zurich, Data Modeling and Databases 2 Basic Data Lectures

More information

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc. Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1 The Relational Model Ramakrishnan&Gehrke, Chapter 3 CS4320 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Legacy systems in older models

More information

DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 DATA INTEGRATION Motivation Many databases and sources of data that need to be integrated to work together Almost all applications have many sources

More information

University Data Warehouse Design Issues: A Case Study

University Data Warehouse Design Issues: A Case Study Session 2358 University Data Warehouse Design Issues: A Case Study Melissa C. Lin Chief Information Office, University of Florida Abstract A discussion of the design and modeling issues associated with

More information

Data Integration and ETL Process

Data Integration and ETL Process Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second

More information

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology Chapter 10 Practical Database Design Methodology Practical Database Design Methodology Design methodology Target database managed by some type of database management system Various design methodologies

More information

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT IT0457 Data Warehousing G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline What is data warehousing The benefit of data warehousing Differences between OLTP and data warehousing The architecture

More information

Data Warehouse Design

Data Warehouse Design Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City

More information

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Data Virtualization and ETL. Denodo Technologies Architecture Brief Data Virtualization and ETL Denodo Technologies Architecture Brief Contents Data Virtualization and ETL... 3 Summary... 3 Data Virtualization... 7 What is Data Virtualization good for?... 8 Applications

More information

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments. Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments Anuraj Gupta Department of Electronics and Communication Oriental Institute

More information

Integrated Data Management: Discovering what you may not know

Integrated Data Management: Discovering what you may not know Integrated Data Management: Discovering what you may not know Eric Naiburg [email protected] Agenda Discovering existing data assets is hard What is Discovery Discovery and archiving Discovery, test

More information

Data Integration and ETL Process

Data Integration and ETL Process Data Integration and ETL Process Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology) Software

More information

Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606

Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606 Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606 J.A. Maldonado, M. Robles, P. Crespo Bioengineering, Electronics and Telemedicine

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Secure Database Development

Secure Database Development Secure Database Development Jan Jurjens () and Eduardo B. Fernandez (2) () Computing Department, The Open University, Milton Keynes, MK7 8LA GB http://www.jurjens.de/jan (2) Dept. of Computer Science,

More information

The Benefits of Data Modeling in Data Warehousing

The Benefits of Data Modeling in Data Warehousing WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2

More information

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision

More information

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures DATA VIRTUALIZATION Whitepaper Data Virtualization Usage Patterns for / Data Warehouse Architectures www.denodo.com Incidences Address Customer Name Inc_ID Specific_Field Time New Jersey Chevron Corporation

More information

Principal MDM Components and Capabilities

Principal MDM Components and Capabilities Principal MDM Components and Capabilities David Loshin Knowledge Integrity, Inc. 1 Agenda Introduction to master data management The MDM Component Layer Model MDM Maturity MDM Functional Services Summary

More information

Assistant Information Technology Specialist. X X X software related to database development and administration Computer platforms and

Assistant Information Technology Specialist. X X X software related to database development and administration Computer platforms and FUNCTIONAL AREA 5 Database Administration (DBA) Incumbents in this functional area plan, design, develop, test, implement, secure, and administer database systems. Database Administration applies to all

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software

Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software SAP Technology Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software Table of Contents 4 Seeing the Big Picture with a 360-Degree View Gaining Efficiencies

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring www.ijcsi.org 78 Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring Mohammed Mohammed 1 Mohammed Anad 2 Anwar Mzher 3 Ahmed Hasson 4 2 faculty

More information

Advanced Database Management MISM Course F14-95704 A Fall 2014

Advanced Database Management MISM Course F14-95704 A Fall 2014 Advanced Database Management MISM Course F14-95704 A Fall 2014 Carnegie Mellon University Instructor: Randy Trzeciak Office: Software Engineering Institute / CERT CIC Office hours: By Appointment Phone:

More information

The Influence of Master Data Management on the Enterprise Data Model

The Influence of Master Data Management on the Enterprise Data Model The Influence of Master Data Management on the Enterprise Data Model For DAMA_NY Tom Haughey InfoModel LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755-3350 [email protected] Feb 19,

More information

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998 1 of 9 5/24/02 3:47 PM Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M. Firestone, Ph.D. White Paper No. Eight June 22, 1998 Introduction Dimensional Modeling (DM) is a favorite

More information

CSE 233. Database System Overview

CSE 233. Database System Overview CSE 233 Database System Overview 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric: web knowledge harvesting,

More information

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate

More information

Component Approach to Software Development for Distributed Multi-Database System

Component Approach to Software Development for Distributed Multi-Database System Informatica Economică vol. 14, no. 2/2010 19 Component Approach to Software Development for Distributed Multi-Database System Madiajagan MUTHAIYAN, Vijayakumar BALAKRISHNAN, Sri Hari Haran.SEENIVASAN,

More information

Security Issues for the Semantic Web

Security Issues for the Semantic Web Security Issues for the Semantic Web Dr. Bhavani Thuraisingham Program Director Data and Applications Security The National Science Foundation Arlington, VA On leave from The MITRE Corporation Bedford,

More information

Understanding Data Warehousing. [by Alex Kriegel]

Understanding Data Warehousing. [by Alex Kriegel] Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.

More information

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types Introduction to Database Management System Linda Wu (CMPT 354 2004-2) Topics What is DBMS DBMS types Files system vs. DBMS Advantages of DBMS Data model Levels of abstraction Transaction management DBMS

More information

Improving your Data Warehouse s IQ

Improving your Data Warehouse s IQ Improving your Data Warehouse s IQ Derek Strauss Gavroshe USA, Inc. Outline Data quality for second generation data warehouses DQ tool functionality categories and the data quality process Data model types

More information

Comparing Data Integration Algorithms

Comparing Data Integration Algorithms Comparing Data Integration Algorithms Initial Background Report Name: Sebastian Tsierkezos [email protected] ID :5859868 Supervisor: Dr Sandra Sampaio School of Computer Science 1 Abstract The problem

More information

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals 1 Properties of a Database 1 The Database Management System (DBMS) 2 Layers of Data Abstraction 3 Physical Data Independence 5 Logical

More information

Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software

Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software SAP Technology Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software Table of Contents 4 Seeing the Big Picture with a 360-Degree View Gaining Efficiencies

More information