Data Integration and Data Cleaning in DWH
|
|
|
- Naomi McKinney
- 10 years ago
- Views:
Transcription
1 Frühjahrssemester 2010 Data Integration and Data Cleaning in DWH Dr. Diego Milano
2 Organization Motivation: Data Integration and DWH Data Integration Schema (intensional) Level Instance (extensional) Level: Data Cleaning Building an DWH: Data Integration & Cleaning in DWH Design (Introduction to Data Quality)
3 Background Knowledge & Tools If you don't master some of these tools, let me know immediately: Database & basics: RDBMs concepts Relational model Entity Relationship Model (and Possibly UML) Database design: from a conceptual model to the logical model
4 What is a DW? A collection of data from different sources Integrated Persistent Dynamically Evolving Focused Used for Decision Support
5 DWH Operational data (from production/sales OLTP environments) External data (e.g. exchange rates, prices from other sales chains etc.) We focus on what happens here DWH OLAP Data Mining Reporting
6 Data Integration Given a set of data sources, data integration is the task of presenting them to the user as a single data source. Local Schemas Sources S 1 S 2 S 3... Integrated DB G Global Schema
7 Two approaches: virtual/materialized Virtual integration: Data stays at the sources, the extension of the global schema is not materialized Queries on the global schema answered using data at sources Pros/cons: + updates on the local sources immediately reflected on the (virtual) integrated DB + No redundancy/no conflicts due to lack of synchronization Enforcing constraints on the global schema not alway possible. Depending on the relationships between the global and the local schema, answering queries may be hard (and inefficient) Propagating updates from the global schema to the global sources is hard Solving inconsistencies at the extensional level is hard
8 Two approaches: virtual/materialized Materialized integration: Data is copied to a single integrated database Pros/cons: + Queries on the integrated repository are more efficient + Possible/Easier to apply complex transformations to the original data: Integrated schema can be very different from source Instance level transformations made easier Integrated DB goes out of sync with sources, needs periodical refreshing Less storage-efficient, potential inconsistencies due to redundancy A Data Warehouse is first of all a data integration system adopting the materialized approach
9 Heterogeneity The main issue in data integration tasks is heterogeneity Data residing at different sources present differences on a number of aspects. These differences make it more complex to reduce these data to a single, integrated view It is not easy to classify heterogenity in a crisp way. Some differences relate to syntactic aspects (the specific language/technology used to represent reality), other relate to semantic aspects (how a certain representation captures reality, its meaning), but these differences coexist and it is not always easy or possible to draw lines between what is syntax and what is semantics.
10 Heterogeneity (Systems/Technology/Syntax) Legacy systems (ad-hoc interfaces) Flat files Web-sources XML files/databases Different DBMSs (e.g. RDBMS, OODBMS...) DBMS with the same flavour (e.g. RDBMS) but with differences in proprietary syntax
11 Heterogeneity (Data Representation) Intensional Level (schema): Data Model (modeling language): Relational, object-oriented, reticular, semi-structured etc. Structure (representation choices): Different designers have different views of the world (and different application needs), and may use different constructs/data types to represent the same concepts/reality: e.g. Date represented as attribute/standalone concept e.g. Attribute 'sex' encoded as String / Acronym / Integer (0,1) Different views of the world include/exclude portions of information: e.g. Record marital status of employees. Linguistics/terminology: Different designers may use different terms to denote the same concept or use the same term to mean different concepts, at various levels: e.g. attribute 'price': $ Data Warehousing (CS242)
12 Heterogeneity (Data Representation) Extensional level (instances) Unmappable or partially mappable domains Non-overlapping domains e.g. All students in basel, only students enrolled after Domains with different granularity: e.g. Sales per day/per month Application-specific domains: e.g. custom identifiers (like employee_code, color_code) meaningful only within a certain application domain. Inconsistencies between semantically equivalent instances Due to errors or other Data Quality problems Data Warehousing (CS242)
13 Solving heterogeneity issues: Systems/Model level: Wrapper-based architectures Intensional Level: Schema Integration Extensional Level: Instance Identification Instance Reconciliation
14 Wrapper-based Architectures A wrapper is a piece of software that encapsulates another softwaresystem and acts as an interpreter for it. Allows to: Hide technological differences Hide (to a certain extent) model differences, presenting all sources in a single canonical language. Canonical Model/Language Wrapper Wrapper Wrapper? Legacy RDBMS XML data <xsd:schema> <xsd:element> <xsd:cheneso>... </xsd:cheneso> <xsd:<schema>
15 Schema Integration Given n data source schemas L1,..,Ln, integrating them means: Identifying correspondences among them Designing a new, integrated schema G that abstracts over all of them and is possibly tailored to some specific application (e.g. for Data Warehousing) Formally specifying mappings between the integrated schema and the source schemas. There are tools to semi-automatically perform some of the activities in schema integration, but these are mostly research-level prototypes. Schema integration is still a (complex) design task for human. Requires expertise in database modeling, and a deep knowledge of the application domains of the schemas to integrate.
16 Wrapper-Mediator Mediator A mediator interacts with the wrappers, and presents to the users a unified global view over the local schemas Mapping Wrapper Wrapper Wrapper? Legacy RDBMS XML data <xsd:schema> <xsd:element> <xsd:cheneso>... </xsd:cheneso> <xsd:<schema>
17 Schema Integration Steps 1.Analysis, Normalization, Abstraction to a common conceptual modeling language 2.Choice of integration strategy 3.Schema Matching: Identify relationships among local schemas 4.Schema Alignment: solve conflicts 5.Schema Fusion: create the Global schema The result of this process is a mapping between the source schemas and the integrated schema
18 1. Analysis For each data source in isolation, the designer must acquire a deep understanding of the application domain: In-depth analysis of the schema(s) interaction with domain experts The result of this phases is to produce a conceptual schema in the canonical language of choice, which: Reflects in the most accurate and complete way possible the domain of interest. Is well-understood Is well-documented
19 Analysis: Know Your Enemy Gathering knowledge about complex application domains is difficult: Business rules covered by secret/not well-documented (Cooperative) domain experts are key elements Understanding the IS of an enterprise is difficult: Legacy systems requires ad-hoc knowledge (e.g. No database schema but data in flat files with custom format) Even if the DB is relational: Software/System documentation is often poor. The domain conceptualization steps that lead to a certain database design, and many design choices, may be lost. Reverse-engineering of the logical schemas and associated applications is sometimes required. This might involve:» Normalization: For efficiency reasons, or bad design, logical schemas are sometimes denormalized» Inferring constraints: not all contraints of the domain are always enforced at the level of logical schema (e.g. not enforced at all, enforced at the application level) Systems are not always well designed/schemas become old. Sometimes corrections to the schema are required
20 Analysis, Normalization, Abstraction CREATE TABLE product( cat_desc VARCHAR(255), cat_name VARCHAR(255), cat_code INTEGER, prod_desc VARCHAR(255), prod_name VARCHAR(255), prod_code INTEGER PRIMARY KEY ); cat_desc cat_name cat_code Product prod_desc prod_name prod_code CREATE TABLE category( cat_desc VARCHAR(255), cat_name VARCHAR(255), cat_code INTEGER PRIMARY_KEY, ); CREATE TABLE product( prod_desc VARCHAR(255), prod_name VARCHAR(255), prod_code INTEGER PRIMARY KEY cat_code INTEGER REFERENCES category(cat_code) ); normalization/correction: the original logical schema is unnormalized AND does not enforce all constraints holding in the application domain. Product (1,1) belongs_to (0,n) Category description/string Name/String Code / integer Description / String Name/String Code / String
21 2. Choice of Integration Strategy Comparing at the same time too many schemas is not always easy/feasible Integration process binary n-ary ladder balanced single step iterative
22 3. Schema Matching Schemas are comparatively analyzed to identify: common concepts and relationships among them differences and structural/semantic conflicts interschema properties
23 Structural Conflicts on Concepts Book is a common concept Publisher and its relationship to book have a structural conflict: the designers used different language constructs to model the same reality an entity set+relationship in one schema, attributes in the other one Book title ISBN title ISBN Book published_by Publisher Publisher_address Publisher Address Name
24 Semantic Conflicts on Concepts The attributes Age and Birthdate clearly model two semantically different concepts. However, it is rather easy to solve this conflict because there is an obvious dependency among then. Solving the conflict means being able to restructure one of the schemas (and thus applying to the data some transformation) to make the two concepts identical. Birthdate SSN Citizen SSN Age Citizen
25 Pitfalls in language: stat rosa pristina nomen... Homonimy: two concepts have the same name but different semantics Synonimy: two concepts have the same semantics but different name Equivalent, with linguistic conflicts: synonims Employee Worker Teacher (1,1) (1,1) (1,1) assigned_to assigned_to assigned_to (1,n) (1,n) (1,n) Department Department Department Identical Non-equivalent, homonims!
26 Scheme Comparison Identity: the concept is modeled in the same way both from the point of view of structure and that of semantics Equivalence: the concept have the same semantics (same view of the world) but there are structural conflicts Comparability: concepts are modelled with different structure/semantics but the views of the world do not conflict Incomparability: The view of the world differs producing a conflict that is not (easily) solvable
27 Different, but comparable views Employee Employee (1,1) (1,1) participates_in assigned_to (1,n) (1,n) Project Department (1,1) belongs_to (1,n) Department
28 Incomparable views The semantics of the two schemes look the same. However, there is a conflict in the integrity constraints which makes the schemas incompatible. Professor Name Professor Name (0,1) (2,n) teaches teaches (1,1) (1,1) Course Course_ID Course Course_ID
29 Inter-schema properties Schema 1 Schema 2 title title ISBN Book Book ISBN published_by written_by Address Address Name Publisher works_for Author Name
30 4. Schema Alignment The goal of this phase is to solve differences/conflicts identified at the previous step Obtained by applying transformations to the local schemas: names and types of attributes functional dependencies integrity constraints Issues: Not all conflicts can be solved, e.g. they derive by a substantial differences in how different information systems are designed (how they model the application domain). In this case, users/domain experts must give hints on which is the intepretation of the world they prefer In case of uncertainty, priority is given to those schemas which are more important in the system (e.g., for DWH, schemas with central concepts in the data mart)
31 5. Schema Fusion Aligned schemas are merged to obtain a single integrated schema. Overlap common concepts Add all other concepts, connecting them to the common concepts
32 Alignment and Fusion Alignment and fusion, are applied in an iterative way: Solve some conflicts produce temporary integrated schema To solve new conflicts, apply transformations to either the schemas or to the temporary integrated schema
33 Mappings A mapping is a set of assertions about correspondencies that hold between two schemas. For very different schemas, mappings are hardly formalizable As the integration process proceeds, it becomes possible to express relationships about the extensions of the schemas. At the conceptual level: as set-relationships At the logical level: as queries (in the simplest case) or as transformations The goal is to link every concept in the integrted schema to some concept in the initial schemas through a chain of transformations
34 Questions & Answers
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
Data warehouse Architectures and processes
Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between
Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration
DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course
ECS 165A: Introduction to Database Systems
ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction
How To Write A Diagram
Data Model ing Essentials Third Edition Graeme C. Simsion and Graham C. Witt MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE
Chapter 3 - Data Replication and Materialized Integration
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA
OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,
Data Modeling Basics
Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy
Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches
Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways
Enterprise Modeling and Data Warehousing in Telecom Italia
Enterprise Modeling and Data Warehousing in Telecom Italia Diego Calvanese Faculty of Computer Science Free University of Bolzano/Bozen Piazza Domenicani 3 I-39100 Bolzano-Bozen BZ, Italy Luigi Dragone,
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28
Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT
Principles of Database. Management: Summary
Principles of Database Management: Summary Pieter-Jan Smets September 22, 2015 Contents 1 Fundamental Concepts 5 1.1 Applications of Database Technology.............................. 5 1.2 Definitions.............................................
Chapter 1: Introduction. Database Management System (DBMS) University Database Example
This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
Databases in Organizations
The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron
Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain
Talend Metadata Manager Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Talend Metadata Manager provides a comprehensive set of capabilities for all facets of metadata
Chapter 5. Learning Objectives. DW Development and ETL
Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems
Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin
Data Warehousing Concepts
Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
DATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
Chapter 10 Practical Database Design Methodology and Use of UML Diagrams
Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in
Chapter 10 Practical Database Design Methodology and Use of UML Diagrams
Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in
CSE 132A. Database Systems Principles
CSE 132A Database Systems Principles Prof. Victor Vianu 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric:
2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications
2. Background on Data Management Aspects of Data Management and an Overview of Solutions used in Engineering Applications Overview Basic Terms What is data, information, data management, a data model,
Introduction to Datawarehousing
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society
Query Management in Data Integration Systems: the MOMIS approach
Dottorato di Ricerca in Computer Engineering and Science Scuola di Dottorato in Information and Communication Technologies XXI Ciclo Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria
SOA Success is Not a Matter of Luck
by Prasad Jayakumar, Technology Lead at Enterprise Solutions, Infosys Technologies Ltd SERVICE TECHNOLOGY MAGAZINE Issue L May 2011 Introduction There is nothing either good or bad, but thinking makes
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Data-Warehouse-, Data-Mining- und OLAP-Technologien
Data-Warehouse-, Data-Mining- und OLAP-Technologien Chapter 2: Data Warehouse Architecture Bernhard Mitschang Universität Stuttgart Winter Term 2014/2015 Overview Data Warehouse Architecture Data Sources
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
COMP5138 Relational Database Management Systems. Databases are Everywhere!
COMP5138 Relational Database Management Systems Week 1: COMP 5138 Intro to Database Systems Professor Joseph Davis and Boon Ooi Databases are Everywhere! Database Application Examples: Banking: all transactions
Overview of Data Management
Overview of Data Management Grant Weddell Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Winter 2015 CS 348 (Intro to DB Mgmt) Overview of Data Management
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs
Topics Software V:. Database concepts: records, fields, data types. Relational and objectoriented databases. Computer maintenance and operation: storage health and utilities; back-up strategies; keeping
A Survey on Data Warehouse Architecture
A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India
Course Notes on A Short History of Database Technology
Course Notes on A Short History of Database Technology Traditional File-Based Approach Three Eras of Database Technology (1) Prehistory file systems hierarchical and network systems (2) The revolution:
Course Notes on A Short History of Database Technology
Course Notes on A Short History of Database Technology Three Eras of Database Technology (1) Prehistory file systems hierarchical and network systems (2) The revolution: relational database technology
CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?
CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency
Chapter 1 Databases and Database Users
Chapter 1 Databases and Database Users Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 1 Outline Introduction An Example Characteristics of the Database Approach Actors
Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006
Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 What is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-varying, non-volatile
Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline
Chapter 1 Databases and Database Users Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Introduction Chapter 1 Outline An Example Characteristics of the Database Approach Actors
Relational Database Basics Review
Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on
Data Warehousing. Jens Teubner, TU Dortmund [email protected]. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1
Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview
Data Modeling and Databases I - Introduction. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases I - Introduction Gustavo Alonso Systems Group Department of Computer Science ETH Zürich ADMINISTRATIVE ASPECTS D-INFK, ETH Zurich, Data Modeling and Databases 2 Basic Data Lectures
Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.
Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1
The Relational Model Ramakrishnan&Gehrke, Chapter 3 CS4320 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Legacy systems in older models
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 DATA INTEGRATION Motivation Many databases and sources of data that need to be integrated to work together Almost all applications have many sources
University Data Warehouse Design Issues: A Case Study
Session 2358 University Data Warehouse Design Issues: A Case Study Melissa C. Lin Chief Information Office, University of Florida Abstract A discussion of the design and modeling issues associated with
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology
Chapter 10 Practical Database Design Methodology Practical Database Design Methodology Design methodology Target database managed by some type of database management system Various design methodologies
IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
IT0457 Data Warehousing G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline What is data warehousing The benefit of data warehousing Differences between OLTP and data warehousing The architecture
Data Warehouse Design
Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City
Data Virtualization and ETL. Denodo Technologies Architecture Brief
Data Virtualization and ETL Denodo Technologies Architecture Brief Contents Data Virtualization and ETL... 3 Summary... 3 Data Virtualization... 7 What is Data Virtualization good for?... 8 Applications
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments Anuraj Gupta Department of Electronics and Communication Oriental Institute
Integrated Data Management: Discovering what you may not know
Integrated Data Management: Discovering what you may not know Eric Naiburg [email protected] Agenda Discovering existing data assets is hard What is Discovery Discovery and archiving Discovery, test
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology) Software
Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606
Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606 J.A. Maldonado, M. Robles, P. Crespo Bioengineering, Electronics and Telemedicine
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Secure Database Development
Secure Database Development Jan Jurjens () and Eduardo B. Fernandez (2) () Computing Department, The Open University, Milton Keynes, MK7 8LA GB http://www.jurjens.de/jan (2) Dept. of Computer Science,
The Benefits of Data Modeling in Data Warehousing
WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2
LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES
LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision
Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures
DATA VIRTUALIZATION Whitepaper Data Virtualization Usage Patterns for / Data Warehouse Architectures www.denodo.com Incidences Address Customer Name Inc_ID Specific_Field Time New Jersey Chevron Corporation
Principal MDM Components and Capabilities
Principal MDM Components and Capabilities David Loshin Knowledge Integrity, Inc. 1 Agenda Introduction to master data management The MDM Component Layer Model MDM Maturity MDM Functional Services Summary
Assistant Information Technology Specialist. X X X software related to database development and administration Computer platforms and
FUNCTIONAL AREA 5 Database Administration (DBA) Incumbents in this functional area plan, design, develop, test, implement, secure, and administer database systems. Database Administration applies to all
Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software
SAP Technology Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software Table of Contents 4 Seeing the Big Picture with a 360-Degree View Gaining Efficiencies
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring
www.ijcsi.org 78 Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring Mohammed Mohammed 1 Mohammed Anad 2 Anwar Mzher 3 Ahmed Hasson 4 2 faculty
Advanced Database Management MISM Course F14-95704 A Fall 2014
Advanced Database Management MISM Course F14-95704 A Fall 2014 Carnegie Mellon University Instructor: Randy Trzeciak Office: Software Engineering Institute / CERT CIC Office hours: By Appointment Phone:
The Influence of Master Data Management on the Enterprise Data Model
The Influence of Master Data Management on the Enterprise Data Model For DAMA_NY Tom Haughey InfoModel LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755-3350 [email protected] Feb 19,
Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998
1 of 9 5/24/02 3:47 PM Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M. Firestone, Ph.D. White Paper No. Eight June 22, 1998 Introduction Dimensional Modeling (DM) is a favorite
CSE 233. Database System Overview
CSE 233 Database System Overview 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric: web knowledge harvesting,
A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate
Component Approach to Software Development for Distributed Multi-Database System
Informatica Economică vol. 14, no. 2/2010 19 Component Approach to Software Development for Distributed Multi-Database System Madiajagan MUTHAIYAN, Vijayakumar BALAKRISHNAN, Sri Hari Haran.SEENIVASAN,
Security Issues for the Semantic Web
Security Issues for the Semantic Web Dr. Bhavani Thuraisingham Program Director Data and Applications Security The National Science Foundation Arlington, VA On leave from The MITRE Corporation Bedford,
Understanding Data Warehousing. [by Alex Kriegel]
Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.
Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types
Introduction to Database Management System Linda Wu (CMPT 354 2004-2) Topics What is DBMS DBMS types Files system vs. DBMS Advantages of DBMS Data model Levels of abstraction Transaction management DBMS
Improving your Data Warehouse s IQ
Improving your Data Warehouse s IQ Derek Strauss Gavroshe USA, Inc. Outline Data quality for second generation data warehouses DQ tool functionality categories and the data quality process Data model types
Comparing Data Integration Algorithms
Comparing Data Integration Algorithms Initial Background Report Name: Sebastian Tsierkezos [email protected] ID :5859868 Supervisor: Dr Sandra Sampaio School of Computer Science 1 Abstract The problem
Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components
Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals 1 Properties of a Database 1 The Database Management System (DBMS) 2 Layers of Data Abstraction 3 Physical Data Independence 5 Logical
Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software
SAP Technology Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software Table of Contents 4 Seeing the Big Picture with a 360-Degree View Gaining Efficiencies
