資 料 倉 儲 (Data Warehousing)



Similar documents
Lection 3-4 WAREHOUSING

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Data Warehouse Overview. Srini Rengarajan

IST722 Data Warehousing

Data Warehousing Systems: Foundations and Architectures

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Master Data Management. Zahra Mansoori

Chapter 5. Learning Objectives. DW Development and ETL

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Master Data Management and Data Warehousing. Zahra Mansoori

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Fluency With Information Technology CSE100/IMT100

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Turkish Journal of Engineering, Science and Technology

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE

Understanding Data Warehousing. [by Alex Kriegel]

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

DATA WAREHOUSING AND OLAP TECHNOLOGY

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

MDM and Data Warehousing Complement Each Other

Business Intelligence: Effective Decision Making

ENTERPRISE RESOURCE PLANNING SYSTEMS

Decision Support and Business Intelligence Systems. Chapter 1: Decision Support Systems and Business Intelligence

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing and Data Mining in Business Applications

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Warehouse (DW) Maturity Assessment Questionnaire

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Research on Airport Data Warehouse Architecture

Data Warehouse: Introduction

BUSINESS INTELLIGENCE

Foundations of Business Intelligence: Databases and Information Management

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Week 3 lecture slides

Dashboards PRESENTED BY: Quaid Saifee Director, WIT Inc.

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

Building an Effective Data Warehouse Architecture James Serra

East Asia Network Sdn Bhd

Data warehouse and Business Intelligence Collateral

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Breadboard BI. Unlocking ERP Data Using Open Source Tools By Christopher Lavigne

OLAP Theory-English version

College of Engineering, Technology, and Computer Science

Building a Data Warehouse

Ursuline College Accelerated Program

Establish and maintain Center of Excellence (CoE) around Data Architecture

Key organizational factors in data warehouse architecture selection

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

B.Sc (Computer Science) Database Management Systems UNIT-V

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

14. Data Warehousing & Data Mining

CHAPTER 4 Data Warehouse Architecture

An Overview of Database management System, Data warehousing and Data Mining

Evolving Data Warehouse Architectures

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

A Service-oriented Architecture for Business Intelligence

Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

A Study on Integrating Business Intelligence into E-Business

A Survey on Data Warehouse Architecture

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Business Intelligence, Analytics & Reporting: Glossary of Terms

Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005

Data Warehousing and OLAP Technology for Knowledge Discovery

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

Class 2. Learning Objectives

Data Warehousing and Data Mining

Foundations of Business Intelligence: Databases and Information Management

tku.edu.tw/myday/ , 26 1

When to consider OLAP?

Management Accountants and IT Professionals providing Better Information = BI = Business Intelligence. Peter Simons peter.simons@cimaglobal.

UC Berkeley Data Warehouse Roadmap. Data Warehouse Architecture

Managing Data in Motion

COURSE SYLLABUS. Enterprise Information Systems and Business Intelligence

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

IMPROVING THE QUALITY OF THE DECISION MAKING BY USING BUSINESS INTELLIGENCE SOLUTIONS

Implementing a Data Warehouse with Microsoft SQL Server

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Transcription:

商 業 智 慧 實 務 Prac&ces of Business Intelligence Tamkang University 資 料 倉 儲 (Data Warehousing) 1022BI04 MI4 Wed, 9,10 (16:10-18:00) (B113) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資 訊 管 理 學 系 http://mail. tku.edu.tw/myday/ 2014-03- 12 1

課 程 大 綱 (Syllabus) 週 次 (Week) 日 期 (Date) 內 容 (Subject/Topics) 1 103/02/19 商 業 智 慧 導 論 (IntroducFon to Business Intelligence) 2 103/02/26 管 理 決 策 支 援 系 統 與 商 業 智 慧 (Management Decision Support System and Business Intelligence) 3 103/03/05 企 業 績 效 管 理 (Business Performance Management) 4 103/03/12 資 料 倉 儲 (Data Warehousing) 5 103/03/19 商 業 智 慧 的 資 料 探 勘 (Data Mining for Business Intelligence) 6 103/03/26 商 業 智 慧 的 資 料 探 勘 (Data Mining for Business Intelligence) 7 103/04/02 教 學 行 政 觀 摩 日 (Off- campus study) 8 103/04/09 資 料 科 學 與 巨 量 資 料 分 析 (Data Science and Big Data AnalyFcs) 2

課 程 大 綱 (Syllabus) 週 次 日 期 內 容 (Subject/Topics) 9 103/04/16 期 中 報 告 (Midterm Project PresentaFon) 10 103/04/23 期 中 考 試 週 (Midterm Exam) 11 103/04/30 文 字 探 勘 與 網 路 探 勘 (Text and Web Mining) 12 103/05/07 意 見 探 勘 與 情 感 分 析 (Opinion Mining and SenFment Analysis) 13 103/05/14 社 會 網 路 分 析 (Social Network Analysis) 14 103/05/21 期 末 報 告 (Final Project PresentaFon) 15 103/05/28 畢 業 考 試 週 (Final Exam) 3

A High- Level Architecture of BI 4

Decision Support and Business Intelligence Systems (9 th Ed., Pren&ce Hall) Chapter 8: Data Warehousing 5

Learning Objec&ves DefiniFons and concepts of data warehouses Types of data warehousing architectures Processes used in developing and managing data warehouses Data warehousing operafons Role of data warehouses in decision support Data integrafon and the extracfon, transformafon, and load (ETL) processes Data warehouse administrafon and security issues 6

Main Data Warehousing (DW) Topics DW definifons CharacterisFcs of DW Data Marts ODS, EDW, Metadata DW Framework DW Architecture & ETL Process DW Development DW Issues 7

Data Warehouse Defined A physical repository where relafonal data are specially organized to provide enterprise- wide, cleansed data in a standardized format The data warehouse is a collecfon of integrated, subject- oriented databases design to support DSS funcfons, where each unit of data is non- volafle and relevant to some moment in Fme 8

Characteris&cs of DW Subject oriented Integrated Time- variant (Fme series) NonvolaFle Summarized Not normalized Metadata Web based, relafonal/mulf- dimensional Client/server Real- Fme and/or right- Fme (acfve) 9

Data Mart A departmental data warehouse that stores only relevant data Dependent data mart A subset that is created directly from a data warehouse Independent data mart A small data warehouse designed for a strategic business unit or a department 10

Data Warehousing Defini&ons Opera&onal data stores (ODS) A type of database oben used as an interim area for a data warehouse Oper marts An operafonal data mart. Enterprise data warehouse (EDW) A data warehouse for the enterprise. Metadata Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisifon and use 11

A Conceptual Framework for DW Data Sources ERP Legacy POS Other OLTP/wEB ETL Process Select Extract Transform Integrate Load Metadata Enterprise Data warehouse No data marts option Access Data mart (Marketing) Data mart (Engineering) Data mart (Finance) A P I / Middleware Applications (Visualization) Routine Business Reporting Data/text mining OLAP, Dashboard, Web External data Replication Data mart (...) Custom built applications 12

Generic DW Architectures Three- &er architecture 1. Data acquisifon sobware (back- end) 2. The data warehouse that contains the data & sobware 3. Client (front- end) sobware that allows users to access and analyze data from the warehouse Two- &er architecture First 2 Fers in three- Fer architecture is combined into one somefme there is only one Fer? 13

Generic DW Architectures Tier 1: Client workstation Tier 2: Application server Tier 3: Database server Tier 1: Client workstation Tier 2: Application & database server 14

DW Architecture Considera&ons Issues to consider when deciding which architecture to use: Which database management system (DBMS) should be used? Will parallel processing and/or parffoning be used? Will data migrafon tools be used to load the data warehouse? What tools will be used to support data retrieval and analysis? 15

A Web- based DW Architecture Web pages Application Server Client (Web browser) Internet/ Intranet/ Extranet Web Server Data warehouse 16

Alterna&ve DW Architectures (a) Independent Data Marts Architecture ETL Source Systems Staging Area Independent data marts (atomic/summarized data) End user access and applications (b) Data Mart Bus Architecture with Linked Dimensional Datamarts ETL Source Systems Staging Area Dimensionalized data marts linked by conformed dimentions (atomic/summarized data) End user access and applications 17

Alterna&ve DW Architectures (c) Hub and Spoke Architecture (Corporate Information Factory) ETL Source Systems Staging Area Normalized relational warehouse (atomic data) End user access and applications Dependent data marts (summarized/some atomic data) (d) Centralized Data Warehouse Architecture ETL Source Systems Staging Area Normalized relational warehouse (atomic/some summarized data) End user access and applications 18

Alterna&ve DW Architectures (e) Federated Architecture Existing data warehouses Data marts and legacy systmes Data mapping / metadata Logical/physical integration of common data elements End user access and applications 19

Alterna&ve DW Architectures 20

Which Architecture is the Best? Bill Inmon versus Ralph Kimball Enterprise DW versus Data Marts approach Empirical study by Ariyachandra and Watson (2006) 21

Data Warehousing Architectures Ten factors that potentially affect the architecture selection decision: 1. InformaFon interdependence between organizafonal units 2. Upper management s informafon needs 3. Urgency of need for a data warehouse 4. Nature of end- user tasks 5. Constraints on resources 6. Strategic view of the data warehouse prior to implementafon 7. CompaFbility with exisfng systems 8. Perceived ability of the in- house IT staff 9. Technical issues 10. Social/poliFcal factors 22

Enterprise Data Warehouse (by Teradata Corpora&on) 23

Data Integra&on and the Extrac&on, Transforma&on, and Load (ETL) Process Data integra&on IntegraFon that comprises three major processes: data access, data federafon, and change capture. Enterprise applica&on integra&on (EAI) A technology that provides a vehicle for pushing data from source systems into a data warehouse Enterprise informa&on integra&on (EII) An evolving tool space that promises real- Fme data integrafon from a variety of sources Service- oriented architecture (SOA) A new way of integrafng informafon systems 24

Data Integra&on and the Extrac&on, Transforma&on, and Load (ETL) Process ExtracFon, transformafon, and load (ETL) process Packaged application Transient data source Data warehouse Legacy system Extract Transform Cleanse Load Other internal applications Data mart 25

ETL Issues affecfng the purchase of and ETL tool Data transformafon tools are expensive Data transformafon tools may have a long learning curve Important criteria in selecfng an ETL tool Ability to read from and write to an unlimited number of data sources/architectures AutomaFc capturing and delivery of metadata A history of conforming to open standards An easy- to- use interface for the developer and the funcfonal user 26

Benefits of DW Direct benefits of a data warehouse Allows end users to perform extensive analysis Allows a consolidated view of corporate data Beier and more Fmely informafon Enhanced system performance SimplificaFon of data access Indirect benefits of data warehouse Enhance business knowledge Present compeffve advantage Enhance customer service and safsfacfon Facilitate decision making Help in reforming business processes 27

Data Warehouse Development Data warehouse development approaches Inmon Model: EDW approach (top- down) Kimball Model: Data mart approach (boiom- up) Which model is best? There is no one- size- fits- all strategy to DW One alternafve is the hosted warehouse Data warehouse structure: The Star Schema vs. RelaFonal Real- Fme data warehousing? 28

DW Development Approaches (Kimball Approach) (Inmon Approach) 29

DW Structure: Star Schema (a.k.a. Dimensional Modeling) Start Schema Example for an Automobile Insurance Data Warehouse Driver Automotive Dimensions: How data will be sliced/ diced (e.g., by location, time period, type of automobile or driver) Claim Information Facts: Central table that contains (usually summarized) information; also contains foreign keys to access each dimension table. Location Time 30

Dimensional Modeling Data cube A two-dimensional, three-dimensional, or higher-dimensional object in which each dimension of the data represents a measure of interest - Grain - Drill-down - Slicing 31

Best Prac&ces for Implemen&ng DW The project must fit with corporate strategy There must be complete buy- in to the project It is important to manage user expectafons The data warehouse must be built incrementally Adaptability must be built in from the start The project must be managed by both IT and business professionals (a business supplier relafonship must be developed) Only load data that have been cleansed/high quality Do not overlook training requirements Be polifcally aware. 32

Risks in Implemen&ng DW No mission or objecfve Quality of source data unknown Skills not in place Inadequate budget Lack of supporfng sobware Source data not understood Weak sponsor Users not computer literate PoliFcal problems or turf wars UnrealisFc user expectafons (ConFnued ) 33

Risks in Implemen&ng DW Cont. Architectural and design risks Scope creep and changing requirements Vendors out of control MulFple planorms Key people leaving the project Loss of the sponsor Too much new technology Having to fix an operafonal system Geographically distributed environment Team geography and language culture 34

Things to Avoid for Successful Implementa&on of DW StarFng with the wrong sponsorship chain Sepng expectafons that you cannot meet Engaging in polifcally naive behavior Loading the warehouse with informafon just because it is available Believing that data warehousing database design is the same as transacfonal DB design Choosing a data warehouse manager who is technology oriented rather than user oriented 35

Real- &me DW (a.k.a. Ac&ve Data Warehousing) Enabling real- Fme data updates for real- Fme analysis and real- Fme decision making is growing rapidly Push vs. Pull (of data) Concerns about real- Fme BI Not all data should be updated confnuously Mismatch of reports generated minutes apart May be cost prohibifve May also be infeasible 36

Evolu&on of DSS & DW 37

Ac&ve Data Warehousing (by Teradata Corpora&on) Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 38

Comparing Tradi&onal and Ac&ve DW 39

Data Warehouse Administra&on Due to its huge size and its intrinsic nature, a DW requires especially strong monitoring in order to sustain its efficiency, producfvity and security. The successful administrafon and management of a data warehouse entails skills and proficiency that go past what is required of a tradifonal database administrator. Requires experfse in high- performance sobware, hardware, and networking technologies 40

DW Scalability and Security Scalability The main issues pertaining to scalability: The amount of data in the warehouse How quickly the warehouse is expected to grow The number of concurrent users The complexity of user queries Good scalability means that queries and other data- access funcfons will grow linearly with the size of the warehouse Security Emphasis on security and privacy 41

Summary DefiniFons and concepts of data warehouses Types of data warehousing architectures Processes used in developing and managing data warehouses Data warehousing operafons Role of data warehouses in decision support Data integrafon and the extracfon, transformafon, and load (ETL) processes Data warehouse administrafon and security issues 42

References Efraim Turban, Ramesh Sharda, Dursun Delen, Decision Support and Business Intelligence Systems, Ninth EdiFon, 2011, Pearson. 43