A Data Warehousing and Business Intelligence Tutorial. Starring Sakila

Similar documents
Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Basics of Dimensional Modeling

When to consider OLAP?

Sterling Business Intelligence

Week 3 lecture slides

Business Intelligence, Analytics & Reporting: Glossary of Terms

Designing a Dimensional Model

DATA WAREHOUSING - OLAP

Data W a Ware r house house and and OLAP II Week 6 1

Monitoring Genebanks using Datamarts based in an Open Source Tool

CHAPTER 4 Data Warehouse Architecture

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

DATA WAREHOUSING AND OLAP TECHNOLOGY

SQL Server 2012 Business Intelligence Boot Camp

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

IST722 Data Warehousing

Business Intelligence: Using Data for More Than Analytics

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

SAS BI Course Content; Introduction to DWH / BI Concepts

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Data Testing on Business Intelligence & Data Warehouse Projects

Breadboard BI. Unlocking ERP Data Using Open Source Tools By Christopher Lavigne

Week 13: Data Warehousing. Warehousing

Pentaho BI Capability Profile

Presented by: Jose Chinchilla, MCITP

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Optimizing Your Data Warehouse Design for Superior Performance

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Part 22. Data Warehousing

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Data Warehousing Systems: Foundations and Architectures

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Data Warehousing, OLAP, and Data Mining

Data Warehouse design

OLAP Theory-English version

Dimensional Modeling for Data Warehouse

Open Source Business Intelligence Intro

Data Warehousing and Data Mining

Mario Guarracino. Data warehousing

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

Fluency With Information Technology CSE100/IMT100

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Business Intelligence & Product Analytics

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Developing Business Intelligence and Data Visualization Applications with Web Maps

Data Warehousing and OLAP Technology for Knowledge Discovery

Cúram Business Intelligence and Analytics Guide

SAS Business Intelligence Online Training

OLAP and Data Warehousing! Introduction!

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

10 Hard Questions to Make Your Choice of Cloud Analytics Easier

By Makesh Kannaiyan 8/27/2011 1

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Implementing a Data Warehouse with Microsoft SQL Server

FINANCIAL REPORTING WITH BUSINESS ANALYTICS

Dimensional Data Modeling for the Data Warehouse

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehouse (DW) Maturity Assessment Questionnaire

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Implementing a Data Warehouse with Microsoft SQL Server

Building an Effective Data Warehouse Architecture James Serra

Sterling Business Intelligence

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Course Outline. Module 1: Introduction to Data Warehousing

Chapter 4 Getting Started with Business Intelligence

Course Code CE609. Lecture : 03. Practical : 01. Course Credit. Tutorial : 00. Total : 04. Course Learning Outcomes

Overview of Data Warehousing and OLAP

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

A Service-oriented Architecture for Business Intelligence

Data Warehousing and OLAP

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Chapter 5. Learning Objectives. DW Development and ETL

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

MDM and Data Warehousing Complement Each Other

BUILDING OLAP TOOLS OVER LARGE DATABASES

Transcription:

A Data Warehousing and Business Intelligence Tutorial Starring Sakila

Welcome! Matt Casters Chief Data Integration, Pentaho http://www.ibridge.be/ @mattcasters Roland Bouman Software Engineer, Pentaho http://rpbouman.blogspot.com/ @rolandbouman Starring Sakila

Commercial Open Source Business Intelligence Full BI suite since 2005 Projects: Kettle (DI & ETL), Jfree (Reporting), Mondrian (OLAP), Weka (Data Mining) Community: CDF (Dashboarding), Saiku (OLAP) Recent: Focus on Big Data, esp. Hadoop http://www.pentaho.com http://sourceforge.net/projects/pentaho/ Pentaho

Business Intelligence Data Warehousing Anatomy of a Data Warehouse Physical Implementation Sakila a Star is Born Filling the Data Warehouse Presenting the Data - BI Applications Agenda

Part I: Business Intelligence Starring Sakila

Skills, technologies, applications and practices to acquire a better understanding of the commercial context of your business Turning data into information useful for business users Management Information Decision Support Business Intelligence

months, years: Data mining Should we become an appliance vendor instead of delivering software solutions Executives Strategic OLAP/Analysis weeks, months: In what region should we open a new store? Managers Tactical days, weeks: Who's available for tomorrow's shift Analysts Customers Partners Reporting Employees Operational Business Intelligence Scope

Front end Applications: Reports Charts and Graphs OLAP Pivot tables Data Mining Dashboards Back end Infrastructure Data Integration Data Warehouse Data Mart Metadata (ROLAP) Cube Functional Parts of a Business Intelligence Solution

Sources Back-end Front-end Reporting Operational Datastore ERP Charts /Graphs Enterprise Data Warehouse CRM OLAP/Analysis Staging Area Meta Data External Data Dashboards Datamarts Data Mining Extract Transform Load Present High Level BI Architecture

Part II: Data Warehousing Starring Sakila

A database designed to support Business Intelligence Applications Different requirements as compared to Operational Applications Analytic Database Systems (ADBMS) MySQL: Infobright, InfiniDB LucidDB, MonetDB What is a Data Warehouse?

Ultimately, it's just a Relational Database Designed for Business Intelligence applications Ease of use Performance Data from various source systems Tables, Columns,... Integration, Standardization, Data cleaning Add and maintain history Corporate memory What is a Data Warehouse?

A database designed to support BI applications BI applications (OLAP) differ from Operational applications (OLTP) OLTP: Online Transaction Processing OLAP: Online Analytical Processing Differences: Applications, Data Processing, Data Model What is a Data Warehouse?

OLTP OLAP Operational Tactical, Strategic 'Always' on Periodically Available All kinds of users Managers, Directors Many users Few(er) users Directly supports business process Redesign Business Process Keep a Record of Current status Decision support, long-term planning Maintains a history OLTP vs OLAP: Application Characterization

OLTP OLAP Transactions Groups Subject Oriented Aspect Oriented Add, Modify, Remove single rows Bulk load, rarely modify, never remove Human data entry Automated ETL jobs Queries for small sets of rows with all their details Scan large sets to return aggregates over arbitrary groups Standard queries Ad-hoc queries OLTP vs OLAP: Data Processing

OLTP OLAP Entity-Relationship model Dimensional model Entities, Attributes, Relationships Facts, Dimensions, Hierarchies Foreign key constraints Ref. integrity ensured in loading process Indexes to increase performance Scans on Fact table obliterates indexes Normalized to 3NF or BCNF Denormalized Dimensions (<= 1NF) OLTP vs OLAP: Data Model

Sources Back-end Front-end Reporting Operational Datastore ERP Charts /Graphs Enterprise Data Warehouse CRM OLAP/Analysis Staging Area Meta Data External Data Dashboards Datamarts Data Mining Extract Transform Load Present High Level BI Architecture

Part III: Dimensional Model Starring Sakila

An aspect-oriented logical data model optimized for querying and data presentation Divides data in two kinds: Facts Dimensions What is the Dimensional Model?

Facts Measures/Metrics of a Business Process Examples: Cost, Units Sold, Profit Dimensions Context of Business Process Who? What? Where? When? Why? Navigate Facts: Selection, Rollup, Drilldown Provide and maintain history The Dimensional Model

Date Dimension Location Dimension All locations America All America North South Europe All Europe East West 2008 Q4 All Months $ 3850 $ 2050 October November December $ 1000 $ 500 $ 1350 $ 750 $ 1500 $ 800 $ 1275 $ 775 $ 1800 $ 300 $ 200 $ 500 $ 500 $ 250 $ 600 $ 475 $ 325 $ 700 $ 800 $ 1000 $ 250 $ 250 $ 250 $ 350 $ 300 $ 400 Dimensional Data Presentation

Fact table structure: Several measures Keys to dimension tables Measures: Usually numeric, Additive, Semi-additive Sometimes pre-calculated Rapidly growing! Millions, Billions of rows (Terabytes) The Dimensional Model: Facts

Dimension table structure: Surrogate key and descriptive text attributes Relatively few rows Exception: Customer 'Monster' dimension Relatively static Exception: Slowly changing dimensions Used to navigate through fact data Hierarchies The Dimensional Model: Dimensions

Selection (Filter) Navigation: Attributes organized in Hierarchies Date dimension examples: Year, Quarter, Month, Day Year, Week, Day Groupings for Aggregation 'Roll up', 'Drill Down' 'Slice and Dice' The Dimensional Model: Navigating data with Dimensions

Fact table usually links to a date dimension Dimensions maintain their own history Slowly changing dimensions Type I Overwrite (no history) Type II History kept in rows (versioning) Type III History kept in columns The Dimensional Model: Maintaining History

Part V: Physical Implementation Starring Sakila

Related metrics stored in a Fact table Fact table references relevant dimensions Each Dimension stored in a Dimension Table Dimension tables shared by multiple fact tables Dimensional Model Implementation: Star Schema

Store Date Film Time Rentals Staff Customer Star Schema example: Sakila Rentals

dim_film dim_date fact_inventory fact_rental fact_payment dim_staff dim_customer dim_store dim_store Star Schema example: Sakila Rentals

Star schema is 'just' an implementation Optimized for simplicity Optimized for performance (?) Heavily denormalized dimensions There is an alternative: Snowflake Normalized dimensions Stars versus Snowflakes

Country Year City Quarter Month Store Date Week Language Film Rating Rentals Minute Customer Hour Country Staff City Snow Flake example: Sakila Rentals

Part V: A Star is Born Starring Sakila

MySQL Sample Database DVD rental business http://dev.mysql.com/doc/sakila/en/sakila.html Overly simplified database schema Typical OLTP database Dimensional Model example

Category Language Country Actor Film City Store Inventory Address Staff Rental Customer 3NF Source schema: Sakila Rentals

Who? Staff Customer When? What? Film Date Fact: Rentals Time Store Where? Target Star Schema

Select Business Process Sales, Purchase, Storage,... Define Facts and Key Metrics Facts: Key Event in Business Process Metrics (Fact Attributes): Count or Amount Choose Dimensions and Hierarchies What? When? Where? Who? Why? Dimensional Design

Select Business Process Rentals Identify Facts Count (number of rentals) Rental Duration Choose Dimensions What: Films Who: Customer, Staff When: Rental, Return Where: Store Example Business Process: Rentals

Inventory Staff Customer Rental A star is born: Rentals 3NF

Category Film Category Store Film Inventory Staff Customer Rental A star is born: Rentals 3NF

Category Film Category Store Film Inventory Staff Customer Rental A star is born: Denormalize

Store Film Category Store Staff Customer Rental A star is born: Denormalize

Address Store Film Category Store Staff Rental A star is born Customer

Address Store Film Category Store Staff Customer Rental A star is born: Denormalize

Language Film Category Address Store Address Store Staff Customer Rental A star is born: Denormalize

City City Language Film Category Address Store Address Store Staff Customer Rental A star is born: Denormalize

Country Country City City Language Film Category Address Store Address Store Staff Customer Rental A star is born: Rental Snowflake

What: Film Where: Store Who: Staff Who: Customer Language Film Country Store Country City Address Staff City Address Category Store Customer Rental A star is born: Rental Star Schema

Something is missing... Who? (Customer, Staff) What? (Film) Where? (Store)...? Dimensional Design

What: Film Where: Store Who: Staff Who: Customer Rental When: Date When: Time A star is born: Rental Date and Time

What: Film Where: Store Who: Staff Who: Customer Rental When: Rental Date When: Rental Time When: Return Date When: Return Time Role Playing: Date/Time for both Rentals and Returns

Rental Star Schema

Part IV: Filling the Data Warehouse Starring Sakila

Sources Back-end Front-end BI Applications Data Warehouse ERP Meta Data Staging Area CRM Datamarts External Data Extract Transform Load Reporting Analysis Visualizations Dashboards Data Mining Present High Level Data Warehouse Architecture

Physical Design Source to Target Mapping Define how data in the data warehouse is derived from data in the source system(s) Specification for designing the ETL process Column-level mapping Source system, schema, table, column, data type Target dimension/fact, column, defaults Transformation rules, cleansing, lookup, calculation Planning the ETL Process

Staging? Changed Data Capture / Extraction Denormalization Derived data / Enrichment Cleansing / Conforming History policy (dimensions) Granularity Dimension Lookup (facts) Designing the ETL Process

Flow ETL Engine Transformations Jobs Data flow and processing Workflow of ETL tasks Tools Spoon Kitchen Pan Designing ETL with Kettle

Load Dimension Tables Load Fact table Loading a Fact Table

Get Customers source data Lookup Address (Denormalize) Update Dimension Loading a Dimension Table

Loading a Fact Table

Part V: Presenting the Data: BI Applications Starring Sakila

months, years: Data mining Should we become an appliance vendor instead of delivering software solutions Executives Strategic OLAP/Analysis weeks, months: In what region should we open a new store? Managers Tactical days, weeks: Who's available for tomorrow's shift Analysts Customers Partners Reporting Employees Operational Business Intelligence Scope

Reporting

Reporting Mostly Operational Lists and Grouping Typically standardized Typically no or limited interactivity Subreporting

months, years: Data mining Should we become an appliance vendor instead of delivering software solutions Executives Strategic OLAP/Analysis weeks, months: In what region should we open a new store? Managers Tactical days, weeks: Who's available for tomorrow's shift Analysts Customers Partners Reporting Employees Operational Scope of Reporting

Reporting

Analysis

Analysis Tactical, Strategic OLAP Online Analytical Processing Pivot tables Typically Interactive Slice and Dice Drilldown Typically Ad-hoc

months, years: Data mining Should we become an appliance vendor instead of delivering software solutions Executives Strategic OLAP/Analysis weeks, months: In what region should we open a new store? Managers Analysts Tactical days, weeks: Who's available for tomorrow's shift Customers Partners Reporting Employees Operational Scope of OLAP & Analysis

Analysis Interactive Pivot table

Data Mining

Data Mining Strategic, Tactical Discover hidden patterns in data Machine learning Statistic analysis Typically not interactive, long running Expert matter Not readily consumable by end-users Characteristics of back-end processing

months, years: Data mining Should we become an appliance vendor instead of delivering software solutions Executives Strategic OLAP/Analysis weeks, months: In what region should we open a new store? Managers Tactical days, weeks: Who's available for tomorrow's shift Analysts Customers Partners Reporting Employees Operational Scope of Data Mining

Data Mining

Charts and Graphs

Charts and Graphs Operational, Tactical, Strategic Summarize large dataset Not a separate class but a presentation Data Visualization Standardized or ad-hoc Can be interactive Drive a subreport Drive drilldown

Dashboarding

Dashboarding Operational, Tactical, Strategic Not a separate class but a presentation Bundle: key metrics for a particular role or perspective different views on the same metrics Can contain reports, pivot tables, charts, graphs Typically interactive

Dashboard