Data Warehousing with Oracle

Similar documents
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Understanding Data Warehousing. [by Alex Kriegel]

IST722 Data Warehousing

Oracle Database 11g: Administer a Data Warehouse

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Fluency With Information Technology CSE100/IMT100

Data Integration and ETL with Oracle Warehouse Builder NEW

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Data Warehousing Concepts

When to consider OLAP?

SQL Server 2012 Business Intelligence Boot Camp

Data Warehousing Fundamentals Student Guide

Data Warehouse Overview. Srini Rengarajan

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

SQL Server 2005 Features Comparison

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

2009 Oracle Corporation 1

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

DATA WAREHOUSING AND OLAP TECHNOLOGY

Part 22. Data Warehousing

SAS BI Course Content; Introduction to DWH / BI Concepts

Lection 3-4 WAREHOUSING

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Oracle Warehouse Builder 10g

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Oracle Database 11g: Data Warehousing Fundamentals

Chapter 5. Learning Objectives. DW Development and ETL

Building an Effective Data Warehouse Architecture James Serra

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

Speeding ETL Processing in Data Warehouses White Paper

Extraction Transformation Loading ETL Get data out of sources and load into the DW

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing and Data Mining

East Asia Network Sdn Bhd

BENEFITS OF AUTOMATING DATA WAREHOUSING

Data Warehouse Design

Data Integration and ETL Process

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Implementing a Data Warehouse with Microsoft SQL Server

Data Warehousing Systems: Foundations and Architectures

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Oracle Architecture, Concepts & Facilities

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

The Data Warehouse ETL Toolkit

College of Engineering, Technology, and Computer Science

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Chapter 3 - Data Replication and Materialized Integration

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

How to Enhance Traditional BI Architecture to Leverage Big Data

Week 13: Data Warehousing. Warehousing

Real-time Data Replication

Cúram Business Intelligence Reporting Developer Guide

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Dimensional Modeling for Data Warehouse

Building a Data Warehouse

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

SQL Server Administrator Introduction - 3 Days Objectives

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

Week 3 lecture slides

Data Warehouse Database Design Student Guide

Increase productivity and safety data of warehouse systems using Shareplex for Oracle

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Designing a Dimensional Model

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

A Critical Review of Data Warehouse

Sterling Business Intelligence

SQL Server Business Intelligence on HP ProLiant DL785 Server

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Oracle Database 11g for Data Warehousing

Data Integration and ETL with Oracle Warehouse Builder: Part 1

Innovative technology for big data analytics

IBM WebSphere DataStage Online training from Yes-M Systems

Data Warehouse: Introduction

Oracle Database In-Memory The Next Big Thing

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Microsoft Data Warehouse in Depth

High-Volume Data Warehousing in Centerprise. Product Datasheet

MDM and Data Warehousing Complement Each Other

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Transcription:

Data Warehousing with Oracle Comprehensive Concepts Overview, Insight, Recommendations, Best Practices and a whole lot more. By Tariq Farooq A BrainSurface Presentation

What is a Data Warehouse? Designed and built for Performance, Performance & Performance! Relational database designed for query and analysis instead of transaction processing. Typically contains historical data. Consolidates data from various sources. Separates Analysis Workload from Transaction Workload. Has an ETL (Extraction, Transformation & Loading) engine.

What is a Data Warehouse? Designed for OLAP, Data Mining, Client Analysis Tools and other Data Gathering / Analysis applications. Good Design includes a Staging Area for receiving / transforming data from various transactional processing / feeder systems. Has Metadata, Raw Data, Mining Data & Summary Data. May contain Data Marts (Categorized subsets of a data warehouse).

Characteristics of a Data Warehouse According to William Inmon, a Data Warehouse has the following main characteristics: Subject Oriented Integrated Non-volatile Time Variant

Comparison of OLTP and Data Warehouse Systems Comparison of OLTP & Data Warehouse Systems Diagram/Figure from Oracle Documentation Data Warehouse Design Architected and designed for ad-hoc queries and analysis. DML: Gets data from Transactional Processing / Feeder systems (mostly in bulkloads). Schemas are de-normalized. Large volumes of data (thousands, millions, billions of records) are read for analysis. Contains Archived/Historical Data.

Data Warehouse Architecture Data Warehouse Architecture with a Staging Area & Data Marts Diagram/Figure from Oracle Documentation Transactional Processing Systems feed data to the Data Warehouse. Staging Area is used to Cleanse and Process (Transform) your data. Pre-computed Aggregates are stored in Summary Data. Data Warehouse can contain Data Marts which are separated and categorized subsets of a Data Warehouse.

Benefits of Data Warehouses Massive Information Storage & Retrieval for reporting, analysis & decision-making. Consolidation of Information from various (similar and disparate) sources. Archival of historical Enterprise data. Enterprise-wide (Company-as-a-whole) Information is available for making well-informed, intelligent and time-sensitive decisions: Therefore, Data Warehouses are also called Decision Support Systems (DSS).

Information Retrieval from Data Warehouses Reporting Data Mining Data Visualization Online Analytical Processing (OLAP)

Designing a Data Warehouse Logical Design Physical Design

Data Warehouse: Logical Design Identify Business Requirements Create a Conceptual Design Use Entity-Relationship Diagramming

Data Warehouse: Logical Design Translate and map your design into dimensional data warehouse schemas: Star Schemas Snow-flake Schemas

Data Warehouse: Logical Design: What is a Star Schema? Looks like a star. Example of a Star Schema Diagram/Figure from Oracle Documentation Most commonly used dimensional model in Data Warehouses. Recommended for its simplicity, ease-of-use and most importantly performance. Entity Relationships One-Level deep only.

Data Warehouse: Logical Design: What is a Star Schema? Star Schema consists of Facts and Dimension tables Diagram/Figure from Oracle Documentation Has one OR more fact tables. Each fact table has one OR more Dimension Tables joined to it. Fact tables are huge. Dimension tables are small. Referential Integrity constraints defined between Fact and Dimension tables.

Data Warehouse: Logical Design: Dimensions Example of a Dimension Hierarchy Diagram/Figure from Oracle Documentation Dimensions categorize data. Dimensions have one OR more hierarchies. Hierarchies are ordered levels of categorizing data within a dimension. Hierarchies can be used for data aggregation with different levels of granularity.

Data Warehouse: Physical Design Logical Design vs. Physical Design Translate and Map Logical Design into Physical Design: Tablespaces Tables Partitions on Tables Indexes Materialized Views Integrity Constraints Dimensions Diagram/Figure from Oracle Documentation

Data Warehouse: Physical Design: Systems Planning Recommendation: Plan for high Bandwith on I/O. Recommendation: Use parallelization on Database Instances (Oracle Real Application Clusters) to provide load-balancing and automatic failover for business continuity. Recommendation: Stripe and Mirror Everything (SAME). Storage Capacity Planning.

Data Warehouse: Physical Design: Partitions Recommendation: Partition your large tables into smaller manageable chunks. Partition Pruning helps queries scan for data in relevant partitions thereby reducing overall logical cost significantly. Recommendation: Composite Partitioning combines partitioning methods for faster performance e.g. Range-List, Range-Hash.

Data Warehouse: Physical Design: Indexes Recommendation: Use Bitmap Indexes in Data Warehouses. Bitmap Indexes are meant for columns with low cardinality. Bitmap Indexes significantly improve performance for ad-hoc queries. Recommendation: Bitmap Indexes on partitioned tables must be local.

Data Warehouse: Physical Design: Integrity Constraints Recommendation: Avoid performance overhead generated because of maintaining / enforcing constraints in Data Warehouses. Recommendation: Unique Constraints: Use DISABLE VALIDATE Clause. Recommendation: Foreign Key Constraints: Use ENABLE NOVALIDATE clause. Recommendation: Because Transactional Processing Systems and ETL Processes regularly validate data integrity, RELY (Belief / Non-Enforceable) clause can be used as an alternative to above e.g. RELY DISABLE NOVALIDATE.

Data Warehouse: Physical Design: Materialized Views Materialized views are used to store precomputed summaries / aggregates of data in Data Warehouses. Recommendation: Create Fast-Refreshable Materialized views using Materialized View Logs. Materialized Views enable QUERY_REWRITE, which results in amazing performance gains.

Data Warehouse: Physical Design: Materialized Views Recommendation: Define Materialized Views and Dimensions. Recommendation: Enable/Deploy Fast- Refresh of Materialized Views (where-ever possible). Alter System SET QUERY_REWRITE_ENABLED = TRUE. Summary Management using Materialized Views Diagram/Figure from Oracle Documentation

Data Warehouse: Physical Design: Materialized Views Transparent Query Rewrite using Materialized Views Diagram/Figure from Oracle Documentation Transparent Query Rewrite dramatically increases query speed. Query Optimizer automatically re-routes the query to the materialized view for better performance. Materialized Views can be partitioned for performance.

Data Warehouse: Physical Design: Partition Change Tracking (PCT) Partition Change Tracking identifies which rows in a materialized view are affected by a detail table partition. Partition Change Tracking enables PCT refresh on Materialized Views for tracking a finer grain of freshness. DBMS_MVIEW.PMARKER function significantly reduces the cardinality of the materialized view.

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Data Extraction from various (homogenous/heterogeneous) sources. Physical Transportation of potentially large volume of data from source systems to target warehouse. Transformation and Loading of data into Target Warehouse.

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Types of Extraction Full Extraction Logical Extraction Extraction Methods Online Extraction Offline Extraction

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Online Extraction Direct Connection to source system Snapshot Logs, Change Tables etc. Offline Extraction Dump/Flat Files Redo/Archive Logs External Tables Transportable Tablespaces

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Incremental Extraction: Change Data Capture Tremendously Efficient Trickle-Feeding Data Acquisition & Delivery Enables Near-Real-Time Data Warehousing Can be implemented by the following methods: Triggers Timestamps Partitioning

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Transportation Mechanisms: Flat Files Distributed Query Operations Transportable Tablespaces Copy generated data using periodically run batch processes.

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Loading & Transformation: Complex & Costly. Can range from simple data conversions to highly complex methods. Loading Mechanisms: SQL*Loader. External Tables. Transportable Tablespaces. Custom Code/Scripts/APIs.

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Flow of a Pipelined Data Transformation Diagram/Figure from Oracle Documentation

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Flow of a Pipelined Data Transformation with Fanout Diagram/Figure from Oracle Documentation

Data Warehouse: Physical Design: Extraction, Transformation & Loading (ETL) Error Handling: Business Rule Violations. Data Rule Violations. Filtering of Erroneous Data: Identification Separation Error Tables

Change Data Capture Prior Techniques: Table Differencing (MINUS Operators). Change-Value Selection. The new wave: Change Data Capture Synchronous CDC (Trigger-based). Asynchronous CDC (Log-based).

Change Data Capture: The Holy Grail of Data Warehousing Near Real-Time Data Warehousing (Trickle-Feed Data Acquisition & Delivery) Publish & Subscribe Model: DBMS_CDC_PUBLISH DBMS_CDC_SUBSCRIBE Advanced Queues Oracle Streams

Change Data Capture Example of Publishers in a CDC System Diagram/Figure from Oracle Documentation

Change Data Capture Example of a Subscriber in a CDC System Diagram/Figure from Oracle Documentation

Change Data Capture Example of Synchronous CDC Diagram/Figure from Oracle Documentation

Change Data Capture Example of Asynchronous CDC Diagram/Figure from Oracle Documentation

Change Data Capture Example of Asynchronous Autolog Online CDC Diagram/Figure from Oracle Documentation

Change Data Capture Example of Asynchronous Distributed Hotlog CDC Diagram/Figure from Oracle Documentation

Change Data Capture Configuration of Streams Propagation used to achieve Asynchronous Distributed Hotlog CDC Diagram/Figure from Oracle Documentation

Performance: Materialized Views & Query Rewrites Steps involved in a Query Rewrite Process Diagram/Figure from Oracle Documentation

Performance: Materialized Views & Query Rewrites Materialized Views are Precomputed/Pre-generated tables made up of summaries/queries. EXAMPLE: CREATE MATERIALIZED VIEW join_sales_time_product_mv ENABLE QUERY REWRITE AS SELECT p.prod_id, p.prod_name, t.time_id, t.week_ending_day, s.channel_id, s.promo_id, s.cust_id, s.amount_sold FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id;

Performance: Materialized Views & Query Rewrites Query Rewrite Types: Text Match Rewrite Join Back Aggregate Computability Aggregate Rollup Rollup Using a Dimension Filtering the Data (Subset of Data) PCT Rewrite Multiple Materialized Views

Performance: Common Schema Models Relational Schemas (OLTP Systems) 1 st Normal Form (1NF) to 6 th Normal Form (6NF) 3NF is mainstream and widely used: Every non-prime attribute is non-transitively dependent on every key of the table. The key, us the whole key, and nothing but the key. Dimensional Schemas (Data Warehouses): Star Schemas (Highly Recommended) Snowflake Schemas

Performance: Common Schema Models OLTP Schema Model: 3rd Normal Form (3NF): Example of a small 3NF Schema Diagram/Figure from Oracle Documentation

Performance: Materialized Views & Query Rewrites Dimensional Schema Model: Star Schema: Example of a Star Schema Diagram/Figure from Oracle Documentation

Performance: Materialized Views & Query Rewrites Dimensional Schema Model: Snow-flake Schema: Example of a Snow-flake Schema Diagram/Figure from Oracle Documentation

Performance: Aggregation Cubes Rollups Grouping Functions Composite Columns

Performance: Aggregation Example of Logical Cubes & Views Diagram/Figure from Oracle Documentation

Tools: Oracle Data Integrator Oracle Data Integrator Overview Diagram/Figure from Oracle Documentation

Tools: Oracle Data Integrator Oracle Data Integrator Overview Diagram/Figure from Oracle Documentation

Tools: Oracle Data Integrator Oracle Data Integrator Overview Diagram/Figure from Oracle Documentation

Tools: Oracle Data Integrator Oracle Data Integrator Overview Diagram/Figure from Oracle Documentation

Tools: Oracle Warehouse Builder Computer Aided Software Engineering (CASE) Tool Rich Oracle-centric Data Warehouse building feature-set Schema Modeling Mapping Data Quality Data Profiling

Tools: Oracle Warehouse Builder CASE Tool:: Practical Usage: Design, Build & Deploy Schema Structures e.g. Tables, Indexes, External Tables etc. Design, Build & Deploy Mappings Design, Build & Deploy Transformations Design, Build & Deploy Dimensions Design, Build & Deploy Cubes Design, Build & Deploy Queues 11g Rel. 2 (Trickle-feed Data Acquisition & Delivery)

Tools: Comparison of Oracle Data Integrator (ODI) with Oracle Warehouse Builder (OWB) Both tools complement each other: However there is some overlapping functionality. Oracle s product direction is to merge the tools into a single product bringing the best of both worlds into a single consolidated offering in the near future. For now, both tools are separately supported.

Project Planning: Approach: Bill Inmon vs. Ralph Kimball Bill Inmon's Model: Top-Down Design: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form. Ralph Kimball's Model: Bottom-Up Design (Recommended): Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

Project Planning: Phased Implementation Business Requirements: Scope / Users / Initiatives Risk Assessment Key Requirements / Conceptual Design: Dimensional Modeling: Star Schemas Reporting OLAP Data Mining Data Marts Technical Implementation: Future-proofing Infrastructure: Hardware/Software Human Resources Technical Execution

Physical Implementation: Software Infrastructure: RDBMS: Its all about blazing performance! Recommedation: Oracle 11g Rel. 2 Enterprise Edition with OLAP, XML DB & OWB options Recommedation: >= 2-Node Real Application Cluster Configuration Recommedation: DataGuard with DataGuard Broker for Business Continuity / Automatic Failover

Physical Implementation: Software Infrastructure: RDBMS: Recommendations: Its all about blazing performance! >= 16KB Block Size QUERY_REWRITE_ENABLED = TRUE STAR_TRANSFORMATION_ENABLED = TRUE

Physical Implementation: Hardware Infrastructure: Network: Recommendations: Its all about blazing performance! Network Backbone Infrastructure: >= GigE (1 Gigabits per second) High-Speed SAN Infrastructure: >= 8Gb/s (8 Gigabits per second backbone): Fibre-Channel, Infini-band etc. High-Speed SAN: Drives: Fibre-Channel, Serial Attached SCSI (SAS) Drives etc. SAN Mirroring: RAID 1+0 (10) for PROD SAN Mirroring: RAID 5 for DEV, QA, TEST

Physical Implementation: Hardware Infrastructure: Servers: Recommendations: Its all about blazing performance! Clustered Low-Cost Commodity Hardware >= Dual-Socket Quad-Core Processors with Hyper-Threading >= 64GB RAM >= 2.5Ghz Processor Clock Speeds Dual HBAs / HCAs >= 8Gb/s

Summary To summarize, DataWarehousing in Oracle is proven, stable and fast and is used by entities, corporations and organizations worldwide to meet their BI/Warehousing needs. Learn more at Oracle's Datawarehousing Homepage. http://www.oracle.com/us/solutions/data warehousing/index.htm