Oracle Data Warehousing Masterclass Mark Rittman, Director, Rittman Mead Consulting UKOUG Conference & Exhibition 2008 1
Who Am I? Oracle BI&W Architecture and Development Specialist Co-Founder of Rittman Mead Consulting Oracle BI&W Project Delivery Specialists 10+ years with Discoverer, OWB etc Oracle ACE Director, ACE of the Year 2005 Writer for OTN and Oracle Magazine Longest-running Oracle blog http://www.rittmanmead.com/blog Chair of UKOUG BIRT SIG Co-Chair of ODTUG BI&DW SIG Speaker at IOUG and BIWA events 2
Rittman Mead Consulting Oracle BI&DW Project Specialists providing consulting, training and support Clients in the UK, USA, Europe, Middle-East Voted UKOUG BI Partner of the Year 2008 Exhibitors at UKOUG 2008, Stand 90 Come and see us to discuss your requirements in more detail 3
So What Is A Data Warehouse? Store of historical information optimized for running queries Follows common design patterns: Inmon or Kimball Subject orientated, integrated, non-volatile and time-variant but there can be exceptions! High volumes of data Tuned and optimized for queries 4
Database Key Features for Data Warehousing Partitioning Parallel Query & DML Segment Compression Bitmap Indexes Materialized Views Star Transformations OLAP Option Data Mining Option Oracle Warehouse Builder Oracle Warehouse Builder Core Functionality 5
New Features in Oracle Database 11g and 10gR2 More partitioning options Advanced Compression Embedding of Oracle Warehouse Builder Cube Organized Materialized Views ADDM, AWR, ASH and the OEM Performance Views and Advisors Just Announced at Oracle Open World 2008 : Exadata Storage Server 6
Data Warehouses Are Often Very Large Data Warehouse databases are often very large Usually >100Gb, often >10Tb, now >1 Petabyte is not unknown Large databases require special handling Because they take a long time to query Because they take a long time to backup Because they take up a lot of disk space Because they take a long time to load with data 7
Oracle s Support for Very Large Databases (VLDB) The Partitioning Option Segment Compression The Advanced Compression Option Parallel DML Automatic Storage Management 8
Oracle Partitioning Option Splits tables into lots of smaller tables Partitioned table still behaves as one table Partitions can be backed-up, made read-only, moved on to archive disks Queries that only require a subset of data will only consider the relevant table partitions ( partition pruning ) Queries that only require a subset of data will only join the relevant partitions ( partition-wise joins ) New data can be prepared offline and swapped in to an empty partitiion ( partition exchange ) The No.1 feature used in virtually every Oracle DW 9
Partitioning Methods Up to Oracle 10g Range Partitioning Rolling window operations Hash Partitioning Increased parallelism List Partitioning Ease of maintenance operations Composite Range-Hash Range-List 10
New Partitioning Features in Oracle Database 11g Interval Partitioning Automatic creation of range-based partitions REF Partitioning Partition detail table based on the master-table key Virtual-Column Based Partitioning Partition based on an expression 11
Interval Partitioning Interval Partitions Automatically creates partitions as they are needed Saves scripting the creation of new partitions each day/week/month 12
Interval Partitioning Details Specified by using the INTERVAL clause in the CREATE TABLE statement Technically an extension to range partitioning, requires at least one range partition, range key value determines the range high point ( transition point ) Database automatically creates interval partitions beyond the transition point Local indexes can also be created along with new partitions Partitioning key can only be a single column, and either DATE or NUMBER datatype STORE IN clause lets you specify which tablespaces are used (round-robin) Partition names are system generated Partition can be referenced through PARTITION FOR clause 13
Interval Partitioning Example Partition P0 is the transition point, all interval partitions will be created from this date onwards (> 1st Feb 2006) CREATE TABLE sales ( prod_id NUMBER(6), cust_id NUMBER, time_id DATE, channel_id CHAR(1), promo_id NUMBER(6), quantity_sold NUMBER(3), amount_sold NUMBER(10,2) ) PARTITION BY RANGE (time_id) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH')) ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) ); 14
Interval Partitioning Segment Creation CREATE TABLE sales ( prod_id NUMBER(6), cust_id NUMBER, time_id DATE, channel_id CHAR(1), promo_id NUMBER(6), quantity_sold NUMBER(3), amount_sold NUMBER(10,2) ) PARTITION BY RANGE (time_id) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH')) ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) ); 15
Interval Partitioning Segment Creation INSERT INTO sales (time_id, prod_id, quantity_sold,... VALUES TO_DATE( 01-03-2006 ), 146, 17,...); 16
Demonstration Interval Partitioning 17
REF Partitioning Partition a table based on a column in another table Partition an detail table by the master table primary key Useful, but actually not relevant for star schemas The wrong way around, we want to partition the dimensions by the fact keys 18
REF Partitioning Example CREATE TABLE orders ( order_id NUMBER(12), order_date DATE, order_mode VARCHAR2(8), order_total NUMBER(8,2), CONSTRAINT orders_pk PRIMARY KEY(order_id) ) PARTITION BY RANGE(order_date) ( PARTITION Q1_2005 VALUES LESS THAN (TO_DATE('01-APR-2005','DD-MON-YYYY')), PARTITION Q2_2005 VALUES LESS THAN (TO_DATE('01-JUL-2005','DD-MON-YYYY')), PARTITION Q3_2005 VALUES LESS THAN (TO_DATE('01-OCT-2005','DD-MON-YYYY')), PARTITION Q4_2005 VALUES LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')) ); CREATE TABLE order_items ( order_id NUMBER(12) NOT NULL, line_item_id NUMBER(3) NOT NULL, product_id NUMBER(6) NOT NULL, quantity NUMBER(8), CONSTRAINT order_items_fk FOREIGN KEY(order_id) REFERENCES orders(order_id) ) PARTITION BY REFERENCE(order_items_fk); 19
Virtual Column-Based Partitioning A new feature in Oracle Database 11g is Virtual Columns Allows you to create additional table columns that are based on expressions Purely virtual, metadata only Can have statistics gathered for them This column can then be used as the partition key for a partitioning clause CREATE TABLE accounts ( acc_no NUMBER(12), acc_name VARCHAR2(20), acc_branch NUMBER(2) GENERATED ALWAYS AS TO_NUMBER (SUBSTR(TO_CHAR(ACC_NO),1,2)) ) PARTITION BY LIST (acc_branch) 20
Segment Compression Segments (tables) can be compressed Compression can be specified at the partition level Typical compression ratios are 3:1 to 5:1 Particularly suited to DW schemas as lots of redundant/repeated data Has two main benefits: Reduces the amount of disk space required for the data warehouse Packs more rows of data into a typical block - good for full table scans Historically, only certain operations load data in a compressed form INSERT */+APPEND */ CREATE TABLE... AS SELECT ALTER TABLE... MOVE Compressed tables are mainly used for read-only operations Update, inserts and deletes are supported, but rows will be permanently decompressed 21
New In Oracle Database 11g : Advanced Compression Oracle Database 11g introduces the Advanced Compression Option Provides compression for the following data types OLTP relational data (supports INSERT, UPDATE, DELETE and retains compression) Unstructured data through SecureFiles (replacement for LOBs) Backups OLTP performance is unaffected as the blocks can be read in their compressed form Performance is actually likely to improve as more data per block (reduced I/O) 22
Enabling OLTP Compression in Tables Oracle 11g extends the COMPRESS clause to enable OLTP compression DW compression is enabled with COMPRESS FOR DIRECT_LOAD OPERATIONS CREATE TABLE emp ( emp_id NUMBER, first_name VARCHAR2(128), last_name VARCHAR2(128) ) COMPRESS [FOR DIRECT_LOAD OPERATIONS]; OLTP compression can now be enabled using COMPRESS FOR ALL OPERATIONS CREATE TABLE emp ( emp_id NUMBER, first_name VARCHAR2(128), last_name VARCHAR2(128) ) COMPRESS FOR ALL OPERATIONS; 23
How OLTP Compression Works Table is initially uncompressed when loaded When PCTFREE level reached, compression is triggered New inserts are added uncompressed When PCTFREE level reached again, compression is triggered again DML still causes block uncompression but data is recompressed when PCTFREE is reached again Inserts are again uncompressed Overhead Free Space Uncompressed Block usage reaches PCTFREE triggers Compression Compressed Inserts are uncompressed Block usage reaches PCTFREE triggers Compression 24
Demonstration Advanced Compression 25
ETL Enhancements in Oracle 10g and 11g Incorporation of Oracle Warehouse Builder into 11g Installer DML Error Logging (10gR2+) Previous releases introduced SQL ETL functions Merge Multi-Table Insert Table Functions etc 26
DML Error Logging Prior to Oracle Database 10gR2, loading data containing errors was handled two ways Process the data prior to loading strip out errors, then load Load the data using PL/SQL and the SAVE EXCEPTIONS clause DML Error Logging (Oracle 10gR2+) provides a third option An Error Logging table is created, data is INSERTed using the LOG ERRORS clause Rows causing errors are then loaded into the error log DBMS_ERRLOG.CREATE_ERROR_LOG ( dml_table_name err_log_table_name err_log_table_owner err_log_table_space skip_unsupported IN VARCHAR2, IN VARCHAR2 := NULL, IN VARCHAR2 := NULL, IN VARCHAR2 := NULL, IN BOOLEAN := FALSE); LOG ERRORS [INTO [schema.]table] [ (simple_expression) ] [ REJECT LIMIT {integer UNLIMITED} ] 27
DML Error Logging DML Error Logging will catch the following load errors: Column values that are too large Constraint violations (NOT NULL, unique, referential, and check constraints), except in certain circumstances detailed below Errors raised during trigger execution Errors resulting from type conversion between a column in a subquery and the corresponding column of the table Partition mapping errors DML Error Logging will NOT catch the following errors: Violated deferred constraints Out-of-space errors Any direct-path INSERT operation (INSERT or MERGE) that raises a unique constraint or index violation Any UPDATE operation (UPDATE or MERGE) that raises a unique constraint or index violation 28
DML Error Logging Example SQL> BEGIN 2 DBMS_ERRLOG.CREATE_ERROR_LOG('SALES_TARGET'); 3 END; 4 / SQL> INSERT /*+ APPEND */ 2 INTO sales_target 3 SELECT * 4 FROM sales_src 5 LOG ERRORS 6 REJECT LIMIT UNLIMITED 7 ; 918834 rows created. SQL> SELECT count(*) 2 FROM err$_sales_target 3 ; COUNT(*) ----- 9 29
DML Error Logging Example SQL> COLUMN ora_err_mesg$ FORMAT A50 SQL> SELECT ora_err_number$ 2, ora_err_mesg$ 3 FROM err$_sales_target 4 ; ORA_ERR_NUMBER$ ORA_ERR_MESG$ --------------- ------------------------------ 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated 1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST"."SALES_TARGET"."PROMO_ID") Elapsed: 00:00:00.06 30
Demonstration DML Error Logging 31
Incorporation of OWB Into 11g Database Install Oracle s Data Warehouse design & build tool Now at version 11gR1, 11gR2 due 2008/2009 From 11gR1, OWB is automatically installed into RDBMS database home Repository schema automatically created Repository now divided into workspaces Other functionality much the same 11gR2 release is next major revision Support for ODI code templates Heterogeneous support OBIEE support CDC support 32
Oracle Database Machine and Oracle Exadata Storage Server New hardware initiative in association with HP Dedicated storage server, filters data at disk level Balanced disk and network capacity New enhancements delivered through 11.1.0.7 patchset Exadata Storage Server available standalone, or as part of HP Oracle Database Machine appliance Aimed at DW Appliance Market 33
Exadata Smart Scan Processing Exadata Storage Server Cells offload the predicate evaluation from the DB server Only returns relevant rows Only returns relevant columns Join filtering Incremental backup filtering Data reduction to around 10% Reduction in CPU load Filters are pushed to the disk unit 34
Oracle s In-Database OLAP Engine : The OLAP Option An option to Oracle Database Enterprise Edition, now at version 11g A alternative summary management solution for SQL-based business intelligence applications An alternative to table-based materialized views, offering improved query performance and fast, incremental updates A fully-featured multidimensional OLAP server Excellent query performance for ad-hoc and unpredictable queries Enhances the analytic content of business intelligence applications through new calculation capabilities Fast, incremental updates of data sets 35
Oracle OLAP Storage Characteristics Unlike relational storage, OLAP servers store data in multi-dimensional arrays Structured around business rules and business concepts Dimensions, facts, hierarchies, aggregation rules etc Can be faster to access cells of data for ad-hoc analysis Summary data stored in the cube, along with detail Offset addressing, no need for joins OLAP engines provide advanced analytics Time-series analysis Non-additive calculations Financial calculations Statistical forcasting, Models 36
Traditional Way of Accessing a Data Warehouse Query tools access star schema stored in an Oracle data warehouse Most queries at a summary level Summary queries against detail-level data can be expensive to process Aggregation Calculations Hence the need for summarizations, materialized views 37
Materialized Views and Automatic Query Rewrite Most DW customers use materialized views to improve summary performance Define appropriate summaries based on query patterns Each summary is typically defined at a particular grain Sales by month and category Profits by city, quarter and channel The CBO automatically rewrites queries to use materialized views as appropriate 38
The Materialized View Challenge Creating materialized views to support ad-hoc query patterns is challenging Managing multiple MV objects Ensuring they rewrite properly Ensuring they refresh properly Users expect excellent query response time across any summary Potentially many MVs to manage Practical limitations on the number of MVs you can manage, the CBO can consider and you can built in time 39
New in 11g : Cube Organized Materialized Views Summaries can now be stored in a single Oracle OLAP multidimensional cube A single cube can contain the equivalent of thousands of summarycombinations The Oracle 11g CBO treats OLAP cubes as MVs and rewrites queries to access cubes transparently Cubes refreshed using standard MV procedures 40
Positioned as the Oracle Data Warehouse Enhancer Transparent integration with SQL Swap-in replacement for MVs OLAP dimension, fact data also accessible directly via SQL views if rewrite is not appropriate Calculations in cube not present in relational source data CBO is aware of nature of SQL OLAP views, joins data in the OLAP layer 41
Demonstration Cube Organized Materialized Views 42
Optimizing and Tuning the Data Warehouse Prior to Oracle 10g, data warehouse tuning was largely a manual process 10046 trace files for session-level tuning, statspack for system-level tuning Oracle Database 10g and 11g come with a number of Advisors Provide insights into database performance, bottlenecks, potential resolutions Automatic recording of diagnostic data Require the Tuning and Diagnostics Pack Several in-process Advisors Automatic Workload Repository Automatic Session History Automatic Database Diagnostic Monitor A number of pages on Oracle DB Console Performance page Advisor Central Storage Management Application & SQL Management Backup & Recovery Management Database Management Intelligent Infrastructure System Resource Management Space Management 43
Automatic Workload Repository Successor to Statpack (though not free) Collects performance-related statistics on a regular (1-hour default) basis Monitors activity at the system level Periodically flush coarser-grain information to disk - in a self-managed tablespace, readily available & real-time accessible when needed On by default - flush to disk every 30 min, keep for 7 days New Oracle Background MMON, SYSAUX is the home tablespace Scripts: Awrrpt.sql Default AWR Report Ashrpt.sql Report on ASH Sampling Awrsqrpt.sql Detailed report on SQL 44
AWR Architecture 45
Active Session History Circular buffer that records statistics on active sessions Used for analyzing wait events for individual sessions Alternative to 10046 Extended SQL Tracing V$SESSION_WAIT with a bit of historical data 46
Automatic Database Diagnostic Monitor (ADDM) Performance expert in a box; based around response-time approach Provides database-wide performance diagnostic, including RAC Automatically runs at every capture of workload repository snapshot Provides impact and benefit analysis Provides Information vs. raw data Integrates all components using the Time Model Uses AWR snapshots Uses MMON to generate reports every 30 minutes Report format is very similar to STATSPACK, only better Storage Management Application & SQL Management Backup & Recovery Management Database Management Intelligent Infrastructure System Resource Management Space Management 47
Oracle DB Console Performance Features Performance tab provides an overview of server and database usage Host activity (CPU usage on Windows, Run Queue on Unix) Average Active Sessions (from ASH) Throughput, I/O, Parallel Execution and Services Links to other advisors, tasks 48
Average Active Sessions Provides detail on the waits within the selected wait class Top SQL and sessionsfor the selected wait class 49
View SQL and Execution Plan View SQL, Execution Plan for highlighted SQL statements View as diagram or as text 50
Create and View ADDM Reports Create and view ADDM reports Diagnose system-wide issues and provide resolutions 51
View Throughput, I/O, Parallel Query, Services View instance throughput, correlate with server and session load View disk I/O, degree of parallelism for queries 52
Tracing Individual Sessions The DB Console reports shown previously are system-wide, equivalent to statspack Sometimes you would want to trace an individual session, using ASH Accomplished through setting a CLIENT_ID for the individual session Session can then be spotted in the Top Consumers report ASH Report can be generated for this CLIENT_ID exec dbms_session.set_identifier('sql_test') select sum(sales) from customers, sales s where... 53
Generating Session Level ASH Report The Performance page has a Generate ASH Report button next to the Average Active Sessions graph ASH Report can then be filtered by the CLIENT_ID Review session-level waits, top SQL etc to identify issues 54
The SQL Access Advisor Part of the 11g Tuning Pack (add-on to Enterprise Edition) Recommends use of indexes, materialized views and partitions Good for MVs, only recommends b-tree indexes and hash partitions (in my experience) 55
The SQL Performance Analyzer Provides an environment for replaying a set of SQL statements in initial and changed environments, reports on impact of changes Useful for testing changes to SQL statements, DB parameters, indexes etc 56
SQL Tuning Sets Captured workloads that can be used as the input for tuning exercises Can grab SQL from the SQL cache, or listen over a defined period for SQL Stored in the database as a database object Can be exported, backed up, replayed etc 57
Demonstration Database Control Advisors and Performance Features 58
Over The Past Two Hours, We Have Covered... Introduction to Oracle 11g Data Warehousing New Extraction, Transformation & Load features in 10g and 11g New Partitioning and Compression features Summarization and Oracle OLAP Data Warehousing Performance features in Database Control 11g 59
And Finally... Thank you all for attending I hope the two hours have been of interest to you Please remember to complete the course evaluations Leave your contact details with me to forward on the slides Have a safe journey home Goodbye! 60
Oracle Data Warehousing Masterclass Mark Rittman, Director, Rittman Mead Consulting UKOUG Conference & Exhibition 2008 61