Scaling To Infinity: Partitioning Data Warehouses on Oracle Database. Thursday 15-November 2012 Tim Gorman www.evdbt.com



Similar documents
1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

Automatic Data Optimization

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database 12c Built for Data Warehousing O R A C L E W H I T E P A P E R F E B R U A R Y

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Oracle Database 11g: SQL Tuning Workshop

Expert Oracle. Database Architecture. Techniques and Solutions. 10gr, and 11g Programming. Oracle Database 9/, Second Edition.

Introduction to SQL Tuning. 1. Introduction to SQL Tuning SkillBuilders, Inc. SKILLBUILDERS

<Insert Picture Here> Designing and Developing Highly Scalable Applications with the Oracle Database

Oracle Total Recall with Oracle Database 11g Release 2

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Safe Harbor Statement

news from Tom Bacon about Monday's lecture

Advanced Oracle SQL Tuning

Partitioning: Tips and Tricks. Arup Nanda Longtime Oracle DBA

March 9 th, Oracle Total Recall

Top 10 Performance Tips for OBI-EE

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Oracle Database 11g: Administer a Data Warehouse

Oracle Architecture, Concepts & Facilities

The safer, easier way to help you pass any IT exams. Exam : 1Z Upgrade Oracle9i/10g/11g OCA to Oracle Database 12c OCP.

Oracle InMemory Database

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Oracle Database 10g: New Features for Administrators

An Oracle White Paper August Automatic Data Optimization with Oracle Database 12c

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Performance Tuning for the Teradata Database

Data Warehousing With DB2 for z/os... Again!

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

Partitioning in Oracle Database 11g. An Oracle White Paper June 2007

Oracle Database 11g for Data Warehousing

Data Warehousing Fundamentals Student Guide

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Oracle DBA Course Contents

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Oracle Database In-Memory The Next Big Thing

Partitioning with Oracle Database 11g Release 2

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Data Warehousing Masterclass

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Oracle Database 10g: Administration Workshop II Release 2

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

Oracle 10g PL/SQL Training

SQL Performance for a Big Data 22 Billion row data warehouse

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Oracle Warehouse Builder 10g

D B M G Data Base and Data Mining Group of Politecnico di Torino

GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing

An Oracle White Paper November Hybrid Columnar Compression (HCC) on Exadata

IMPROVING QUERY PERFORMANCE IN VIRTUAL DATA WAREHOUSES

ESSBASE ASO TUNING AND OPTIMIZATION FOR MERE MORTALS

David Dye. Extract, Transform, Load

The Sins of SQL Programming that send the DB to Upgrade Purgatory Abel Macias. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Exadata HCC Case Study. - By Diwakar Kasibhotla Sr. Data Architect

Enhancing ETL Performance with Warehouse Builder

SQL Server Replication Guide

LOGGING OR NOLOGGING THAT IS THE QUESTION

Birds of a Feather Session: Best Practices for TimesTen on Exalytics

Contents at a Glance. Contents... v About the Authors... xiii About the Technical Reviewer... xiv Acknowledgments... xv

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

Oracle Database 12c: Introduction to SQL Ed 1.1

Performance Enhancement Techniques of Data Warehouse

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

RMAN What is Rman Why use Rman Understanding The Rman Architecture Taking Backup in Non archive Backup Mode Taking Backup in archive Mode

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

Querying Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Contents Overview of Planning Your Implementation... 7 Critical Performance Factors Planning Process Physical Sizing...

SQL Query Performance Tuning: Tips and Best Practices

Oracle Database 11gNew Features: Best Practices to Improve Scalability, Performance & High Availability

Data Warehousing Planning & Design

Database security tutorial. Part I

Oracle9i Database and MySQL Database Server are

Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate. Technical White Paper

Implementing a Data Warehouse with Microsoft SQL Server

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Oracle Database: Introduction to SQL

An Oracle White Paper March Best Practices for Implementing a Data Warehouse on the Oracle Exadata Database Machine

Oracle PL/SQL Injection

Restore and Recovery Tasks. Copyright 2009, Oracle. All rights reserved.

Partitioning under the hood in MySQL 5.5

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database 10g: Introduction to SQL

An Oracle White Paper January Advanced Compression with Oracle Database 11g

1 Introduction. 2 Technical overview/insights into FDAs. 1.1 What is what

Data Integration and ETL with Oracle Warehouse Builder: Part 1

SQL Server 2008 Core Skills. Gary Young 2011

Performance Verbesserung von SAP BW mit SQL Server Columnstore

CSPP 53017: Data Warehousing Winter 2013" Lecture 6" Svetlozar Nestorov" " Class News

Physical DB design and tuning: outline

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Oracle Database: SQL and PL/SQL Fundamentals

Course ID#: W 35 Hrs. Course Content

Transcription:

NoCOUG Scaling To Infinity: Partitioning Data Warehouses on Oracle Database Thursday 15-November 2012 Tim Gorman www.evdbt.com

NoCOUG Speaker Qualifications Co-author 1. Oracle8 Data Warehousing, 1998 John Wiley & Sons 2. Essential Oracle8i Data Warehousing, 2000 John Wiley & Sons 3. Oracle Insights: Tales of the Oak Table, 2004 Apress 4. Basic Oracle SQL 2009 Apress 5. Expert Oracle Practices: Database Administration with the Oak Table, 2010 Apress 28 years in IT C programmer, sys admin, network admin (1984-1990) Consultant and technical consulting manager at Oracle (1990-1998) Independent consultant (http://www.evdbt.com) since 1998 B of D, Rocky Mountain Oracle Users Group (http://www.rmoug.org) since 1995 B of D, Oracle Developer Tools Users Group (http://www.odtug.com) since 2012 Oak Table network (http://www.oaktable.net) since 2002 Oracle ACE since 2007, Oracle ACE Director since 2012

NoCOUG Agenda The virtuous cycle and the death spiral Basic 5-step EXCHANGE PARTITION load technique 7-step EXCHANGE PARTITION technique for dribble effect Performing MERGE/up-sert logic using EXCHANGE PARTITION

NoCOUG Data warehousing reality We have to recognize how features for large data volumes and optimal queries work together Partitioning Direct-path loading Compression Star transformation Bitmap indexes Bitmap-join indexes READ ONLY tablespaces Information lifecycle management Because it really isn t documented anywhere

The Virtuous Cycle Non-volatile time-variant data implies Data warehouses are INSERT only Insert-only data warehouses implies Tables and indexes range-partitioned by a DATE column Tables range-partitioned by DATE enables Data loading using EXCHANGE PARTITION load technique Partitions organized into time-variant tablespaces Incremental statistics gathering and summarization Data loading using EXCHANGE PARTITION enables Direct-path (a.k.a. append) inserts Data purging using DROP/TRUNCATE PARTITION instead of DELETE Bitmap indexes and bitmap-join indexes Elimination of ETL load window and 24x7 availability for queries NoCOUG

The Virtuous Cycle Direct-path (a.k.a. append) inserts enable Load more data, faster, more efficiently Optional NOLOGGING on inserts Basic table compression (9i) or HCC (11gR2) for Oracle storage Eliminates contention in Oracle Buffer Cache during data loading Optional NOLOGGING inserts enable Option to generate less redo during data loads Optimization of backups Table compression enables Less space consumed for tables and indexes Fewer I/O operations during queries Partitions organized into time-variant tablespaces enable READ ONLY tablespaces for older, less-volatile data NoCOUG

The Virtuous Cycle READ ONLY tablespaces for older less-volatile data enables Tiered storage Backup efficiencies Data purging using DROP/TRUNCATE PARTITION enables Faster more efficient data purging than using DELETE statements Bitmap indexes enable Star transformations Star transformations enable Optimal query-execution plan for dimensional data models Bitmap-join indexes Bitmap-join indexes enable Further optimization of star transformations NoCOUG

NoCOUG The Death Spiral ETL using conventional-path INSERT, UPDATE, and DELETE operations Conventional-path operations work well in transaction environments High-volume data loads in bulk are problematic High parallelism causes contention in Shared Pool, Buffer Cache Mixing of queries and loads simultaneously on table and indexes Periodic rebuilds/reorgs of tables if deletions occur Full redo and undo generation for all inserts, updates, and deletes Bitmap indexes and bitmap-join indexes Modifying bitmap indexes is slow, SLOW, SLOW Unavoidable locking issues in during parallel operations

The Death Spiral ETL dominates the workload in the database Queries will consist mainly of dumps or extracts to downstream systems Query performance worsens as tables/indexes grow larger Stats gathering takes longer, smaller samples worsen query performance Contention between queries and ETL become evident Uptime impacted as bitmap indexes must be dropped/rebuilt Backups consume more and more time and resources Entire database must be backed up regularly Data cannot be right-sized to storage options according to IOPS, so storage becomes non-uniform and patchwork, newer less-expensive storage is integrated amongst older high-quality storage, failure points proliferate NoCOUG

NoCOUG Basic 5-step technique The basic technique of bulk-loading new data into a temporary-user scratch table, which is then indexed, analyzed, and finally published using the EXCHANGE PARTITION operation This should be the default load technique for all large tables in a data warehouse Assumptions for this example: A type 2 time-variant composite-partitioned fact table named TXN Range partitioned on DATE column TXN_DATE Hash sub-partitioned on NUMBER column ACCT_KEY 25-Feb data to be loaded into scratch table named TXN_SCRATCH Ultimately data to be published into partition P0225 on TXN

Range-hash composite-partitioned TXN Basic 5-step technique Hash-partitioned TXN_SCRATCH 1. Create ScratchTable 2. Bulk Loads NoCOUG 3. Table & Col Stats 4. Index Creates 22-Feb 23-Feb 24-Feb (empty) 25-Feb 5. Exchange Partition

Range-hash composite-partitioned TXN Basic 5-step technique Exchange Partition Hash-partitioned TXN_SCRATCH 1. Create ScratchTable 2. Bulk Loads 3. Table & Col Stats NoCOUG 4. Index Creates 22-Feb 23-Feb 24-Feb 25-Feb (empty) 5. Exchange Partition

NoCOUG Basic 5-step technique 1. Create scratch table TXN_SCRATCH as a hash-partitioned table 2. Perform parallel, append load of data into TXN_SCRATCH 3. Gather CBO statistics on table TXN_SCRATCH Only table and columns stats 4. Create indexes on TXN_SCRATCH matching local indexes on TXN 5. alter table TXN exchange partition P0225 with table TXN_SCRATCH including indexes without validation update global indexes;

Basic 5-step technique It is a good idea to encapsulate this logic inside PL/SQL packaged- or stored-procedures: SQL> exec exchpart.prepare( TXN, TXN_SCRATCH, 25-FEB- ); SQL> alter session enable parallel dml; SQL> insert /*+ append parallel(n, 16) */ into txn_scratch n 3 select /*+ full(x) parallel(x, 16) */ * 4 from ext_stage x 5 where x.load_date >= 25-FEB- 6 and x.load_date < 26-FEB- ; SQL> commit; SQL> exec exchpart.finish( TXN, TXN_SCRATCH ); DDL for EXCHPART package posted at http://www.evdbt.com/tools.htm#exchpart NoCOUG

NoCOUG The dribble effect In real-life, data loading is often much messier Due to range partition key column not matching load cycles Example: data to be loaded on 25-Feb is ~1,000,000 rows: 950,000 rows for 25-Feb 45,000 rows for 24-Feb 4,000 rows for 23-Feb 700 rows for 22-Feb 200 rows for 21-Feb 90 rows for 20-Feb and a dozen rows left over from 07-Jan

The dribble effect Use EXCHANGE PARTITION technique when >= N rows; otherwise, conventional INSERT NoCOUG for d in (select trunc(txn_dt) dt, count(*) cnt from EXT_STAGE group by trunc(txn_dt)) loop -- if d.cnt >= 100 then -- exchpart.prepare( TXN, TXN_P to_char(d.dt, YYYYMMDD ), d.dt); insert /*+ append parallel(n,16) */ into TXN_P0224 n select /*+ parallel(x,16) */ * from EXT_STAGE x where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1; exchpart.finish( TXN, TXN_P to_char(d.dt, YYYYMMDD )); exchpart.drop_indexes( TXN_P to_char(d.dt, YYYYMMDD )); insert /*+ append parallel(n,16) */ into TXN_P0224 n select /*+ parallel(x,16) */ * from EXT_STAGE x where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1; -- else -- insert into TXN select * from ext_stage where txn_dt >= d.dt and txn_dt < d.dt + 1; -- end if; -- end loop;

Range-hash composite-partitioned TXN 7-step technique Hash-partitioned TXN_P0224 1. Create/reuse ScratchTable 2. Bulk Loads NoCOUG 3. Table & Col Stats 4. Index Creates 22-Feb 23-Feb 24-Feb 25-Feb 24-Feb 5. Exchange Partition

Composite-partitioned table TXN 7-step technique Exchange Partition Hash-partitioned TXN_P0224 1. Create ScratchTable 2. Bulk Loads 3. Table & Col Stats NoCOUG 4. Index Creates 22-Feb 23-Feb 24-Feb 25-Feb (950,000 rows) 5. Exchange Partition

Composite-partitioned table TXN 7-step technique Hash-partitioned TXN_P0224 6. Drop Indexes 7. 2 nd Bulk load NoCOUG 22-Feb 23-Feb 24-Feb 25-Feb 24-Feb

NoCOUG 7 step technique 1. Use existing hash-partitioned scratch table TXN_P0224 From previous load cycle 2. Perform parallel, append load of data into TXN_P0224 3. Gather CBO statistics on table TXN_P0224 Only table and columns stats 4. Create indexes on TXN_P0224 matching local indexes on TXN 5. alter table TXN exchange partition P0224 with table TXN_P0224 including indexes without validation update global indexes; 6. Drop indexes on TXN_P0224 7. Perform parallel, append load of data into TXN_P0224 and

NoCOUG OK, more than 7 steps 8. Need to determine how long to retain timestamped scratch tables EXCHPART.PREPARE procedure can first check if the specified scratch table already exists If not, then creates it from base partition If so, just use what exists Need logic to drop scratch tables after N load cycles

NoCOUG MERGE / Up-sert logic Slowly-changing dimension tables Change often enough to require time-variant image of data Should be loaded similar to fact tables using basic 5-step or advanced 7-step EXCHANGE PARTITION loads Also require current point-in-time image of data MERGE or update-else-insert (a.k.a. up-sert) logic If row exists, then update, else insert

NoCOUG MERGE / Up-sert or So we could either do it this way merge into curr_acct_dim using (select * from acct_dim where eff_dt >= 25-FEB- and eff_dt < 26-FEB- ) when matched then update set... when not matched then insert...;

NoCOUG or EXCHANGE PARTITION 1. Create/reuse hash-partitioned scratch table ACCT_SCRATCH 2. Perform parallel, append load of data into ACCT_SCRATCH Nested in-line SELECT statements doing UNION, ranking, and filtering 3. Gather CBO statistics on table ACCT_SCRATCH 4. Create indexes on ACCT_SCRATCH matching local indexes on CURR_ACCT_DIM 5. alter table CURR_ACCT_DIM exchange partition PDUMMY with table ACCT_SCRATCH including indexes without validation;

Range-hash composite-partitioned table ACCT_DIM (type-2 dimension) Merge / Up-sert Range-hash composite-partitioned table CURR_ACCT_DIM (type-1 dimension) NoCOUG Hash-partitioned table ACCT_SCRATCH 23-Feb 24-Feb 25-Feb Union/filter operation

Merge / Up-sert NoCOUG CURR_ACCT_DIM Range-hash compositepartitioned Range partition key column = PK column Single range partition named PDUMMY B*Tree index on PK (local) Bitmap indexes (local) on attributes Exchange Partition ACCT_SCRATCH Hash partitioned Hash partition key column same as CURR_ACCT_DIM Indexes created to match local indexes on CURR_ACCT_DIM

Merge / Up-sert NoCOUG INSERT /*+ append parallel(t,8) */ INTO ACCT_SCRATCH t SELECT (list of columns) FROM (SELECT (list of columns), ROW_NUMBER() over (PARTITION BY acct_key ORDER BY eff_dt desc) rn FROM (SELECT (list of columns) FROM CURR_ACCT_DIM UNION ALL SELECT (list of columns) FROM ACCT_DIM partition(p0225))) WHERE RN = 1; 1. Inner-most query pulls newly-loaded data from ACCT_DIM, unioned with existing data from type-1 CURR_ACCT_DIM 2. Middle query ranks rows within each ACCT_KEY value, sorted by EFF_DT in descending order 3. Outer-most query selects only the latest row for each ACCT_KEY and passes to INSERT 4. INSERT APPEND (direct-path) and parallel, can compress rows, if desired

Merge / Up-sert NoCOUG Assume that CURR_ACCT_DIM has 15m rows total 1m new rows just loaded into 25-Feb partition of ACCT_DIM 100k (0.1m) rows are new accounts, 900k (0.9m) rows changes to existing accounts Then, what will happen is Inner-most query in SELECT fetches 15m rows from CURR_ACCT_DIM unioned with 1m rows from 25-Feb partition of ACCT_DIM, returning 16m rows in total Middle query in SELECT ranks rows within each ACCT_KEY by EFF_DT in descending order, returning 16m rows Outer-most query in SELECT filters to most-recent row for each ACCT_KEY, returning 15.1m rows Inserts 15.1m rows into ACCT_SCRATCH

NoCOUG Summary 1. During load cycles, load time-variant type-2 tables Either using basic 5-step EXCHANGE PARTITION load technique when load cycles match granularity of range partitions Or using 7-step EXCHANGE PARTITION load technique for dribble effect when load cycles do not match granularity of range partitions 2. then, merge newly-loaded data from time-variant tables into point-in-time type-1 tables Using EXCHANGE PARTITION load technique to accomplish merge / up-sert logic 29

Things To Remember (1) If you are loading large volumes of data the data is to be compressed there are bitmap or bitmap-join indexes Then use EXCHANGE PARTITION load technique NoCOUG

NoCOUG Things To Remember (2) If you are performing a large MERGE/Up-sert or UPDATE (or DELETE!) operation The fastest MERGE or UPDATE is an INSERT Followed by EXCHANGE PARTITION

NoCOUG Tim s contact info: Web: http://www.evdbt.com Email: Tim@EvDBT.com Thank You!