SQL Performance for a Big Data 22 Billion row data warehouse



Similar documents
Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

SQL Considerations for a Big Data 22 Billion row data warehouse By Dave Beulke

DB2 V8 Performance Opportunities

Big Data Disaster Recovery Performance

www. d a v e b e u l k e. com 1

SQL Optimization & Access Paths: What s Old & New Part 1

Top 25+ DB2 SQL Tuning Tips for Developers. Presented by Tony Andrews, Themis Inc.

Data Warehousing With DB2 for z/os... Again!

Session: Archiving DB2 comes to the rescue (twice) Steve Thomas CA Technologies. Tuesday Nov 18th 10:00 Platform: z/os

Best Practices for DB2 on z/os Performance

Java DB2 Developers Performance Best Practices

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Advanced Oracle SQL Tuning

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Welcome to the presentation. Thank you for taking your time for being here.

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

CA Performance Handbook. for DB2 for z/os

Oracle Database 11g: SQL Tuning Workshop Release 2

DBMS / Business Intelligence, SQL Server

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Introduction to Querying & Reporting with SQL Server

DBAs having to manage DB2 on multiple platforms will find this information essential.

Instant SQL Programming

Many DBA s are being required to support multiple DBMS s on multiple platforms. Many IT shops today are running a combination of Oracle and DB2 which

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Course ID#: W 35 Hrs. Course Content

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Querying Microsoft SQL Server 20461C; 5 days

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

Optimizing Your Database Performance the Easy Way

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

LOBs were introduced back with DB2 V6, some 13 years ago. (V6 GA 25 June 1999) Prior to the introduction of LOBs, the max row size was 32K and the

Querying Microsoft SQL Server 2012

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

SQL Performance and Tuning. DB2 Relational Database

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

SAP BO 4.1 COURSE CONTENT

Querying Microsoft SQL Server 2012

Course 10774A: Querying Microsoft SQL Server 2012

Oracle DBA Course Contents

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

DB2 for i5/os: Tuning for Performance

SQL Server Query Tuning

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

MS SQL Performance (Tuning) Best Practices:

SQL Server 2012 Performance White Paper

Oracle Database 12c Enables Quad Graphics to Quickly Migrate from Sybase to Oracle Exadata

Improve SQL Performance with BMC Software

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

Querying Microsoft SQL Server

In-memory Tables Technology overview and solutions

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

SQL Server Query Tuning

Inge Os Sales Consulting Manager Oracle Norway

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

LearnFromGuru Polish your knowledge

HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization. Presented by: Ritika Rahate

Big Fast Data Hadoop acceleration with Flash. June 2013

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

University of Aarhus. Databases IBM Corporation

Indexing Techniques for Data Warehouses Queries. Abstract

How To Improve Performance In A Database

Oracle Database: SQL and PL/SQL Fundamentals NEW

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016

SQL Query Performance Tuning: Tips and Best Practices

Top Ten SQL Performance Tips

EII - ETL - EAI What, Why, and How!

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Oracle - Engineered for Innovation. Thomas Kyte

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

news from Tom Bacon about Monday's lecture

Data Warehousing and Data Mining

Oracle Database 12c: Performance Management and Tuning NEW

David Dye. Extract, Transform, Load

Optimizing Your Data Warehouse Design for Superior Performance

IBM WebSphere DataStage Online training from Yes-M Systems

Saskatoon Business College Corporate Training Centre

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

ETL-EXTRACT, TRANSFORM & LOAD TESTING

SQL DATABASE PROGRAMMING (PL/SQL AND T-SQL)

Data warehousing with PostgreSQL

Oracle Database 10g. Page # The Self-Managing Database. Agenda. Benoit Dageville Oracle Corporation benoit.dageville@oracle.com

SQL Server 2008 Designing, Optimizing, and Maintaining a Database Session 1

Performance Tuning for the Teradata Database

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Data Warehousing on System z BWDB2UG Presentation 9/12/07 v

Oracle InMemory Database

Business Usage Monitoring for Teradata

Oracle Database In-Memory The Next Big Thing

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

1. This lesson introduces the Performance Tuning course objectives and agenda

Transcription:

SQL Performance for a Big Data Billion row data warehouse Dave Beulke dave @ d a v e b e u l k e.com Dave Beulke & Associates Session: F19 Friday May 8, 15 8: 9: Platform: z/os

D a v e @ d a v e b e u l k e. c o m Member of the inaugural IBM DB Information Champions One of 45 IBM DB Gold Consultant Worldwide President of DAMA-NCR, Past President of International DB Users Group - IDUG Best speaker at CMG conference & former TDWI instructor Former Co-Author of certification tests DB DBA Certification tests IBM Business Intelligence certification test Former Columnist for IBM Data Management Magazine Weekly Performance Tips: www.davebeulke.com Consulting CPU Demand Reduction Guaranteed! DB Performance Review DW & Database Design Review Security Audit & Compliance DB 11 Migration Assistance DB 1 Performance IBM White Paper & Redbook Teaching Educational Seminars DB Version 11 Transition 3 DB Performance for Java Developers Data Warehousing Designs for Performance How to Do a Performance Review Data Studio and purequery Extensive experience in Big Data systems, DW design and performance Working with DB on z/os since V1. Working with DB on LUW since OS/ Extended Edition Designed/implemented first data warehouse in 1988 for E.F. Hutton Working with Java for Syspedia since 1 Syspedia - Find, understand and integrate your data faster! 3

Big Data Era bends the SQL Rules AGENDA The bigger the database the more attention needed to: Standard SQL Rules (3+) Database Design Database Conditions Database Indexes DB SQL being used 99

Standard SQL Tips 1. Only SELECT the columns needed because it optimizes the I/O between DB and your program.. Only SELECT rows needed for the process or transaction and use FETCH FIRST x ROWS ONLY whenever possible. 3. Code WHERE clauses with indexed columns that have unique or good indexes. 4. Reference only tables that are absolutely necessary for getting the data. 5. Code as many WHERE column clauses as possible because the DB engine filters data faster than any application code. 5

Standard SQL Tips 6. Prioritize the WHERE column clauses to maximize their effectiveness. First code the WHERE column clauses that reference indexed keys, than the WHERE column clauses that limit the most data, than WHERE clauses on all columns that can filter the data further. 7. Code SQL JOINs instead of singular SQL access whenever possible. A single SQL JOIN is always faster than two SQL statements within an application program comparing and filtering the result set data. 8. Code SQL tables JOINs when there is a common indexed column or columns shared by both or all of the JOINed tables. If a common column is not indexed talk to a DBA and make a new index if possible. 9. When coding JOINs make sure to code the most restrictive JOIN table first and provide as many indexed and restrictive WHERE clauses as possible to limit the amount of data that needs to be JOINed to the second or subsequent tables. 1.Use DB OLAP DB functions to optimize your DB SQL answer result sets. Using these SQL DB OLAP functions is usually much faster than application code summing, totaling and calculations over your data. 6

Standard SQL Tips 11. Leverage DB functions appropriately. Analyze the SQL to determine the need for the various DB functions. Report writers generate SQL with multiple, duplicate or unnecessary DB functions. AVG, SUM etc.. 1. Examine the data columns and tables SELECTed in the SQL. Many of these new big data report writers have nice GUI interfaces that allow the users to point and click on anything. Dup of #1? 13. Verify the DB table joins for improving your DB SQL performance. evaluate table indexes uniqueness, cardinality, relationship to the table partitioning or column frequencies. 14. Eliminate data translations. Sometime the report writer generated SQL uses many CAST, SUBSTR, CASE statements, constants, or formulas in the SQL. The generated SQL may also have math formulas compared to DB table columns within the SQL. Check the ones that are not index-able and clean them for performance. 15. Understand the ORDER of JOIN and Access. By making sure the JOIN of the smaller table drives the data SELECTed from the big data DB table, your DB SQL should eliminate/minimize as much data as possible in the first JOIN. Also minimize SQL such as ORDER BY, GROUP BY, DISTINCT, UNION, or JOIN predicates that do not leverage the clustering order of the table, because they will require a DB sort. 7

Standard SQL Tips 16.Understand all SQL phrases that results in a SORT. DB SQL workloads Sorts are one of the most expensive operations especially with Big Data. i.e. - Distinct, Order by, UNION etc. 17.Verify Scalar function is index-able. In DB 11 many scalar function became index-able not all of them like DATE(timestamp-column). 18.Remove all math from the WHERE Clause if possible. Math in WHERE clauses sometime disrupts the data type comparisons and precision. Precalculate math before SQL statement and always verify index-ability of any that are questionable. 19.Understand the number of trips to the database. Use multi-row SELECT and INSERT operations whenever possible to access or to the add the database..minimize and optimize any WHERE OR clauses. Make sure that the OR does not make the entire WHERE clause become a table scan. Optimize all OR phases and turn them into a correlated sub-query if possible. 8

Standard SQL Tips 1.SELECT from INSERT, UPDATE whenever possible. Avoids multiple trips to the database and is great for identity, sequence and any other database column keys.use MERGE to reference the database and make sure the data is there and updated. Avoid complex program logic to perform the data checking and then update. 3.Create indexes on all WHERE clause expressions that are not already indexable. Companies have unique situations and data types. Create Indexes on Expressions that are used in your daily SQL processing. 4.Use Common Table Expression (CTEs) whenever possible because they can get the data so that the result set can be used repeatedly and efficiently. 5.Use DB TEMP tables for all data that is referenced more than three times in the flow of daily business. It is more efficient to get the TEMP result set working instead of repeated redundant data access. Remember they can also be set up as NOT LOGGED. 9

Standard SQL Tips 6.Use DB Virtual Indexes to model your potential Indexes Model INCLUDE Column, Index on Expression, Uniqueness and other new index definition possibilities. 7.Use Optimizer Filter Factor Hints to give full information. JOIN Skew/Correlation, accurate statistics, complex predicates and host variable and special register information. 8.Analyze the SYSSTATFEEDBACK table contents to understand your SQL Access Path results better. Can be used to generate better RUNSTATS input. 9.Leverage EXCLUDE NULL when creating indexes to reduce index size. Verify that the NULL-ability is not within the application WHERE clauses. 3.Overriding predicate selectivity with the DSN_PREDICAT_% tables. Populate these tables with the details of the selectivity of the tables uniqueness to improve optimizer knowledge for better access paths and improved performance. 1

Big Data Database Design Big data physical objects I/O bound operations Partition the database tables to minimize the data required for daily SQL Use the thousands of partitions available for a design How many parallel processes are your applications running today? Separate old data from new data Current Year, Quarter, Week, Day Temporal tables with the HISTORY tables Complicates the SQL also can make a lot of data quickly Materialized Query Table YTD sales figures Or composite tables to separate via TIME axis Year, Quarter, Week, Day Or composite tables to separate via Sales territory axis Country, Region, State, City, Zip code 11 C L A I M S _ 1 5 C L A I M S _ 1 4 C L A I M S _ 1 4 C L A I M S _ 1 C L A I M S _ 1 1 C L A I M S _ 1 C L A I M S _ 9 C L A I M S _ 8 C L A I M S _ 7 C L A I M S _ 6

Database Design for parallelism Partitioning to leverage??? more concurrent processes Same partitioning limit keys across multiple table spaces Via Customer number across those related tables Via Product SKU number across all the product related data ziip ziip ziip ziip G G G G G G ziip ziip Partitioning design leverages time properly The active partitions are only a segment of the entire table G G G Concentrates the I/O into the right sized portion of the database Current history available - Ancient history is not in database/archived easily ziip Indexes (NPIs or DPSIs) are appropriately designed Partitioned for parallelism and recovery Table clustered for SQL efficiencies 1

Big Data Flow Design encourages concurrent loads and queries Either through bulk, trickle or custom loads Automatically aggregates data within hierarchy Massively parallel processing of all workload aspects Leverage daily processing aggregates Fact-Daily Fact-Week Fact-Month Fact-1Q Fact-Yearly MQT MQT MQT MQT MQT 13

Materialized Query Table-MQT Example Daily sales figures feed MQTs Create additive MQTs MQTs created for all analysis comparison points Daily Sales Weekly Totals Monthly Totals Quarterly Totals View Y-T-D Totals MQTs can be built from other MQTs Define Quarterly Sales from Monthly Sales Combine MQTs through Views Views over region, territory, store id etc OLTP Detail Trans from today 14

MQT 1 to 1 times improvement! 5B rows per year 1 per 4k page= ½B pages MQT aggregates save large amounts of everything Create aggregates for every possibility On Demand information Sales by department Sales by zip code Sales by time period day/week/month/quarter/ap All reporting and analysis areas Fact-Yearly MQT Trace usage to create/eliminate aggregates Total by month ½B I/Os versus 1 I/Os Y-T-D View Fact-1Q MQT Fact-Month MQT Fact-Week MQT Fact-Daily MQT 15

Database Conditions RUNSTATs Sample your database 5% Table Page size Optimization Do you have the correct DB Page Size? Column Cardinality Important Columns Index columns JOIN columns All columns used within any WHERE clauses Index Cardinality Make sure to get detailed statistics INCLUDE Columns 16

Database Conditions How long does it take until your Index goes bad? You can collect distribution statistics when you collect inline statistics when you run the following utilities: LOAD REORG TABLESPACE You can collect histogram statistics when you collect inline statistics when you run the following utilities: LOAD REBUILD INDEX REORG INDEX REORG TABLESPACE 17

Database Conditions Identify your VOLATILE TABLEs Targeted for SAP Customers and the SAP Cluster Tables Favors using Index access whenever possible Avoids list prefetch Can be a problem for OR predicates or UPDATEs at risk of loop Optimize Page level options PAGE ROW Ratio needs to be optimized for storage and Update frequency PCTFREE FOR UPDATE in DB 11 Help keep clustered longer, reduces the use of overflow records and indirect references 18

Statement Level Hints Steps 1. Zparm required - Enable OPTHINTS. Populate DSN_USERQUERY_TABLE with SQL text From SYSPACKSTMT or Statement Cache 3. Populate PLAN_TABLE with corresponding hints Make sure to match QUERYNO in both tables - DSN_USERQUERY_TABLE & PLAN_TABLE 4. BIND QUERY to put Hint into the repository Next dynamic prepare should use the Hint FREE QUERY to remove it 19

Database Design Use correct database column definitions So all the DB Online Analytic Processing-OLAP functions can be used Understand the new Stage 1 predicates processed value BETWEEN col1 AND col value BETWEEN column-expression AND column-expression SUBSTR(COLX, 1, n) = value **NOTE only from first position** DATE(timestamp-column) = value YEAR(date-column) = value CASE WHEN THEN ELSE END = value

Steps to resolve SQL problems 1. Start with all EXPLAIN tables information. Research the database objects, Tables, Indexes 3. Research and understand the statistics a. Understand where the problems are: CPU, I/O, #-Sorts, #-tables, wrong indexes 4. Using all WHERE clauses possible most restrictive first 5. How many times accessing the same table 6. Change the SQL table order? Sorts 7. Filter Factors, uneven data distribution, range predicates a. Are Stats current, RTS Usage & Can you change the objects statistics to improve? 8. Can you use Hints to improve the optimizer access path choice 9. Buy a IBM DB Analytic Accelerator - IDAA 1

Thank You! Dave@ d a v e b e u l k e. c o m Session: F19 Title: SQL Performance for a Big Data Billion row data warehouse Please fill out your session evaluation before leaving!