SQL Optimization & Access Paths: What s Old & New Part 1



Similar documents
Best Practices for DB2 on z/os Performance

SQL Performance for a Big Data 22 Billion row data warehouse

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

CA Performance Handbook. for DB2 for z/os

SQL Performance and Tuning. DB2 Relational Database

Top 25+ DB2 SQL Tuning Tips for Developers. Presented by Tony Andrews, Themis Inc.

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

DB2 V8 Performance Opportunities

MS SQL Performance (Tuning) Best Practices:

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

DBAs having to manage DB2 on multiple platforms will find this information essential.

Many DBA s are being required to support multiple DBMS s on multiple platforms. Many IT shops today are running a combination of Oracle and DB2 which

Advanced Oracle SQL Tuning

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Oracle Database 11g: SQL Tuning Workshop

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

DBArtisan 8.5 Evaluation Guide. Published: October 2, 2007

The Top 10 Things DBAs Should Know About Toad for IBM DB2

Oracle Database 11g: SQL Tuning Workshop Release 2

EMC Smarts Network Configuration Manager

Experiment 5.1 How to measure performance of database applications?

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

REP200 Using Query Manager to Create Ad Hoc Queries

SQL Server Query Tuning

Welcome to the presentation. Thank you for taking your time for being here.

Unit Storage Structures 1. Storage Structures. Unit 4.3

HelpSystems Web Server User Guide

What are the top new features of DB2 10?

DB2 - DATABASE SECURITY

Oracle Database 10g Express

ERserver. iseries. Work management

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

D B M G Data Base and Data Mining Group of Politecnico di Torino

Developing Rich Web Applications with Oracle ADF and Oracle WebCenter Portal

Customer evaluation guide Toad for Oracle v12 Database administration

DB2 Developers Guide to Optimum SQL Performance

Using Database Performance Warehouse to Monitor Microsoft SQL Server Report Content

Getting Started with Tuning SQL Statements in IBM Data Studio and IBM Data Studio (stand-alone), Version 2.2.1

Using the Query Analyzer

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

CA Log Analyzer for DB2 for z/os

Optimizing Your Database Performance the Easy Way

Lab 2: MS ACCESS Tables

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

4 Simple Database Features

Physical Data Organization

HansaWorld SQL Training Material

PORTAL ADMINISTRATION

The Welcome screen displays each time you log on to PaymentNet; it serves as your starting point or home screen.

SQL Server An Overview

Top Ten SQL Performance Tips

Toad for Oracle 8.6 SQL Tuning

University of Aarhus. Databases IBM Corporation

LOBs were introduced back with DB2 V6, some 13 years ago. (V6 GA 25 June 1999) Prior to the introduction of LOBs, the max row size was 32K and the

An Oracle White Paper February, Oracle Database In-Memory Advisor Best Practices

Microsoft Query, the helper application included with Microsoft Office, allows

Ankush Cluster Manager - Hadoop2 Technology User Guide

Top 25+ DB2 SQL. Developers

ICE for Eclipse. Release 9.0.1

Abstract. For notes detailing the changes in each release, see the MySQL for Excel Release Notes. For legal information, see the Legal Notices.

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

PLATINUM PLAN ANALYZER EXPLAIN SERVICES

Finance Reporting. Millennium FAST. User Guide Version 4.0. Memorial University of Newfoundland. September 2013

MS Access Lab 2. Topic: Tables

MyOra 4.5. User Guide. SQL Tool for Oracle. Kris Murthy

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Business Intelligence Tutorial

DB2 for i5/os: Tuning for Performance

System Monitor Guide and Reference

Security Development Tool for Microsoft Dynamics AX 2012 WHITEPAPER

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Database 12c: Performance Management and Tuning NEW

PERFORMANCE TUNING IN MICROSOFT SQL SERVER DBMS

Ankush Cluster Manager - Cassandra Technology User Guide

The Complete Performance Solution for Microsoft SQL Server

Optimizing Performance. Training Division New Delhi

Oracle Database 10g: Introduction to SQL

Sample- for evaluation only. Introductory Access. TeachUcomp, Inc. A Presentation of TeachUcomp Incorporated. Copyright TeachUcomp, Inc.

Epicor ERP Performance Diagnostic and Troubleshooting Guide

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Tutorial 3 Maintaining and Querying a Database

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced

Database Design Patterns. Winter Lecture 24

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

CA Nimsoft Monitor. Probe Guide for Active Directory Response. ad_response v1.6 series

Unit 3. Retrieving Data from Multiple Tables

How To Improve Performance In A Database

Query. Training and Participation Guide Financials 9.2

Session: Archiving DB2 comes to the rescue (twice) Steve Thomas CA Technologies. Tuesday Nov 18th 10:00 Platform: z/os

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

WebSphere Business Monitor

How To Tune A Database 2

Transcription:

SQL Optimization & Access Paths: What s Old & New Part 1 David Simpson Themis Inc. dsimpson@themisinc.com 2008 Themis, Inc. All rights reserved. David Simpson is currently a Senior Technical Advisor at Themis Inc. He teaches courses on SQL, Application Programming, DB2 Administration as well as performance and tuning. He has supported transactional systems that use DB2 for z/os databases in excess of 10 terabytes. David has worked with DB2 for 14 years as an application programmer, DBA and technical instructor. David is a certified DB2 DBA on both z/os and LUW. David was voted Best User Speaker and Best Overall Speaker at IDUG North America 2006. He was also voted Best User Speaker at IDUG Europe 2006. 1

Disclaimer The content of this presentation reflects my personal experience and is a product of the systems and applications I have worked with. Your results may vary. My results may or may not be typical. In other words.it DEPENDS! Themis makes no representation, warranties or guarantees whatsoever in relationship to the information contained in this presentation. This presentation is provided solely to share information with the audience relative to the subject matter contained in the presentation and is not intended by the presenter or Themis to be relied upon by the audience of this presentation. 2

Goals of Optimization Improve overall application performance by: Reducing CPU used by DB2 Reducing I/O done by DB2 Reducing contention Optimization is essentially about 3 things: Reducing CPU used by DB2 Reducing I/O done by DB2 Reducing contention Accomplishing any of the above will likely improve the performance of any application. 3

Reducing I/O in DB2 Appropriate use of indexes System tuning Bufferpool tuning Early elimination of data from consideration (i.e. get a good access path) Reducing I/O within DB2 is one of the easiest ways to improve the performance of SQL within an application. Creating appropriate indexes on columns or groups of columns that are commonly used to identify needed data can significantly reduce the I/O and CPU needed to retrieve a result. It is also important that they DB2 system itself be configured optimally for the workload that must be supported. One important component of system tuning is the bufferpools. Bufferpools exist to reduce the amount of I/O needed by applications. Bufferpools should be configured to allow for as much reuse of cached data as possible. Objects may be grouped by sequential and random access patterns and the settings of the pools adjusted accordingly. In some cases a system tuning effort can reap significant rewards. No amount of system tuning, however, can recover the resources wasted by a poor database design or poor access paths generated by the DB2 optimizer. This course focuses on optimization and tuning at the SQL level. In general, we want the optimizer to generate an access path that eliminates as much data from consideration as early as possible in the process. 4

Determining What to Tune Trace Data Accounting and Statistics Reports Online Monitor Top 10 Lists Critical Path Queries May only execute once per day, but runs 2 hours Ad Hoc Queries Determining which SQL statements should be targets of optimization is always a challenge. Most monitoring tools have reports that will generate a top n list of statements and their cost for a time period. This can be done either with the monitor s own historical data store, or by summarizing trace data from SMF or GTF. Some queries require tuning even if they don t run very often. Batch SQL in the critical path may warrant a tuning effort even if it only executes once a day. Ad Hoc queries as part of a reporting system or data warehouse may also need to be optimized as they enter the system. 5

The DB2 Optimizer Catalog Statistics Object Definitions Access Path Access Path Hint Through the Data Manipulation Language (DML) the user of a DB2 database supplies the WHAT ; that is, the data that is needed from the database to satisfy the business requirements. DB2 then uses the information in the DB2 Catalog to resolve WHERE the data resides. The DB2 Optimizer is then responsible for determining the all important HOW to access the data most efficiently. Ideally, the user of a relational database is not concerned with how the system accesses data. This is probably true for an end user of DB2, who writes SQL queries quickly for one-time or occasional use. It is less true for developers who write application pro-grams and transactions, some of which will be executed thou-sands of times a day. For these cases, some attention to DB2 access methods can significantly improve performance. DB2 s access paths can be influenced in four ways: By rewriting a query in a more efficient form. By creating, altering, or dropping indexes. By updating the catalog statistics that DB2 uses to estimate access costs. By utilizing Optimizer Hints. 6

Explain EXPLAIN PLAN SET QUERYNO = 10 FOR SELECT LASTNAME,SALARY FROM EMP WHERE EMPNO BETWEEN '000000' AND '099999' AND SALARY < 40000 OR BIND PACKAGE with option EXPLAIN(YES) Optimizer PLAN_TABLE DSN_STATEMNT_TABLE DSN_FUNCTION_TABLE & a bunch of super-secret hidden tables The process of asking the DB2 optimizer to describe an access path that was chosen (or will be chosen) for a query is called an explain. When we run an explain, the output is placed in DB2 tables that we may then view. A PLAN_TABLE is a regular DB2 table that holds results of an EXPLAIN. IBM s SQL Reference Guide contains a format for the PLAN TABLE and a description of all the columns. Each user running an explain needs access to a plan table either by owning one directly or through a secondary authid. In DB2 Version 8, aliases may also be used to allow users to share a single set of explain tables. Although the plan table is required to run explains, there are also several other explain tables which may optionally be created to hold explain data. These extra tables will be populated during an explain if they exist. The DSN_STATEMNT_TABLE contains information about the total perceived cost of the query being explained. This cost data may be compared for several iterations of refinement for a query to see if it might improve the performance. Costs may also be tracked over time. Additionally, there are many more hidden explain tables that will be populated if they exist. IBM has chosen not to document the contents of these tables, but they are used by Visual Explain in DB2 Version 8 to display much more detailed analysis of a query. 7

Retrieving Rows From a Plan Table SELECT * FROM MY.PLAN_TABLE WHERE PROGNAME = 'PACK1' AND COLLID = 'COLL1' AND VERSION = 'PROD1' ORDER BY TIMESTAMP, QUERYNO, QBLOCKNO, PLANNO, MIXOPSEQ; Several processes can insert rows into the same plan table. To understand access paths, you must retrieve the rows for a particular query in an appropriate order. Retrieving Rows for a Plan The rows for a particular plan are identified by the value of APPLNAME. The PLAN_TABLE query on the preceding page returns the rows for all the explainable statements in a plan in their logical order. The Result of the ORDER BY Clause Illustrates It is important to arrange the rows selected from the PLAN_TABLE in order to view the access path in their logical sequence. The result of the ORDER BY clause shows whether there are: Multiple QBLOCKNOs within a QUERYNO Multiple PLANNOs within a QBLOCKNO Multiple MIXOPSEQs within a PLANNO All rows with the same non-zero value for QBLOCKNO and the same value for QUERYNO relate to a step within the query. QBLOCKNOs are not necessarily executed in the order shown in PLAN_TABLE. But within a QBLOCKNO, the PLANNO column gives the substeps in the order they execute. For each substep, the TNAME column identifies the table accessed. Sorts can be shown as part of a table access or as a separate step. 8

Visual Explain for Version 8 Download from http://www-306.ibm.com/software/data/db2/zos/osc/ve Requires DB2 Connect access either via Personal Edition or the DB2 client and a DB2 Connect EE gateway. VE Version 8 will work with DB2 Version 7 with loss of some functionality. Requires all external and hidden explain tables be created. VE will prompt you to create them if they don t exist for your SQLID. DBA authority will be needed to do this. Visual Explain is a free tool provided by IBM to help with optimization and SQL tuning. Visual Explain was first introduced in Version 7. The original tool essentially displayed the data from the plan table in a graphical format with some additional data provided from the DSN_STATEMNT_TABLE and the DSN_FUNCTION_TABLE. Visual Explain is a windows based tool that requires DB2 Connect (either directly or through a gateway) to connect to DB2 on z/os. In Version 8, Visual Explain has been re-written and enhanced to provide additional information that cannot be obtained by looking at the plan table alone. Visual Explain uses additional explain tables that provide enhanced data for optimization. These additional tables are not documented, but may be viewed after an explain is done. These additional tables are populated when present even if the explain is not performed with Visual Explain. All 12 tables are required for Visual Explain to perform an explain. The user must have the tables created under their own ID or be part of a secondary authid that has the tables available. 9

Starting Visual Explain After launching Visual Explain the List Databases window should come up with a list of all datasources configured for your client. Select the one you wish to use and click the Connect button. Log on with your userid and password. Once connected you may use any of the functionality of Visual Explain against the subsystem. Most commonly used is the Tune SQL button which is on the button bar or may be accessed through the Tools menu option. 10

Tuning SQL with Visual Explain Connected Open SQL Tuning Window 11

Enter SQL Statement SQL Explain Tables to Use Creator for Unqualified Tables The SQL Tuning window provides a place to enter an SQL statement. The SQLID should be set to the owner of the explain tables to be used. This must either be the userid or a secondary authid to which the user is connected. The schema box may be edited to provide a qualifier to be used for any references in the SQL statement to unqualified tables. The Current Degree dropdown box specifies whether parallelism should be considered when optimizing the statement. A value of 1 means no parallelism will be used. A value of ANY means that parallelism will be considered. If System Default is specified then the system s default parallel mode will be used. The Execute button will actually run the query and present results in a grid window. The Analyze button will provide statistics recommendations for the query. The Explain button will explain the query and present the access path graph as a result. 12

Enter SQL Statement Parallelism? Explain the Query Run the Query Get Stats Recommendation 13

Visual Explain Access Path Graph The access path graph on the facing page show the explain data for the query. Access path graphs are read left to right, bottom to top. Each of the nodes on the graph represent a source of data or an operation on data as it moves towards the result set. Each node may be clicked to provide details about that node on the left side of the screen. 14

Visual Explain Access Path Graph Sources of Data In this graph, there are two nodes that indicate they are a source of data. Data is accessed by doing an index scan of the XEMP03 index. Once appropriate rows are identified then the data in the EMP table is retrieved using the identifiers from the index. If one of these nodes is highlighted by clicking on it the catalog data about the object is displayed on the left side of the screen. Statistical information is displayed as well as the timestamp when statistics were last gathered. By navigating the tree at the top left of the screen, information may be viewed about the table, tablespace and any other indexes that exist on the referenced table. 15

Visual Explain Access Path Graph 16

Visual Explain Access Path Graph The other nodes on the diagram represent operations on the data. If one of these nodes is selected the left side of the screen will show the metrics that DB2 used in determining that this was the appropriate access method. Predicate level data is shown as well as row estimates for how many rows will be passed to the next operation. These estimates may then be compared to reality to determine if the optimizer made a good choice. 17

Tablespace Scan SELECT... FROM EMP WHERE HIREDATE >? ; PLAN NO METHOD TNAME ACCESS TYPE MATCH COLS ACCESS NAME INDEX ONLY PREFETCH 1 0 EMP R 0 N S Visual Explain Plan Table Tablespace scans are illustrated through EXPLAIN by ACCESSTYPE = R and PREFETCH = S. The query above illustrates a tablespace scan. The query has a predicate, however there are no matching indexes on the HIREDATE column. When Tablespace Scans are Appropriate Tablespace scan access is selected by DB2 typically when; A matching index scan is not possible because there are no indexes or there are no predicates to that match the index columns. A high percentage of the rows in the table qualify. The indexes that have matching predicates have low cluster ratios, making them efficient only when a small number of rows qualify. The Visual Explain access path graph for a tablespace scan is also shown. Notice that Visual Explain gives estimates of how many rows will remain at each level. 18

Tablespace Scan Detail Visual Explain Limited Part Scan Prefetch If Visual Explain is used more detail is provided by clicking on the TBSCAN node in the graph. The left side of the screen will show that sequential prefetch has been selected as well as a limited partition scan. The details also show which partitions will be scanned. Estimates of how many rows will actually qualify for the predicates involved are supplied along with the filter factors used by DB2 in making these estimates. Filter factors and their importance in the optimization process will be discussed in detail in an upcoming chapter. 19

Index Structures Level 2 Page A Page B Root Page Highest Key of Page A Highest Key of Page B Level 1 Nonleaf Page A Page 1 Highest Key of Page 1 ---------------- Nonleaf Page B ---------------- ---------------- Page X Highest Key of Page X Page Z Highest Key of Page Z Level 0 Leaf Page 1 --------- - - - -- - - - - Key Record-ID Leaf Page X --------- - - - -- - - - - Key Record-ID Leaf Page Z --------- - - - -- - - - - Key Record-ID Table Row Row Row Indexes can have multiple levels of pages. Index pages that point directly to the table data are called leaf pages. If an index has more than one leaf page, it will have at least one nonleaf page, containing the entries that point to leaf pages. If an index has more than one nonleaf page, the nonleaf pages that point to the leaf pages are referred to as level 1. A second level of nonleaf pages must point to level 1, and so on. The highest level contains a single page, called the root page. This page is created by DB2 when the index in initially built. This index tree then points directly to the table data through the key and the RID (record identifier). Typically, the larger the key data component of the index, the more levels there will be in the index tree. This is due to the page structure. There are a fixed number of bytes of data that can be stored on any given page. Typically, the more levels to an index, the less likely DB2 will use the index for matching index access. Although indexes provide may performance advantages, such as direct access to data, avoiding sorts, enforcing uniqueness, clustering, speeding RI checks and assisting in joins. It is important to remember some of costs associated with indexes: A row insert requires an insert to every index on that table. A row delete requires a delete to every index on that table. An update of an indexed column requires a delete and an insert on indexes referencing that column. When tables are reorganized or loaded, each index on the table must be rebuilt. 20

Index Structures Leaf Pages Smith, Abby, A 235/3 Smith, Bubba, Z 5/57 Smith, David, C 432/9 Smith, Ed, B 37/16 Smith, Joe, A 83/4 Smith, Mary, N 235/5 Smith, Nancy, Z 985/9 Smith, Nate, C 21/3 Smith, Olivia, B 39/42 Smith, Traci, A 875/8 Index on (LASTNAME, FIRSTNME, MIDINIT) Record ID (RID) Physical Location of the row Although indexes provide may performance advantages, such as direct access to data, avoiding sorts, enforcing uniqueness, clustering, speeding RI checks and assisting in joins. It is important to remember some of costs associated with indexes: A row insert requires an insert to every index on that table. A row delete requires a delete to every index on that table. An update of an indexed column requires a delete and an insert on indexes referencing that column. When tables are reorganized or loaded, each index on the table must be rebuilt. 21

Index Scan - Matching SELECT * FROM EMP WHERE LASTNAME = Coldsmith AND FIRSTNME = Nichelle ; 1) Root page is read to determine corresponding non-leaf page 2) Non-leaf page is read to determine corresponding leaf page 3) Leaf page(s) are read to determine RID of corresponding data row(s) Root Page Index XEMP03 = Non-Leaf Pages LASTNAME, FIRSTNME, MIDINIT Leaf Pages 4) Data Pages are returned Data Pages Index Structure is utilized by reading some Index Pages and their corresponding Data Pages. A matching index scan means there are column(s) specified in the predicate(s) that match the leading column(s) specified in the index. These predicates provide filtering capabilities. The higher the degree of filtering, the more efficient the matching index access becomes. The general rules for determining the number of matching columns are fairly straightforward, although there are a few exceptions. The index columns are examined from leading to trailing. For each index column, DB2 will search for an indexable predicate on that column. If this predicate exists, then it can be used as a matching predicate. If no matching predicate is found for a column, the search for matching predicates stops. If a matching predicate is a range predicate, there can be no more matching columns. The example above illustrates a composite index on LASTNAME, FIRSTNME AND MIDINIT. Because the first column of the index is referenced in an indexable predicate, DB2 is able (if it chooses) to use the index in a matching mode. Also, the existence of FIRSTNME in an indexable predicate enables DB2 to use two columns of the index for matching. 22

Index Scan - Matching SELECT * FROM EMP WHERE LASTNAME = Coldsmith AND FIRSTNME = Nichelle ; PLAN_TABLE PLAN NO METHOD TNAME ACCESS TYPE MATCH COLS ACCESS NAME INDEX ONLY PREFETCH 1 0 EMP I 2 XEMP03 N Matching Index scans are depicted in the PLAN_TABLE by ACCESSTYPE = I, I1, N, or MX and MATCHCOLS > 0. For a Matching Index Scan, DB2 has determined that the query uses predicates that match index columns. In general, the matching predicates on the leading index columns are equal or IN predicates. The predicate that matches the final index column can be an equal, IN, or a range predicate (<, <=, >, >=, LIKE, or BE-TWEEN). The query above illustrates matching index access. Assume the table EMP has an index; XEMP03 on (LASTNAME, FIRSTNME, MIDINIT). The index XEMP03 is the chosen access path for this query, with MATCHCOLS = 2. There are two equal predicates on the first two columns of the index. In Visual Explain the IXSCAN detail shows which predicates were used to match columns along with their filter factors. Row estimates are computed and displayed based on the available statistics for the table and index. The value of MATCHCOLS is used to determine the number of columns DB2 can match to predicates in the query. Typically, index access will be more efficient the greater the number of matching columns. Effort placed on proper index design can have a huge return on investment in terms of the efficiency of DB2 s ability to utilize matching indexes to query predicates. 23

Index Screening SELECT * FROM EMP WHERE LASTNAME =? AND MIDINIT =? INDEX XEMP03 on (LASTNAME, FIRSTNME, MIDINIT) Index Screening Predicate PLAN_TABLE PLAN NO METHOD TNAME ACCESS TYPE MATCH COLS ACCESS NAME INDEX ONLY PREFETCH 1 0 EMP I 1 XEMP03 N Index screening predicates are specified on index key columns, but are not part of the matching columns used to scan the index structure. These screening predicates improve index access by reducing the number of rows that qualify while searching the index. Assume the table EMP has an index; XEMP03 on (LASTNAME, FIRSTNME, MIDINIT); The query above illustrates DB2s ability to use one of the two predicates matching against the index, i.e. with MATCHCOLS = 1. Once DB2 determines that a symbolic key entry matches on the predicate LASTNAME =?, the predicate MIDINIT =? can be applied during the index scan to further qualify rows. This is the process known as index screening. If a row meets the criteria of these screening predicates, the row will be retrieved. Once the data row has been retrieved, predicates for columns not in the index can be applied. When Index Screening is used When there are predicates available to apply against columns in the index to further qualify rows. The PLAN_TABLE does not directly tell when an index is screened. However, if the MATCHCOLS is less than the number of in-dex key columns, this indicates index screening is possible. Visual Explain does flag predicates where index screening is used. 24

Index Screening (cont) 25

Index Scan - Nonmatching SELECT * FROM EMP WHERE FIRSNME =? AND MIDINIT =?; Index XEMP03 = (LASTNAME, FIRSTNME, MIDINIT) Root Page Non-Leaf Pages 1) Leaf Pages are scanned to acquire corresponding RIDs Leaf Pages 2) Data Pages are returned in index order Data Pages Index Leaf Pages are read and their corresponding Data Pages are read A nonmatching index scan means there are no matching columns in the index. Because a nonmatching index scan does not utilize the index structure, it is sometimes referred to as relative positioning. The example above illustrates a composite index on LASTNAME, FIRSTNME & MIDINIT. Because the first column of the index is not referenced in the WHERE clause, DB2 is unable to use the index in a matching mode. However, the existence of FIRSTNME and MIDINIT in the WHERE clause does give DB2 that ability to use index screening. Through a screening process, DB2 can use a nonmatching index scan to pick off the data rows associated with the desired criteria. The physical order in which a table's data pages are stored is important. DB2 uses the CLUSTERRATIOF to determine the effectiveness of the index for such access. DB2 might also choose to scan a nonmatching index, in order to avoid a sort operation or to evaluate a stage 1 predicate. Typically, the CLUSTERRATIOF must be fairly high for this type of access strategy. 26

Index Scan - Nonmatching SELECT * FROM EMP WHERE FIRSNME =? AND MIDINIT =?; PLAN_TABLE PLAN NO METHOD TNAME ACCESS TYPE MATCH COLS ACCESS NAME INDEX ONLY PREFETCH 1 0 EMP I 0 XEMP03 N Nonmatching Index scans are described through EXPLAIN by ACCESSTYPE = I and MATCHCOLS = 0. In Visual Explain it is possible to see which columns are used as screening predicates in a nonmatching index scan. Notice that MATCHCOLS shows up as zero in both the PLAN_TABLE and Visual Explain. When Nonmatching Index Access is used Because there is little or no filtering, a nonmatching index scan is used in only a few special cases. When index screening is provided. In this case not all the data pages are accessed. Only those data pages that DB2 has de-termined qualified based on the screening. When the OPTIMIZE FOR n ROWS clause is used in conjunc-tion with an ORDER BY clause and the index can support the ordering. When there is more than one table in a nonsegmented table-space, the nonmatching index scan can provide access to rows of that table. 27

Index Scan - Nonmatching SELECT * FROM EMP WHERE FIRSNME =? AND MIDINIT =?; Nonmatching Index scans are described through EXPLAIN by ACCESSTYPE = I and MATCHCOLS = 0. In Visual Explain it is possible to see which columns are used as screening predicates in a nonmatching index scan. Notice that MATCHCOLS shows up as zero in both the PLAN_TABLE and Visual Explain. When Nonmatching Index Access is used Because there is little or no filtering, a nonmatching index scan is used in only a few special cases. When index screening is provided. In this case not all the data pages are accessed. Only those data pages that DB2 has determined qualified based on the screening. When the OPTIMIZE FOR n ROWS clause is used in conjunction with an ORDER BY clause and the index can support the ordering. When there is more than one table in a nonsegmented tablespace, the nonmatching index scan can provide access to rows of that table. 28

Index Structures Clustered Index 25 61 Root Page 8 13 33 45 75 86 Non-Leaf Pages Leaf Pages T A B L E S P A C E T A B L E Data Page Data Page Data Page Data Page Row When a table has a clustering index during an INSERT, DB2 will insert the data row as closely as possible to the to order of the index values in the index structure. Because the order of the rows reflect the order of the index, significant performance advantages exist when performing certain operations such as grouping, ordering, and comparisons other than equal. DB2 uses a catalog statistic CLUSTERRATIOF to keep track of how closely the order of the index entries on the index leaf pages match the actual order of the data on the data pages. In general, the closer to 100% the value of CLUSTERRATIOF, the more closely the index entries and data entries are in the same clustered sequence. The index structure above illustrates DB2 s access through a clustering index structure. This illustration depicts a CLUSTERRATIOF of 100%. Note that to access the data in index order, the data pages are read in sequential order. 29

Index Structures NonClustered Index 25 61 Root Page 8 13 33 45 75 86 Non-Leaf Pages Leaf Pages T A B L E S P A C E T A B L E Data Page Data Page Data Page Data Page Row The index structure above illustrates DB2 s access through a nonclustering index structure. This illustration depicts a CLUSTERRATIOF far less than 100%. Note that to access the data in index order, the data pages are read not in sequential order, but in random order, and in many cases a data page must be reread to access data containing the next key value. Nonclustered indexes are typically used by DB2 for random access to data rows. 30

Index Only Access SELECT LASTNAME, FIRSTNME, MIDINIT FROM EMP WHERE LASTNAME LIKE 'JO%' If all the columns needed for a particular table in a query are available in an index, the optimizer may be able to qualify and retrieve the columns from the index without going to the tablespace at all. This is called index only access and may provide a significant performance improvement, particularly when many rows need to be evaluated using a non-clustered index. In this example only 1 column in XEMP03 is being used to qualify rows, but placing the FIRSTNME and MIDINIT columns in the index provides significant benefit to this query. Note that there is no table node for the EMP table in this diagram since only the index is used. 31

Index Scan List Prefetch SELECT.... FROM EMP WHERE DEPTNO = P01 ; INDEX XEMP02 on DEPTNO PLAN NO METHOD TNAME ACCESS TYPE MATCH COLS ACCESS NAME INDEX ONLY PREFETCH 1 0 EMP I 1 XEMP02 N L 1) A list of RIDS for data pages are accessed by a matching index scan Root Page Non-Leaf Pages Leaf Pages 2) Rid list is sorted in ascending sequence by data page number RID SORT 3) Data Pages are prefetched in order of the sorted RID list Data Pages Qualifying RIDs are sorted in ascending order by Data Page number prior to row retrieval List Prefetch reads a set of data pages determined by a list of RIDs taken from a matching scan of one or more indexes. The data pages need not be contiguous. The maximum number of pages that can be read in a single list prefetch is 32. The illustration above depicts List Prefetch during matching index access, with a single index. As with any matching index access the index structure is utilized to find the RIDs that qualify based on the indexable predicate(s). Once the RIDs have been determined at the Leaf Page level, the RIDs are sorted in Data Page sequence. The purpose of the RID sort is to avoid the rereading of data pages because the CLUSTERRATIOF value is very low. 32

Index Scan List Prefetch It should be noted that while the qualifying data pages are not necessarily contiguous, since the RIDs have been sorted in ascending sequence, the prefetch process is able to access qualifying data pages in an efficient and orderly manner. When List Prefetch is used With a single index that has a cluster ratio less than 80%. Sometimes with indexes that have a high cluster ratio, if the estimated amount of data to be accessed is too small to make sequential prefetch efficient, but large enough to require more than one I/O. Always in conjunction with multiple index access. Always in conjunction with the inner table of a hybrid join. When List Prefetch is not used Matching IN-list predicates cannot be used in conjunction with List Prefetch. The OPTIMIZE FOR 1 ROW clause will discourage List Prefetch for access 33

Reference IBM Books SC18-7426 DB2 UDB for OS/390 and z/os SQL Reference V8 SC18-7413 DB2 UDB for OS/390 and z/os Administration Guide V8 SC18-7427 DB2 UDB for OS/390 and z/os Utility Guide and Reference V8 SG24-6079 DB2 UDB for z/os Version 8: Everything You Ever Wanted to Know,... and More Previous IDUG Presentations IDUG North America 2007 More Ways to Challenge the DB2 z/os Optimizer by Terry Purcell of IBM 34