Lock Out Your Locking Problems Part 2. Lennart Henäng Svenska Handelsbanken AB

Similar documents

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Data Propagator. author:mrktheni Page 1/11

Locks and Latches. John Campbell Distinguished Engineer DB2 for z/os Development IBM Corporation

System Monitor Guide and Reference

DB2 11 for z/os Profiles Overview The New ZParms and MORE!

Improve SQL Performance with BMC Software

Guerrilla Warfare? Guerrilla Tactics - Performance Testing MS SQL Server Applications

Everything You Wanted to Know about DB2 Logs, but Were Afraid to Ask. Paul Pendle, Rocket Software Session: 16906

Session: Archiving DB2 comes to the rescue (twice) Steve Thomas CA Technologies. Tuesday Nov 18th 10:00 Platform: z/os

Predictive Analytics And IT Service Management

SAP Performance Review/System Health Check

PostgreSQL Concurrency Issues

5. CHANGING STRUCTURE AND DATA

IBM DB2 for z/os. DB2 Version 9 - Zusammenfassung. (DB2_V9_SUMMARYnews.ppt) Dez, 09 1 (*)

Oracle Database: SQL and PL/SQL Fundamentals NEW

Load Testing and Monitoring Web Applications in a Windows Environment

CowCalf5. for Dummies. Quick Reference. D ate: 3/26 /

Embedded SQL programming

With each new release of SQL Server, Microsoft continues to improve

How to test and debug an ASP.NET application

Controlling Dynamic SQL with DSCC By: Susan Lawson and Dan Luksetich

Response Time Analysis

Using DOTS as Apache Derby System Test

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

IBRIX Fusion 3.1 Release Notes

CA Log Analyzer for DB2 for z/os

KPN SMS mail. Send SMS as fast as !

Response Time Analysis

Firebird. Embedded SQL Guide for RM/Cobol

LOBs were introduced back with DB2 V6, some 13 years ago. (V6 GA 25 June 1999) Prior to the introduction of LOBs, the max row size was 32K and the

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

EVENT LOG MANAGEMENT...

? Index. Introduction. 1 of 38 About the QMS Network Print Monitor for Windows NT

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Information Management for System z. IMS - Information Management System - Transaction Monitor Part -

Security Service tools user IDs and passwords

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Using Database Performance Warehouse to Monitor Microsoft SQL Server Report Content

Key Metrics for DB2 for z/os Subsystem and Application Performance Monitoring (Part 1)

Response Time Analysis

Transactionality and Fault Handling in WebSphere Process Server Web Service Invocations. version Feb 2011

Scan Physical Inventory

Table of Contents. Introduction How to access the Safari Backoffice How Safari corporate accounts are structured...

1 Workflow Design Rules

Installing and Configuring a SQL Server 2014 Multi-Subnet Cluster on Windows Server 2012 R2

Percona Server features for OpenStack and Trove Ops

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

Dynamics NAV/SQL Server Configuration Recommendations

PERFORMANCE TIPS FOR BATCH JOBS

EZ DUPE DVD/CD Duplicator

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

Ingres Interactive Performance Monitor User Guide

Database Replication with MySQL and PostgreSQL

PSM/SAK Event Log Error Codes

Traditional IBM Mainframe Operating Principles

Lecture 7: Concurrency control. Rasmus Pagh

The Diagnostic Evolution for Transaction Analysis: Introducing the Transaction Analysis Workbench

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Performance Monitoring User s Manual

Performance Tuning for the Teradata Database

Oracle 11g Database Administration

Concurrency Control. Module 6, Lectures 1 and 2

C H A P T E R Condition Handling

Choosing a Data Model for Your Database

READPAST & Furious: Transactions, Locking and Isolation

FactoryTalk Gateway Getting Results Guide

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3

Monitoring System Status

Backup and Recovery. What Backup, Recovery, and Disaster Recovery Mean to Your SQL Anywhere Databases

About Me: Brent Ozar. Perfmon and Profiler 101

Datagram. Datagram SyslogAgent manual. Version 3.6

Analyzing IBM i Performance Metrics

SQL Replication Guide and Reference

File by OCR Manual. Updated December 9, 2008

Configuring Apache Derby for Performance and Durability Olav Sandstå

Table of Contents. Chapter 1: Introduction. Chapter 2: Getting Started. Chapter 3: Standard Functionality. Chapter 4: Module Descriptions

Seagate Manager. User Guide. For Use With Your FreeAgent TM Drive. Seagate Manager User Guide for Use With Your FreeAgent Drive 1

Virtuoso Replication and Synchronization Services

Base Conversion written by Cathy Saxton

Kentico CMS 5.5 User s Guide

High Availability Failover Optimization Tuning HA Timers PAN-OS 6.0.0

Managing Agile Projects in TestTrack GUIDE

Kentico CMS 7.0 User s Guide. User s Guide. Kentico CMS

Advantech WebAccess Device Driver Guide. BwSNMP Advantech WebAccess to SNMP Agent (Simple Network Management Protocol) Device Driver Guide

Java DB2 Developers Performance Best Practices

ProSafe Plus Switch Utility

The Complete Performance Solution for Microsoft SQL Server

The first time through running an Ad Hoc query or Stored Procedure, SQL Server will go through each of the following steps.

DNP Points List and Implementation

Mass Announcement Service Operation

Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009

USB GSM 3G modem RMS-U-GSM-3G. Manual (PDF) Version 1.0,

Mind Q Systems Private Limited

SQL Server Replication Guide

Mail 2 ZOS FTPSweeper

Transcription:

1 F11 Lock Out Your Locking Problems Part 2 Lennart Henäng Svenska Handelsbanken AB October 15, 2008 14:45 15:45 Platform: DB2 for z/os In this second part of the two-part presentation we will dive into the information that DB2 for z/os can provide you with in order to get hold of locking conflicts. Various system parameters controlling the behaviour of DB2 with regard to locking are described. Messages and trace records showing deadlocks, timeouts, and long running transactions will be looked at in detail. Our ideas on a home grown poor man's locking conflict monitor will be discussed. We will start with a reference to part 1. We will then continue with a description of the messages and trace records that DB2 can provide for deadlocks, timeouts, and long running transactions respectively. We look at related system parameters that control the locking conflict resolution behaviour as well as the way DB2 produces messages and trace records. The next section discusses a proposal for a poor man's locking conflict monitor. The proposed monitor retrieves trace records by using the Instrumentation Facility Interface and saves the information immediately to a set of DB2 tables. The proposed monitor accepts subscriptions and based on these subscriptions, it can mail relevant locking conflict information to responsible DBAs. The subscription part is built as a set of stored procedures and triggers. The presentation will hilite any differences between DB2 V8 and DB2 9.

2 Thanks Peter Backlund Haakon Roberts Claes Arrenius Olle Nyman 2 I want to thank Peter Backlund of Peter Backlund DB2-Konsult AB who is one of the most dedicated DB2er I know. It is always a fun learning experience to work with Peter. I also want to thank Haakon Roberts of DB2 for z/os Development that has been very helpful in answering my questions about locking. I also want to thank two colleagues at Handelsbanken. Claes Arrenius, the author of an application that scans DB2 messages and put locking conflict information into a DB2 table. Ola Nyman, that has been helpful in testing deadlocks and timeouts with DL/I batch. Finally I want to thank my other colleagues including my manager - at Handelsbanken that has been patient while I ve been working on this presentation.

3 Svenska Handelsbanken Group Universal bank established 1871 The biggest bank in Sweden and the third biggest bank in the Nordic area* Lending to general public: > SEK 1 300 billion Operating profit: SEK 19.4 billion (2007, incl. SPP) Total staff: 10 500 660 branches 461 in Sweden 171 in Great Britain and in the Nordic Countries outside Sweden * Refers to lending to the public 3 Svenska Handelsbanken Group has 660 branches in total. The largest coverage is in Sweden, Great Britain and the other nordic countries. The subsidiaries of Svenska Handelsbanken Group are: Handelsbanken Finans, Stadshypotek, Handelsbanken Liv, and Handelsbanken Fonder

4 What was in part 1? Locking from an application point of view Lock size Lock mode Lock duration row, page, table S (share) or X (exclusive) commit or momentarily Now we will take a look from a technical point of view 4 As a reminder of what we were talking about in part 1, we recap the main DB2 locking terminology from an application standpoint. It had to do with the size of the locks, or rather how much data that is locked by a lock, the mode of a lock, basically read or update, and the duration av a lock for how long ít will be held. In this part, we will look at locking from a more technical point of view to see what locking conflicts we can run into and how DB2 tries to solve such conflicts and how DB2 tell you about their existence.

5 Agenda Defining locking conflicts DB2s behaviour and how to adjust it DB2 Messages and Instrumentation Timeout Deadlock Lock escalation Long runners How to manage locking conflicts 5 The presentation will take us thru examples, messages, and trace records for timeouts, deadlocks, lock escalations as well as long runners. We will also see what kind of controls we have to modify DB2s behaviour in the event of a timeout, deadlock, lock escalation, or long runner. We start off by defining the most common locking conflicts, timeout and deadlock, and see how the application will be informed about them. Second, we delve into the behaviour of DB2 and how to adjust it to our will. Third, we look at how DB2 communicates information about locking conflict events to its administrators by messages and instrumentation (trace records). Finally, we present an idea on how to manage locking conflicts in order to make applications run as smooth and problem free as possible.

6 B-1 Timeout (1/2) TRANSACTION A X RESOURCE t 6 First of all, we have the timeout conflict that occurs if a transaction tries to access a resource that is already locked by another transaction. In this example, TRANSACTION A, at a certain time, has got an exclusive lock on a resource (which can be a table, page, or row). IBMs definition: Abnormal termination of either the DB2 subsystem or of an application because of the unavailability of resources. Installation specifications are set to determine both the amount of time DB2 is to wait for IRLM services after starting, and the amount of time IRLM is to wait if a resource that an application requests is unavailable. If either of these time specifications is exceeded, a timeout is declared.

7 B-1 Timeout (2/2) TRANSACTION A X RESOURCE t S or X timeout TRANSACTION B When the wait-time has been exceeded we get a timeout SQLCode -911, SQLState 40001 Reasoncode 00C9008E 7 While TRANSACTION A, holds on to its exclusive lock, TRANSACTION B comes along and requests a lock on the very same resource. Since any lock request from TRANSACTION B is incompatible with TRANSACTION A s exclusive lock, TRANSACTION B has to wait (be suspended in DB2 talk) to be granted the required lock. That is a basic behaviour of lock management. However, if TRANSACTION B has to wait for too long, DB2 decides to stop waiting and tell the application that it won t get the requested lock. TRANSACTION B will experience a TIMEOUT where DB2 has rolled back the complete transaction. This has to be done no matter if TRANSACTION B has made numerous updates before the timeout or not. TRANSACTION B will be rolled back and the application will receive an SQLCODE - 911, SQLSTATE 40001 together with a reasoncode 00C9008E. So, what is too long to wait for a lock? We ll talk about that a bit later in the presentation, but for now, it is enough to know that it s controlled by the installation. Note! If the ROLLBACK fails, and the application does not abend, the application receives SQLCODE -913, SQLSTATE 57033 together with a reasoncode 00C9008E. In this case the application is in control of the transaction and can choose to commit or rollback.

8 B-2 Timeout (1/2) RESOURCE t S TRANSACTION C 8 This is another example where TRANSACTION C has been granted an S-lock on the resource.

9 B-2 Timeout (2/2) TRANSACTION D X timeout RESOURCE t S TRANSACTION C When the wait-time has been exceeded we get a timeout SQLCode -911, SQLState 40001 Reasoncode 00C9008E 9 TRANSACTION D comes along requesting an X-lock, which is incompatible with the S-lock of TRANSACTION C. Again, TRANSACTION D will be suspended for the installation defined wait-time. Since TRANSACTION C does not release its lock within the wait-time, DB2 decides to stop waiting and tell the application that it won t get the requested lock. TRANSACTION D will experience a TIMEOUT where DB2 has rolled back the complete transaction. Again, the application will receive an SQLCODE -911, SQLSTATE 40001 together with a reasoncode 00C9008E.

10 B-3 Deadlock (1/4) TRANSACTION E X RESOURCE 1 t 10 IBMs definition of a deadlock: Unresolvable contention for the use of a resource, such as a table or an index. This example starts off with TRANSACTION E acquiring an X-lock on RESOURCE 1.

11 B-3 Deadlock (2/4) TRANSACTION E X RESOURCE 1 RESOURCE 2 t X TRANSACTION F 11 TRANSACTION F comes along and acquires an X-lock on another resource, RESOURCE 2. So far, everything is in order; each transaction has control over its own resource.

12 B-3 Deadlock (3/4) TRANSACTION E X S or X RESOURCE 1 RESOURCE 2 t X TRANSACTION F 12 TRANSACTION E is moving along and asks for an S- or X-lock on RESOURCE 2. TRANSACTION E will be suspended, waiting for TRANSACTION F s X-lock to be released. There is still no real problem here apart from that TRANSACTION E could be timed out due to TRANSACTION F holding on to its X-lock for too long (as we ve learnt earlier in the presentation).

13 B-3 Deadlock (4/4) TRANSACTION E X S or X RESOURCE 1 RESOURCE 2 t X S or X => deadlock TRANSACTION F This is a classical example of a deadlock SQLCode -911, SQLState 40001 Reasoncode 00C90088 13 But, TRANSACTION F moves along and asks for an S- or X-lock on RESOURCE 1. And suddenly we have a classical deadlock situation where TRANSACTION E is waiting for TRANSACTION F and TRANSACTION F is waiting for TRANSACTION E. When just waiting for a lock, the wait-time could be set to a higher value to make the waiting transaction wait longer and eventually get the lock it asked for when the blocker releases its lock. But, a deadlock will never be solved by waiting. It s a definitive problem that has to be solved by DB2, as early as possible. The installation can control how often DB2 (or rather IRLM) will check for deadlock situations. When detected, DB2 will choose a deadlock victim that will have its transaction rolled back and then notified by SQLCODE -911 and reasoncode 00C90088. The other transaction is then granted the lock it asked for and can continue its processing. Which transaction will DB2 choose as deadlock victim? Basically it will be the transaction that has written the fewest log records. As we will se later, DB2 assign worth values on the transactions and the one with the lowest worth value will be rolled back. In a situation where one of the transactions has updated a NOT LOGGED tablespace and the other has not, DB2 will always chose the transaction that did NOT update a NOT LOGGED tablespace as the victim! That is regardless of how many log records that have been written by the two transactions respectively. If both has updated a NOT LOGGED tablespace, the one that wrote most log records is the winner.

14 B-4 Deadlock single resource TRANSACTION E S X RESOURCE t S X => deadlock TRANSACTION F Result: Deadlock 14 A deadlock can also occur on a single resource. This slide shows such an example. TRANSACTION E acquires an S-lock on RESOURCE. Short thereafter, TRANSACTION F also acquires an S-lock on RESOURCE. This is perfectly OK, as we learned in the first part of this presentation. Next, TRANSACTION E asks for an X-lock on RESOURCE and has to wait because the X-lock is not compatible with TRANSACTION F s S-lock. Then, TRANSACTION F asks for an X-lock on RESOURCE and has to wait because the X-lock is not compatible with TRANSACTION E s S-lock. So, the two transactions will wait on each other for ever we have a deadlock situation. Again DB2 has to choose a deadlock victim and rollback its transaction to resolve the situation making it possible for the other transaction to continue processing.

15 DB2s behaviour and how to adjust it 15 This section will describe how to influence the behaviour of DB2 in locking conflicts.

B-5 Deadlock & Timeout parms The most important parameters are IRLMRWT 1 3600, default 60 Resource wait time in seconds a multiple of DEADLOK Defined in DSNZPARM Cannot be updated online, DB2 has to be restarted for a change DEADLOK 1 5, 100 5000, default 1 Time (in seconds or milliseconds) between deadlock-checks Defined in startup JCL for IRLM Can be changed dynamically by MODIFY IRLMPROC command Use milliseconds, e.g. 500 (especially in Data Sharing) 16 IRLMRWT: The value that is specified for this option must be a multiple of the DEADLOCK TIME on installation panel DSNTIPJ because IRLM uses its deadlock timer to initiate time-out detection and deadlock detection. This value is rarely the actual time. For data sharing, the actual timeout period is longer than the time-out value. The IRLMRWT parameter CANNOT be changed dynamically by using the SET PARM= command. DB2 calculates the timeout period as follows: 1. Divide RESOURCE TIMEOUT by DEADLOCK TIME 2. Round to the next largest integer 3. Multiply that integer by DEADLOCK TIME In non-data-sharing systems, the actual time that a transaction waits on a lock before timing out varies between the timeout period and the timeout period plus one DEADLOCK TIME interval. MIN LOCAL TIMEOUT = timeout period MAX LOCAL TIMEOUT = timeout period + DEADLOCK TIME value AVERAGE LOCAL TIMEOUT = timeout period + DEADLOCK TIME value/2 For example, if the timeout period for a given transaction is 60 seconds and the DEADLOCK TIME value is 5 seconds, the transaction waits between 60 and 65 seconds before timing out, with the average wait time of 62.5 seconds. This is because timeout is driven by the deadlock detection process, which is activated on a timer interval basis. In a data sharing environment, because the deadlock detection process sends inter-system XCF messages, a given transaction typically waits somewhat longer before timing out than in a non-data-sharing environment. MIN GLOBAL TIMEOUT = timeout period + DEADLOCK TIME value MAX GLOBAL TIMEOUT = timeout period + 4 * DEADLOCK TIME value AVERAGE GLOBAL TIMEOUT = timeout period + 2 * DEADLOCK TIME value For example, if the timeout period for a given transaction is 60 seconds and the DEADLOCK TIME value is 5 seconds, the transaction waits between 65 and 80 seconds before timing out, with the average wait time of 70 seconds. This is because timeout is driven by the deadlock detection process, which is activated on a timer interval basis. The DEADLOK parameter for IRLM can be changed dynamically by a MODIFY IRLMPROC command. MODIFY irlmproc,set DEADLOCK=nnnn Specifies the number, in milliseconds, indicating how often the local deadlock processing is scheduled. nnnn must be a number from 100 through 5000 milliseconds. If a member of a sysplex group and all IRLMs are not enabled for subsecond deadlock processing, message DXR106E is issued. 16

17 B-6 Actual timeout time For non-data-sharing systems MIN LOCAL TIMEOUT = IRLMRWT MAX LOCAL TIMEOUT = IRLMRWT + DEADLOK AVERAGE LOCAL TIMEOUT = IRLMRWT + DEADLOK/2 For data-sharing systems MIN GLOBAL TIMEOUT = IRLMRWT + DEADLOK MAX GLOBAL TIMEOUT = IRLMRWT + 4 * DEADLOK AVERAGE GLOBAL TIMEOUT = IRLMRWT + 2 * DEADLOK 17 DB2 calculates the timeout period as follows: 1. Divide RESOURCE TIMEOUT by DEADLOCK TIME 2. Round to the next largest integer 3. Multiply that integer by DEADLOCK TIME In non-data-sharing systems, the actual time that a transaction waits on a lock before timing out varies between the timeout period and the timeout period plus one DEADLOCK TIME interval. MIN LOCAL TIMEOUT = timeout period MAX LOCAL TIMEOUT = timeout period + DEADLOCK TIME value AVERAGE LOCAL TIMEOUT = timeout period + DEADLOCK TIME value/2 In a data sharing environment, because the deadlock detection process sends inter-system XCF messages, a given transaction typically waits somewhat longer before timing out than in a non-data-sharing environment. MIN GLOBAL TIMEOUT = timeout period + DEADLOCK TIME value MAX GLOBAL TIMEOUT = timeout period + 4 * DEADLOCK TIME value AVERAGE GLOBAL TIMEOUT = timeout period + 2 * DEADLOCK TIME value

18 B-7 Example of timeout times IRLMRWT = 5 and DEADLOK = 500 For non-data-sharing systems Min 5 seconds Max 5.5 seconds Avg 5.25 seconds For data-sharing systems Min 5.5 seconds Max 7 seconds Avg 6 seconds Note: Max and average values can be larger, depending on # of waiters or heavy load on IRLM 18 Let s calculate the theoretical timeout times that we could expect using the formulas on the previous slide. First, for a non-data-sharing system For example, if the timeout period for a given transaction is 5 seconds and the DEADLOCK TIME value is 1 second, the transaction waits between 5 and 5.5 seconds before timing out, with the average wait time of 5.25 seconds. This is because timeout is driven by the deadlock detection process, which is activated on a timer interval basis. And then, for a data-sharing system For example, if the timeout period for a given transaction is 5 seconds and the DEADLOCK TIME value is 500 milliseconds, the transaction waits between 5.5 and 7 seconds before timing out, with the average wait time of 6 seconds. This is because timeout is driven by the deadlock detection process, which is activated on a timer interval basis. However, the maximum or average values can become larger, depending on the number of waiters in the system or if a heavy IRLM workload exists.

19 B-8 Timeout multiplier by type Type IMS BMPs IMS DL/I batch IMS Fast Path Nonmessage processing BIND subcommand processing STOP DATABASE command processing Utilities Retained locks for all types Application accessing not logged tablespaces All other types Multiplier 4 6 6 3 10 6 0 >= 3 1 Modifiable yes yes no no no yes yes no no Parameter NOTE! All modifiable timeout multipliers can be changed online BMPTOUT (1-254) DLITOUT (1-254) - - - UTIMOUT (1-254) RETLWAIT (0-254) - - 19 A timeout hits different load in different ways. An IMS transaction that is timed out while waiting for a lock, automatically gets rescheduled which means that the user probably doesn t even notice the timeout, apart from a longer response time. A batch job that has done an amount of updates to the database will take longer to rollback, and it has to execute its restart logic to continue its work. Or, if it s a DL/I Batch it will abend and terminate when hit by the timeout and has to be physically restarted. DB2 provides controls to set multipliers for the timeout period based on what type of processing is going on. The table on this page shows the different types, the default multiplier, whether it is possible to modify it, and the name of the parameter, if any. Retained lock wait (RETLWAIT) works a bit different as compared to the other timeout multipliers. RETLWAIT indicate how long a transaction should wait for a lock on a resource if another DB2 in a data sharing group has failed and is holding an incompatible lock on that resource. If you use the default, 0, applications do not wait for incompatible retained locks, but instead the lock request is immediately rejected, and the application receives a resource unavailable SQLCODE. The value that you use is a multiplier that is applied to the connection's normal time-out value. For example, if the retained lock multiplier is 2, the timeout period for a call attachment connection that is waiting for a retained lock is 1 * 2 (1 for the normal CAF timeout period, 2 for the additional time that is specified for retained locks). In other words, it is a multiplier to be multiplied with one of the other multipliers. In DB2 9 you can turn off logging on a tablespace. When an application is accessing such a tablespace, DB2 will guarantee that it will have a multiplier that is at least three. For example, an IMS transaction will have three times the IRLMRWT while a DL/I batch will have six times the IRLMRWT. All modifiable timeout multipliers can be changed online via the SET PARM command.

20 B-9 Longrunners Three important values in DSNZPARM URCHKTH 0 255, default 0 Warning if application has been active during x checkpoints without committing URLGWTH 0 1000, default 0 Warning if application has written x thousand log-records without committing LRDRTHLD 0 1439, default 0 Warning if a read claim has been held more than x minutes These warnings are written to the SYSLOG and as trace records when Statistics Trace Class 3 is on 20 We want to avoid all types of locking conflicts. If we can make programs behave properly, they won t create problems for other programs accessing the same tables. We have to avoid creating timeouts which primarily can be done by frequent commits and we have to avoid deadlocks which is more a data access pattern issue. Long-running transactions might also result in a lengthy DB2 restart or a lengthy recovery situation for critical tables. DB2 provides ways to detect programs that doesn t commit as frequent as they should. This function is controlled by three parameters; URCHKTH, URLGWTH, and LRDRTHLD. URCHKTH specifies the number of checkpoint cycles that are to complete before DB2 issues a warning message to the console and instrumentation for an uncommitted unit of recovery (UR). Specify a value that is based on how often a checkpoint occurs in your system and how much time you can allow for a restart or shutdown. For example, if your site s checkpoint interval is 5 minutes and the standard limit for issuing commits with units of recovery is 20 minutes, divide 20 by 5 to determine the best value for your system. URLGWTH specifies the number of log records that are to be written by an uncommitted unit of recovery (UR) before DB2 issues a warning message to the console and instrumentation. Specify the value in 1-K (1000 log records) increments. LRDRTHLD specifies the number of minutes that a read claim can be held by an agent before DB2 before DB2 issues a warning message to the console and instrumentation to report it as a long-running reader. All the above ZPARMs can be changed online with the SET PARM command.

21 Messages and Instrumentation 21 This section will describe the two ways that DB2 communicates with its administrators about locking conflict events.

22 B-10 Timeout - application select deptname from dsn8910.dept ---------+---------+---------+---------+---------+---------+---------+---------+ DEPTNAME ---------+---------+---------+---------+---------+---------+---------+---------+ DSNE610I NUMBER OF ROWS DISPLAYED IS 0 DSNT408I SQLCODE = -911, ERROR: THE CURRENT UOW HAS BEEN ROLLED BACK DUE TO DEADLOCK OR TIMEOUT. REASON 00C9008E, TYPE OF RESOURCE 00000302, AND RESOURCE NAME DSN8D91A.DSN8S91D.X'000002' DSNT418I SQLSTATE = 40001 SQLSTATE RETURN CODE DSNT415I SQLERRP = DSNXRRC SQL PROCEDURE DETECTING ERROR DSNT416I SQLERRD = -190-100 13172878 13813475 1010298875 536870912 SQL DIAGNOSTIC INFORMATION DSNT416I SQLERRD = X'FFFFFF42' X'FFFFFF9C' X'00C9008E' X'00D2C6E3' X'C3C81005' X'20000000' SQL DIAGNOSTIC INFORMATION 22 In this example we have an ordinary timeout situation on a tablespace page in the DSN8910.DEPT table. The SQLCA contains the SQLCODE, SQLSTATE, reasoncode, resource type, as well as the resource name. In this example, we are using SPUFI that in turn uses DSNTIAR to format the contents of the SQLCA. Please note that SQLCA does not contain any information on other processes involved in the timeout. SQLCode -911 SQLState 40001 ReasonCode 00C9008E Type of Resource 00000302 = Table Space Page (resource types are documented in Appendix A of the DB2 9 Messages manual) Resource Name DSN8D91A.DSN8S91D.X 000002 i.e. page 2 in the DSN8S91D tablespace in database DSN8D91A

23 B-11 Timeout - SYSLOG DSNT376I -GT8G PLAN=DSNESPCS WITH CORRELATION-ID=GOLD106 CONNECTION-ID=TSO LUW-ID=ADCD.GT9GLU1.C23209926293=315 THREAD-INFO=GOLD106:*:*:* IS TIMED OUT. ONE HOLDER OF THE RESOURCE IS PLAN=DSNESPCS WITH CORRELATION-ID=GOLD105 CONNECTION-ID=TSO LUW-ID=ADCD.GT9GLU1.C232096C408D=313 THREAD-INFO=GOLD105:*:*:* ON MEMBER GT9G DSNT501I -GT8G DSNILMCL RESOURCE UNAVAILABLE CORRELATION-ID=GOLD106 CONNECTION-ID=TSO LUW-ID=ADCD.GT9GLU1.C23209926293=0 REASON 00C9008E TYPE 00000302 NAME DSN8D91A.DSN8S91D.X'000002' 23 DB2 will document the timeout on the MVS console. Here we get some more information since DB2 is giving us the identity of ONE HOLDER OF THE RESOURCE in the DSNT376I message. DSNT501I gives us the reason code, resource type (resource types are documented in Appendix A of the DB2 9 Messages manual) and the resource name.

B-12 Display locks (DB2 V8) -dis db(dsnd881a) locks DSNT360I -GT8G *********************************** DSNT361I -GT8G * DISPLAY DATABASE SUMMARY * GLOBAL LOCKS DSNT360I -GT8G *********************************** DSNT362I -GT8G DATABASE = DSN8D81A STATUS = RW DBD LENGTH = 20180 DSNT397I -GT8G NAME TYPE PART STATUS CONNID CORRID LOCKINFO -------- ---- ----- ------------ -------- ------------ -------- DSN8S81D TS RW TSO GOLD106 H-IS,S,C DSN8S81D TS RW TSO GOLD105 H-IX,S,C DSN8S81E TS -THRU 0004 DSN8S81P TS 0001 RW RW 24 DB2 gives you the ability to get a snapshot on what locks are held on a database by the command DISPLAY DATABASE() LOCKS. If there is a longrunning batch job that has acquired a lock on the resource and does not COMMIT frequent enough, you might spot the lock holder(s) by using this command. The example on the slide does not show the locking conflict situation mentioned in the previous slides. It just want to show an example of what information is available in DB2. As implied above, there are very rare situations where you can spot a locking conflict by using the command. Furthermore, as soon as you have a timeout or deadlock reported, the locking conflict is already resolved and there is no information to be shown by DB2. In this example of command output from DISPLAY DATABASE(DSND881A) LOCKS on a DB2 V8 subsystem, we can see that there are two lock holders on the DSN8S91D tablespace in database DSN881A, namely GOLD106 and GOLD105. The LOCKINFO consists of a lock status, followed by a dash -, a lock state, a lock type and a lock duration delimited by commas. For this example, we can see that GOLD106 has a status of Hold, a lock state/mode of IS, a table space type of lock (S), with a Commit duration on the DSN8S81D tablespace. We can also see that GOLD105 has a status of Hold, a lock state/mode of IX, a table space type of lock (S), with a Commit duration on the very same tablespace. As we learned in the first part of this presentation, a lock in IS mode is compatible with a lock in IX mode, so everything is in order. The problem with this display output is to identify the thread that each row refers 24

25 B-13 Display locks (DB2 9) -dis db(dsn8d91a) locks DSNT360I -GT9G *********************************** DSNT361I -GT9G * DISPLAY DATABASE SUMMARY * GLOBAL LOCKS DSNT360I -GT9G *********************************** DSNT362I -GT9G DATABASE = DSN8D91A STATUS = RW DBD LENGTH = 24218 DSNT397I -GT9G NAME TYPE PART STATUS CONNID CORRID LOCKINFO -------- ---- ----- ------------ -------- ------------ --------- DSN8S91D TS RW TSO GOLD106 H-IS,S,C - AGENT TOKEN 278 DSN8S91D TS RW TSO GOLD105 H-IX,S,C - AGENT TOKEN 272 11 TB TSO GOLD106 H-IS,T,C - AGENT TOKEN 278 11 TB TSO GOLD105 H-IX,T,C - AGENT TOKEN 272 25 In DB2 9, there is an enhancement in the output from the DISPLAY DATABASE() LOCKS command. There is now an additional line in the output identifying the lock holders thread by its agent token. By some reason, this is only true for local lock holders. In this example, there are also additional information on table locks held by the two agents. Table names does not fit in the tablespace name column so they are represented with a number, an OBID within the database. The content of the different fields is documented under message DSNT397I in the DB2 9 Messages manual. Compatibility of different lock modes is documented in the section Compatibility of lock modes in the DB2 9 Performance Monitoring and Tuning Guide.

26 B-14 Timeout - trace PRIMAUTH CONNECT ORIGAUTH END_USER WS_NAME ORIGAUTH CORRNAME CONNTYPE PLANNAME DESCRIPTION GOLD106 TSO GOLD106 'BLANK' 'BLANK' GOLD106 GOLD106 TSO DSNESPCS TIMEOUT DATA TIMEOUT HEADER NUMBER OF HOLDERS/WAITERS: 1 LOCK HASH VALUE: X'00041402' LOCK RES TYPE: DATA PAGE LOCK DBID: 266 OBID: 2 REQUESTED FUNCTION: LOCK REQUESTED STATE: SHARED REQUESTED DURATION: MANUAL WAITERS CACHED STMT ID: X'0000006D... H O L D E R PRIMAUTH : GOLD105 PLAN NAME: DSNESPCS CORR ID: GOLD105 CONN: TSO LOCK STATE: EXCLUSIVE LOCK DURATION: COMMIT MEMBER: GT8G TRANSACT : 'BLANK' WS_NAME: 'BLANK' END_USER: 'BLANK' STMT ID : X'00000044' 26 As mentioned earlier, DB2 will also write a trace record for timeouts if STATISTICS CLASS(3) is started (or if explicitly asked for). This slide shows the output from OMEGAMON for DB2 batch report for IFCID 196. The trace record has a standard header that identifies the thread that experienced the timeout and got rolled back. In this case it is the GOLD106 user. The timeout header shows information about what kind of lock the thread requested and on what resource. In this example, GOLD106 requested a shared lock on tablespace DSN8S91D (DBID 266, OBID 2 see previous list from the catalog) with duration manual. The holder part of the trace record shows that the holder of the incompatible lock is GOLD105 that has an exclusive lock with duration COMMIT. We need to know more about the two threads involved in this locking conflict and DB2 provides us with a hint on the SQL statements that are involved. In the timeout header you can find the waiters cached stmt id, in this case it s X 0000006D. And for the holder the stmt id is X 00000044. These numbers refer to the SQL statement cache, which is used by dynamic SQL. Now, we only have to look into the cache to find the SQL statements

27 B-15 Timeout - SQL WAITERS CACHED STMT ID: X'0000006D' = 109 Explain stmtcache all or Explain stmtcache stmtid 109 Select from Dsn_Statement_Cache_Table where stmtid = 109 109 GOLD106 select deptname from dsn8810.dept H O L D E R STMT ID : X'00000044' = 68 68 GOLD105 update dsn8810.dept set deptname ='Planning' where deptno = 'B01' 27 The waiters cached stmt id was X 0000006D which translates to 109 in decimal notation. The holders stmt id was X 00000044 which translates to 68 in decimal notation. One way to read the statement cache is to use the EXPLAIN STMTCACHE statement which will returns rows for cached statement to the DSN_STATEMENT_CACHE_TABLE. These rows contain identifying information about the statements in the cache, as well as statistics that reflect the execution of the statements by all processes that have executed the statement. Running a SELECT statement against the DSN_STATEMENT_CACHE_TABLE will display the SQL statement in question.

28 B-16 Deadlock - application UPDATE DSN8810.PROJ SET PROJNAME ='Planning' WHERE projno='op1010'; DSNE615I NUMBER OF ROWS AFFECTED IS 1 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0 ---------+---------+---------+---------+---------+---------+---------+---------+ SELECT deptname FROM DSN8810.DEPT; ---------+---------+---------+---------+---------+---------+---------+---------+ DEPTNAME ---------+---------+---------+---------+---------+---------+---------+---------+ DSNE610I NUMBER OF ROWS DISPLAYED IS 0 DSNT408I SQLCODE = -911, ERROR: THE CURRENT UOW HAS BEEN ROLLED BACK DUE TO DEADLOCK OR TIMEOUT. REASON 00C90088, TYPE OF RESOURCE 00000302, AND RESOURCE NAME DSN8D81A.DSN8S81D.X'000002' DSNT418I SQLSTATE = 40001 SQLSTATE RETURN CODE DSNT415I SQLERRP = DSNXRRC SQL PROCEDURE DETECTING ERROR DSNT416I SQLERRD = -190-100 13172872 13813475 1010298875 536870912 SQL DIAGNOSTIC INFORMATION DSNT416I SQLERRD = X'FFFFFF42' X'FFFFFF9C' X'00C90088' X'00D2C6E3' X'C3C81005' X'20000000' SQL DIAGNOSTIC INFORMATION 28 This transaction updates the PROJ table and then tries to read from the DEPT table. Another transaction, that this transaction doesn t know anything about, has probably been updating the DEPT table and then tried to read from the PROJ table. DB2 detects a deadlock and finds out who is going to be the victim. The deadlock victim is then rolled back and notified by returning control after the SELECT against the DEPT table with SQLCODE -911 and reason code 00C90088. The example is using SPUFI which in turn uses DSNTIAR to format the DSN408I message. The message also contains the resource type and resource name. In this case it is page 2 in tablespace DSN8S81D in database DSN8D81A. No information about the other transaction is provided in the SQLCA.

29 B-17 Deadlock - SYSLOG DSNT375I -GT8G PLAN=DSNESPCS WITH CORRELATION-ID=GOLD106 CONNECTION-ID=TSO LUW-ID=ADCD.GT8GLU1.C2320F682B81=338 THREAD-INFO=GOLD106:*:*:* IS DEADLOCKED WITH PLAN=DSNESPCS WITH CORRELATION-ID=GOLD105 CONNECTION-ID=TSO LUW-ID=ADCD.GT8GLU1.C2320F58CDCA=337 THREAD-INFO=GOLD105:*:*:* ON MEMBER GT8G DSNT501I -GT8G DSNILMCL RESOURCE UNAVAILABLE CORRELATION-ID=GOLD106 CONNECTION-ID=TSO LUW-ID=ADCD.GT8GLU1.C2320F682B81=0 REASON 00C90088 TYPE 00000302 NAME DSN8D81A.DSN8S81D.X'000002' 29 DB2 will document the deadlock on the MVS console. Here we get some more information since DB2 is giving us the identity of the other threads that were involved in the deadlock in the DSNT375I message. In the explanation of DSNT375 you can read Plan plan-id2 identifies one of the members of the deadlock. DB2 does not attempt to identify all survivors of a deadlock or all participants in a deadlock in the DSNT375I message. Plan plan-id2 in message DSNT375I might be just one of several plans holding locks on the desired resource. DSNT501I gives us the reason code, resource type (resource types are documented in Appendix A of the DB2 9 Messages manual) and the resource name.

30 B-18 Deadlock trace (1/2) DEADLOCK HEADER INTERVAL COUNT:255505 WAITERS INVOLVED: 2 TIME DETECTED: 08/08/08 18:57:24.42 R E S O U R C E LOCK RES TYPE: ROW LOCK DBID: 266 OBID: 25 RESOURCE ID: X'0000020E' B L O C K E R PRIMAUTH : GOLD106 PLAN NAME : DSNESPCS CORR ID : GOLD106 CONN ID : TSO MEMBER : N/A DURATION : COMMIT STATE : EXCLUSIVE ACE : 2 TRANSACTION : 'BLANK' WS_NAME : 'BLANK' END_USER: 'BLANK PROGRAM NAME: DSNESM68 LOCATION : 'BLANK' PCKG/COLL ID: DSNESPCS CONS TOKEN : X'149EEA901A79FE48' STMT ID : X'0000006E' STATUS : HOLD QW0172HF: X'12' W A I T E R PRIMAUTH : GOLD105 PLAN NAME : DSNESPCS CORR ID : GOLD105 CONN ID : TSO MEMBER : N/A DURATION : MANUAL STATE : SHARED ACE : 3 TRANSACTION : 'BLANK' WS_NAME : 'BLANK' END_USER: 'BLANK PROGRAM NAME: DSNESM68 LOCATION : 'BLANK' PCKG/COLL ID: DSNESPCS CONS TOKEN : X'149EEA901A79FE48' STMT ID : X'00000070' WORTH : X'12' QW0172WG: X'30' 30 As mentioned earlier, DB2 will also write a trace record for deadlocks if STATISTICS CLASS(3) is started (or if explicitly asked for). This slide shows the output from OMEGAMON for DB2 batch report for IFCID 172. The trace record has a standard header that identifies the thread that was chosen as the deadlock victim and got rolled back. It is not included in the printout, but in this case it is the GOLD106 user. This slide shows the first resource involved in the deadlock. It is a row lock on table DSN8810.PROJ in database DSN8D81A (DBID 266, OBID 25). The blocker is GOLD106 which holds an X-lock with duration COMMIT on row x 0000020E. We also have information down to package version to identify the running program. We also have the cached statement id available. The waiter for this lock is GOLD105 that is requesting an S-lock on the very same row for manual duration. Please note that the waiter has got a worth value associated with it. The worth value is X 12 (18 in decimal notation).

31 B-19 Deadlock trace (2/2) R E S O U R C E LOCK RES TYPE: DATA PAGE LOCK DBID: 266 OBID: 2 RESOURCE ID: X'00000200' B L O C K E R PRIMAUTH : GOLD105 PLAN NAME : DSNESPCS CORR ID : GOLD105 CONN ID : TSO MEMBER : N/A DURATION : COMMIT STATE : EXCLUSIVE ACE : 3 TRANSACTION : 'BLANK' WS_NAME : 'BLANK' END_USER: 'BLANK PROGRAM NAME: DSNESM68 LOCATION : 'BLANK' PCKG/COLL ID: DSNESPCS CONS TOKEN : X'149EEA901A79FE48' STMT ID : X'00000044' STATUS : HOLD QW0172HF: X'12' W A I T E R PRIMAUTH : GOLD106 PLAN NAME : DSNESPCS CORR ID : GOLD106 CONN ID : TSO MEMBER : N/A DURATION : MANUAL STATE : SHARED ACE : 2 TRANSACTION : 'BLANK' WS_NAME : 'BLANK' END_USER: 'BLANK PROGRAM NAME: DSNESM68 LOCATION : 'BLANK' PCKG/COLL ID: DSNESPCS CONS TOKEN : X'149EEA901A79FE48' STMT ID : X'0000006F' WORTH : X'11' QW0172WG: X'30' 31 This is the second resource involved in the deadlock. It is a page lock on a page in tablespace DSN8S91D in database DSN8D91A (DBID 266, OBID 2). The blocker is GOLD105 which holds an X-lock with duration COMMIT on page x 00000200. We have information down to package version to identify the running program. We also have the cached statement id available. The waiter for this lock i GOLD106 that is requesting an S-lock on the very same page for manual duration. Please note that the waiter for this resource has got a worth value of X 11 (17 in decimal notation). According to the message in a previous slide, GOLD106 is the one chosen as deadlock victim and being rolled back. It s due to the fact that GOLD106 has the lowest worth value (x 11 as compared to x 12 ).

32 B-20 Deadlock - SQL R E S O U R C E LOCK RES TYPE: ROW LOCK DBID: 266 OBID: 25 RESOURCE ID: X'0000020E' B L O C K E R STMT ID : X'0000006E' = 110 110 GOLD106 UPDATE DSN8810.PROJ SET PROJNAME ='Planning' WHERE projno='op1010' W A I T E R STMT ID : X'00000070' = 112 112 GOLD105 select projname from dsn8810.proj R E S O U R C E LOCK RES TYPE: DATA PAGE LOCK DBID: 266 OBID: 2 RESOURCE ID: X'00000200' B L O C K E R STMT ID : X'00000044' = 68 68 GOLD105 update dsn8810.dept set deptname ='Planning' where deptno = 'B01' W A I T E R STMT ID : X'0000006F' = 111 111 GOLD106 SELECT deptname FROM DSN8810.dept 32 When combining the deadlock trace from OMEGAMON for DB2 and the contents of the statement cache, you can get a clear picture of what has happened and the source of the locking conflict. In this slide you can see how GOLD106 updates the PROJ table and that GOLD105 updates the DEPT table and that GOLD106 waits to select from the DEPT table at the same time as GOLD105 waits to select from the PROJ table. A classical deadlock situation!

33 B-21 Lockmax - application Create tablespace nomax in gold106d lockmax 0 locksize row; Create table tm (c1 integer not null) in gold106d.nomax; Insert into tm with t(n) as (select 1 from sysibm.sysdummy1 union all select n+1 from t where n<20000) select n from t; 57011(-904) Unsuccessful execution caused by an unavailable resource. Reason code: "00C90096", type of resource: "00000304", and resource name: "GOLD106D.NOMAX.X'000029' '.X'38'". 00C90096 NUMLKUS exceeded 39 (x 27 ) pages * 255 rows + 56 (x 38) rows = 10001 rows (locks) 33 In this example, we will force a situation where NUMLKUS will be exceeded. We do it by telling DB2 to shut off lock escalation (which, by the way, is a mechanism to avoid excessive number of locks) by setting LOCKMAX 0 on the tablespace, requesting row locking by setting LOCKSIZE ROW and then try to insert 20000 rows into the table tm. LOCKMAX specifies the maximum number of page, row, or LOB locks an application process can hold simultaneously in the table space. If a program requests more than that number, locks are escalated. Specifying zero (0), indicates that the number of locks on the table or table space are not counted and escalation does not occur. Resource 304 = Tablespace RID Since MAXROWS is not defined on the tablespace, the default number of rows on a page is 255. According to the message, NUMLKUS is reached on page x 29 and row number x 38. Since page x 0 is the header page and page x 1 is the space map page, we can calculate the NUMLKUS for this DB2 system. We have x 27 pages of 255 rows and trieds to insert row number x 38 when we encounter the -904. x 27 is 39 in decimal notation and x 38 is 56 in decimal notation. The number of rows added = 39*255 + 56 = 10001 which tells us that NUMLKUS is 10000, the default value! Note that this situation is only a problem for the application in itself, so DB2 will not put any information about this on the MVS console or in trace records called by STATISTICS CLASS(3).

34 B-22 Lock escalation - SYSLOG Alter tablespace gold106d.nomax lockmax 700; Insert into tm with t(n) as (select 1 from sysibm.sysdummy1 union all select n+1 from t where n<20000) select n from t; DSNI031I -GT9G DSNILKES - LOCK ESCALATION HAS OCCURRED FOR RESOURCE NAME = GOLD106.LOCKMAX LOCK STATE = X PLAN NAME : PACKAGE NAME = DISTSERV : SYSSH200 COLLECTION-ID = NULLID STATEMENT NUMBER = 000001 CORRELATION-ID = aqt.exe CONNECTION-ID = SERVER LUW-ID = C0A800A7.ODCB.011584115044 THREAD-INFO = GOLD106 : BACKLUND-X60 : gold106 : aqt.exe 34 We now tell DB2 to allow a maximum of 700 locks on our tablespace and then insert 2000 rows into the table tm. It works like a charm, the application will get SQLCODE 0 after the completion of the insert. However, under the covers, DB2 has escalated the row locks to a tablespace lock. This can severely impact other users of this tablespace, so DB2 will put information about the event on the MVS console and in trace records. Note that LOCKMAX can also be set to SYSTEM, which means that DB2, for this tablespace, will use the subsystem wide default lockmax aka NUMLKTS which has a default value of 1000. Also note the detailed information about what statement is hit by the lock escalation. This is great for a program using static SQL, but for dynamic it doesn t help much.

35 B-23 Trace records to care for IFCID 172 Deadlocks IFCID 196 Timeouts IFCID 313 Long running URs Based on # of checkpoints (URCHKTH) Based on # of logrecords (URLGWTH) Based on minutes for a read claim (LRDRTHLD) IFCID 337 Lock escalations 35 These are the four important trace records when it comes to locking conflicts. As we recall from an earlier slide in this presentation, for DB2 to catch long running URs there are three zparms that can be used; URCHKTH, URLGWTH, and LRDRTHLD. The IFCID 313 and 337 are rather straight forward. They describe the culprit and when its bad behavior was detected by DB2. The IFCID 172 and 196 is somewhat more complex in structure. We will look into that on the next slide.

36 B-24 IFCID 172 - Deadlock OP Header LL Headers QW0172 (resource) Hash Value Locking Flag QW0172HE Resource Name 1st QW0172 Holders ID Member Lock State QW0172HE Interval Counter Lock Duration # of resources STCK Waiters ID Lock Function Requested State Lock Duration Worth Member Holders ID Cached Stmt ID Package Name Collection Name Consistency Token Waiters ID Cached Stmt ID Package Name Collection Name Consistency Token As many QW0172 as there are resources Standard Header IFCID # of areas SSID STCK IFCID Seq # DEST Seq # Trace Mask Trace ID Commit Count Correlation Header AUTHID CORRID CONNID PLAN OPERID Data Sharing Header Member Name Group Name 36 This is a pictorial way to illustrate the contents of the IFCID 172 trace record. It is NOT complete nor fully accurate but is meant to serve as a tool to get to understand how these records are put together. The IFCID 172 is written based on the thread that was choosen as a deadlock victim. The correlation header tells you which thread was rolled back. QW0172HE is the deadlock header that tells you when the deadlock was detected and how many resources that was involved in the deadlock. Holders ID and Waiters ID are not fields in the trace record, but denotes a number of fields including planname, correlation id, connection id, LUW id, thread token as well as client information. QW0172 is the resource record that tells you about the holder and the waiters for the resource. There is one QW0172 for each resource. The other headers are standard DB2 instrumentation headers. The Distributed Header is not included on the slide.

37 B-25 IFCID 196 - Timeout OP Header LL Headers QW0196HE 1st QW0196 QW0196HE (resource) # of holders Hash Value Resource Name Lock Function Lock State Lock Duration Req Owning WU Timeout Interval Timeout Counter Waiter Stmt ID QW0196 Holder or prio waiter Holders ID Owning WU Member Lock State Lock Duration Holders Stmt ID As many QW0196 as there are holders Standard Header IFCID # of areas SSID STCK IFCID Seq # DEST Seq # Trace Mask Trace ID Commit Count Correlation Header AUTHID CORRID CONNID PLAN OPERID Data Sharing Header Member Name Group Name 37 This is a pictorial way to illustrate the contents of the IFCID 196 trace record. It is NOT complete nor fully accurate but is meant to serve as a tool to get to understand how these records are put together. The IFCID 196 is written based on the thread that was timed out. The correlation header tells you which thread was rolled back. QW0196HE is the timeout header that tells you about the resource and the lock requested by the waiter as well as its cached statement id. It also shows how many holders that were involved in the timeout. QW0196 is the record that identify a holder or priority waiter, its held or requested lock and its cached statement id. A priority waiter is another transaction that is also waiting for a lock on the same resource, but is before the timed out transaction in the queue. There is one QW0196 for each holder/priority waiter. Holders ID is not a field in the trace record, but denotes a number of fields including planname, correlation id, connection id, LUW id, thread token as well as client information. The other headers are standard DB2 instrumentation headers. The Distributed Header is not included on the slide.

38 B-26 How to identify the culprit By messages * New message in DB2 9 DSNT375I Deadlock PLAN YES COLLECTION - PACKAGE - CONSISTE NCY TOKEN - STATEMENT NUMBER - CACHED STATEMENT ID - DSNT376I Timeout YES - - - - - R035I, J031I Longrunner YES N/A N/A N/A N/A N/A DSNI031I Lock escalation YES YES YES - YES - DSNU120I* Deadlock YES - - - - - DSNU121I* Timeout YES - - - - - 38 The MVS console messages don t provide all the information we need, they have information that relates them back to a unique transaction. But, we also want to be able to track down the individual SQL statements that actually created the locking conflict. Longrunning transactions are different to the other three events since DB2 is only reporting a potential for a transaction to create a locking conflict. Most likely there is no individual SQL statement that creates the potential for locking conflicts, so there is no need to point out the statement. The only message that references an SQL statement is DSNI031I that provides a statement number that references an SQL statement in a package. In DB2 9 for z/os there are two new messages, DSNU120I and DSNU121I, related to utilities that encounter a deadlock or a timeout. Please see the Reference Materials for the structure of these messages. These messages are similar to DSNT375I and DSNT376I respectively, but contains additional information normally found only in trace records. Still, they don t contain information to pinpoint the SQL-statement involved in the locking conflict.

39 B-27 How to identify the culprit By trace records PLAN COLLECTION PACKAGE CONSIST ENCY TOKEN STATEMENT NUMBER CACHED STATEMENT ID 172 Deadlock YES YES(a) YES(a) YES - YES 196 Timeout YES - - - - YES 313 Longrunner YES N/A N/A N/A N/A N/A 337 Lock escalation YES YES YES - YES YES (a) Does not support 128 bytes long unicode names 39 The trace records have information that relates them back to a unique transaction. But, we also want to be able to track down the individual SQL statements that actually created the locking conflict. Longrunning transactions are different to the other three events since DB2 is only reporting a potential for a transaction to create a locking conflict. Most likely there is no individual SQL statement that creates the potential for locking conflicts, so there is no need to point out the statement. In IFCID 172, the collection name has a maximum of 18 unicode characters and the package name has a maximum of 8 unicode characters. IFCID 337 is the only one that references static SQL statements in package. IFCID 172, IFCID 196, and IFCID 337 all reference dynamic SQL statements in the dynamic statement cache. We suggest that DB2 development take measures to make IFCID 172, IFCID 196 complete based on what is already done in IFCID 337 For IFCID 172 we want statement number for static SQL as well as support for long names of collection and package For IFCID 196 we want statement number for static SQL and to have collection, package, and consistency token added (with long names where it applies) For IFCID 337 we want to have consistency token added

40 An additional example 40 Here is an additional example that shows what happens if we run into a deadlock situation with three resources and three threads involved.

41 B-28 Three-way Deadlock TRANSACTION G TRANSACTION H TRANSACTION I S RESOURCE 1 X => deadlock t S RESOURCE 2 X S RESOURCE 3 X 41 This slide has a different way to show a deadlock compared to the slides in the beginning of this presentation. However, it shows a situation where we have three transactions running against three resources and they end up in a deadlock. TRANSACTION I is granted an S-lock on RESOURCE 1, TRANSACTION G is granted an S-lock on RESOURCE 2, and TRANSACTION H is granted an S-lock on RESOURCE 3. Then, TRANSACTION I requests an X-lock on RESOURCE 2, TRANSACTION G requests an X-lock on RESOURCE 3. So far, so good. At this stage, we have two potential timeouts coming along. Then lastly, TRANSACTION H requests an X-lock on RESOURCE 1. And suddenly, we are in a deadlock situation where all three transactions are waiting for each other in a ring. I is waiting for G, G is waiting for H, and H is waiting for I. The three transactions end up in a deadlock. How will DB2 handle this situation? Let s see on the next slide!

42 B-29 The resolution TRANSACTION I TRANSACTION H TRANSACTION G DB2 choses to ROLLBACK TRANSACTION G Its S-lock on R 2 is released Its request for X-lock on R 3 is cancelled TRANSACTION I gets its X-lock on R 2 TRANSACTION H has to wait for its X-lock on R 1 42 OK, so we now have three transactions waiting in a ring. TRANSACTION I is waiting for TRANSACTION G, TRANSACTION G is waiting for TRANSACTION H, and TRANSACTION H is waiting for TRANSACTION I. DB2 has to break the deadlock in some way, and in this it choses to rollback TRANSACTION G. This means that its s-lock on RESOURCE 2 is released and its request for X-lock on RESOURCE 3 is cancelled. This in turn means that TRANSACTION I gets its X-lock on RESOURCE 2 and that TRANSACTION H has to wait for its X-lock on RESOURCE 1 until TRANSACTION I releases it. So, we might run into a timeout situation here if TRANSACTION I doesn t release its lock in time.

43 B-30 Deadlock information Deadlock message only contains The two threads involved in the resolution (G & I) The resource involved in the resolution (R 2) Trace record contains more information Number of resources = 3 For each resource Resource identity (DBID + OBID) Blocker identity, lock state and duration Waiter identity, requested lock state and duration Worth The victim (G) 43 How is DB2 reporting this deadlock? The deadlock message only contains information about the two transactions and their common resource immediately involved in the resolution of the deadlock. I.e., TRANSACTION G and TRANSACTION I as well as RESOURCE 2. The trace record gives you information about the complete deadlock situation, listing all involved resources and transactions. The trace record also identify the chosen victim for the deadlock resolution.

44 How to manage locking conflicts 44 This section will describe our ideas on how to manage locking conflicts.

45 B-31 Catching Culprits - Today DB2 SHBLOCKS SYSLOG REXX Table SHBLOCKS only contains info from DB2 messages 45 Currently, we are gathering all the information that is available in the deadlock, timeout, and lock escalation messages on the SYSLOG by parsing the messages and putting the data into a DB2 table called SHBLOCKS. There are some drawbacks with this solution. 1.There is not enough information in the messages. For example, we cannot pinpoint the involved SQL-statements (except for static SQL statements involved in lock escalation) and we don t get all the involved transactions in a timeout situation. 2.The process is asynchronous since the Rexx program runs as a batch job once a day on the sysout of xxxxmstr

46 B-32 Catching Culprits - Tomorrow DB2 LOCKMON IFCID SQL REXX Start trace for IFCID 172,196,313, and 337 Get trace data with IFI READA Get dynamic SQL statements with IFI READS Store data and statements in DB2 tables 46 To get more information and more up-to-date information we are developing a locking conflict monitor (LOCKMON) that more or less continously gathers locking conflict information from DB2 and stores it into a number of tables. This is done by using a Rexx program in the following manner. 1.Connect to DB2 2.-start tracce(perfm) CLASS(32) IFCID(172,196,313,337) dest(opx) bufsize(32) tdata(cor,dist) 3. Wait for a specified time interval 4. Issue READA against IFI 5. Process any trace records received by parsing them and inserting the information in the right table 6. If any references to a cached statement id, issue READS to read the statement from the dynamic statement cache and insert statement into a table LOCKMON is dynamically controlled and monitored by a so called tracker table.

B-33 LOCKMON Objects LOCKMON_TRACKER LOCKMON_RECORD LOCKMON_LONGRUNNER LOCKMON_ESCALATION LOCKMON_TIMEOUT_RESOURCE LOCKMON_TIMEOUT_BLOCKER LOCKMON_DEADLOCK_EVENT LOCKMON_DEADLOCK_RESOURCE LOCKMON_STATUS 47 This is the current schema for the lock conflict monitor. It may evolve into more tables, especially when it comes to the timeout and deadlock parts. LOCKMON_TRACKER is used to track the activity of LOCKMON and also contains dynamic parameters to control the behaviour of LOCKMON (numtimes, sleeptime, trace, and soft termination) LOCKMON_RECORD is used when tracing is selected, LOCKMON can save trace data asis either in a dataset or in this table or both LOCKMON_LONGRUNNER contains relevant info about any of the three types of longrunners LOCKMON_ESCALATION contains relevant info about transactions that has encountered a lock escalation LOCKMON_TIMEOUT_RESOURCE contains relevant info about the resource part of a timeout record, including info about the transaction that has been timed out LOCKMON_TIMEOUT_BLOCKER contains relevant info about the blockers (holder and priority waiters) in a timeout LOCKMON_DEADLOCK_EVENT contains relevant info about a deadlock event, including the time detected LOCKMON_DEADLOCK_RESOURCE contains relevant info about a specific resource and its holder and waiter involved in a deadlock event LOCKMON_STATUS is a stored procedure that tells you about the status of LOCKMON (by drawing some conclusions based on the contents of the LOCKMON_TRACKER table. Note that there can be several timeout blockers per timeout resource as well as there can be several deadlock resources per deadlock event. 47

48 B-34 Status LOCKMON In August 2008, a prototype is running Exploits the LOCKMON_TRACKER table Traces in LOCKMON can be sent to LOCKMON_RECORD table A dataset Information on longrunners is stored in LOCKMON_LONGRUNNER Work is underway with the rest of functionality Exploring a subscription part for e-mails based on Triggers and stored procedures 48 The LOCKMON prototype is running in our DB2 system for application development. It currently exploits the LOCKMON_TRACKER table.. Trace records can be stored in LOCKMON_RECORD table and/or in a dataset for further analysis. Information on longrunners is stored in the LOCKMON_LONGRUNNER table. There is also a stored procedure called LOCKMON_STATUS that can be run to show the status of LOCKMON (derived implicitly from information in the LOCKMON_TRACKER table).

49 B-35 Summary Important to follow up on locking conflicts Messages and trace records are not lined up not complete Two requirements has been filed More information in trace records Better handling of locking conflicts in DL/ batch We have ideas on how to monitor locking conflicts More immediate information All available information is used 49 At Handelsbanken we have procedures and an organization to follow up on locking conflicts. The work is based on information from the DB2 messages for deadlocks, timeouts, and lock escalation. However, the messages does not contain all the information we need. For example, they do not have a reference to the SQL statements that are involved in a locking conflict. Another example is that the timeout message only contains ONE HOLDER OF THE RESOURCE. This makes it tedious to find the culprit and prolongs the fixing of a problem program. The DB2 trace records, however has some more information in them but are still not complete. We have posted a requirement for DB2 development to take a unified action to make the trace records more complete. The requirement number in FITS is MR0905086141. We also encountered a strange behaviour when using DL/I batch that DB2 returns a single reason code for both deadlocks and timeouts where the description only talks about the DL/I batch being the deadlock victim. We have filed a requirement to make this corrected. The requirement number in FITS is MR0905083727. We are prototyping a locking conflict monitor that gathers information from trace records. This will give us more timely and complete information.

50 Questions Any questions? 50 Are there any questions?

51 Documents to read DB2 9 for z/os Documentation SC18-9851 Performance Monitoring and Tuning Guide Redbooks SG24-4725 Locking in DB2 for MVS/ESA Environment SG24-7111 Data Integrity with DB2 for z/os SG24-7134 DB2 UDB for z/os: Application Design for High Performance and Availability Bonnie Baker Programmers Only in IBM Database Magazine http://cmp.ebookhost.net/db2/quest/8/ 51

52 Lock Out Your Locking Problems Part 2 Lennart Henäng Svenska Handelsbanken AB lehe08@handelsbanken.se F11 52

Reference Materials 53

54 B-36 Timeout Chart Result of deadlock TSO CAF RRS TSO CAF RRS IMS Nonmessage driven IMS Nonmessage driven IMS All other types of regions DL/I Batch CICS CICS DB2 ROLLBACK YES FAILED YES FAILED YES YES YES FAILED ABEND U0777 U0777(b) S04E 00D44033 or 00D44050 SQLCODE -911-913(a) -911-911 -913(a) SQLSTATE 40001 57033(a) 40001 40001 57033(a) (a) Only in a DB2 abend situation, i.e. when the DB2-initiated rollback fails (b) In this case, IMS reschedules the transaction 54 NOTE! DL/I batch will receive an ABENDS04E together with reason code 00D44033. That reason code states that the DL/I batch has been choosen as a deadlock victim even though it has been timed out. DB2 Development has initiated a documentation change for this. And Handelsbanken has put forward a requirement to have two different reason codes. In our previous examples, we have made the assumption that we ve been running an ordinary batch and that the DB2 initiated rollback succeeded. In fact there is a number of different outcomes in a timeout situation depending on which environment the SQL is executed from. The chart on this page tries to summarize all the various outcomes. The TSO/CAF/RRS columns also includes SQL coming in via DDF. We don t have the time to dwelve into the chart, but you should be aware that an application can get a -913 and it means that the application then has the responsibility to take appropriate action. I.e., to COMMIT or to ROLLBACK the transaction. In IMS MPPs, IFPs or message driven BMPs the application will never receive an SQLCODE/SQLSTATE in a timeout situation. The transaction gets a pseudo ABEND and will, in the normal case, be rescheduled by IMS. Since DL/I batch crashes with an ABENDS04E, there are controls to let SQL in such a batch job wait longer for a resource than when issued from other environments. We ll be back on this subject later. For more information, see chapter 15 of the DB2 9 for z/os Performance Monitoring and Tuning Guide.

B-37 Rollback Chart Result of deadlock TSO CAF RRS TSO CAF RRS IMS Nonmessage driven IMS Nonmessage driven IMS All other types of regions DL/I Batch CICS DROLLBACK = YES CICS DROLLBACK = NO DB2 ROLLBACK YES FAILED YES FAILED YES YES YES NO ABEND U0777 U0777(b) S04E 00D44033 or 00D44050 SQLCODE -911-913(a) -911-911 -913(c) SQLSTATE 40001 57033(a) 40001 40001 57033(c) (a) Only in a DB2 abend situation, i.e. when the DB2-initiated rollback fails (b) In this case, IMS reschedules the transaction (c) Only the current SQL statement is rolled back, application must take appropriate action 55 In our previous examples, we have made the assumption that we ve been running an ordinary batch and that the DB2 initiated rollback succeeded. In fact there is a multitude of different outcomes in a rollback situation depending on which environment the SQL is executed from and certain parameter settings. The chart on this page tries to summarize all the various outcomes. The TSO/CAF/RRS columns also includes SQL coming in via DDF. We don t have the time to dwelve into the chart, but you should be aware that an application can get a -913 and it means that the application then has the responsibility to take appropriate action. I.e., to COMMIT or to ROLLBACK the transaction. In IMS MPPs, IFPs or message driven BMPs the application will never receive an SQLCODE/SQLSTATE in a deadlock situation. The transaction gets a pseudo ABEND and will then, in the normal case, be rescheduled by IMS. When talking to DB2 Development they say that you will get -911 if DB2 is the commit coordinator and rolls back. If DB2 is not the commit coordinator for the thread then we cannot unilaterally roll back and you will get the -913. The effect of this is that you will never get a -913 in TSO, only in the case of IMS, CICS, or XA. This is not consistent with this slide that got its information from the DB2 documentation. For more information, see chapter 15 of the DB2 9 for z/os Performance Monitoring and Tuning Guide. 55

56 B-38 Utility messages in DB2 9 DSNU120I csect-name DEADLOCK INFORMATION: INTERVAL COUNT =n, NUMBER OF WAITERS = m. LOCK NAME = lcknm TYPE = type FUNC/STATE/DURATION = f st dur HOLDER/WAITER = h / w PLAN-ID = planid CORR-ID = corrid MEMBER NM = mbrnm DSNU121I csect-name TIMEOUT INFORMATION: NUMBER OF HOLDERS/WAITERS =m, TIMEOUT FACTOR = t. LOCK NAME = lcknm TYPE = type FUNC/STATE/DURATION = f st dur HOLDER/WAITER = h / w PLAN-ID = planid CORR-ID = corrid MEMBER NM = mbrnm 56 The following is the description for DSNU120I as can be found in the DB2 9 for z/os Messages manual. Utility processing encountered a resource that is currently unavailable because the resource is involved in a deadlock condition. In conjunction with other messages, this message will identify the system action and the action that should be taken by the installation or operator. lcknum lock resource name type lock type f lock resource function st lock resource state dur lock resource duration planid holder or waiter s plan name corrid holder or waiter s correlation ID mbrum DB2 member name The lock type, function, state, and duration values are as documented for the IFCID 172 deadlock trace record (QW0172). The description for message DSNU121I is incorrect in the DB2 9 for z/os Messages manual. DB2 Development is notified and has issued a documentation change.

57 B-39 OBID for DSN8D91A (1/2) DBID OBID QUALIFIER NAME TYPE OBJECT 266 1 DSN8D91A DSN8S91D obid TableSpace 266 2 DSN8D91A DSN8S91D psid TableSpace 266 3 DSN8D91A DSN8S91E obid TableSpace 266 4 DSN8D91A DSN8S91E psid TableSpace 266 5 DSN8D91A DSN8S91R obid TableSpace 266 6 DSN8D91A DSN8S91R psid TableSpace 266 7 DSN8D91A DSN8S91P obid TableSpace 266 8 DSN8D91A DSN8S91P psid TableSpace 266 9 DSN8D91A DSN8S91S obid TableSpace 266 10 DSN8D91A DSN8S91S psid TableSpace 266 11 DSN8910 DEPT obid Table 266 12 DSN8910 XDEPT1 obid Index 266 13 DSN8910 XDEPT1 isobid Index 266 14 DSN8910 XDEPT2 obid Index 266 15 DSN8910 XDEPT2 isobid Index 266 16 DSN8910 XDEPT3 obid Index 266 17 DSN8910 XDEPT3 isobid Index 266 18 DSN8910 EMP obid Table 266 21 DSN8910 XEMP1 obid Index 266 22 DSN8910 XEMP1 isobid Index 266 23 DSN8910 XEMP2 obid Index 266 24 DSN8910 XEMP2 isobid Index 266 25 DSN8910 PROJ obid Table 266 28 DSN8910 XPROJ1 obid Index 266 29 DSN8910 XPROJ1 isobid Index 266 30 DSN8910 XPROJ2 obid Index 266 31 DSN8910 XPROJ2 isobid Index 266 32 DSN8910 ACT obid Table 266 33 DSN8910 XACT1 obid Index 266 34 DSN8910 XACT1 isobid Index 266 35 DSN8910 XACT2 obid Index 266 36 DSN8910 XACT2 isobid Index 266 37 DSN8910 PROJACT obid Table 266 40 DSN8910 XPROJAC1 obid Index 57 By consulting the DB2 catalog, we can get a list of objects and their internal IDs. In the shown list, the objects are combined (unioned) from SYSIBM.SYSINDEXES, SYSIBM.SYSTABLES, and SYSIBM.SYSTABLESPACE. Note that indexes appears twice since they have both an obid and an isobid.

58 B-40 OBID for DSN8D91A (2/2) DBID OBID QUALIFIER NAME TYPE OBJECT 266 41 DSN8910 XPROJAC1 isobid Index 266 42 DSN8910 EMPPROJACT obid Table 266 45 DSN8910 XEMPPROJACT1 obid Index 266 46 DSN8910 XEMPPROJACT1 isobid Index 266 47 DSN8910 XEMPPROJACT2 obid Index 266 48 DSN8910 XEMPPROJACT2 isobid Index 266 49 DSN8910 PARTS obid Table 266 50 DSN8910 XPARTS obid Index 266 51 DSN8910 XPARTS isobid Index 266 55 DSN8910 EDEPT obid Table 266 56 DSN8910 EEMP obid Table 266 57 DSN8910 EPROJ obid Table 266 58 DSN8910 EACT obid Table 266 59 DSN8910 EPROJACT obid Table 266 60 DSN8910 EEPA obid Table 266 62 DSN8D91A DSN8S91X obid TableSpace 266 63 DSN8D91A DSN8S91X psid TableSpace 266 64 DSN8D91A DSN8L91X obid TableSpace 266 65 DSN8D91A DSN8L91X psid TableSpace 266 66 DSN8910 PLAN_TABLE obid Table 266 67 DSN8910 DSN_FUNCTION_TABLE obid Table 266 68 DSN8910 DSN_STATEMNT_TABLE obid Table 266 69 DSN8910 DSN_STATEMENT_CACHE_TABLE obid Table 266 70 DSN8910 DSN_STATEMENT_CACHE_AUX obid Auxiliary 266 72 DSN8910 PLAN_TABLE_HINT_IX obid Index 266 73 DSN8910 PLAN_TABLE_HINT_IX isobid Index 266 74 DSN8910 DSN_STATEMENT_CACHE_IDX1 obid Index 266 75 DSN8910 DSN_STATEMENT_CACHE_IDX1 isobid Index 266 76 DSN8910 DSN_STATEMENT_CACHE_IDX2 obid Index 266 77 DSN8910 DSN_STATEMENT_CACHE_IDX2 isobid Index 266 78 DSN8910 DSN_STATEMENT_CACHE_IDX3 obid Index 266 79 DSN8910 DSN_STATEMENT_CACHE_IDX3 isobid Index 266 80 DSN8910 DSN_STATEMENT_CACHE_AUXINX obid Index 266 81 DSN8910 DSN_STATEMENT_CACHE_AUXINX isobid Index 58 This is the continuation of the previous output.

59 B-41 SQL for listing OBIDs Select dbid, obid as obid,creator as qualifier, name, 'obid ' as type, 'Index' as object from sysibm.sysindexes where obid>0 and dbname = 'DSN8D91A' union all Select dbid, isobid as obid,creator as qualifier, name, 'isobid' as type, 'Index' as object from sysibm.sysindexes where obid>0 and dbname = 'DSN8D91A' union all Select dbid, obid as obid,creator as qualifier, name, 'obid ' as type, case type when 'T' then 'Table' when 'C' then 'Clone' when 'P' then 'XML' when 'X' then 'Auxiliary' else 'View' end as object from sysibm.systables where obid>0 and dbname = 'DSN8D91A' union all Select dbid, obid as obid,dbname as qualifier, name, 'obid ' as type, 'TableSpace' as object from sysibm.systablespace where obid>0 and dbname = 'DSN8D91A' union all Select dbid, psid as obid,dbname as qualifier, name, 'psid ' as type, 'TableSpace' as object from sysibm.systablespace where obid>0 and dbname = 'DSN8D91A' order by obid 59 This is Peter Backlund s first attempt to create the listing of the objects.

60 B-42 SQL for listing OBIDs Select dbid, obid as obid,creator as qualifier, name, 'obid ' as type, 'Index' as object from sysibm.sysindexes where obid>0 and dbname = 'DSN8D91A' union all Select dbid, isobid as obid,creator as qualifier, name, 'isobid' as type, 'Index' as object from sysibm.sysindexes where obid>0 and dbname = 'DSN8D91A' union all Select dbid, obid as obid,creator as qualifier, name, 'obid ' as type, case type when 'T' then 'Table' when 'C' then 'Clone' when 'P' then 'XML' when 'X' then 'Auxiliary' else 'View' end as object from sysibm.systables where obid>0 and dbname = 'DSN8D91A' union all Select dbid, obid as obid,dbname as qualifier, name, 'obid ' as type, 'TableSpace' as object from sysibm.systablespace where obid>0 and dbname = 'DSN8D91A' union all Select dbid, psid as obid,dbname as qualifier, name, 'psid ' as type, 'TableSpace' as object from sysibm.systablespace where obid>0 and dbname = 'DSN8D91A' order by obid 60 The SQL statement works perfectly well, but you have to repeat the database name five times, once for each unioned statement. Peter was challenged to find another way to write an SQL statement that allows for having the database name in one single place.

61 B-43 SQL for listing OBIDs with c(cname) as (select 'DSN8D91A' from Sysibm.SysDummy1) Select dbid, obid as obid,creator as qualifier, name, 'obid ' as type, 'Index' as object from sysibm.sysindexes,c where obid>0 and dbname = cname union all Select dbid, isobid as obid,creator as qualifier, name, 'isobid' as type, 'Index' as object from sysibm.sysindexes,c where obid>0 and dbname = cname union all Select dbid, obid as obid,creator as qualifier, name, 'obid ' as type, case type when 'T' then 'Table' when 'C' then 'Clone' when 'P' then 'XML' when 'X' then 'Auxiliary' else 'View' end as object from sysibm.systables,c where obid>0 and dbname = cname union all Select dbid, obid as obid,dbname as qualifier, name, 'obid ' as type, 'TableSpace' as object from sysibm.systablespace,c where obid>0 and dbname = cname union all Select dbid, psid as obid,dbname as qualifier, name, 'psid ' as type, 'TableSpace' as object from sysibm.systablespace,c where obid>0 and dbname = cname order by obid 61 And, here it is. By using a common table expression (aka inline-view) we can avoid to repeat the database name. Basically we use the common table expression to set a variable cname to the database name and refer to it in all five unioned SQL statements.

62 B-44 Statement cache - all 67 GOLD105 delete from dsn8810.dept where deptno = 'B01 68 GOLD105 update dsn8810.dept set deptname ='Planning' where deptno = 'B01' 112 GOLD105 select projname from dsn8810.proj 70 GOLD106 select deptname from dsn8810.dept where deptno = 'B01' 73 GOLD106 select projname from dsn8810.proj where projno = 'OP1010' 109 GOLD106 select deptname from dsn8810.dept 110 GOLD106 UPDATE DSN8810.PROJ SET PROJNAME ='Planning' WHERE projno='op1010' 111 GOLD106 SELECT deptname FROM DSN8810.dept 62 This is the content of the statement cache after running the test. Note that the SQL text is saved in the statement cache exactly as it is typed in. Please note that statement 109 and statement 111 is syntactically the same statement, but since DB2 requires that statements only match if they are exactly the same, statement 111 became added to the statement cache.

63 B-45 DDL for LOCKMON_TRACKER CREATE TABLE LOCKMON_TRACKER ( RUN INTEGER WITH DEFAULT NULL, JOBNAME CHAR(8) WITH DEFAULT NULL, JOBID CHAR(8) WITH DEFAULT NULL, SSID CHAR(4) WITH DEFAULT NULL, STIME TIMESTAMP WITH DEFAULT NULL, LTIME TIMESTAMP WITH DEFAULT NULL, TTIME TIMESTAMP WITH DEFAULT NULL, NUMTIMES INTEGER WITH DEFAULT NULL, SLEEPTIME INTEGER WITH DEFAULT NULL, SCHEMA CHAR(8) WITH DEFAULT NULL, RTRACE CHAR(1) WITH DEFAULT NULL, SRECS CHAR(1) WITH DEFAULT NULL, NUMREQS INTEGER WITH DEFAULT NULL, TIMEOUTS INTEGER WITH DEFAULT NULL, DEADLOCKS INTEGER WITH DEFAULT NULL, LONGRUNNERS INTEGER WITH DEFAULT NULL, ESCALATIONS INTEGER WITH DEFAULT NULL, TERMINATE CHAR(1) WITH DEFAULT NULL) 63 This is the DDL fpr the LOCKMON_TRACKER table. LOCKMON stores information about its runs in this table. When LOCKMON starts up, it stores its startup parameters in this table. As long as LOCKMON is running it updates the counters for each time it has asked for trace records from DB2 (as well as the timestamp for that event). Directly after the update, LOCKMON reads the table to see if any of its most important startup parameters has been changed (these are marked with bold text on the slide). To terminate LOCKMON in advance, you just set the TERMINATE column to Y.

64 B-46 Requirement MR0905086141 We suggest that DB2 development take measures to make IFCID 172, IFCID 196 complete based on what is already done in IFCID 337 For IFCID 172 we want statement number for static SQL as well as support for long names of collection and package For IFCID 196 we want statement number for static SQL and to have collection, package, and consistency token added (with long names where it applies) For IFCID 337 we want consistency token added 64 For more information, please see IBM requirement MR0905086141.

65 B-47 Requirement MR0905083727 We need better handling of deadlocks and timeouts for DL/I batch (today single reason code 00D44033) Make DL/ batch work the same way as all other batch attachments, i.e. SQLCODE -911 and reason code 00C90088 or 00C9008E Or, at least have two different reason codes to distinguish between deadlock and timeout 00D44033 Explanation: The DB2-DL/I batch support cannot continue because the application was selected as a dead lock victim. 65 For more information, please see IBM requirement MR0905083727.

66 Backup slides Some extra slides, if time permits

67 B-48 Why U-locks? TRANSACTION E U RESOURCE 1 t U S U X TRANSACTION F S Y Y N Result: Wait U X Y N N N N N 67

68 B-49 Other interesting zparms RELCURHL EVALUNC SKIPUNCI RRULOCK XLKUPDLT Yes, No always Yes in V9 Yes, No default No Yes, No default No Yes, No default No Yes, No, Target default No 68

69 B-50 Example of timeout times IRLMRWT = 60 and DEADLOK = 5 For non-data-sharing systems Min 60 seconds Max 65 seconds Avg 62.5 seconds For data-sharing systems Min 65 seconds Max 80 seconds Avg 70 seconds Note: Max and average values can be larger, depending on # of waiters or heavy load on IRLM 69 Let s calculate the theoretical timeout times that we could expect using the formulas on a previous slide. First, for a non-data-sharing system For example, if the timeout period for a given transaction is 60 seconds and the DEADLOCK TIME value is 5 seconds, the transaction waits between 60 and 65 seconds before timing out, with the average wait time of 62.5 seconds. This is because timeout is driven by the deadlock detection process, which is activated on a timer interval basis. And then, for a data-sharing system For example, if the timeout period for a given transaction is 60 seconds and the DEADLOCK TIME value is 5 seconds, the transaction waits between 65 and 80 seconds before timing out, with the average wait time of 70 seconds. This is because timeout is driven by the deadlock detection process, which is activated on a timer interval basis. However, the maximum or average values can be larger, depending on the number of waiters in the system or if a heavy IRLM workload exists.

70 B-51 Lock escalation - SYSLOG Select c1 from tm with rs; DSNI031I -GT9G DSNILKES - LOCK ESCALATION HAS OCCURRED FOR RESOURCE NAME = GOLD106.LOCKMAX LOCK STATE = S PLAN NAME : PACKAGE NAME = DSNESPCS : DSNESM68 COLLECTION-ID = DSNESPCS STATEMENT NUMBER = 000251 CORRELATION-ID = GOLD106 CONNECTION-ID = TSO LUW-ID = GTHLSZ9.H3A5.C231B1C1BC95 THREAD-INFO = GOLD106 : * : * : * 70 Lock escalation can also occur for S locks here we are using isolation RS. Note the detailed information about what statement is hit by the lock escalation. This is great for a program using static SQL, but for dynamic it doesn t help much.

71 B-52 Lock wait A update dsn8810.dept set deptname= Planning where deptno= B01 ; B update dsn8810.proj set projname= Operation where projno= OP1010 ; C select deptname from dsn8810.dept where deptno = 'B01'; A rollback; C select projname from dsn8810.proj where projno = 'OP1010'; B rollback; 71

72 B-53 Lock wait declare global temporary table temp_tab(timecol timestamp not null); insert into session.temp_tab(timecol) values(current timestamp); C select deptname from dsn8810.dept where deptno = 'B01'; select current timestamp - timecol from session.temp_tab; ==> 15.341655 update session.temp_tab set timecol=current timestamp; C select projname from dsn8810.proj where projno = 'OP1010'; select current timestamp - timecol from session.temp_tab; ==> 18.749824 drop table session.temp_tab; 72

73 B-54 Accounting trace DB2 provides three different traces, Statistics, Accounting, Performance where each of them can have diferent classes with various detail level For accounting trace the important classes are 1 time from first SQL to end 2 time within DB2 at plan level 3 wait within DB2 at plan level 7 time within DB2 at package level 8 wait within DB2 at package level Time Wait is CPU-time and elapsed time can be I/O, lock, 73

74 B-55 Accounting report PRIMAUTH CONNECT ORIGAUTH END_USER WS_NAME ORIGAUTH CORRNAME CONNTYPE PLANNAME DESCRIPTION GOLD106 TSO GOLD106 'BLANK' 'BLANK' GOLD106 GOLD106 TSO DSNESPCS ACCOUNTING CLASS 1 BEGINNING STORE CLOCK TIME 04/04/08 14:17:12.195625 ENDING STORE CLOCK TIME 04/04/08 14:18:01.905554 ELAPSED TIME 49.709929 BEGINNING MVS TCB TIME 0.975455 CLASS 2 DB2 ELAPSED TIME 35.394762 TCB TIME 0.129828 CLASS 3 LOCK/LATCH(DB2+IRLM) SUSP TIME 35.251260 LOCK/LATCH(DB2+IRLM) SUSP EVENTS 4 SYNCHRONOUS I/O SUSP TIME 0.000000 LOG WRITE I/O SUSP TIME 0.000000 OTHER READ SUSP TIME 0.000000 OTHER WRITE SUSP TIME 0.000000 74

75 B-56 Accounting report PRIMAUTH CONNECT ORIGAUTH END_USER WS_NAME ORIGAUTH CORRNAME CONNTYPE PLANNAME DESCRIPTION GOLD106 TSO GOLD106 'BLANK' 'BLANK' GOLD106 GOLD106 TSO DSNESPCS PACKAGE/DBRM COLLECTION: DSNESPCS PACKAGE ID: DSNESM68 SQL STMTS: 36 CLASS 7 BEGINNING STORE CLOCK TIME 04/04/08 14:18:01.905367 ENDING STORE CLOCK TIME 04/04/08 14:18:01.905554 TOTAL ELAPSED TIME 35.394760 TOTAL TCB TIME 0.129826 CLASS 8 LOCK/LATCH SUSP TIME 35.251260 LOCK/LATCH SUSP EVENTS 4 SYNCHRONOUS I/O SUSP TIME 0.000000 OTHER READ SUSP TIME 0.000000 OTHER WRITE SUSP TIME 0.000000 75