Understanding LGWR, Log File SyncWaits and Commit Performance



Similar documents
Maximum Availability Architecture. Oracle Best Practices for High Availability

TUTORIAL WHITE PAPER. Application Performance Management. Investigating Oracle Wait Events With VERITAS Instance Watch

PERFORMANCE TUNING ORACLE RAC ON LINUX

Oracle DBA Course Contents

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Oracle Database Capacity Planning. Krishna Manoharan

Oracle Database 11g: Performance Tuning DBA Release 2

Oracle server: An Oracle server includes an Oracle Instance and an Oracle database.

Lessons Learned while Pushing the Limits of SecureFile LOBs. by Jacco H. Landlust. zondag 3 maart 13

Oracle Database 12c: Performance Management and Tuning NEW

Best Practices for Synchronous Redo Transport

Basic Tuning Tools Monitoring tools overview Enterprise Manager V$ Views, Statistics and Metrics Wait Events

Outline. Failure Types

1. This lesson introduces the Performance Tuning course objectives and agenda

ORACLE INSTANCE ARCHITECTURE

Configuring Apache Derby for Performance and Durability Olav Sandstå

Oracle Architecture. Overview

One of the database administrators

Debugging Java performance problems. Ryan Matteson

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

Database Performance Monitor Utility

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

Objectif. Participant. Prérequis. Pédagogie. Oracle Database 11g - Performance Tuning DBA Release 2. 5 Jours [35 Heures]

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

LOGGING OR NOLOGGING THAT IS THE QUESTION

Oracle Database 10g: Performance Tuning 12-1

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

EVERYTHING A DBA SHOULD KNOW

Configuring Apache Derby for Performance and Durability Olav Sandstå

OS Thread Monitoring for DB2 Server

Chapter 6, The Operating System Machine Level

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

Daniela Milanova Senior Sales Consultant

Module 3: Instance Architecture Part 1

About the Author About the Technical Contributors About the Technical Reviewers Acknowledgments. How to Use This Book

Method R Performance Optimization the Smart Way. Chad McMahon. Senior Consultant, Database Services CGI

SQL Server Transaction Log from A to Z

Technical Paper Yet Another Performance Profiling Method (Or YAPP-Method)

Oracle Database 10g. Page # The Self-Managing Database. Agenda. Benoit Dageville Oracle Corporation benoit.dageville@oracle.com

Module 15: Monitoring

Response Time Analysis

Together with SAP MaxDB database tools, you can use third-party backup tools to backup and restore data. You can use third-party backup tools for the

Oracle Database Auditing Performance Guidelines

- An Oracle9i RAC Solution

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

HP ProLiant DL380p Gen mailbox 2GB mailbox resiliency Exchange 2010 storage solution

Performance Baseline of Hitachi Data Systems HUS VM All Flash Array for Oracle

Managing Database Performance. Copyright 2009, Oracle. All rights reserved.

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Oracle Database 10g: New Features for Administrators

EMC Unisphere for VMAX Database Storage Analyzer

Secure Web. Hardware Sizing Guide

MySQL Cluster Deployment Best Practices

ERserver. iseries. Work management

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Adaptive Server Enterprise

Drilling Deep Into Exadata Performance With ASH, SQL Monitoring and Exadata Snapper

VERITAS Database Edition for Oracle on HP-UX 11i. Performance Report

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Informix Performance Tuning using: SQLTrace, Remote DBA Monitoring and Yellowfin BI by Lester Knutsen and Mike Walker! Webcast on July 2, 2013!

Xen and XenServer Storage Performance

Real-Time Scheduling 1 / 39

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Data Replication User s Manual (Installation and Operation Guide for Windows)

Boost Database Performance with the Cisco UCS Storage Accelerator

Safe Harbor Statement

The 5-minute SQL Server Health Check

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

OTM Performance OTM Users Conference Jim Mooney Vice President, Product Development August 11, 2015

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies

Performance Tuning and Optimizing SQL Databases 2016

Real-time Data Replication

High Availability Solutions for the MariaDB and MySQL Database

Oracle 11g Database Administration

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application

Enhancing SQL Server Performance

ORACLE CORE DBA ONLINE TRAINING

ORACLE DATABASE 11G: COMPLETE

SQL Server Performance Tuning and Optimization

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

Response Time Analysis

Performance and scalability of a large OLTP workload

Oracle Database 12c: Performance Management and Tuning NEW

Managing a Large OLTP Database

Transcription:

Understanding LGWR, Log File SyncWaits and Commit Performance Tanel Põder http://blog.tanelpoder.com http://tech.e2sn.com 1

Intro: About me Tanel Põder http://tech.e2sn.com- My company and technical Oracle stuff http://blog.tanelpoder.com- Personal Blog (more tech stuff) tanel@tanelpoder.com- Questions & enquiries Consulting, Training, Seminars tanel@tanelpoder.com http://tech.e2sn.com/oracle-training-seminars Online seminars coming soon! 2

Topics How does commit & log file sync work Overview Reasons for too long log file sync waits How to measure where s the problem? 3

Reasons for log file sync waits Commits wait for log file sync by default User commits DDL There s an user commits statistic in v$sesstat Resulting recursive transactions commit Recursive data dictionary DML Rollbacks wait too! User rollbacks User/application issued a rollback command Transaction rollbacks We had an internal rollback (because of some failure) Space allocation/assm problems, cancelled queries, killed sessions 4

Time FG proc. Commit & log file sync flow idealistic overview 1) User issues a COMMIT 6) Commit complete 2) Foreground 5) LGWR posts proc posts the FG proc. LGWR LGWR IO 3) LGWR issues the physical write syscall 4) The physcical write syscall completes log file parallel write log file sync 5

Time Log file sync performance -> disk IO speed FG proc. LGWR The physical write IO (log file parallel write) takes most of the time IO Most of the log file synctime was spent waiting on log file parallel write The other components, scheduling latency, IPC were small 6

Time FG proc. Log file sync performance -> scheduling latency 1) User issues a 2) LGWR 6) Foreground proc COMMIT waits in CPU gets posted, gets runqueue onto CPU runqueue LGWR IO 3) LGWR submits the IO, goes to sleep 4) IO completes, OS puts LGWR to CPU runqueue 5) LGWR gets onto CPU, posts foreground proc. 7

Log file sync flow 1. Foreground process (FG) posts LGWR and goes to sleep The log file sync wait starts Posting is done via a semaphore operation on Unix/Linux 2. LGWR wakes up, gets onto CPU Issues the IO request(s) LGWR goes to sleep, waiting for log file parallel write wait 3. Hardware completes the IO and OS wakes up LGWR LGWR gets onto CPU Marks log file parallel write event complete and posts the FG 4. Foreground process is woken up by LGWR post Foreground process gets onto CPU and completes the log file sync wait 8

Measuring LGWR "speed" SQL> @snapper out 1 10 1096 -- Session Snapper v2.01 by Tanel Poder ( http://www.tanelpoder.com ) --------------------------------------------------------------------------------- SID, USERNAME, TYPE, STATISTIC, DELTA, HDELTA/SEC, %TIME, GRAPH --------------------------------------------------------------------------------- 1096, (LGWR), STAT, messages sent, 12, 12, 1096, (LGWR), STAT, messages received, 10, 10, 1096, (LGWR), STAT, background timeouts, 1, 1, 1096, (LGWR), STAT, physical write total IO requests, 40, 40, 1096, (LGWR), STAT, physical write total multi block request, 38, 38, 1096, (LGWR), STAT, physical write total bytes, 2884608, 2.88M, 1096, (LGWR), STAT, calls to kcmgcs, 20, 20, 1096, (LGWR), STAT, redo wastage, 4548, 4.55k, 1096, (LGWR), STAT, redo writes, 10, 10, 1096, (LGWR), STAT, redo blocks written, 2817, 2.82k, 1096, (LGWR), STAT, redo write time, 25, 25, 1096, (LGWR), WAIT, LGWR wait on LNS, 1040575, 1.04s, 104.1%, @@@@@@@@@@ 1096, (LGWR), WAIT, log file parallel write, 273837, 273.84ms, 27.4%, @@@ 1096, (LGWR), WAIT, events in waitclass Other, 1035172, 1.04s, 103.5%, @@@@@@@@@@ -- End of snap 1, end=2010-03-23 12:46:04, seconds=1 9

LGWR and AsynchIO oracle@linux01:~$ strace -cp `pgrep -f lgwr` Process 12457 attached - interrupt to quit ^CProcess 12457 detached % time seconds usecs/call calls errors syscall This is what the log file parallel writewait event measures AIO reaping duration ------ ----------- ----------- --------- --------- -------------- 100.00 0.010000 263 38 3 semtimedop 0.00 0.000000 0 213 times 0.00 0.000000 0 8 getrusage 0.00 0.000000 0 701 gettimeofday 0.00 0.000000 0 41 io_getevents 0.00 0.000000 0 41 io_submit This system call is not instrumented by wait Interface! 0.00 0.000000 0 2 semop 0.00 0.000000 0 37 semctl ------ ----------- ----------- --------- --------- -------------- 100.00 0.010000 1081 3 total 10

Warning The (background processes) IO instrumentation has quite a few bugs Different IO modes, sync, asynd, direct, buffered etc On some versions, the log file parallel write (and db file parallel write) aren t properly instrumented Version dependent, for example 11.2.0.1 on Linux When filesystemio_options= NONE, IO syscallwaits are instrumented ok (but you don t wan to use this option) When filesystem_io_options= ASYNC, IO reaping waits are all very short However there s unaccounted time in LGWR s wait profile When filesystem_io_options= SETALL, IO reaping waits are instrumented properly 11

Redo, commit related latches and tuning Redo related latches redo allocation latches Protect allocating space in log buffer / RBA ranges in redolog stream redo copy latches Used only for keeping track of whether anyone s copying data into redo log buffer so that LGWR would know to wait for these memory copies to complete before it tries to write buffers to disk LGWR will wait for LGWR wait for redo copywait event in such cases Used to be tuned by _log_simultaneous_copies Should we tune any of these? No, we should fix only problems which exist In other words, if wait interface doesn t show anyone waiting for them, then don t bother tuning them! 12

Wait Events: log file sync log file parallel write log file single write Instrumentation Performance Counters (V$SESSTAT, V$SYSSTAT) redo size redo writing time user commits user rollbacks transaction rollbacks 13

Wait event: log buffer space (Not a commit problem) LGWR is too slow flushing redo log buffer contents to disk Either because too slow IO subsystem Or LGWR not getting enough (quality) CPU time Sometimes pops up due large (unplanned) transactions Of course, it can also be because of a too small log buffer Which is not the case anymore in modern days Log buffer is usually multiple MB due how it is allocated from SGA You shouldn t even set the log_buffer parameter in 10g+ 14

Wait event: log file single write Single block redo IO is used mostly for logfileheader block reading/writing Log switch is the main cause Archiving as well as it updates log header Who wait: LGWR & ARCH Example of what LGWR does during a log switch: WAIT #0: nam='log file sequential read' ela= 12607 log#=0 block#=1 WAIT #0: nam='log file sequential read' ela= 21225 log#=1 block#=1 WAIT #0: nam='control file sequential read' ela= 358 file#=0 WAIT #0: nam='log file single write' ela= 470 log#=0 block#=1 WAIT #0: nam='log file single write' ela= 227 log#=1 block#=1 15

LGWR trace warnings Starting from 10.2.0.3 (or was it 10.2.0.4) LGWR is trying to be helpful and dump warnings when the actual log write IO takes too long: New parameter: _side_channel_batch_timeout_ms timeout before shipping out the batched side channelmessagesin milliseconds LGWR trace file: *** 2010-03-10 11:36:06.759 Warning: log write time 690ms, size 19KB *** 2010-03-10 11:37:23.778 Warning: log write time 52710ms, size 0KB *** 2010-03-10 11:37:27.302 Warning: log write time 3520ms, size 144KB 16

Log file sync in Statspack / AWR How much of the end-to-endresponse time goes to log file sync? How big it is compared to the full response time of the end user? Log file sync may take 20% of your DB Time but the DB Time itself may take only 10% of the total end user response time! 17

Log file sync in Statspack / AWR If log file sync waits take a significant part of the response time, look into the Avg wait (ms) column: Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time ------------------------------------- ------------ ----------- ------ ------ PL/SQL lock timer 57,159 8,754 153 68.8 db file sequential read 61,258 1,562 25 12.3 log file sync 5,873 1,522 259 12.0 CPU time 463 3.6 direct path write 235,606 155 1 1.2... Avg %Total %Tim Total Wait wait Waits Call Event Waits out Time (s) (ms) /txn Time ------------------------ ------------ ---- ---------- ------ -------- ------ log file parallel write 13,292 0 108 8 0.9.8 db file parallel write 3,470 0 39 11 0.2.3 db file sequential read 257 0 4 16 0.0.0 18

Better breakdown of wait times V$EVENT_HISTOGRAM Instead of a single wait time average, breaks wait times into buckets SQL> select event, wait_time_milli,wait_count 2 from v$event_histogram 3 where event = 'log file parallel write'; EVENT WAIT_TIME_MILLI WAIT_COUNT ------------------------- --------------- ---------- log file parallel write 1 22677 log file parallel write 2 424 log file parallel write 4 141 log file parallel write 8 340 log file parallel write 16 1401 log file parallel write 32 812 log file parallel write 64 391 log file parallel write 128 21 log file parallel write 256 6 19

Easy on Solaris prstat m Works since Solaris 8 Timed_os_statistics Identifying scheduling latency OS Wait-cpu(latency) time in v$sesstat On other OS es, it can t be directly measured with standard tools Indirectly, you can look into system wide average CPU runqueue length and assume that LGWR was also queueing Not a too systematic approach, huh? 20

Measure scheduling latency (Solaris) Reading thread-level microstate accounting data with prstat: USR - % Time spent on CPU in user mode SYS - % Time spent on CPU in kernel mode TRP - % Time spent processing system traps (CPU traps) TFL/DFL - % Time spent processing text/data page faults LCK - % Time spent waiting for user locks SLP - % Time spent sleeping (other than user locks) LAT - % CPU scheduling latency # prstat -mlp 5124 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 5124 oracle 4.2 1.6 0.0 0.6 0.0 0.0 94 0.1 412 20 3K 0 oracle/1 5124 oracle 0.0 0.6 0.0 0.0 0.0 99 0.0 0.0 44 3 90 0 oracle/11 5124 oracle 0.0 0.6 0.0 0.0 0.0 99 0.0 0.0 44 0 90 0 oracle/7 5124 oracle 0.0 0.6 0.0 0.0 0.0 99 0.7 0.0 46 0 90 0 oracle/9 5124 oracle 0.0 0.6 0.0 0.0 0.0 99 0.0 0.0 44 1 90 0 oracle/5 5124 oracle 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 24 0 7 0 oracle/2 5124 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 5 0 oracle/3 5124 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 5 0 oracle/10 5124 oracle 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 1 0 5 0 oracle/8 21

No instrumentation Bugs, problems On some version/platform/io configuration the wait interface doesn t record log file parallel write waits at all The same goes for db file parallel writes (I ve noticed it on 9.2-10.2.0.x on Solaris for example) For LGWR you can use V$SESSTAT redo write time statistic instead It s in centiseconds 1-second log file sync bug Most log file syncs took ~1 second to complete The posts sent back by LGWR were missed by foreground process Thus the FG always waited until the 1 second log file sync wait timeout happened 22

Tuning No need for tuning! Log buffer is quite large by default All memory remaining in a granule after the allocation for fixedsga is given to log buffer Oracle used to have a single redo log buffer until v9.0 Redo allocation latch could become the ultimate contention point Since 9.2, Oracle can have the log buffer split into multiple buffers Each protected by a separate redo allocation latch From 10g, Oracle can keep lots of small private redo strandsin shared pool Each protected by a separate redo allocation latch @rs.sql Show redo strands available 23

Evil tuning If you don t care about the D in ACID (and want to occasionally lose data for fun), then: 10gR1: commit_logging transaction commit log write behavior 10gR2: commit_write commit_wait transaction commit log write behavior transaction commit log wait behavior Old undocumented stuff _wait_for_sync wait for sync on commit MUST BE ALWAYS TRUE Old: Put redologsto /tmp(on Solaris) or in-memory disks (/dev/shm) for duration of a migration/upgrade If your OS / server crashes, you ll need to restore from a backup! 24

Optimizations for working around bad applications Commit optimization In PL/SQL since Oracle 9i The log file syncis deferred until the end of the PL/SQL call! SQL> exec while true loop update t set a=a+1 ; commit ; end loop; No log file syncwaits log buffer space/ log file switch completion waits more likely! 25

Prevent priority decay LGWR configuration CPU Put LGWR into fixed priority scheduling class (FX60 on Solaris) LGWR should get onto CPU faster when waking up LGWR isn t thrown off CPU as likely If LGWR is still experiencing significant scheduling latency You can put LGWR into a higher priority class You should not put LGWR into the highest real-time class Real time is tricky your process can monopolize a CPU for itself You don t want to make LGWR pre-empt the OS kernel! Note that Oracle sets some processes into higher priority by default: _high_priority_processes LMS* VKTM 26

LGWR configuration - IO Reduce the amount of workand waitinga log file parallel write has to do Unbuffered concurrent IO Verify with truss/stracewhether proper flags are used (O_DIRECT, O_DIO, O_CIO etc) Or use raw devices ASM is essentially a raw device Or ODM for some cases And optimize the whole IO hardware stack, of course! Note shat mid-large size storage arrays do have write cache built in So, moving redo log files to SSD may not give any advantage! Verify what s your current log file parallel write latency using v$event_histogram 27

log file sync magic tuning super-secret!!! COMMIT LESS!!! Commit when your business transaction ends, not after every single update! 28

Application: Commit less! Summary Ideally only when your logical business transaction ends Troubleshooting: Measure log file sync at session level detail CPU: If waits for log file sync are significant -see whether LGWR gets: Enough (quality) CPU time Onto the CPU fast enough IO: See how much LGWR waits for log file parallel writeevent What s the log file parallel write completion time V$EVENT_HISTOGRAM for better detail 29

Download slides from: Thanks! http://tech.e2sn.com/oracle-slides-and-whitepapers Download Snapper from: http://tech.e2sn.com/oracle-scripts-and-tools/session-snapper Blog: http://blog.tanelpoder.com Email: tanel@tanelpoder.com 30