Users are Complaining that the System is Slow What Should I Do Now? Part 1

Similar documents

SQL Server Performance Assessment and Optimization Techniques Jeffry A. Schwartz Windows Technology Symposium December 6, 2004 Las Vegas, NV

Perfmon counters for Enterprise MOSS

One of the database administrators

SOLIDWORKS Enterprise PDM - Troubleshooting Tools

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Windows Server Performance Monitoring

HP LeftHand SAN Solutions

Query Performance Tuning: Start to Finish. Grant Fritchey

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

Default Thresholds. Performance Advisor. Adaikkappan Arumugam, Nagendra Krishnappa

Response Time Analysis

Managing Orion Performance

Windows System Performance Measurement and Analysis. Jeffry A. Schwartz Integrated Services, Inc.

Microsoft SharePoint 2010 on HP ProLiant DL380p Gen8 servers

Profiling Application Workloads for Microsoft SQL Server Unlocking I/O Performance Potential for Enterprise Applications

PERFORMANCE TUNING IN MICROSOFT SQL SERVER DBMS

Perfmon Collection Setup Instructions for Windows Server 2008+

Dynamic Management Views: Available on SQL Server 2005 and above, using TSQL queries these views can provide a wide variety of information.

About Me: Brent Ozar. Perfmon and Profiler 101

A Performance Engineering Story

Destiny performance monitoring white paper

Enhancing SQL Server Performance

Windows NT. Performance Monitor. A Practical Approach. Windows NT Performance Monitor (Perfmon) may be

Response Time Analysis

Response Time Analysis

NetApp FAS Mailbox Exchange 2010 Mailbox Resiliency Storage Solution

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

SQL Sentry Essentials

theguard! ApplicationManager System Windows Data Collector

Storage and SQL Server capacity planning and configuration (SharePoint...

Optimizing Performance. Training Division New Delhi

Predefined Analyser Rules for MS SQL Server

Autodesk AutoCAD Map 3D Citrix XenApp 4.5 Performance Analysis

Monitoring Databases on VMware

HP ProLiant DL380p Gen mailbox 2GB mailbox resiliency Exchange 2010 storage solution

Device Monitoring Configuration 12/28/2007 2:15:00 PM - 1/11/2008 2:15:00 PM

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

Optimising SQL Server CPU performance

Virtualisa)on* and SAN Basics for DBAs. *See, I used the S instead of the zed. I m pretty smart for a foreigner.

WHITE PAPER Optimizing Virtual Platform Disk Performance

Performance White Paper

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

SQL Server Performance Tuning for DBAs

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Throwing Hardware at SQL Server Performance problems?

my forecasted needs. The constraint of asymmetrical processing was offset two ways. The first was by configuring the SAN and all hosts to utilize

PERFORMANCE TUNING ORACLE RAC ON LINUX

Performance Monitoring with Dynamic Management Views

Squeezing The Most Performance from your VMware-based SQL Server

The Complete Performance Solution for Microsoft SQL Server

WHITE PAPER Keeping Your SQL Server Databases Defragmented with Diskeeper

Performance And Scalability In Oracle9i And SQL Server 2000

Whitepaper: performance of SqlBulkCopy

Solving Performance Problems In SQL Server by Michal Tinthofer

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

PRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

SQL Server Business Intelligence on HP ProLiant DL785 Server

SQL Server Performance Tuning and Optimization

Throughput Capacity Planning and Application Saturation

SQL SERVER FREE TOOLS

1 Storage Devices Summary

TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2

MS SQL Performance (Tuning) Best Practices:

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

pc resource monitoring and performance advisor

The Top 20 VMware Performance Metrics You Should Care About

DBMS Performance Monitoring

Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?)

Deploying Microsoft Exchange Server 2007 mailbox roles on VMware Infrastructure 3 using HP ProLiant servers and HP StorageWorks

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Deploying and Optimizing SQL Server for Virtual Machines

BridgeWays Management Pack for VMware ESX

Using Application Response to Monitor Microsoft Outlook

Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads

SQL Server 2012 Query. Performance Tuning. Grant Fritchey. Apress*

Deployment Planning Guide

Analysis of VDI Storage Performance During Bootstorm

Performance Management in a Virtual Environment. Eric Siebert Author and vexpert. whitepaper

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Performance data collection and analysis process

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference

Migrate, Manage, Monitor SQL Server 2005: How Idera s Tools for SQL Server Can Help

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

A Comparison of Oracle Performance on Physical and VMware Servers

Performance of Virtualized SQL Server Based VMware vcenter Database

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Microsoft SQL Server OLTP Best Practice

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades

Transcription:

Users are Complaining that the System is Slow What Should I Do Now? Part 1 Jeffry A. Schwartz July 15, 2014 SQLRx Seminar jeffrys@isi85.com

Overview Most of you have had to deal with vague user complaints quoting slowness or hangs Obviously, this information is not very helpful with getting to the source of the problem Remember that erratic response times Proven to be more frustrating and counterproductive than consistently slow response times People usually remember the 90 th or 95 th percentile response times as average Database is often the first place to be blamed Dilemmas Many times analysts do not know where to start - Process especially time-consuming when analysts inexperienced in performance 2

Overview DBAs need techniques for determining Which hardware or software components, including servers, are in trouble? Causes of poor performance - What is to blame? - Is the software itself the problem? - Are the hardware s problems caused by Inefficient software? Simply too much work? For SQL Server servers, which queries are most troublesome? How to develop appropriate solutions 3

Today s Session First of several 30-minute presentations Discusses methodology for pinpointing the sources of problems using PerfMon data alone Subsequent presentations will delve into more SQL Server-centric metrics and tools Discusses high-level hardware-related and preliminary SQL Server performance analysis Describes information and techniques applicable to 2005+ SQL Server and all versions of Windows 4

Analysis Objectives Use measurements to corroborate or discredit user perceptions of performance Capture performance data on ALL servers involved in user experience - IIS, Application, Database, etc. - If SAN is shared, then collect on ALL servers that use it If perceptions are valid - When? - How bad? - Duration? - How do users know it is bad from 11:00 noon? If hardware in trouble, prove whether application is the cause or just the amount of work Too many database trips per transaction? Too many transactions? 5

Analysis Methodology Use Windows Performance Monitor, a.k.a. PerfMon, to determine When problems occur and their duration Which servers and hardware components could be involved Are the physical servers or SQL Server short of memory? Use graphs to visually correlate problem periods on servers Using PerfMon to focus analyses is highly recommended by Microsoft 6

Analysis Techniques Covered Interpretation and usage of informative performance counters - Processor - Memory - Physical I/O - SQL Server Expand upon and clarify PerfMon explanations Provide Useful graphical techniques Insights acquired from over 1,000 customer engagements Possible courses of action 7

PerfMon Hardware and SQL Server Metrics Invaluable hardware and SQL Server metrics % User Time (Processor) % Privileged Time (Processor) % Interrupt Time (Processor) % DPC Time (Processor) % Idle Time (each Disk) Avg. Disk sec/transfer (each Disk) Avg. Disk sec/write (each Disk) Available Bytes (Memory) Page Life Expectancy (SQLServer:Buffer Manager) Page Reads/sec (Memory & SQLServer:Buffer Manager) 8

Assessing Your System Potential problems exist if consistently Counter Criterion % Processor Time > 70% % Privileged Time > 30% (Processor) % Interrupt Time > 20% (Processor) % DPC Time > 25% (Processor) % Idle Time < 40% for any Disk LUN and especially SQL LUNs Avg. Disk sec/transfer > 0.040 seconds (40 ms) Avg. Disk sec/write > 0.040 seconds (40 ms) Available Bytes < 1 GB (Memory) Page Life Expectancy < 300 seconds (SQLServer:Buffer Manager) 9

Useful Processor Performance Counters Processor Object Interrupt counters are isolated to specific processors and frequently ignored (often single-threaded through interrupt processor) Windows only reports % Privileged Time (contains kernel mode and interrupt times) and interrupt times - Actual kernel (Windows) time must be computed manually - REAL kernel mode time is difference between % Privileged Time and the sum of the interrupt times (%Interrupt and %DPC) System Object Processor Queue Length (waiting list length ONLY) - Can be affected by application design Context Switches/sec 10

Processor Usage Overview Percentage used 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Interrupts or DPCs/Second 9,000 8,100 7,200 6,300 5,400 4,500 3,600 2,700 1,800 900 0% Fri Jul 21 6:00 7:30 9:00 10:30 12:00 1:30 3:00 4:30 6:00 Mon Jul 24 6:30 8:00 9:30 11:00 12:30 2:00 3:30 5:00 6:30 Tue Jul 25 7:00 8:30 10:00 11:30 1:00 2:30 4:00 5:30 DPC Interrupt Kernel User Deferred Procedure Calls Queued Hardware Interrupts - 11

Physical I/O Measurements Critical for SQL Server systems because they are often I/O constrained I/O time measured directly by disk driver, which provides transfer times to Windows I/O time = service time + queue time due to driver s location in I/O path Disk response time Queuing not always the cause of large I/O times May not be possible to improve large service (working) times because of physical or financial constraints 12

Physical I/O Measurements Most frequently (and badly) misunderstood Windows metrics Data reported for Each physical disk or LUN (PhysicalDisk) - When RAID implemented in hardware, many actual disk drives can appear as one physical disk or LUN - Use disk idle and disk transfer times to detect contention within RAID itself (see I/O Performance Counters Incomplete slide) Each logical disk partition (LogicalDisk) 13

Useful Disk Performance Counters Physical Disk Avg. Disk sec/transfer - Should be 0.020 seconds (20 ms) at most unless I/O size huge % Idle Time - Surprising how often this is ignored! - Once this reaches zero, no more I/Os can be processed - Performance usually degrades as it approaches zero - Easier to represent as utilization, i.e., 100 - % Idle Time Disk Transfers/sec, Disk Bytes/sec - Beware of disk specs because they usually cite very large I/Os - Beware of cliffs, even on SSDs Read and Write-specific counters also valuable, especially when a read/write performance disparity exists or using RAID 5 Logical Disk Same counters available plus space-related ones Useful when multiple logical drives reside on one physical LUN 14

Interpreting Performance Counters Disk Queue lengths Unlike processor queue length, this INCLUDES I/Os in progress By far, most commonly quoted and used disk performance measurement - Assumes relationships among transfer times, utilizations, and queue lengths - Actually least useful, except when outrageously high or from a VM guest on a busy host (> 50% processor utilization) - Use Transfer times and % Idle instead Interpretation very difficult because # of physical disks in a LUN is usually unknown unless the values are obscenely high 15

Utilization versus Queue Depth Graph 16

Misunderstood PerfMon Counters Many PerfMon counters misunderstood, e.g., % Disk Time Many people continue in 2014 to believe that this metric is a utilization metric! It most definitely is NOT! PerfMon explanation does not help! % Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. % Disk Time actually = 100 * Avg. Disk Queue Length Artificially constrained to 100% by PerfMon Actually useless, but frequently referenced and interpreted as disk busy times Actual busy = 100 - % Idle Time Can indicate a capacity constraint even when performance is excellent 17

Disk LUN Driver Activity % Disk busy 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Mon Jul 31 6:00 8:00 10:00 12:00 2:00 4:00 6:00 Tue Aug 1 7:00 9:00 11:00 1:00 3:00 5:00 Wed Aug 2 6:00 Disk 00 C Disk 05 D Disk 04 E Disk 03 E 8:00 10:00 12:00 2:00 4:00 6:00 Disk 03 E Disk 01 Disk 04 E Disk 03 F Disk 05 D Disk 02 F Disk 00 C Disk 06 D 18

Disk LUN Driver Activity % Disk busy 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Mon Mar 1 2:49 2:52 2:55 2:58 3:19 3:22 3:25 3:28 3:31 3:34 3:37 3:54 3:57 4:00 4:03 4:06 4:09 4:12 Disk 02 T Disk 00 C D Disk 00 C D Disk 01 R S V Disk 02 T 19

Disk LUN I/O Times Milliseconds per I/O 300 250 200 150 100 50 Disk 01 R 0 Mon Mar 1 2:49 2:52 2:55 2:58 3:19 3:22 3:25 3:28 3:31 3:34 3:37 3:54 3:57 4:00 4:03 4:06 4:09 4:12 Disk 00 C D Disk 00 C D Disk 02 T Disk 01 R S V 20

Disk I/O Byte Traffic Overview Disk Bytes/second 40,000,000 35,000,000 30,000,000 25,000,000 20,000,000 15,000,000 10,000,000 5,000,000 0 Mon Jul 17 5:00 Tue Jul 18 6:00 8:00 10:00 12:00 2:00 4:00 6:00 Wed Jul 19 7:00 9:00 11:00 1:00 3:00 5:00 Thu Jul 20 6:00 8:00 10:00 12:00 2:00 4:00 6:00 Disk 07 Disk 12 Disk 13 Disk 13 Disk 11 Disk 10 Disk 12 Disk 08 Disk 09 Disk 07 21

I/O Performance Counters Incomplete Some important metrics not measured directly Avg. Disk Service Time per Transfer Avg. Disk Queuing Time per Transfer Missing values can be computed using the Utilization Law 22

Utilization Law U = X * S U => utilization of a resource X => completion rate S => average service time required of a resource System assumed to be in steady state Not perfect, but an excellent tool 23

Using Utilization Law to Compute Missing I/O-Related Times Restate the Utilization Law S = U / X All calculations use PhysicalDisk counters LogicalDisk counters can be used, if necessary Italicized entities are PerfMon counters Disk Utilization = (100 - % Idle Time) / 100 Disk service time (sec) = Disk Utilization / Disk Transfers/sec Disk queue time (sec) = Avg. Disk sec/transfer - Disk service time (sec) 24

RAID Example Calculations #1 and #2 Disk Utilization 36.57% Disk Transfers/sec 0.65 Avg. Disk sec/transfer 2.0095 seconds! LUN #1 LUN #2 Disk service time 0.3657 / 0.65 = 0.563 seconds or 563 milliseconds Disk queue time 2.0095 0.563 = 1.447 seconds or 1,447 milliseconds Bytes/Transfer 1,307 Disk Utilization 77.67% Disk Transfers/sec 30.89 Avg. Disk sec/transfer 2.4424 seconds! Disk service time 0.7767 / 30.89 = 0.025 seconds or 25 milliseconds Disk queue time 2.4424 0.025 = 2.4174 seconds or 2,417 milliseconds Bytes/Transfer 22,437 25

RAID Example #1 vs. #2 I/O times (2.0095 vs. 2.4424) outrageously high Queuing occurred on both disks Low I/O rate of Disk #1 appears to contribute to high service times 1,307 bytes should not require 563 milliseconds How could this happen? 26

RAID Example #1 vs. #2 Problems began when faster processor complex attached to an existing disk subsystem Customer blamed new processor for poor performance Customer wanted vendor to take it back because architecture was supposedly defective and slower than original In reality, it was MUCH faster and it was swamping the disk subsystem! Solution was to reconfigure disk drives Customer refused to state exactly what they changed Probably multiple LUNs shared same physical drives - Becoming a more common problem as many LUNs are spread across the same physical disks Great in theory, but once hot spots develop, it is very difficult to unravel and correct 27

Database I/O Counters Page reads/sec and Page writes/sec counters Measures physical I/Os, not logical I/Os SQL Trace measures logical I/Os May indicate Insufficient database memory Applications improperly accessing database Improper database table implementation Plot reads and writes on same graph Highlights changes in workload behavior Heavy write activity may coincide with periods of poor performance, especially when RAID 5 disks involved Add Full Scans/sec and Forwarded Recs/sec for better view 28

I/O Activity vs. Scans and Forwarded Records Graph 29

Detecting Insufficient SQL Memory Use Page Life Expectancy Measures time unlocked buffers allowed to remain in buffer pool Aged number that tends to drop quickly and increase slowly If Page Life Expectancy too low (< 300) Allocate more memory to SQL Server or optimize queries Malformed queries that read inappropriate amounts of data can cause low Page Life Expectancy because of data churn - Page reuse is very low 30

Page Lookups/sec Counter Measures number of times SQL Server attempted to find page in buffer pool Logical read 31

Page Life Expectancy vs. Page Lookups Graph 250,000 Lookups per second SQL Server Page Life Expectancy & Page Lookups Seconds 1,000 900 200,000 800 700 150,000 600 500 100,000 400 300 50,000 200 100 0 Fri Mar 9 9:30 11:15 1:00 2:45 4:30 6:15 8:00 9:45 11:30 Sat Mar 10 1:15 3:00 4:45 6:30 8:15 10:00 11:45 1:30 3:15 5:00 6:45 8:30 10:15 Sun Mar 11 12:00 1:45 4:30 6:15 8:00 9:45 0 11:30 Lookups Page Life Expectancy Page Life Expectancy Threshold 32

Batch Requests/sec Number of select, insert, and delete statements Each of these statements triggers a batch event, which increments the counter Note: Also includes each of these statement types executed within a stored procedure 33

Batch Requests vs. Page Lookups Graph 34

Conclusions PerfMon should always be used to focus application troubleshooting and tuning efforts Extremely important to combine Windows system performance for ALL servers used by the application Especially true for processor, memory, and I/O Include SQL Server PerfMon and internal metrics when applicable 35

Follow-Up Please attend next session Will discuss SQL Server Dynamic Management Views (DMVs) Please complete the four question evaluation to let us know How we can help What topics you would like us to cover in future webinars 36