Solving Performance Problems In SQL Server by Michal Tinthofer Michal.Tinthofer@Woodler.eu GOPAS: info@gopas,sk www.gopas.sk www.facebook.com/gopassr
Agenda Analyze the overall Sql Server state Focus on problematic area Investigate problematic area Parallelism and other CPU waits Memory pressure Storage utilization User concurrency Resolve the issue
Analyze the overall Sql Server State Worker states & waits
How to start Not primitive task Task manager is not enough We need to look into SQL Server service Performance Monitor SQL Server Profiler Database Tuning Advisor Extended Events Dynamic Management Views SSMS reports
Server worker states Resource waits Suspended Signal waits Runnable CPU usage / real work Running
Wait stats (when worker is suspended) CPU Parallelism Storage Pages in memory Worker waits on / potential bottleneck
Signal waits Signal Wait Total Worker Wait Resource Wait
Signal wait can build up Wait on resource Wait on CPU CPU Allocated / working Wait on another resource Wait on CPU CPU Allocated / working
Where to look sys.dm_os_wait_stats Alternatively data collector > Server activity Column name wait_type Description Name of the wait type. For more information, see Types of Waits, later in this topic. waiting_tasks_count Number of waits on this wait type. This counter is incremented at the start of each wait. wait_time_ms max_wait_time_ms signal_wait_time_ms Total wait time for this wait type in milliseconds. This time is inclusive of signal_wait_time_ms. Maximum wait time on this wait type. Difference between the time that the waiting thread was signaled and when it started running.
Wait Stats Showcase
Drawbacks of wait stats Deleted after service restart Cumulative since service start Many waits are not interesting Knowledge about wait types required Categories Types Signal wait compared to resource wait GOC 630 from Gopas will help here
Also perfmon could help Just focus on important counters for overall server performance http://www.quest.com/backstage/images/promotions/sqlse rver-perfmonance-poster.pdf Advantage: Easy to collect and compare over time Disadvantage: Data are aggregated Plenty of counters, prone to collect too much Some counters are obsolete (Buffer manager\buffer cache hit ratio, Physical Disk\Avg. disk queue or Physical Disk\Disk time% )
Investigate problematic area
There are many areas where server can have an issue Even we have 649 waits only few are really important for troubleshooting Most common are: Parallelism and other CPU waits CXPACKET, SOS_SCHEDULER_YIELD, THREADPOOL Memory Concurrency PAGELATCH_XX Storage utilization PAGEIOLATCH, WRITELOG User concurrency LOCK_M_XX, TRAN_MARKLATCH_XX External network ASYNC_NETWORK_IO, OLEDB
Parallelism and other CPU Issues First identify problematic database sys.dm_exec_query_stats can help Monitor your standard workload and compare it Sys.dm_os_schedulers Workers Tasks Load Factor Work queue thread pool Perfmon Auto-Param Attempts/sec, Failed Auto-params/sec SQL Compilations/sec & SQL Re-Compilations/sec Process CPU utilization Compare it with your actual workload Batch Requests /sec
Parallelism and other CPU Issues You really want to collect your queries SQL Trace (NOT Profiler without filters on prod server!) Extended Events DMV Data collector Most issues lying on you workload More than 70% of performance gain is from tuning your queries and database design Focus on: Most CPU intensive queries If CXPACKET look at queries with highest sub tree cost Most waiting queries on CPU waits
How to fix it Focus on query issues like this: Missing indexes Old Statistics Non-SARGable predicates Implicit conversions Parameter sniffing Using the OPTIMIZE FOR hint Recompilation options WITH RECOMPILE Ad hoc non-parameterized queries
Processor Architectures Symmetric Multi Processing (SMP) Memory CPU 0 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 CPU 7 NUMA NODE 0 NUMA NODE 1 CPU 0 CPU 2 CPU 4 CPU 6 CPU 1 Local Memory Access Memory CPU 3 Foreign Memory Access n x local CPU 5 Memory CPU 7
Advanced Troubleshooting Diagnosing inappropriate parallelism Latches A latch is a lightweight synchronization mechanism that protects access to read and change in-memory structures A latch is only held for the duration of the operation, unlike a lock which may be held until a transaction commits. Acts as the synchronization mechanism to prevent two threads updating the page at the same time Example: even NOLOCK LOCK will create Shared Latch Can Be Monitored via wait stats (LATCH class) or sys.dm_os_latch_stats Spinlocks Lightweight synchronization mechanism used to control access to certain data structures in the engine Used when the time that the spinlock will be held is very short. A thread waiting to acquire a spinlock will burn some CPU spinning to see if it can get the CPU before giving up and backing off (yielding the scheduler) before trying again. Can Be Monitored via wait stats (SOS_SCHEDULER_YIELD or sys.dm_os_spinlock_stats
CPU & Task Handling Showcase
What is memory pressure Pressure External Internal Physical Physical memory (RAM) running low. working sets trim SQL Server detects it and shrinks High memory consumption internally not enough for some components (say, QE etc) Virtual Running low on available memory commitment (the difference between Memory:Commit Limit and Memory:Committed Bytes) is low (could be due to lack of space in the system page files). Running low on VAS (direct allocations, DLLs loaded in SQL Server VAS, high number of threads) or VAS fragmentation (a lot of VAS is available but in small blocks). MTL area. SQL Server detects this and responds. Lot of misconceptions here SQL process working set size on 80% of physical memory may not be memory issue!
What to collect External Physical check available memory pages/sec os_sys_memory system_memory_state_desc available memory page file size (especially LPIM) os_memory_clerks Paged & nonpaged pool Perfmon Memory object Available [M, K]Bytes, pages/sec Process object Working Set and private bytes counter for each process (AWE part is not in the report)
What to collect Extrenal Virtual Memory: Commit Limit the amount of virtual memory that can be committed without extending page file space. Paging File: %Usage, Paging File: %Usage Peak. Process: Working Set from the Process Private Bytes counters. How much process memory has been paged out sys.dm_os_sys_memory total_page_file_kb available_page_file_kb
What to collect Internal Physical Performance monitor SQL Server: Buffer Manager object High number of Checkpoint pages/sec High number of Lazy writes/sec PLE RING_BUFFER_OOM Resource Monitor Internal virtual Querying the sys.dm_os_virtual_address_dump DMV results a map of SQL Server VAS by allocation
Memory Manager Architecture change in 2k12 SQL Server 2008 R2 SQL Server 2012 Page Reservation Memory Objects CLR Page Reservatio n Buffer Pool Memory Objects CLR Single-page Allocator Memory Manager Any size page Allocator VAS Allocator Buffer Pool Multi-page Allocator VAS Allocator Memory Manager -g MemToReserve -g MemToReserve
Changes in monitoring sys.dm_os_memory_cache_counters SQL Server 2012 pages_kb pages_in_use_kb sys.dm_os_memory_cache_entries SQL Server 2012 pages_kb sys.dm_os_memory_clerks SQL Server 2012 pages_kb page_size_in_bytes sys.dm_os_memory_objects SQL Server 2012 pages_in_bytes max_pages_in_bytes SQL Server 2008 R2 single_pages_kb + multi_pages_kb single_pages_in_use_kb + multi_pages_in_use_kb SQL Server 2008 R2 pages_allocated_count SQL Server 2008 R2 single_pages_kb + multi_pages_kb page_size_bytes SQL Server 2008 R2 pages_allocated_count max_pages_allocated_count
Memory Management Where to look sys.dm_os_sys_info bpool_commit_target and bpool_committed (<=SQL2012) committed_kb and committed_target_kb in SQL2012 sys.dm_os_buffer_descriptors WARNING: Be careful using dm_os_buffer_descriptors as it can return 200,000+ rows for just a 1.6GB address space. It doesn t count all to avoid blocking and contention
What to monitor?
Common memory error: 701 - There is insufficient system memory in resource pool 'pool_name' to run this query. Performance counters sys.dm_os_memory_clerks sys.dm_resource_governor_resource_pools DBCC FREESYSTEMCACHE Optimize query
Other problematic areas Storage utilization Check for storage latency sys.dm_io_virtual_file_stats(null,null) Check for IO waits or Log waits PAGEIOLATCH, WRITELOG Defrag Log files, reduce VLF count, change file size, use filegroups User concurrency Deadlock Monitor deadlock graph event via SQL Trace Use B-practices Reduce long running queries
Bonus: Consolidation challenges
Multi-Instance Consolidation Scenarios Run workloads, observe characteristics and understand the baseline set of requirements for an instance Determine the processor and memory requirements for each instance Isolate the processors and memory for each instance ALTER SERVER CONFIGURATION SET PROCESS AFFINITY Do not set affinity from Task Manager Keep instances within node boundaries Always set max server memory when using affinity Avoid using more memory than available in nodes Better to change configuration with idle server
Multi-Instance Consolidation Example Machine: 12 LPs - 48 GB RAM Node 0 (16GB) Node 1 (16GB) Node 2 (16GB) LPs: 8 Memory: 32GB LP = Logical Processor: Computing engine in the OS, application and driver view LPs: 2 Mem: 12GB LPs: 2 Mem: 4GB
Questions & Answers. GOPAS: info@gopas,sk www.gopas.sk www.facebook.com/gopassr
Thank You!! Michal.Tinthofer@ Woodler.eu www.woodler.eu GOPAS: info@gopas,sk www.gopas.sk www.facebook.com/gopassr