Optimising SQL Server CPU performance

At a glance: Troubleshooting database performance issues Reviewing hardware causes Using PerfMon to track database bottlenecks Evaluating query performance Optimising SQL Server CPU performance Zach Nichter Troubleshooting performance problems on a database system can be an overwhelming task. Knowing where to look for trouble is important, but more crucial is knowing why your system reacts the way that it does to a particular request. A number of factors can affect CPU utilisation on a database server: compilation and recompilation of SQL statements, missing indexes, multithreaded operations, disk bottlenecks, memory bottlenecks, routine maintenance, and extract, transform and load (ETL) activity, among others. CPU utilisation is not a bad thing in itself performing work is what the CPU is there for. The key to healthy CPU utilisation is making sure that the CPU is spending its time processing what you want it to process and not wasting cycles on poorly optimised code or sluggish hardware. Two paths leading to the same place When viewed from a high level, there are two paths to identifying CPU performance problems. The first is reviewing the system s hardware performance, an exercise that helps determine where to look when you head down the second path, reviewing the server s query efficiency. This second path is usually more effective in identifying SQL Server performance issues. Unless you know exactly where your query performance issues lie, however, you should always start with a 52 To get your FREE copy of TechNet Magazine subscribe at: www.microsoft.com/uk/technetmagazine

Hyper-threading system performance evaluation. In the end, you ll usually end up taking both paths. Let s lay some groundwork so that we can review both of these paths. Hyper-threading is a topic that is worth discussing a bit more because of the way that it affects SQL Server. Hyper-threading actually presents two logical processors to the OS for each physical processor. Hyper-threading essentially leases time on the physical processors so that each processor ends up more fully utilised. The Intel Web site (microsoft.com/uk/intel) gives a far fuller description of how hyper-threading works. On SQL Server systems, the DBMS actually handles its own extremely efficient queuing and threading to the OS, so hyper-threading only serves to overload the physical CPUs on systems with already high CPU utilisation. When SQL Server queues multiple requests to perform work on multiple schedulers, the OS has to actually switch the context of the threads back and forth on the physical processors to satisfy the requests that are being made even if the two logical processors are sitting on top of the same physical processor. If you are seeing Context Switches/sec higher than 5000 per physical processor you should strongly consider turning off hyper-threading on your system and retesting performance. In rare cases, applications that experience high CPU utilisation on SQL Server can effectively use hyper-threading. Always test your applications against SQL Server with hyper-threading both turned on and off before implementing changes on your production systems. Laying the groundwork A high-end dual-core processor will easily outperform the RAM in a machine, which will in turn be faster than an attached storage device. A good CPU can handle approximately six times the throughput of current top-end DDR2 memory and about two times that of top-end DDR3 memory. Typical memory throughput is more than 10 times that of the fastest fibre channel drives. In turn, hard disks can only perform a finite number of IOPS (input/output operations per second), a value which is entirely limited by the number of seeks per second a drive can perform. To be fair, it is not typical that only a single storage drive is used to handle all of the storage needs on enterprise database systems. Most setups today utilise Storage Area Networks (SANs) on enterprise database servers or larger RAID groups that can nullify or minimise the disk I/O processor issue. The most important thing to remember is that no matter what your setup looks like, disk and memory bottlenecks can affect the performance of your processors. Because of the I/O speed differences, retrieving data from disk is much more costly than retrieving data from memory. A data page in SQL Server is 8Kb. An extent in SQL Server is made up of eight 8Kb pages, making it equivalent to 64Kb. This is important to understand because when SQL Server requests a particular data page from disk, it is not just the data page that is retrieved but the entire extent in which the data page resides. There are reasons that actually make this more cost-effective for SQL Server, but I won t go into details here. Pulling a data page that is already cached from the buffer pool, at peak performance, should take under half a millisecond; retrieving a single extent from disk should take between 2 and 4 milliseconds in an optimal environment. I typically expect a well-performing, healthy disk subsystem read to take between 4 and 10ms. Retrieving a data page from memory is generally between 4 and 20 times faster than pulling a data page from disk. When SQL Server requests a data page, it checks the in-memory buffer cache before looking for the data page on the disk subsystem. If the data page is found in the buffer pool, the processor will retrieve the data and then perform the work required. This is called a soft page fault. Soft page faults are ideal for SQL Server because the data that is retrieved as part of a request must be in the buffer cache before it can be used. A data page that is not found in the buffer cache must be retrieved from the server s disk subsystem. When the OS has to retrieve the data page from disk, it s known as a hard page fault. When correlating memory performance, disk performance and CPU performance, a common denominator helps us put everything in perspective: throughput. In not-so-scientific terms, throughput is the measurement of how much data you can stuff down a finite pipe. Path 1: System performance There are really only a few methods for determining if a server has a CPU bottleneck and TechNet Magazine January 2008 53

there aren t many potential causes of high CPU utilisation. Some of these issues can be tracked using PerfMon or a similar systemmonitoring tool while others are tracked using SQL Profiler or similar tools. Another method is to use SQL commands through Query Analyzer or SQL Server Management Studio (SSMS). The philosophy I use when evaluating a systems performance is Start broad, then focus deep. Obviously, you can t focus on problem areas until you ve identified them. After you evaluate overall CPU utilisation with a tool like PerfMon, you can use it to look at a couple of very simple and easy-tounderstand performance counters. One of the most familiar performance counters is % Processor Time; when you re in PerfMon, it s highlighted as soon as you open the Add Counter window. % Processor Time is the amount of time the processors stay busy executing work. Utilisation on processors is generally considered high when this value is 80 percent or higher for most of your peak operating time. It s typical, and should be expected, that you will see spikes up to 100 percent at times even when the server is not operating with 80 percent utilisation. Another counter you should review is Processor Queue Length, which can be found under the System performance object in PerfMon. Processor Queue Length shows how many threads are waiting to perform work on the CPU. SQL Server manages its work through schedulers in the database engine, where it queues and processes its own requests. Because SQL Server manages its own work, it will only use a single CPU thread for each logical processor. This means that there should be minimal threads waiting in the processor queue to perform work on a system dedicated to SQL Server. Typically you shouldn t have anything higher than five times the number of physical processors on a dedicated SQL Server, but I consider more than two times problematic. On servers where the DBMS shares a system with other applications, you will want to review this along with the % Processor Time and Context Switches/sec performance counters (I ll discuss context switches shortly) to determine if your other applications or the DBMS needs to be moved to a different server. Figure 1 Selecting the counters to monitor When I see processor queuing along with high CPU utilisation, I look at the Compilations/sec and Re-Compilations/sec counters under the SQL Server: SQL Statistics performance object (see Figure 1). Compiling and recompiling query plans adds to a system s CPU utilisation. You should see values close to zero for the Re-Compilations, but watch trends within your systems to determine how your server typically behaves and how many compiles are normal. Recompiles can t always be avoided, but queries and stored procedures can be optimised to minimise recompiles and to reuse query plans. Compare these values to the actual SQL statements coming into the system through the Batch Requests/sec also found in the SQL Server: SQL Statistics performance object. If the compilations and recompilations per second comprise a high percentage of the batch requests that are coming into the system, then this is an area that should be reviewed. In some situations SQL developers may not understand how or why their code can contribute to these types of system resource problems. Later in this article I ll provide some references to help you minimise this type of activity. While you re in PerfMon, check out the performance counter called Context Switches/sec (see Figure 2). This counter tells how many times threads have to be taken out of the OS schedulers (not SQL 54 To get your FREE copy of TechNet Magazine subscribe at: www.microsoft.com/uk/technetmagazine

schedulers) to perform work for other waiting threads. Context switches are often much more frequent on database systems that are shared with other applications like IIS or other vendor application server components. The threshold that I use for Context Switches/sec is about 5000 times the number of processors in the server. This value can also be high on systems that have hyper-threading turned on and also have moderate to high CPU utilisation. When CPU utilisation and context switches both exceed their thresholds regularly, this indicates a CPU bottleneck. If this is a regular occurrence, you should start planning for the purchase of more or faster CPUs if your system is outdated. For further information, see the Hyper-threading sidebar. The SQL Server Lazy Writer (as it s called in SQL Server 2000) or the Resource Monitor (as it s called in SQL Server 2005) is another area to monitor when CPU utilisation is high. Flushing the buffer and procedure caches can add to CPU time via the resource thread called the Resource Monitor. The Resource Monitor is a SQL Server process that determines which pages to keep and which pages need to be flushed from the buffer pool to disk. Each page in the buffer and procedure caches are originally assigned a cost representing the resources that are consumed when that page is placed into the cache. This cost value is decremented each time the Resource Monitor scans it. When a request requires cache space, the pages are flushed from memory based upon the cost associated to each page; pages with the lowest values are the first to be flushed. The Resource Monitor s activity can be tracked through the Lazy Writes/sec performance counter under the SQL Server: Buffer Manager object within PerfMon. You should track how this value trends to determine what threshold is typical on your system. This counter is usually reviewed along with the Page Life Expectancy and Checkpoints/sec counters to determine whether there is memory pressure. The Page Life Expectancy (PLE) counter helps determine memory pressure. The PLE counter shows how long a data page stays in the buffer cache. The industry-accepted threshold for this counter is 300 seconds. Anything less than a 300-second average over an extended period of time tells you that the data pages are being flushed from memory too often. This causes the Resource Monitor to work harder, which in turn forces more activity onto the processors. The Figure 2 Performance counters to watch Performance counter Counter object Threshold Notes % Processor Time Processor > 80% Potential causes include memory pressure, low query plan reuse, non-optimised queries. Context Switches/sec System > 5000 x processors Potential causes include other applications on the server, more than one instance of SQL Server running on the same server, hyper-threading turned on. Processor Queue Length System > 5 x processors Potential causes include other applications on the server, high compilations or recompilations, more than one instance of SQL Server running on the same server. Compilations/sec SQLServer:SQL Statistics Trend Compare to Batch Requests/sec. Re-Compilations/sec SQLServer:SQL Statistics Trend Compare to Batch Requests/sec. Batch Request/sec SQLServer:SQL Statistics Trend Compare with the Compilation and Re-Compilations per second. Page Life Expectancy SQLServer:Buffer Manager < 300 Potential for memory pressure. Lazy Writes/sec SQLServer:Buffer Manager Trend Potential for large data cache flushes or memory pressure. Checkpoints/sec SQLServer:Buffer Manager Trend Evaluate checkpoints against PLE and Lazy Writes/sec. Cache Hit Ratio: SQL Plans SQLServer:Plan Cache < 70% Indicates low plan reuse. Buffer Cache Hit Ratio SQLServer:Buffer Manager < 97% Potential for memory pressure. TechNet Magazine January 2008 55

PLE counter should be evaluated along with the Checkpoints Pages/sec counter. When a checkpoint occurs in the system, the dirty data pages in the buffer cache are flushed to disk, causing the PLE value to drop. The Resource Monitor process is the mechanism that actually flushes these pages to disk, so during these checkpoints you should also expect to see the Lazy Writes/sec value increase. If your PLE value goes up immediately after a checkpoint is completed, you can ignore this temporary symptom. On the other hand, if you find that you are regularly below the PLE threshold there is a very good chance that additional memory will alleviate your problems and at the same time release some resources back to the CPU. All of these counters are found in the SQL Server: Buffer Manager performance object. Path 2: Query performance Query plans are evaluated, optimised, compiled and placed in the procedure cache when a new query is submitted to SQL Server. Each time a query is submitted to the server, the procedure cache is reviewed to attempt to match a query plan with a request. If one is not found then SQL Server will create a new plan for it, which is a potentially costly operation. Some considerations for T-SQL CPU optimisation are: Query plan reuse Reducing compiles and recompiles Sort operations Improper joins Missing indexes Table/index scans Function usage in SELECT and WHERE clauses Multithreaded operations So let s pull this back into perspective a bit. SQL Server typically pulls data from both memory and from disk, and it s not often that you are working with just a single data page. More often, you have multiple parts of an application working on a record, running multiple smaller queries or joining tables to provide a full view of relevant data. In OLAP environments, your applications may be pulling millions of rows from one or two tables so that you can consolidate, roll SP tracing When tracing your SQL Server app, it pays to become familiar with the stored procedures used for tracing. Using the GUI interface (SQL Server Profiler) for tracing can increase system load by 15 to 25 percent. If you can use stored procedures in your tracing, this can drop by about half. When I know that my system is bottlenecked somewhere and I want to determine which current SQL statements are causing problems on my server, I run the query that you see below. This query helps me get a view of individual statements and the resources that they are currently using, as well as statements that need to be reviewed for performance enhancements. For more information on SQL traces, see msdn2.microsoft. com/ms191006.aspx. SELECT substring(text,qs.statement_start_offset/2,(case WHEN qs.statement_end_offset = -1 THEN len(convert(nvarchar(max), text)) * 2 ELSE qs.statement_end_offset END - qs.statement_start_offset)/2),qs.plan_generation_num as recompiles,qs.execution_count as execution_count,qs.total_elapsed_time - qs.total_worker_time as total_wait_time,qs.total_worker_time as cpu_time,qs.total_logical_reads as reads,qs.total_logical_writes as writes FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st LEFT JOIN sys.dm_exec_requests r ON qs.sql_handle = r.sql_handle ORDER BY 3 DESC up and summarise data for a regional sales report. In situations like these, returning data can be measured in milliseconds if the data is in memory, but those milliseconds can turn into minutes when retrieving the same data from disk instead of RAM. The first example is a situation with a high volume of transactions and plan reuse depends on the application. Low plan reuse causes a large number of compilations of SQL statements which in turn causes a great deal of CPU processing. In the second example, the heavy system resource utilisation can cause a system s CPU to be overly active, as existing data has to be constantly flushed from the buffer cache to make room for the large volume of new data pages. Consider a highly transactional system, where a SQL statement like the one shown below is executed 2000 times over a 15- minute period in order to retrieve shipping carton information. Without query plan reuse you could hypothetically have an individual execution time of around 450ms per 56 To get your FREE copy of TechNet Magazine subscribe at: www.microsoft.com/uk/technetmagazine

statement. If the same query plan is used after the initial execution, every subsequent query could probably run in about 2ms, bringing the total execution time down to about 5 seconds. USE SHIPPING_DIST01; SELECT Container_ID,Carton_ID,Product_ID,ProductCount,ModifiedDate FROM Container.Carton WHERE Carton_ID = 982350144; Query plan reuse is critical for optimal performance on highly transactional systems and it is most often achieved by parameterising your queries or stored procedures. Here are a few excellent resources for information on query plan reuse: Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (microsoft.com/technet/prodtechnol/ sql/2005/recomp.mspx) Optimizing SQL Server Stored Procedures to Avoid Recompiles (microsoft.com/uk/sqlserver) Query Recompilation in SQL Server 2000 (msdn2.microsoft.com/aa902682.aspx) One helpful place that has a wealth of information is the SQL Server 2005 dynamic management views (DMVs). When CPU utilisation is high, there are a couple of DMVs that I use to help me determine if the CPU is being used appropriately or not. Figure 3 Determining server activity SELECT es.session_id,es.program_name,es.login_name,es.nt_user_name,es.login_time,es.host_name,es.cpu_time,es.total_scheduled_time,es.total_elapsed_time,es.memory_usage,es.logical_reads,es.reads,es.writes,st.text FROM sys.dm_exec_sessions es LEFT JOIN sys.dm_exec_connections ec ON es.session_id = ec.session_id LEFT JOIN sys.dm_exec_requests er ON es.session_id = er.session_id OUTER APPLY sys.dm_exec_sql_text (er.sql_handle) st WHERE es.session_id > 50 -- < 50 system sessions ORDER BY es.cpu_time DESC One of the DMVs I review is sys.dm_os_ wait_stats, which is used to supply DBAs with a means to determine each resource type or function that SQL Server uses and it measures the amount of time the system waits because of that resource. The counters in this DMV are accumulative. This means that in order to get a clean look at what resources could be affecting different areas of the system, you will first have to issue a DBCC SQLPERF( sys.dm_os_wait_stats, CLEAR) command to reset all of the counters after you review the data for any outstanding issues. The sys.dm_os_wait_stats DMV is the equivalent of the database consistency check DBCC SQLPERF(WAITSTATS) command in SQL Server 2000. You can find out more about the different wait types in SQL Server Books Online at msdn2.microsoft.com/ ms179984.aspx. It s important to know that waits are typical in a system even when everything is running optimally. You just have to determine if the waits are being affected by a CPU bottleneck. Signal waits should be as low as possible in relation to the overall wait time. The amount of time a particular resource waits for a processor resource can be determined simply by subtracting the signal wait time from the total wait time; this value should not be greater than approximately 20 percent of the total wait time. The sys.dm_exec_sessions DMV shows all of the open sessions on the SQL Server. This DMV provides a high-level view of the performance of each session and all of the work that each session has performed from its inception. This includes the total amount of time the session has spent waiting, total CPU usage, memory usage and a count of reads and writes. The DMV will also provide you with the login, login time, host machine and the last time the session made a request of SQL Server. Using the sys.dm_exec_sessions DMV, you will be able to determine only the active sessions, so if you are seeing high CPU utilisation this is one of the first places to look. Review the sessions that have a high CPU count first. Determine the application and the user that has been performing the work, then start to dig deeper. Pairing sys.dm_ exec_sessions with the sys.dm_exec_requests TechNet Magazine January 2008 57

DMV can provide much of the information that is available through the sp_who and sp_who2 stored procedures. If you join this data together along with the sys.exec_sql_ Profiler can help you narrow in on specific SQL statements or a specific time period that caused high resource utilisation text dynamic management function (DMF) through the sql_handle column, you can get the session s currently running query. The snippet in Figure 3 shows how to pull this data together to help determine what is currently happening on a server. I find that this statement helps determine which applications you need to focus on. When I compare CPU, memory, reads, writes and logical reads for all of the sessions within an application and I determine that the CPU resource is much higher than other resources that are utilised, I start focusing on those SQL statements. To track SQL statements historically for an application I use SQL Server traces. You can get to these either through the SQL Server Profiler tool or through the trace system stored procedures to help evaluate what is going on. (See the sidebar SP tracing for more information on this topic.) Profiler should be reviewed for statements with high CPU utilisation as well as hash and sort warnings, cache misses and other red flags. This can help you narrow in on specific SQL statements or a specific time period that caused high resource utilisation. Profiler is capable of tracking things like the SQL statement text, execution plans, CPU usage, memory usage, logical reads, writes, caching of query plans, recompiles, ejection of query plans from the cache, cache misses, table and index scans, missing statistics, and many other events. After I have collected data from either sp_ trace stored procedures or the SQL Server Profiler, I generally use a database, which is populated with trace data either after the fact or by setting the trace to write to the database. Populating the database after the fact can be done using the SQL Server system function called fn_trace_getinfo. The benefit of this approach is that I can query and sort the data in multiple ways to see what SQL statements used the most CPU or had the most reads, count how many recompiles occur, and many other things. Here s an example of how this function is used to load a table with a profiler trace file. The default specifies that all of the trace files for that trace will be loaded in the order that they were created: SELECT * INTO trc_20070401 FROM fn_trace_gettable( S:\Mountpoints\TRC_20070401_ 1.trc, default); GO Wrapping it up As you ve seen, high CPU utilisation doesn t necessarily indictate that there is a CPU bottleneck. High CPU utilisation can be masking a number of other application or hardware bottlenecks too. Once you ve identified that you have high CPU utilisation despite your other counters looking healthy, you can start looking for the cause within the system, and isolate a solution (whether it be purchasing more CPUs or optimising your SQL code). And whatever you do, don t give up! With the tips provided in this article, along with a bit of practice and research, optimising CPU utilisation under SQL Server is an attainable execution plan. For more information, visit the SQL Server Tech Center: http://technet.microsoft.com/ en-gb/sqlserver/default.aspx Zach Nichter is a SQL Server professional with more than 10 years of experience. He has held a number of SQL Server support roles including DBA, team lead, manager and consultant. Currently, Zach is employed by Levi Strauss & Co. as the DBA Architect, focusing on SQL Server performance, monitoring and architecture as well as other strategic initiatives. In addition, Zach is the author of a video blog found on www.sqlcatch.com. 58 To get your FREE copy of TechNet Magazine subscribe at: www.microsoft.com/uk/technetmagazine