WORKLOAD CHARACTERIZATION AND CAPACITY PLANNING FOR MS SQL SERVER

WORKLOAD CHARACTERIZATION AND CAPACITY PLANNING FOR MS SQL SERVER Yefim Somin BMC Software, Inc. Abstract Microsoft SQL Server is a multi-component database management system taking advantage of many features of the underlying operating system. Obtaining a breakdown of its workload by categories such as users and applications requires a non-standard approach. This paper describes such an approach as well as architectural features of MS SQL Server relevant for workload analysis and capacity planning. 1. Introduction Microsoft SQL Server is one of the newer major business database management systems. It is, however, moving into the area of business critical applications, and as a result, capacity planning for MS SQL Server is gaining in importance. With more and more sophisticated versions of SQL Server hitting the market, the amount of literature covering its performance is also growing. While on-line documentation is always available for a contemporary product [SQLS02], it is usually expanded on by books and papers. As an example, a CMG paper addressed the interpretation of standard performance counters provided through Windows Performance Monitor [SCHW02]. Many areas of SQL Server architecture have been covered in books such as Microsoft SQL Server 2000 Optimization Guide [FIEL00] and Microsoft SQL Server 2000 Performance Tuning, Technical Reference [WHAL01]. These publications deal mainly with configuration and tuning of the database. Although [WHAL01] contains sections on capacity planning, it covers primarily fundamentals of queueing theorybased approaches for a user who does not possess modeling tools. Necessarily, it is limited to simple workload cases and does not address workload breakdown needs. This paper aims to fill the gap currently left in literature and to describe the specifics of workload characterization for MS SQL Server. All stages of this task, from obtaining necessary data, to interpreting the data, to computing parameters characterizing workloads, require a non-standard approach beyond what one could get by observing and interpreting statistics provided by Performance Monitor. Relevant architectural features of SQL Servers are covered along the way, and a practical approach based on the data provided by several instrumentation sources is presented. 2. Workload Characterization Options for MS SQL Server Workload characterization consists of breaking down resource utilization in a computing environment and assigning the work to appropriate categories. While determination of what constitutes a workload in a particular environment is in the eye of the analyst, it also depends on the kind of statistics available for this purpose.

The simplest approach to workload characterization for MS SQL Server could be separation of all SQL Server-related work from other activities on the system. However, while somewhat useful, this level of characterization is not sufficient for effective performance management and capacity planning. As described further in this paper, the breakdown into workloads for MS SQL Server could be done along the following categories inherent in its architecture: environment in a more distributed direction and could affect details of performance modeling, they are beyond the scope of this paper and would not change the main points addressed here. Finally, specialized tools like MS SQL Server Analysis Services, an OLAP engine, have their own use and structure and are not covered here. 3.2 Main Components of SQL Server Architecture Instances Databases Users Applications These components of SQL Server architecture are described in appropriate detail in the following section. It also turns out that statistics necessary to characterize these objects could be obtained. Let us review the main components of SQL Server Architecture. They include: SQL Server s MS SQL Server Databases Other components (SQL Server Agent, MDTC) 3.2.1 Once workloads are characterized to the desired degree, performance modeling necessary for effective management, sizing and capacity planning could be carried out. 3. MS SQL Server Architecture 3.1 General This section intends to cover those aspects of SQL Server architecture which are specifically relevant for workload characterization. It is not attempting to address the whole architecture of SQL Server, or even all architectural features affecting performance. There are two releases of SQL Server currently in wide use: SQL Server 7.0 and SQL Server 2000. While changes have occurred between version 7.0 and version 2000, most of the features relevant for workload characterization apply to both of them, consequently the approach described in this paper is applicable to both. We concentrate on breaking down the database work on a particular node within a particular SQL Server. While some facilities, e.g., MDTC (Microsoft Distributed Transaction Coordinator), Federated Servers, etc., move the SQL Server s can be either remote or local to the node on which SQL Server is running. In either case, client is a process containing user presentation layer and application logic which establishes a session with the database server. s could run from a command line interface or GUI. When a session is established, a client must supply access credentials, i.e., connect to the database as a valid database user. An application supported by the client can also supply an alphanumeric name designating itself to the database. Within database it is known as program (name). This name is supplied programmatically (and unfortunately, optionally, depending on the goodwill of the application developer) and is not tied to the command name of the OS process doing the client work. 3.2.2 SQL Server SQL Server is an autonomous incarnation of the main database management functionality and is essentially an instance of MS SQL Server. Each instance is a collection of managed resources. It includes sets of memory areas, locks, buffers, logs, and associated databases (see below). Most importantly for the purposes of this paper, a SQL Server instance is implemented as one

OS process. Handling of multiple requests and auxiliary tasks is accomplished with the use of threads. There are two types of threads: worker threads and background threads. The former are used for sessions, the latter for background tasks such as lock monitoring and lazy writing. The equivalent facilities in other RDBMS environments are called background processes (Oracle) or core processes (DB2 UDB). A number of worker threads are created at the database initialization. New ones could be created if necessary, however, there is a system configuration limit beyond which no threads could be added. This limitation could result in rejection of requests. While threads are more lightweight than processes, in that creation of threads, their context switching and maintenance are less time and resource consuming, there exists an even lighter option known as fibers (see for instance [WHAL01] for details). This choice may make a difference for the overall resource use by affecting switching overhead, but it does not change the workload characterization approach presented here. In an important difference from some other major RDBMS systems, each SQL Server on a multi-instance machine is separate and does not share memory, database data files, or database transaction log files with other servers. The workloads on these SQL Servers are also separate and there is no automatic distribution of work among SQL Servers. 3.2.3 Database Each of SQL Server instances has a set of databases attached to them. These are storage entities, and can be of two types: system and user. System databases contain information about SQL Server as a whole and are used for management and operation of the instance. There is a well known set of such databases (master, msdb, tempdb, etc.). It is user databases that are important for workload characterization. They are created by users and contain user and application specific information. 3.2.4 Other Components Other components, if present, play an auxiliary role. For example, SQL Server Agent schedules multi-step jobs, including those used for maintenance, replication and other SQL Server features. MDTC (Microsoft Distributed Transaction Coordinator) is used to distribute requests over multiple servers. These services run in their own OS processes. 3.3 Lifecycle of a Session When a client establishes a session with a SQL Server it connects with a particular instance. At that point a thread from the pool of worker threads is assigned to this session for its duration. If a free thread is not available, an extra one is created up to the configured limit. Each session is also connected to a specific database, from among those attached to the SQL Server instance. User requests, a.k.a. transactions, are passed to the worker thread which in turn performs the appropriate operations gathering or modifying data from the database. Background threads assist in this work by maintaining management structures, keeping logs and assuring data integrity, usually asynchronously in relation to user requests. Thus, both the work of worker threads and background threads puts the load on hardware resources, but background threads don t participate in the user visible response time directly. 3.4 Example of a Multi-instance Configuration Figure 1 provides a sample SQL Server configuration on one physical node, with a number of clients, SQL Server instances with threads, and databases. s: there are 4 remote and 2 local clients connected to the database system Instances: the system consists of 2 SQL Server instances, A and B; each instance contains a set of worker and background threads Databases: there are 4 databases attached to SQL Server instances, 2 for each 3.5 Workload Characterization Example

Figure 2 could be considered a zoom into a configuration like the one presented on Figure 1. Now we are looking at a specific SQL Server Instance and several clients connected to it. Each client is connected to the server as a particular SQL user, e.g., bob, boss, or adm. Each client is also executing an application, e.g., acc, fix, check. Finally, each client s session is maintained by a worker thread assigned to it. Each thread is also connected to a database. If we had data on resource consumption by session (which is indeed the case, as will be described later) we could break the total work down by several categories - workloads. Users: we would have 4 workloads by the number of users Applications (programs): 3 workloads (acc, check, fix); 2 clients are running the same application Databases: 4 workloads, by the number of databases Finally, each of the SQL Server instances could also be a separate workload. All of the above categories could be used separately or together depending on the needs of the analysis. After all, workload is in the eye of the beholder. It is possible to assemble workloads based on worker threads which are assigned to a particular application and database. Background threads don t have such assignments. They could be either pooled together to constitute a separate background workload, or their work could be distributed in some proportion among other workloads. It should be noted, that while handling of parallel tasks is typically implemented using threads, SQL Server utilities, such as SEM (SQL Enterprise Manager), as well as Microsoft online documentation refer to parallel tasks as Processes, which could create a certain amount of confusion.

Remote Remote Remote Remote SQL Server Node SQL Server (Instance A) SQL Server (Instance B) Background Threads Background Threads Worker Threads Worker Threads Local Local Database X Database Y Database K Database L Figure 1. SQL Server Architectural Components

Remote SQL User : bob Program: acc Remote SQL User : ann Program: acc Remote SQL User : boss Program: check Remote SQL User : adm Program: fix SQL Server Node SQL Server Instance Worker Threads Database X Database Y Database V Database Z Figure 2. Worker Threads

4. Measurements Available for MS SQL Server 4.1 Threads and OS Processes While distinct work is done by different threads, it is impossible to tell a specific identity of a specific thread from the point of view of the Operating System. That is, it is not possible to determine that a particular thread is serving a particular session. It is, however possible to identify OS level processes (not to be confused with Processes within SQL Server) doing the relevant work. These processes are: sqlservr.exe main process (process per instance); this is the process which is split into threads to handle sessions and background tasks (see Figure 2) sqlagent.exe processes implementing SQL Server Agents msdtc.exe - process implementing Microsoft Distributed Transactions Coordinator (an auxiliary facility, not part of SQL Server per se) sqlmangr.exe - SQL Service Manager, a utility program which allows one to start and stop the SQL Server and SQL Server Agent osql.exe and isql.exe - SQL Query command line interfaces isqlw.exe - SQL Query Windows interface bcp.exe batch utility Of the processes listed above only the main sqlsrvr.exe process is of real interest in the server workload characterization task. Other auxiliary processes contribute little and cannot be broken down by categories, although it may be prudent to put them in a separate workload and monitor their behavior. processes, such as osql.exe, isql.exe, isqlw.exe, etc., may also be put into a separate client workload, however, in a typical environment most clients are usually remote and the nodes on which they are running are not of interest for workload characterization and analysis. If MS SQL Server Analysis Services are present the following process will be present: This process also has a system of threads to handle tasks, but it is not directly related to the basic functionality of SQL Server described here. There could also be other additional functions under SQL Server umbrella, such as data warehousing, which are beyond the scope of this discussion. 4.2 OS Measurements Once the relevant OS level processes are identified, measurements of their resource consumption could be obtained. This could be done by extracting the appropriate metrics from the Windows Registry. A number of resource consumption metrics are instrumented, primarily CPU, memory, etc. While thread level statistics could also be collected, it is more costly and in any case would not add to our understanding of workload characterization. Process level metrics summarize resource usage by all the component threads. In turn, accumulated resources for processes belonging to SQL Server workload should be summarized. 4.3 Session Data from the Database SQL Server keeps running statistics on current sessions. As has been mentioned, numerous statistics reflecting various areas of SQL Server configuration and activities are available from Windows Registry and could be accessed using Performance Monitor, a standard Windows monitoring utility (see for instance [SCHW02] for a discussion of available metrics). However, session level statistics are not included in this access mechanism. They have to be extracted using a special programmatic interface. Each session record contains a number of identifiers linking the session with the type of work it s doing for classification. These key identifiers are: Session ID - a unique ID for the session SQL User Name - name of the user defined within SQL Server Program - application name (can be blank if not defined) msmdsrv.exe

Database - name of the Database the session is connected to (sessions for worker threads only) Machine name of the node on which the session client is running Process process ID of SQL Server For each of these sessions the following resource statistics directly relevant for workload characterization are available: CPU used - the amount of CPU time used by the session Physical I/O - the count of physical I/O operations on behalf of the session Memory - the number of pages of memory used by the session Other data, such as thread ID, last query start time, and the state of connection could also be obtained. By regularly sampling records for all existing sessions, it is possible to obtain the amount of resources consumed by sessions over time intervals of interest. It should be noted, however, that session records are kept only while sessions are in existence (whether for active or dormant sessions). They disappear once a session is terminated. That means that a shortlived session, i.e., a session whose lifetime is shorter than the sampling interval, could be completely missed. Similarly, part of the resource use of longer-living sessions which are terminated between samples could also be missed. Although it is possible to increase the sampling frequency to improve the capture efficiency of session statistics, this is done at the cost of a higher collection overhead, thus a tradeoff must be found. This task is addressed in the following section. 5. Workload Characterization Methodology It follows from the preceding discussion, that to obtain both proportionately correct workload breakdown and capture all the work of computing resources, it is necessary to use the measurements from the OS and the database. In this context the OS data provide the more accurate estimate of the total amount of work. 5.1 Computational Steps Select workload characterization categories (from the list of users, applications, databases, instances, or any combination thereof); e.g., workload characterization by users and databases: that means that there will be at least one (or more) workloads for each database, broken down within a database by users Summarize resource consumption (CPU, I/O) for each of the selected categories using Session data records; as has been noted, each session record could be assigned to an appropriate summarization bin (workload) based on its object keys; For each computing node present in the environment, sum of resources of all workload-specific buckets needs to be computed; Separately, total resource consumption of OS processes carrying out SQL Server work, namely processes with command name sqlservr.exe, is computed; For each node the sum of OS based resources is compared with the sum of resources derived from the database session data; it is expected that the OS sum will be greater than the database sum; Each workload specific resource should be multiplied by the ratio of the OS based sum to the database sum; this operation will preserve the proportions of the workload breakdown yet accurately account for the total load SQL Server activities put on the system. While parts of sessions and some complete sessions may be missed at collection time and not included in the database measurements, the above algorithm provides a statistical estimate of the underlying breakdown. The accuracy of such an estimate depends on the nature of the load, the frequency and the duration of collection, and as practical matter is usually acceptable. For a discussion of statistical properties of such an estimate, as applied to resource measurements, see for instance [AGRA96]. 6. Capacity Planning

Once workloads are separated, what-if scenarios for capacity planning could be tested using analytical modeling techniques, like the ones described in [LAZO84] and numerous other queueing theory books. Examples of such scenarios are: workload growth load balancing hardware changes 7. Workload Characterization Example Here is an example of workload resource utilization computation. It was an experiment using Hammer, a freely available SQL Server workload generation utility. This utility allows the operator to specify the number of virtual users connected to the database and putting a load on it. Processes generating the load have a command name procslave and connect to the database with that application name. The operator can however determine the user name under which to connect. In this case, connections were done as three different users (test1, test2, and test3). In addition to the load, session information was collected from the database by an application called collect with the same user name. Some work was also done by SQL utilities, isql and isqlw, connected as sa (sysadmin) user. Data in Table 1 reflect a one hour collection interval. Statistics for sessions with the same user and application name are already summarized and the number of contributing sessions is also included. The total amount of CPU seconds as measured for the sessions is smaller than the amount of CPU resource used by the SQL Server process as measured by the Operating System. This is to be expected. Number of sessions User Applicati on CPU (sec) 1 collect collect 33 1 sa isql 71 1 sa isqlw 105 10 test1 procslave 485 10 test2 procslave 479 10 test3 procslave 491 Total 1664 slqservr.exe OS process 1731 Table 1. SQL Sessions CPU Consumption In this case we define workloads based on Applications, hence all test users are combined into one workload. On the other hand, sa user s work is broken down. A different aggregation scheme could easily be chosen. In the first step, CPU time values for all the sessions are multiplied by a factor of 1.04, the ratio between the OS measure of the SQL Server process and the sum of session measurements. Then, standard CPU utilization for the time interval is computed for all the workloads (test users numbers are added together first). The workloads obtained are shown in Table 2. Workload CPU Utilization collect 0.95% isql 2.05% isqlw 3.03% procslave 42.03% Total 48.06% Table 2. Workloads based on session statistics. 8. Conclusions It has been demonstrated that a meaningful, i.e., useful for a performance analyst or an IT manager, workload characterization of an MS SQL Server environment can be accomplished. To this end, the following needs to be done:

SQL Server session data, not available through Windows Registry, need to be collected OS processes resource consumption measurements need to be collected as well The data from two sources need to be combined to accurately capture the total amount of work as well as the correct workload breakdown Details of SQL Server component hierarchy need to be understood and utilized for the proper processing described in the paper Capacity planning using performance modeling techniques can be performed on the workloads characterized with this methodology 9. Acknowledgments The author acknowledges advice and review provided by Chris Cavers, as well as preliminary work done by Saqib Syed, both within the framework of BMC Software. 10. References [FIEL00] Jenney Lynne Fields, Microsoft SQL Server 2000 Optimization Guide, Prentice Hall, 2000 [WHAL01] Edward Whalen et at., Microsoft SQL Server 2000 Performance Tuning, Technical Reference, Microsoft Press, 2001 [SCHW02] Jeffrey A. Schwarz, Interpreting SQL Server 2000 Performance Counters, CMG 2002 Proceedings, Reno 2002 [SQLS02] SQL Server Books Online, Microsoft, 2002 [LAZO84] Edward Lazowska, John Zahorjan, Scott Graham, and Kenneth Sevcik, Quantitative System Performance, Prentice Hall, 1984 [AGRA02] Subhash Agrawal, Kenneth Newman, Michael Forsyth, Yefim Somin, Measurement and Analysis of Process & Workload CPU Times in UNIX Environments, CMG96 Proceedings, San Diego 1996