Profiling Application Workloads for Microsoft SQL Server Unlocking I/O Performance Potential for Enterprise Applications Understanding how each application contributes to the total I/O workload is crucial to designing a high-performance storage solution. SQL SERVER IS A SERVICE. I/O LOAD ORIGINATES WITH THE APPLICATION. Poorly managed I/O workloads are a leading cause of degrading SQL performance. Microsoft SQL Server does not initiate I/O activity. It simply responds to the requests from the applications. Tuning the SQL engine without examining the source applications will only produce haphazard results. I/O design elements can benefit certain application types while hindering the performance of other application types. The I/O environment must be evaluated as a whole. Windows Performance Monitor provides all the tools necessary to measure an application s I/O profile. INTRODUCTION The SQL database server requires constant and intricate performance tuning to deliver maximum value. Complicating the tuning effort is the wide variety of applications that combine to create the I/O work stream. Understanding how each application contributes to the load is crucial to forming an effective performance tuning strategy. I/O CONSUMPTION Understanding how applications consume server resources is a fundamental requirement of performance tuning and capacity planning. The modern server is a powerful engine that can support a myriad of applications. Performance tuning is about configuring the platform so the most favored applications run well. Capacity planning is about maintaining enough resources so all the applications have enough resources to run. This paper provides information about how to measure one critical server resource: the storage I/O. Tracking down I/O activity can be a bit elusive because what the server sees is a confluence of application activity. At the Gulf of Mexico the mighty Mississippi river runs deep and wide. That flow of water is not characteristic of all the upstream tributaries that contribute to the total volume. Each application adds I/O volume to the data path just like rivers and streams add volume to the flow of the Mississippi. To predict the total I/O flow, each contributing source must be examined. SN0430971-00 Rev. A 04/14 1
Profiling Application Workloads for Microsoft SQL Server APPLICATION CONFLUENCE Like a great river, SQL Server combines the I/O inflows of many application sources and combines them into a larger, more singular workload. I/O performance tuning for the most mission-critical applications must be balanced against providing enough I/O for all the applications that are serviced by the SQL Server platform. SQL SERVER - TIER 1 AND IN HIGH DEMAND Building and maintaining a healthy SQL Server infrastructure is a prime directive for a growing enterprise. End users have come to expect a rapid response from their applications and are quick to complain when service starts to slow down. Likewise, overnight processing windows come with very firm deadlines. Batch processes must churn through an ever-increasing amount of data in a finite amount of time. Downstream processes in the critical path rely on this data for the business to function. Storage engineers, server administrators, and database administrators must work in concert to keep the SQL Server platform operating in peak condition. AN AGGRESSIVE APPLICATION WITH ROCK STAR STATUS SQL Server is an aggressive application with a voracious appetite for consuming computing resources. SQL Server will seize every last hardware resource within its reach. This is a design choice with important ramifications. SQL Server will take away resources from any co-resident application. There are, however, certain benefits to being such a poor neighbor. The SQL Buffer Manager can operate under an assumption that it has exclusive access to the data in the databases. This method of operation allows it to buffer all reads and writes without the overhead of exhaustive verification or validation. Indeed, the data integrity strategy of SQL Server depends on this exclusivity. It is a tier-one VIP (very important program), and it does not mind flaunting its rock star status. When it comes to I/O workloads, however, SQL Server is a big consumer but less of an instigator. PROFILING I/O WORKLOADS FOR SQL SERVER APPLICATIONS It is commonly accepted that SQL Server produces an extraordinary I/O workload on storage architecture. What is less known is that SQL Server does not initiate that workload. Rather, it simply responds to the requests of the applications it services. To understand how to achieve maximum performance, the effect of each application must be examined to see how the combined workload pushes SQL Server to its limits (Figure 1). Figure 1. Applications Drive SQL I/O Workloads. DELIVERING VALUE TO THE ENTERPRISE Performance goals for database platforms are often expressed in business terms such as more transactions, quicker reports, and faster execution of complex processes. The benefits of reaching these milestones include more orders placed, less delay for end users, and the ability to analyze more data to support decision making. Achieving business benefits is the ultimate driver of value to the business. Taking the time to analyze how the business interacts with applications enables the setting of worthwhile goals that result in a meaningful ROI. ISOLATE PROFILE MODEL The first step in mapping I/O workloads is to isolate a target application from the general circulation. This task is not always easy, but it is necessary to determine an application s I/O characteristics. Isolation has two components. First, all other traffic must be silenced. For a SQL Server, this means that all other database traffic must be suspended. For a virtual deployment, all the I/O from co-resident virtual machines (VMs) must also be suspended. Second, a representative process must be identified that, when executed, generates a typical I/O signature for that application. This might be a certain canned report or a fairly intense batch process. The key here is that the identified task can be repeated so the effect of change can be measured. Once a target application is identified and a sample process isolated, the next step is to profile the I/O workload. There are a number of commercially available tools to help with I/O profiling. Many storage devices are also equipped with the ability to capture I/O traffic patterns. For a Microsoft Windows environment, the Performance Monitor has all the right stuff. This paper will cover how to use PerfMon to profile a specific application workload as well as the aggregate load produced by all applications. SN0430971-00 Rev. A 04/14 2
Profiling Application Workloads for Microsoft SQL Server MEASURING I/O WORKLOADS There are four primary characteristics of an I/O workload. 1. IOPS Input/output per second describes the number of transactions executed. Some applications are designed to complete a large number of individual transactions. A stock trading application, for example, deals with time sensitive buy and sell orders. Even slight delays can impact the return on an investor s money. Pushing through a large number of IOPS is the key to achieving success. This type of application falls under the Online Transaction Processing (OLTP) classification. Two PerfMon counters, Disk Reads/sec and Disk Writes/sec, capture the IOPS workload (Table 1). 2. MBPS - Megabytes per second (MBPS) describes the quantity of data the application must query. When a large amount of data must be processed, throughput becomes the most critical factor. A decision support system is an example of an application that must wade through reams of data to find the aggregate, average, or other key aspects that will guide an executive choice. Getting to the right conclusion without undue delay is the crucial metric. This type of application falls under the online analytical processing (OLAP) classification. The PerfMon counters, Disk Read Bytes/sec and Disk Write Bytes/sec, provide the amount of data the application is seeking (Table 1). 3. I/O Size The I/O size is important because of its relationship to IOPS and throughput. Smaller I/O blocks pass through the storage system faster, improving IOPS performance. Larger I/O blocks carry more data and produce better throughput. The PerfMon counters, Avg Disk Bytes/ Read and Avg Disk Bytes/Write, provide strong evidence of the typical block size (Table 1). Of course, with any average, it is important to be aware that values at either extreme can skew the results. For profiling purposes, knowing the average I/O size is very useful. 4. Latency Latency is the time it takes for an I/O to complete. It becomes part of the performance equation for all applications because it directly affects response time. Faster response time is the most sought after result of all performance improvement efforts. The PerfMon counters, Avg Disk sec/read and Avg Disk sec/write, describe the latency experienced by the application (Table 1). PerfMon counters for profiling I/O workloads are available through the PhysicalDisk and the LogicalDisk objects. The counters found under the LogicalDisk object are preferred for application profiling because they report volume letters or mount point names (rather than disk numbers) and can provide a finer level of granularity if there are multiple volumes on a single LUN or Windows disk (Figure 2). Figure 2. Performance Monitor Objects KEEPING A POLITE DISTANCE PerfMon itself is an application which consumes a very small amount of local computing resources. On systems where system load is extremely sensitive, PerfMon can be executed remotely from another server, a VM, or a management workstation. To view performance counters from a remote computer, the Performance Logs and Alerts firewall exception must be enabled on the remote computer. In addition, members of the Performance Log Users group must also be members of the Event Log Readers group on the remote computer. From the Performance Monitor snapin menu, select Performance, then Performance Monitor (Figure 3). Table 1. Performance Monitor Counters for I/O Workloads Perf Mon Counter Workload Characteristic Description Disk Reads/sec Disk Writes/sec Disk Read Bytes/sec Disk Write Bytes/sec Avg Disk Bytes/Read Avg Disk Bytes/Write Avg Disk sec/read Avg Disk sec/write Number of IOPS Throughput Avg Size of I/Os Latency Number of I/Os being issued against a particular LUN. Applications with high IOPS workloads respond well to server-side caching. Measure of the total throughput for a particular disk or LUN. Applications with large throughput requirements prefer large bandwidth architecture. Size of I/Os being issued. Larger I/O sizes tend to increase latency. When used to monitor SQL Server, this will report the average size of the I/Os SQL is issuing to fill query requests. Lower latency values are better. Can vary depending on the size of I/O s being issued. Larger I/O s tend to increase latency. SN0430971-00 Rev. A 04/14 3
Profiling Application Workloads for Microsoft SQL Server Figure 3. PerfMon Connecting to a Remote Computer ADDING COUNTERS 1. Click the red X to delete any existing counters for the local server. 2. Click the green + to add counters for the remote server. 3. In the Add Counters window, either browse for the remote server or add the remote server by name in the format \\<servername>. After a moment, the available counters for the remote server are loaded. 4. Select the LogicalDisk counters listed in Table 1 along with the LUN where the target application resides and click the Add>> button and click OK. CREATING COLLECTOR SETS Now that Perfmon is monitoring counters, the next step is to create a Collector Set to record the information. The collector sets are then used to create reports. 1. Right-click on Performance Monitor > select New > Data Collector Set. 2. Enter a name for the collector set and click Next. 3. Enter the location for the.blg file, which is the binary performance log file. Once the collector set is created, it will show up under the User Defined section of Data Collector Sets. To run a collector set at a scheduled time, right-click the collector set > Properties > Schedule tab. Once the collector is started, it will begin creating the report. The report is available under the User Defined section of Reports. The performance log data can be captured in many formats, including binary, comma separated, tab separated, or native SQL Server. The log format can be set in the properties of the User Defined Data Collector Set, as shown in Figure 4. CAPTURING THE I/O PROFILE At this point, Performance Monitor is ready to profile the target application. 1. Open the user-defined collector set. 2. Begin recording with the Start button. 3. Launch the tasks that were selected to represent the application s core activities. 4. When the tasks have run their course, press the stop button to end the recording. VIEW THE CAPTURED RESULTS To view the results: Figure 4. Log Format Options for Collector Sets 1. Select the Performance Monitor option in the Windows Performance Monitor navigation pane. 2. In the console pane toolbar, click the Add Log Data button. The Performance Monitor properties page will open at the Source tab. 3. In the Data Source section, select Log files and click Add. 4. Browse to the log file created in the previous section and click Open. 5. Click the green + to add counters for review. Note, the only counters presented are those that were captured by the data collector set (Figure 5). 6. To view only a portion of the log file, adjust the beginning and ending time sliders. Click the zoom button, and the selected time frame will fill the display. SN0430971-00 Rev. A 04/14 4
Profiling Application Workloads for Microsoft SQL Server 7. Look at the maximum, minimum, and average values of the data sample. Adjusting the scale of the counter along with the minimum and maximum vertical scale can make the data easier to read. 8. Data can be presented in a number of formats including line, histogram bar, report, area, and stacked area with the Change Graph Type pull down button. MODELING THE WORKLOAD Now that the I/O pattern has been identified, the application profile starts to come together. The next step is to model the workload with a tool that allows this exact I/O pattern to be reproduced at will. Microsoft s SQLIO Disk Subsystem Benchmark Tool or QLogic s QSQLIO can emulate almost any SQL I/O workload. See the appendix in this technology brief for more information about QSQLIO. Build a script using either tool based on the profiling results. This provides the first building block to a more comprehensive analysis. COMBINING WORKLOADS TO SEE THE BIGGER PICTURE Repeat the process of isolating applications and determining an I/O profile for each critical production application that must share a common SQL Server. Build an I/O model for each application following the process outlined in this paper. Then combine the scripts so the workloads can run concurrently as they would normally do in a production environment. Measure the effect of the combined I/O workload using the same data collector set used to profile each application. The combined I/O profile may bear some resemblance to the individual application profiles with a strong tendency towards larger block sequential patterns. A primary goal of SQL Server is to maintain data in sequential patterns as much as possible because data stored sequentially can be read from disk faster and with fewer I/O requests (Figure 6). Figure 5. Selecting Counters from the Log File READY FOR SQL SERVER ARCHITECTURAL PLANNING Now that the application I/O workload is more fully understood, designing a SQL Server platform that meets an organization s needs becomes much more straightforward. Heavy I/O throughput performs best with the large bandwidth data path provided by QLogic 2600 Series Adapters. For I/O-intense workloads that must complete a large number of transactions, the QLogic FabricCache Adapter delivers extreme IOPS performance with an SSD-based server side caching solution. Figure 6. SQL Blends Multiple I/O Workloads into More Sequential Patterns SN0430971-00 Rev. A 04/14 5
Profiling Application Workloads for Microsoft SQL Server SUMMARY To realize the highest possible performance from SQL Server assets, it is important to understand the I/O workload produced by each application. A three-step process is used to isolate, profile, and model each individual work stream. These building blocks can then be systematically reassembled to see how SQL Server combines the workloads and passes the I/O requests along to the storage resources. If the combined workload contains a large portion of small block transactions, more IOPS capability is indicated in the storage solution. The innovative QLogic 10000 Series Adapter with server-side caching increases the IOPS capability of any Fibre Channel storage solution, enabling unprecedented SQL Server performance levels for transactional applications. Large block I/O traffic coming from the SQL Server is indicative of applications calling for more throughput. QLogic 16Gb Gen 5 Fibre Channel solutions deliver the industry s highest level of throughput for SQL Server applications. If extreme SQL database performance is the goal, QLogic has the solution. For more information, visit www.qlogic.com. SN0430971-00 Rev. A 04/14 6
Profiling Application Workloads for Microsoft SQL Server APPENDIX OVERVIEW QSQLIO provides a safe and effective method to measure the expected performance gain when QLogic FabricCache 10000 Series Adapters are used with Microsoft SQL Server. This tool provides its own data targets, executable scripts, and reports. RESULTS Eighteen individual data samples are produced with each run. The results of each sample are imported into an Excel spreadsheet. The information is then aggregated and displayed in graph and text formats as shown in Figure 8. SQLIO SQLIO is a disk subsystem benchmark tool provided by Microsoft to determine the I/O capacity of a specific configuration. QLogic s QSQLIO is an extension of SQLIO designed to simplify the setup, expedite the execution of test runs, and provide a convenient reporting mechanism. REQUIREMENTS A server with a QLogic FabricCache 10000 Series Adapter installed. Two LUNS with 10GB of available disk space. Microsoft Excel 2010, or later, to view reports. EXECUTION QSQLIO runs three scripts consecutively to exercise the storage array. Each script produces a different I/O workload so they compete for storage resources just like real applications must do in a production SQL Server environment. Figure 8. A QSQLIO report shows the performance gained using FabricCache. QSQLIO is available without charge from QLogic at http://qsales. qlogic.com/qsalesdocument/solutionguide_10000series_ QSQLIODemonstratesSQLPerfGains.pdf. Figure 7. Competing workloads vie for storage resources. SN0430971-00 Rev. A 04/14 7
Profiling Application Workloads for Microsoft SQL Server Follow us: Share: Corporate Headquarters QLogic Corporation 26650 Aliso Viejo Parkway Aliso Viejo, CA 92656 949-389-6000 International Offices UK Ireland Germany France India Japan China Hong Kong Singapore Taiwan www.qlogic.com 2014 QLogic Corporation. Specifications are subject to change without notice. All rights reserved worldwide. QLogic, the QLogic logo, and FabricCache are trademarks or registered trademarks of QLogic Corporation. Excel, Microsoft, SQL Server, and Windows are registered trademarks of Microsoft Corporation. All other brand and product names are trademarks or registered trademarks of their respective owners. Information supplied by QLogic Corporation is believed to be accurate and reliable. QLogic Corporation assumes no responsibility for any errors in this document. QLogic Corporation reserves the right, without notice, to make changes in product design or specifications. SN0430971-00 Rev. A 04/14 8