Nimsoft Monitor sqlserver Guide v4.3 series
Legal Notices Copyright 2012, Nimsoft. All rights reserved. Warranty The material contained in this document is provided "as is," and is subject to being changed, without notice, in future editions. Further, to the maximum extent permitted by applicable law, Nimsoft Corporation disclaims all warranties, either express or implied, with regard to this manual and any information contained herein, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Nimsoft Corporation shall not be liable for errors or for incidental or consequential damages in connection with the furnishing, use, or performance of this document or of any information contained herein. Should Nimsoft Corporation and the user have a separate written agreement with warranty terms covering the material in this document that conflict with these terms, the warranty terms in the separate agreement shall control. Technology Licenses The hardware and/or software described in this document are furnished under a license and may be used or copied only in accordance with the terms of such license. No part of this manual may be reproduced in any form or by any means (including electronic storage and retrieval or translation into a foreign language) without prior agreement and written consent from Nimsoft Corporation as governed by United States and international copyright laws. Restricted Rights Legend If software is for use in the performance of a U.S. Government prime contract or subcontract, Software is delivered and licensed as "Commercial computer software" as defined in DFAR 252.227-7014 (June 1995), or as a "commercial item" as defined in FAR 2.101(a) or as "Restricted computer software" as defined in FAR 52.227-19 (June 1987) or any equivalent agency regulation or contract clause. Use, duplication or disclosure of Software is subject to Nimsoft Corporation s standard commercial license terms, and non-dod Departments and Agencies of the U.S. Government will receive no greater than Restricted Rights as defined in FAR 52.227-19(c)(1-2) (June 1987). U.S. Government users will receive no greater than Limited Rights as defined in FAR 52.227-14 (June 1987) or DFAR 252.227-7015 (b)(2) (November 1995), as applicable in any technical data. Trademarks Adobe, Acrobat, Acrobat Reader, and Acrobat Exchange are registered trademarks of Adobe Systems Incorporated. Intel and Pentium are U.S. registered trademarks of Intel Corporation. Java(TM) is a U.S. trademark of Sun Microsystems, Inc. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. Netscape(TM) is a U.S. trademark of Netscape Communications Corporation. Oracle is a U.S. registered trademark of Oracle Corporation, Redwood City, California. UNIX is a registered trademark of the Open Group.
Contact Nimsoft For your convenience, Nimsoft provides a single site where you can access information about Nimsoft products. At http://support.nimsoft.com/, you can access the following: Online and telephone contact information for technical assistance and customer services Information about user communities and forums Product and documentation downloads Nimsoft Support policies and guidelines Other helpful resources appropriate for your product Provide Feedback If you have comments or questions about Nimsoft product documentation, you can send a message to support@nimsoft.com.
Contents Chapter 1: sqlserver 4.3 7 Prerequisites and Requirements... 11 Configuration... 11 The Setup Tab... 11 Define Message... 13 The Connections Tab... 15 The Profiles Tab... 19 The Templates Tab... 24 The Status Tab... 25 Define New Checkpoint... 26 Edit Checkpoint... 36 Thresholds... 39 Schedules... 40 Checkpoints Metrics... 41 Single Counter Description... 42 Chapter 2: sqlserver Metrics 46 Appendix A: Appendix 49 V2 QoS Compatibility Mode... 49 Other QoS Issues... 49 Contents 5
Chapter 1: sqlserver 4.3 Chapter 1: sqlserver 4.3 7
Prerequisites and Requirements This description applies to probe version 4.3. The sqlserver probe will run selected SQL's to extract vital information about your SQL Servers. The information is presented to the database-administrator as alarms and/or as a report. The following information is extracted and monitored out of the box: Database uptime Database state/status Number of databases available The data-file size for each database Log-file size for each database File-group size for each database Table size for each database The buffer cache-hit ratio The log-file cache-hit ratio The number of active users The number of users currently logged onto the server Number of deadlocks per second Number of transactions per second Number of database page reads/writes per second Number of flush waits per second Number of latch requests per second Number of full scans (table or index) per seconds The usage (growth/shrinking) of the transaction logs Table/index fragmentation Memory resources CPU and I/O resources Locking and unlocking resources Free connections Backup status Long running queries (SQL Server 2005 only) Long running jobs This section contains the following topics: Documentation Changes (see page 10) Prerequisites and Requirements (see page 11) 8 sqlserver Guide
Prerequisites and Requirements Configuration (see page 11) Define New Checkpoint (see page 26) Edit Checkpoint (see page 36) Checkpoints Metrics (see page 41) Chapter 1: sqlserver 4.3 9
Prerequisites and Requirements Documentation Changes This table describes the version history for this document. Version Date What's New? 4.3 March 2012 For the two checkpoints logfile_usage and logfile_size, wherever the given database is in the recovery or restore mode, no metric values would be reported for given interval of execution. 4.2 August 2011 SOC Support Added. Added support for signed store procedure for standard and custom checkpoints queries. Probe can be run in standard as well as in sign mode. Added new checkpoints mirror_state, mirror_witness_server and mirror_sqlinstance for monitoring Database Mirroring state, status of witness server and status of sql server instance hosting mirroring database. Fixed an issue where sqlusr_cpu store procedure are not deleted after executing queries in case of SQL Server 2000. Modified qos_key value for user_cpu checkpoints for avoiding large amount of QoS. Fixed an issue related to subsystemid field where subsystemid shows wrong value. Fixed an issue where long_jobs checkpoint do not send any alarms. Fixed an issue where logic_fragment checkpoint gives Lock request time out error. Fixed Handle leak issue. Added support for configuring unit as minutes, hours and days in backup_status, transaction_backup_status and differential_backup_status checkpoints. Added a new error alarm message that will be send in case of checkpoint query execution failure. 4.1 April 2011 Added support for reading alarm tokens from configuration. Related Documentation Documentation for other versions of the sqlserver probe (../../sqlserver.html) Getting Started with Nimsoft Probes Nimsoft Probes Reference 10 sqlserver Guide
Prerequisites and Requirements Prerequisites and Requirements The sqlserver probe is supported on the following: Windows XP Windows 2003 Windows Vista Windows 7 Windows 2008 Configuration The initial configuration of the sqlserver probe is done by using the configuration tool (GUI), setting up one or more database instance profiles. The probe may be running locally on the database server, or it may be configured to run as a remote client. The probe is configured by double-clicking the line representing the probe in the Infrastructure Manager. This brings up the configuration tool for the probe. The Setup Tab Chapter 1: sqlserver 4.3 11
Configuration This property-sheet will set the general run-time parameters regarding the sqlserver probe. Field Description Setup tab Generate status only Alarm severity filter Status Auto-Update Log Size Log level QoS V2 Compatibility Message pool tab Instructs the probe to only generate status, not to issue an alarm when a threshold is breached. Select the Status tab to see the status for the different checkpoints. Sets a "filter" on which severity levels to be considered as alarms. The sqlserver probe is capable of checking many areas of the databases. Some events that are generated are vital and key to the performance and availability of the database. As a database administrator, you may want to pass the important events on to the operations centre or helpdesk, so the event can trigger pagers, email etc. The Alarm severity filter will consider the events matching the selected severity level and higher as alarms, and pass these on whenever the Generate status only option is unchecked. If you set this to be major, then only messages with severity major and upward are considered as alarms. A checkbox lets you activate/deactivate the Status Auto-Update functionality described below: The "Status Auto-Update" parameter (number of seconds) specifies the automatic refresh interval of the Status Window on the Status tab. Setting this parameter to a value higher than 0 and then selecting a profile on the Status tab, the status will be automatically updated every x seconds. The checkpoints of the selected profile will be displayed until selecting another profile. Note: This parameter is a dialog value, which means it is not saved in the configuration file, but in the machine running the dialog (same as for example widows size). Set the size of the probe s log file to which probe-internal log messages are written. The default size is 100 KB. Note: When this size is reached, the contents of the file are cleared. Sets the level of details written to the log file. Log as little as possible during normal operation, to minimize disk consumption. Select to inserted data in the QoS database. This tab contains a list of all alarm messages available. You select messages from this list when editing the properties for a checkpoint. Right-clicking in the list, allows you to add, edit, copy or delete messages. 12 sqlserver Guide
Configuration Define Message To define a message: 1. Select the Message pool tab, right-click in the list and select New. 2. Enter a name in the small dialog popping up. Note: Use the name of the checkpoint for which you create the alarm message as name. That makes it easier to find the alarm message when selecting an alarm message in the properties dialog for the checkpoint. Chapter 1: sqlserver 4.3 13
Configuration 3. Select the checkpoint for which you create the alarm message in the drop-down list, and all variables available for that check-point will be listed in the right part of the dialog. 4. Enter the message and select the variables you need and then click the OK button when finished. The new message appears in the message pool. 14 sqlserver Guide
Configuration The Connections Tab This list contains the various connections to instances that the sqlserver probe will monitor. You need to specify user name, password and service name you want to use to connect to the instance. The password information is encrypted and placed into the configuration file. A connection can be used by more than one profile. Chapter 1: sqlserver 4.3 15
Configuration The list contains one predefined connection that you may modify to your preferences. You may add, edit delete and copy connections. Select the connection and choose Edit from the right pop-up menu, the connection property window is displayed for editing. Field Description Authentication User ID and Password Server name Description Short description of the connection. Select type of authentication the probe should use to connect to the database server. Valid options are: SQL Server authentication Windows authentication Encryption Selecting this option, the communication between the probe and the database server will be encrypted. In case of SQL Server authentication, provide the user name and password for the SQL server. In case of Windows authentication, provide the user name and password for the domain. The name of the server to be used. 16 sqlserver Guide
Configuration Retry attempts Retry delay Timeout Test button Number of attempts the probe should try to repeat connection in case of failure. "0" means only the initial connection will be done. The time the probe will wait between two connection attempts. Defines how long the probe will wait for answer before it aborts the connection process. Clicking this button will test if the connection can be made. If success, it will return the instance name and its version number. If not, an error message will be returned. Note: In order to automatically append the domain name to the user credentials, select Windows Authentication option from the Authentication drop-down menu and then enable the Detect domain automatically checkbox. Chapter 1: sqlserver 4.3 17
Configuration This will auto-append the domain name to the User ID text field. In the Server name field, it is optional to add the sql port number. 18 sqlserver Guide
Configuration The Profiles Tab The list contains a sample profile that you may modify to your preferences. Every profile will run as a separate thread, and multiple profiles can be used to monitor one instance. This way the probe can be configured to deploy available resources the best way and allows independent monitoring of several instances simultaneously. Icons in the profile list Green icon in the profile line means the profile is active and running. Yellow icon means the profile is active but suspended (the suspend / resume button in the profile properties dialog allows stopping / starting profile monitoring dynamic, without deactivating /activating the probe). Black icon shows the profile is inactive. Chapter 1: sqlserver 4.3 19
Configuration You may add, edit, delete and copy profiles. The suspend /resume commands allows stopping/starting profile monitoring dynamic, without deactivating /activating the probe. Select the profile and click Edit from the right pop-up menu displays the profile property window for editing. The upper part of the window shows general profile properties and defaults. At the bottom, you will find a list of available checkpoints. Field Description Description Short description of the profile. 20 sqlserver Guide
Configuration Field Heartbeat Connection Check interval Clear message SQL timeout Message Profile timeout Message Timeout severity Suspended/Resumed (indicator) Alarm Source Profile checkpoints Alarm Source Description Defines the interval, at which all profile checkpoints schedules will be tested and trigger eventual checkpoint execution. This number should be common denominator to all used check interval values. The higher the value the lower is the profile overhead. Connection used in this profile. It has to be defined in "Connections" dialog before creating a profile. Default value for check interval in the profile. Will be used if nothing else is defined in the checkpoint and overwrites the default checkpoint list setting. Message name for clear alarm. Every checkpoint query runs asynchronously. In case the query reaches the SQL timeout, the checkpoint processing will be terminated and the next checkpoint will be started. Alarm is issued. Message name used for SQL timeout alarm. Defines the maximum processing time for all checkpoints in the profile. If this timeout is reached, the interval processing is finished and the probe waits for next heartbeat to evaluate any checkpoint schedules. Alarm message is issued. Message name used for profile timeout alarm. Severity for timeout messages. This indicator is green when the profile is activated. The indicator changes to yellow when the profile is suspended and to black when deactivated. This option lets you override the source name of the alarm. At the bottom, you will find a list of available checkpoints. When defining a new profile, all checkpoints available (listed under the Checkpoints tab) will be listed here. Select the checkpoints you want for your new profile. The global and default checkpoint settings will be used, unless you modify the settings locally for your profile (see Note below this table). Possibility to change the source for issued alarms. If not used, default is assumed (robot IP). Chapter 1: sqlserver 4.3 21
Configuration Note on checkpoint types Defining a profile, you can use two different strategies how to handle Checkpoints in a profile. You can decide to use checkpoint templates dynamic, which means that the checkpoints are defined globally (under the "Templates" tab) and represent the default settings. Every time you change the template value, it will reflect on all profiles using dynamic templates strategy. Note: If you want to have specific settings valid just for one profile, you right-click the checkpoint in the list and select Change to static. 22 sqlserver Guide
Configuration You can now double-click the checkpoint to modify the properties, and the settings will be valid for this profile only. Note: If attempting to modify a template checkpoint in the Profile dialog without changing it to static as described above, you will get a warning: Of course, there can be both "template" and "static" checkpoints mixed in one profile. If a checkpoint is managed as static, the checkpoint name will appear in the list with a blue color, and it will be marked as static in the column Type. Conclusion: Static To manage the properties for a checkpoint locally, "change" the checkpoint to static in your profile before modifying it. When modified, the new settings will be valid for this profile only. Template To edit the properties for a checkpoint template, double-click the checkpoint in the profile list or Templates tab. When modified, the new settings will be valid for all profiles, unless overruled by static settings in the profile. When deciding which checkpoints to activate/deactivate for a profile, see the section Single counter description for a description of the different checkpoints. Chapter 1: sqlserver 4.3 23
Configuration The Templates Tab The list contains the predefined set of checkpoints that you may use in your profiles. These checkpoints can be modified to your preferences. By default, most checkpoints are active with a reasonable default threshold value. The checkpoint properties may be used in a profile either dynamic, using the template values, or they can be added to the profile and managed static in the profile. Static To edit the properties for a checkpoint locally for a profile, right-click the profile in the Checkpoints list in the profile dialog and change it to static. Then double-click the checkpoint to modify it. When modified, the new settings will be valid for this profile only. Template To edit the properties for a checkpoint template, double-click the checkpoint in the profile list or Templates tab. When modified, the new settings will be valid for all profiles, unless overruled by static settings in the profile. See the section Single counter description for a description of the checkpoint properties. 24 sqlserver Guide
Configuration The Status Tab The status is presented in a hierarchal fashion with profile name nodes and one or more checkpoint nodes (only active checkpoints are considered here). The highest status is propagated. Select the checkpoint in the navigation tree (to your left) to bring up the corresponding events. Changing the individual values for checkpoints: The properties for an individual checkpoint object can also be modified here. Select a profile and a monitored checkpoint in the left pane. Then double-click an object in the right pane. If the object belongs to a template object, you will be warned that a modification will make the checkpoint static for the selected profile. See the section Editing a checkpoint (see page 36) for a description of the checkpoint properties. Chapter 1: sqlserver 4.3 25
Define New Checkpoint Define New Checkpoint Select the Templates tab, right-click in the checkpoint list and select Create new. 26 sqlserver Guide
Define New Checkpoint A small dialog pops up. Enter a name for the new checkpoint. The difference to regular edit checkpoint dialog is the additional Query tab. SQL query is the data source and therefore the central piece for every checkpoint. It is recommended to test the query first with SQL Server Management Studio (or other tool) before you start to create new checkpoint. The query has to return at least one numeric column, which will be used as checked value (all numeric formats are supported). If the query returns more than one row, the probe needs unique identification per row which will be used as part of suppression key and QoS definition. The row key can be created by concatenating several columns in the checkpoint definition. Additional columns can be retrieved to be used in generated messages. Note: Use rtrim and ltrim functions to remove leading and trailing blanks from string variables! Use explicit column names for manipulated values, avoid generated names, such as Col0 and Col1. Queries are stored in separate files, not in the configuration file itself. If you want to create new checkpoint, you either click the button New/Edit, copy & paste the query in the query field and entry the query file name where you want the query to be stored. The query file name can contain full path, otherwise the file will be stored in the probe work directory. Other possibility is to create the file first and then use the button Read first, then New/Edit for change/test of the query. Chapter 1: sqlserver 4.3 27
Define New Checkpoint 28 sqlserver Guide
Define New Checkpoint On selecting the checkbox Interval modus, the variable "$interval_value.i" is added to the Message variables textbox. The "interval_value" variable can be configured using the Raw Configure dialog box. The New/Edit button needs to be hit to be able to test the query. For data security reason the probe will force you to entry user-id and password every time you click New/Edit. Note 1: The connection information (i.e. User ID, Password, Server name etc) defined in this form must be same as defined in the SQL Server profile. Note 2: If there are multiple profiles using different connections, then the custom checkpoints must be converted to static, and in each static checkpoint the connection should match to the connection used in the SQL Server profile. Chapter 1: sqlserver 4.3 29
Define New Checkpoint You have to run a successful test every time you want to make any change to the query itself, also the query test is necessary for the probe to have all information about retrieved columns to be able to define all checkpoints variables. Next step is to define checkpoints variables by clicking the Edit button in the Message variables line. This opens new window with a list of all available columns and their possible usage. You have to define their use and format. Numeric columns can be used as: Value candidate for checkpoint checking with standard formatting Value size candidate for checking, formatted as file size (B, KB, MB, GB or TB) Value int candidate for checking, formatted as integer number Information no checking, standard formatting (if available 2 digits after comma) Character columns: Row key candidate for row identification, string formatting. Information string formatting. 30 sqlserver Guide
Define New Checkpoint Chapter 1: sqlserver 4.3 31
Define New Checkpoint The next step is to choose which variable to be used for checking and to set the comparison operator. If your query returns more then one line, you need to define unique row key. To see which variables you can use you need to type the sign "$" into the Row identification line, and a sublist with all suitable variables will appear. 32 sqlserver Guide
Define New Checkpoint Last setting on this page is the interval value checkbox. If you check interval value, then the probe will always subtract the variable value at the beginning of an interval from the value at the end of the interval and use the result for checking and QoS. If you do not check the interval value checkbox, the value of variable, as it returned from the query will be used for checking and QoS. Chapter 1: sqlserver 4.3 33
Define New Checkpoint Then you need to define the rest of the checkpoint processing settings, like: Interval value Sampling Scheduling Thresholds Messages QoS definitions You can also create hint text, but it is not mandatory. 34 sqlserver Guide
Define New Checkpoint Note: You can save the definition and restart the probe to start using this new checkpoint. Chapter 1: sqlserver 4.3 35
Edit Checkpoint Edit Checkpoint The checkpoint properties may be used in a profile either dynamic, using the template values, or they can be added to the profile and managed static in the profile. Static To edit the properties for a checkpoint locally for a profile, right-click the profile in the Checkpoints list in the profile dialog and change it to static. Then double-click the checkpoint to modify it. When modified, the new settings will be valid for this profile only. Template To edit the properties for a checkpoint template, double-click the checkpoint in the profile list or Templates tab. When modified, the new settings will be valid for all profiles, unless overruled by static settings in the profile. The properties for checkpoints are described below: 36 sqlserver Guide
Edit Checkpoint The upper part of the window contains general checkpoint settings. The lower part contains two lists with threshold and schedule settings. Field Description Active Condition Check interval Send Quality of Service QoS List Samples Description Short description of the purpose of the checkpoint. Check this option to activate the checkpoint. Information, describing how the threshold values are evaluated. Interval value used for this checkpoint. Every checkpoint can have a different check interval value. Default is taken from the profile definition, if not defined than from the default checkpoint list. Activates QoS values being send into the QoS database. If not available in a checkpoint, checkbox is disabled. Clicking this button opens the QoS list, showing the current QoS definitions (default is one definition per checkpoint). Right-clicking in the list lets you add new QoS definitions and copy, edit or delete an existing QoS definition. The Edit QoS dialog offers available metrics (numerical variables which could be reported as QoS) and available object variables (if any - to be added to the QoS source). The name of the QoS has to start with the checkpoint name. QoS can be activated/deactivated as usual. Note: Some of the checkpoints have no QoS possibilities from these checkpoints the QoS dialog cannot be activated. The probe will save the number of samples specified here and calculate an average value. This average value will be compared to the alarm threshold specified (see threshold description below the table). Setting "Samples = 1", no sampling is done. Setting "Samples = 3", the average of the 3 last samples will be used. Setting "Samples = 0" (in profile), number of samples will be taken from the template. If not set there, no sampling is done. Initially after start-up, the probe calculates the average value from the number of samples available. Example, Samples=3: In the first interval the first sample value is used In the second interval, the average of sample 1 and 2 will be used etc. Note: Many checkpoints calculate an "interval value", therefore in the first interval there is no value at all (no threshold checking). Chapter 1: sqlserver 4.3 37
Edit Checkpoint Field Use excludes Excludes list Scheduling Clear message Clear severity Thresholds/Schedules Description Checking this option gives you the possibility to add excludes to the "exclude list" to some of the checkpoints (as it does not make sense for all checkpoints). Using excludes, you can define objects that you do NOT want to monitor on the checkpoint. The excludes patterns found if clicking the Excludes list button (see below) will be used for the checkpoint. Clicking this button opens the Excludes list. This list shows if excludes are defined for the checkpoint. The excludes found in the list will be used for the checkpoint if the Use excludes option (see above) is checked. Right-clicking in the list lets you add new excludes or edit, copy or delete existing excludes. When adding (or editing) an exclude pattern, a "match expression" dialog is opened, letting you edit or define the exclude pattern. Excludes are defined using regular expression patterns. A test button lets you test the exclude pattern defined. This test is possible only for running active profiles and checkpoints. The test uses the status list (on the status tab) as input: Note that if there already are active excludes, the excluded objects are excluded from the status list BEFORE the test. When clicking the test button, an exclude test list pops up, showing the result of the test: Red text lines show the objects which would be excluded using the tested pattern. The "object thresholds" are functioning as an "include list" - it means, if there are special thresholds defined for a special object, this object will always stay in, even if the exclude pattern would eliminate it normally. This is considered also in the test function. This field lets you select how to use the schedules settings, if any (see description below the table). Rules Selecting Rules means to run according the rules described in the Schedules settings. Exceptions Selecting Exceptions means to run except the rules described in the Schedules settings. Message name used for clear alarm message. Severity, used for message issued in normal state. See description below the table. 38 sqlserver Guide
Edit Checkpoint Thresholds The list contains the predefined set of monitoring profiles that you may use in your profiles as well as you can modify them as per your preferences. By default, most profiles are active with a reasonable default threshold value. The threshold values may be defined by modifying checkpoints in the respective profile. Every checkpoint has to have at least one threshold, but there can be additional thresholds defined. The threshold identification consists of an object name (if applicable), like tablespace name, userid. and a threshold ID, numbered from 0. Threshold values should be descending or ascending, depending on condition used in a checkpoint, starting with the highest severity threshold condition. Field Threshold object name Threshold value Current value Severity Message Message text Description Monitoring object name, if applicable or default. Some special checkpoints have a second threshold called count" (e.g. "locked_users"). Value used for threshold evaluation If invoked from the status report, it contains the last measured value. Alarm severity. Name of message used for threshold alarm. Text of the message, containing variables, which will be replaced in run time. If the message text is changed from a profile list, you will be forced to create new message. Chapter 1: sqlserver 4.3 39
Edit Checkpoint Variables List of variables, available in the checkpoint. Schedules If the schedules list is empty, the checkpoint will be executed in interval matter, 24 hours a day. You can define a number of schedules per checkpoint, which can define additional rules to the check interval or exceptions of it. Rules and exceptions cannot be mixed in one checkpoint. In principle, a schedule is a definition of an execution period (or execution break if exceptions used) with specified days, time from/to and date from/to values. Additionally, if only date from and time from is defined, first execution can be defined. Run once will cause the checkpoint run only once a day in the defined period (unlike multiple times if interval used). 40 sqlserver Guide
Checkpoints Metrics Checkpoints Metrics The following types of metrics are used: Count Refers to absolute number of events in the interval. In the first interval, counts are not checked because their interval value cannot be calculated. If there is a "total" value in the message, it means "since the start of the server". Count/sec Refers to absolute number of events in the interval per second. It is calculated as delta between count at the beginning of the interval and at the end, divided by length of the interval in seconds. In the first interval, counts are not checked because their interval value cannot be calculated. If there is a "total" value in the message, it means "since the start of the server". Gauge Refers to absolute number, describing the actual state of the system. If it describes size, it will be in KB or MB, depending on actual size. Ratio Refers to calculated percentage, using interval counts. In the first interval, it is calculated from total counts (as the interval count cannot be calculated). Status Refers to absolute value like ONLINE etc. Average Refers to calculated using interval counts. In the starting interval it is calculated from absolute counts. Chapter 1: sqlserver 4.3 41
Checkpoints Metrics Single Counter Description check_dbalive status This checkpoint tries to connect to a server. In case the connection cannot be established, an alert is generated. This checkpoint cannot be deactivated! In case of Alert, check the server connectivity or if the server itself is running. server_startup gauge This checkpoint monitors number of days the database server is up and running. database_count gauge This checkpoint monitors the change of the number of databases on the server. If the number increases or decreases, an alert is generated. backup_status status This checkpoint monitors number of days, since last database backup has been taken. buf_cachehit_ratio ratio This checkpoint monitors the percentage of pages found in the buffer cache without having to read from the disk. The ratio is the interval number of cache hits divided by the interval number of cache look-ups. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to the SQL Server. long_jobs count This checkpoint will find all jobs running longer then defined threshold in seconds. long_queries count This checkpoint will find all queries running longer then defined threshold in seconds. log_cachehit_ratio ratio This checkpoint monitors the percentage of pages found in the log cache without having to read from disk. The ratio is the interval number of cache hits divided by the interval number of cache look-ups. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the log cache hit ratio by increasing the amount of memory available to the SQL Server. active_users gauge This checkpoint monitors the number of users having an active transaction at the moment of snapshot. login_count gauge This checkpoint monitors the number of users having an open connection to the server at the moment of snapshot. 42 sqlserver Guide
Checkpoints Metrics deadlocks count/sec This checkpoint monitors the number of deadlocks per second in an interval. As deadlocks can cause severe performance penalty, their number should be close to 0. Use trace 1204 or 1205 to identify the deadlocked resources and involved applications, also sp_lock procedure delivers useful information about locking. lock_timeouts count/sec This checkpoint monitors number of lock-timeouts per second in interval with precision of 0.001sec. lock_requests count/sec This checkpoint monitors number of lock requests per second in interval. lock_waits count/sec This checkpoint monitors number of lock waits per second in interval. transactions count/sec This checkpoint monitors number of transactions per second in interval. page_reads count/sec This checkpoint monitors the number of physical database page-reads that are issued per second in an interval. Since physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, more efficient queries, or by changing the database design. page_writes count/sec This checkpoint monitors the number of database page-writes that are issued per second in an interval. Page-writes are generally expensive. Reducing page-write activity is important for optimal tuning. One way to do this is to ensure that you do not run out of free buffers in the free buffer pool. If you do, page-writes will occur while waiting for an unused cache buffer to flush. log_flush_waits count/sec This checkpoint monitors the number of commits per second waiting on the log flush in interval. When commits are waiting for log flushes, the log device is usually the bottleneck. latch_waits count/sec This checkpoint monitors the number of latch requests in interval that could not be granted immediately and had to wait before being granted. If this number is high the system is generally experiencing a low cache hit ratio and is being forced to perform physical I/Os. Add more memory or increase bandwidth of your system. full_scans count/sec This checkpoint monitors the number of full table or index scans per second in interval. If this value is high (2-10) then you need to analyse your queries. log_file_growths count Chapter 1: sqlserver 4.3 43
Checkpoints Metrics This checkpoint monitors the number of times in an interval the transaction log for the database has been expanded. If this happens more often you should consider resizing your log files. log_file_shrinks count This checkpoint monitors the number of times in an interval the transaction log for the database has been decreased. If this happens more often you should consider resizing your log files. scan_density ratio This checkpoint monitors the ratio between the best number of extents to the actual number of extents. It should be near 100%, lower number indicates external fragmentation - the object should be reorganized. logic_fragment ratio This checkpoint monitors the number of cluster index pages that are out of order. Any number higher than 10% indicates external fragmentation. The index should be rebuilt. Note: Non-cluster indexes are not monitored, because a table can have only one clustering sequence! database_size gauge This checkpoint monitors the space used by the respective database files, data and log files together (in KB/MB). An alert will be issued whenever a particular size is exceeded. Note: This checkpoint should be used as information additional to the checkpoints "free_space" and "logfile_usage". database_status status This checkpoint monitors the database status value. The status value is actually a combination of some configuration options and a status, therefore there can be multiple values set at the same time (like "torn page detection" and "loading"). free_space ratio This checkpoint monitors the amount of free space in database data files in percent. If there is at least one file with "unlimited" growth, the space in the whole database is considered as 100% free. If you are using file groups, this could be misleading; therefore you should deactivate this checkpoint and use only the "fg_free_space" checkpoint. fg_free_space ratio This checkpoint monitors the amount of free space in database file groups in percent. If there is at least one file with "unlimited" growth in a file group, the space in this file group is considered as 100% free. logfile_usage ratio 44 sqlserver Guide
Checkpoints Metrics This checkpoint monitors the amount of free space in transaction log in percent. If there is at least one transaction log file with "unlimited" growth in a database, the space in its transaction log is considered as 100% free. Note: For this checkpoint wherever the given database is in the recovery or restore mode, no metric values would be reported for given interval of execution. logfile_size gauge This checkpoint monitors the size of transaction log in MB. If there is at least one transaction log file with "unlimited" growth in a database. Note: For this checkpoint wherever the given database is in the recovery or restore mode, no metric values would be reported for given interval of execution. table_space gauge This checkpoint monitors amount of space (in KB/MB) reserved for a particular table in a database. This checkpoint can be used to control the size of fast growing tables. lock_memory gauge This checkpoint monitors amount of allocated lock memory in Kb. locks_used ratio This checkpoint monitors % of used lock and lock owner blocks used. connection_memory gauge This checkpoint monitors amount of memory in Kb used to maintain connections to SQL Server. optimizer_memory gauge This checkpoint monitors amount of memory in Kb used for SQL optimizer. sqlcache_memory gauge This checkpoint monitors amount of memory in Kb used for SQL statement cache. workspace_memory gauge This checkpoint monitors amount of memory in Kb used for executing processes such as hash, sort, bulk copy, and index creation operations. average_waittime average This checkpoint monitors average lock wait time in ms in interval. High wait time will cause performance degradation, consider increase number of locks available or enlarge available computer memory. total_memory gauge This checkpoint monitors total amount of dynamic memory (in kilobytes) the server is using currently. server_cpu ratio This checkpoint monitors % of CPU usage by SQL Server instance in interval. Chapter 1: sqlserver 4.3 45
Checkpoints Metrics server_io ratio This checkpoint monitors % of I/O busy for SQL Server instance in interval. free_connections ratio This checkpoint monitors % free connections to SQL Server instance, specified by parameter 'user connections' (max. 32676). user_cpu ratio This checkpoint monitors % of CPU usage by user in interval. Note: The checkpoint user_cpu reports $spid.$hostid in the QoS target. This will result in the creation of new data series for each new $spid or $hostid. Hence, it is recommended that you disable the QoS for this checkpoint. locked_users count This checkpoint monitors the number of users suspended by locks at the moment of snapshot. Also, the blocked user and its current SQL is displayed. Transaction_backup_status status This checkpoint will send QoS and Alarms for those databases that are running in full or bulk-logged recovery mode. Note: This checkpoint will not send QoS and Alarms for those databases that are running in simple recovery mode. Chapter 2: sqlserver Metrics The following table describes the checkpoint metrics that can be configured using the sqlserver probe. Monitor Name Units Description QOS_SQLSERVER_active_connection_ratio Percent Active Connection Ratio QOS_SQLSERVER_active_users Count Active Users QOS_SQLSERVER_alloc_space Percent Allocated Space QOS_SQLSERVER_av_fragmentation Percent Average Fragmentation QOS_SQLSERVER_average_waittime ms Average wait time QOS_SQLSERVER_backup_status Count Backup Status QOS_SQLSERVER_blocked_users Count Blocked Users QOS_SQLSERVER_buf_cachehit_ratio Percent Cachehit Ratio QOS_SQLSERVER_check_dbalive State Availability QOS_SQLSERVER_connection_memory Kilobytes Connection memory 46 sqlserver Guide
Checkpoints Metrics QOS_SQLSERVER_database_count Count Database count QOS_SQLSERVER_database_size Megabyte Database size QOS_SQLSERVER_database_state State Database state QOS_SQLSERVER_deadlocks Count/s Deadlocks QOS_SQLSERVER_differential_backup_status Days Differential Backup Status QOS_SQLSERVER_fg_free_space Percent Fg space QOS_SQLSERVER_free_connections Percent Free Connections QOS_SQLSERVER_free_space Percent Free Space QOS_SQLSERVER_full_scans Count/s Full Scans QOS_SQLSERVER_latch_waits Reqs/s Latch Waits QOS_SQLSERVER_lock_memory Kilobyte Lock Memory QOS_SQLSERVER_lock_requests Reqs/s Lock Requests QOS_SQLSERVER_lock_timeouts Count/s Lock Timeouts QOS_SQLSERVER_lock_waits Count/s Lock Waits QOS_SQLSERVER_locked_users Count Locked Users QOS_SQLSERVER_locks_used Percent Locks Used QOS_SQLSERVER_log_cachehit_ratio Percent Log Cachehit Ratio QOS_SQLSERVER_log_file_growths Count Log File Growths QOS_SQLSERVER_log_file_shrinks Count Log File Shrinks QOS_SQLSERVER_log_flush_waits Count/s Log Flush Waits OQS_SQLSERVER_logfile_size Count Log File Size QOS_SQLSERVER_logfile_usage Percent Log File Usage QOS_SQLSERVER_logic_fragment Percent Logic Fragment QOS_SQLSERVER_login_count Count Login Count QOS_SQLSERVER_long_queries Seconds Long Queries QOS_SQLSERVER_mirror_sqlinstance State Mirror Sqlinstance QOS_SQLSERVER_mirror_state State Mirror State QOS_SQLSERVER_mirror_witness_server State Mirror Witness Server QOS_SQLSERVER_optimizer_memory Kilobyte Optimizer Memory QOS_SQLSERVER_page_reads Count/s Page Reads QOS_SQLSERVER_page_writes Count/s Page Writes Chapter 2: sqlserver Metrics 47
Checkpoints Metrics QOS_SQLSERVER_scan_density Percent Scan Density QOS_SQLSERVER_server_cpu Percent Server CPU QOS_SQLSERVER_server_io Percent IO QOS_SQLSERVER_server_startup Count This checkpoint monitors number of days the database server is up and running. QOS_SQLSERVER_sqlcache_memory Kilobytes Cache Memory QOS_SQLSERVER_table_space Kilobytes Table Space QOS_SQLSERVER_total_memory Kilobytes Total Memory QOS_SQLSERVER_transaction_backup_status Count/s Transaction Backup Status QOS_SQLSERVER_transactions Count/s Transactions QOS_SQLSERVER_user_cpu Percent User CPU QOS_SQLSERVER_workspace_memory Percent Workspace Memory 48 sqlserver Guide
Appendix A: Appendix This section contains the following topics: V2 QoS Compatibility Mode (see page 49) Other QoS Issues (see page 49) V2 QoS Compatibility Mode In V3, the has_max flag has been added to following checkpoints: alloc_space fg_free_space free_connections free_space locks_used logic_fragment scan_density server_cpu server_io user_cpu workspace_memory If you created QoS definition for these checkpoints under V2 sqlserver probe, enable the QoS V2 compatibility checkbox in General tab to ensure all data is inserted correctly in QoS database. To use the V3 format (has_max), delete V2-generated QoS definitions for these checkpoints (all the data for these checkpoints will be deleted). Other QoS Issues For user_cpu checkpoint the qos target definition is corrected in V3. The V2 target (loginname + program name) was not 100% unique, which can cause data inconsistencies in some cases. The target in V3 is changed to 'spid + hostid'. In case you need old target, change QoS definition for this checkpoint and replace current object definitions by $loginname.$programname. Note: The target definition for the checkpoint "fg_free_space" was incorrect in V2 - it was only 'db_name' instead of 'db_name + fg_name'. This is corrected in V3. Appendix A: Appendix 49