Foglight Cartridge for SQLServer TM

Size: px
Start display at page:

Download "Foglight Cartridge for SQLServer TM"

Transcription

1 Foglight Cartridge for SQLServer TM User Guide Version 3.2.3

2 Copyright Quest Software, Inc All rights reserved. This document contains proprietary information, which is protected by copyright. The software described in this document is furnished under a software license or nondisclosure agreement. This software may be used or copied only in accordance with the terms of the applicable agreement. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording for any purpose other than the purchaser's personal use without the written permission of Quest Software, Inc. Warranty The information contained in this document is subject to change without notice. Quest Software makes no warranty of any kind with respect to this information. QUEST SOFTWARE SPECIFICALLY DISCLAIMS THE IMPLIED WARRANTY OF THE MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Quest Software shall not be liable for any direct, indirect, incidental, consequential, or other damage alleged in connection with the furnishing or use of this information. Trademarks Foglight is a registered trademark of Quest Software, Inc. Foglight software includes 1996 Expect software (freeware). Expect and its documentation are copyrights and trademarks of Don Libes, Associates. This product includes software developed by the OpenSSL Project Copyright (c) The OpenSSL Project. All rights reserved. Portions of the code in this package were distributed by Carnegie Mellon University. 1989, 1991, 1992, Carnegie Mellon University. All rights reserved. Portions of this product were obtained from the ucd-snmp package written by Wes Hardaker at the University of California, Davis copyright 1996, Copyright , Networks Associates Technology, Inc. All rights reserved. Portions of this code are also copyright , Cambridge Broadband Ltd. All rights reserved. Portions of this software are derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm. Foglight software includes Info-ZIP. Info-ZIP is provided "as is" without warranty of any kind, express or implied. In no event shall Info-ZIP or it's contributors be held liable for any direct, indirect, incidental, special or consequential damages arising out of the use of or inability to use this software. This product includes software developed by The Apache Software Foundation ( All other trademarks and registered trademarks used in this document are property of their respective owners. World Headquarters 8001 Irvine Center Drive Irvine, CA info@quest.com U.S. and Canada: Please refer to our Web site for regional and international office information. Foglight Cartridge for SQLServer TM User Guide Updated - April 2005 Software Version Foglight Version - 4.2

3 Contents CONTENTS SQLSERVER AGENT ABOUT QUEST SOFTWARE, INC Contacting Customer Support Contacting Quest Software ABOUT THE SQLSERVER AGENT USING SQLSERVER ASPS Setting the Connection Details Setting the Data Management Parameters Setting the Service Checks Setting the Collection Parameters SELECTING A COLLECTION MODEL MANAGING COLLECTION DATA DEBUGGING AND TROUBLESHOOTING CLUSTER HANDLING COLLECTION ERROR RETURN CODES EDITING THE ERRLOG SEARCH STRINGS LIST EDITING THE ERRLOG EXCLUSION LIST EXCLUDING DATABASES FROM BEING MONITORED EDITING THE TABLE ID LIST EXCLUDING JOBS FROM THE JOB TABLE OVERRIDING DEFAULT SAMPLE AND PURGING VALUES SQLSERVER AGENT TABLES AMPerfInfo Table Availability Table Backup Table Blocking Table (renamed Blocking_323) BMPerfInfo Table Cache Table CollectionStatus Table Connection Table Database Table (renamed Database_323) DBMonCount Table DBPerfInfo Table i

4 Foglight SQLServer Cartridge ii DBStatus Table ErrorLog Table (renamed ErrorLog_323) File Table (renamed File_323) Filegroup Table (renamed Filegroup_323) General Table (renamed General_323) Jobs Table (renamed Jobs_323) Lock Table LockRate Table LogShipping Table Memory Table MMPerfInfo Table Replication Table Response Table Service Table SQLConfig Table Statistic Table System Table TableSize Table TopUsers Table (renamed TopUsers_323) Trace Table INVESTIGATIONS Investigating Data Investigating Locks and Blockers Investigating Memory Usage Investigating Performance Investigating Physical Space Investigating Processes and Jobs VIEWS SQLServer Buffer Cache Free Pages Graph SQLServer Cache Hit Ratios Graph SQLServer Compiler Statistics Graph SQLServer Connections Activity Graph SQLServer Connections License Summary Graph SQLServer Connections Login Rate Graph SQLServer Connections Maximum Blocking Time Graph. 66 SQLServer Connections Type Graph SQLServer Database Space Overview Graph SQLServer Graph Database User SQLServer File Space Overview Graph SQLServer Filegroup Space Overview Graph

5 Contents SQLServer Lock Rate Graph SQLServer Memory Areas Graph SQLServer Rates Statistics Graph SQLServer Response Time Graph SQLServer Table Row Count Graph SQLServer Table Size Graph RULES Changing Rule Threshold Values Agent Status Rule Agent Mail Status Rule Buffer Cache Free Pages Rule Buffer Cache Hit Ratio Rule Cluster Failover Rule Cluster Node Status Rule SQLServer Rule Collection Error SQLServer Rule Collection Timeout Database Status Rule Days Since Last Backup Rule Deadlock Rate Rule DTC Status Rule Error Log Rule FTS Status Rule Last Run Outcome Rule License Limit Rule Log Shipping Failures Rule Login Rate Rule Maximum Block Time MS Rule Max Server Memory Rule Monitored Databases Rule OLAP Status Rule Potential Growth MB Rule Procedure Cache Hit Ratio Rule Recompile Rate Rule Replication Agent Failed Rule Replication Agents Retrying Rule Replication Conflicts Rule Response Time Rule Server Mail Status Rule SQLServer Running Rule SQL Long Running Job Rule iii

6 Foglight SQLServer Cartridge Trace Poor Performer Rule Worker Threads Rule iv

7 SQLServer Agent This User Guide is a printable version of the Foglight online help. In instances where there is conflicting information, the online help supercedes content in this guide. About Quest Software, Inc. Quest Software, Inc. delivers innovative products that help organizations get more performance and productivity from their applications, databases and infrastructure. Through a deep expertise in IT operations and a continued focus on what works best, Quest helps more than 18,000 customers worldwide meet higher expectations for enterprise IT. Quest Software, headquartered in Irvine, Calif., can be found in offices around the globe and at Contacting Customer Support Quest Software s world-class support team is dedicated to ensuring successful product installation and use for all Quest Software solutions. SupportLink: support@quest.com You can use SupportLink to do the following: Create, update, or view support requests Search the knowledge base Access FAQs Download patches Contacting Quest Software Phone: (United States and Canada) info@quest.com Mail: Quest Software, Inc. World Headquarters 8001 Irvine Center Drive Irvine, CA USA Web site: Please refer to our Web site for regional and international office information.

8 Foglight Cartridge for SQLServer About the SQLServer Agent This documentation is for the Foglight SQLServer agent release v Use the SQLServer agent to monitor the performance of your SQLServer. The SQLServer Agent uses a default set of rules to trigger alerts when specific conditions on your SQLServer occur. You can modify these rules to narrow or broaden the conditions that trigger an alert. The SQLServer agent monitors SQL Servers running in a Windows environment and a clustered Windows environment. To monitor SQL Servers in a clustered Windows environment you must follow specific installation instructions. For more information, see SQLServer Agent Cluster Handling. ASP and Procedures Using SQLServer ASPs Managing SQLServer Agent Collection Data Selecting a SQLServer Agent Collection Model Overriding default sampling and purging values Excluding databases from being monitored Excluding jobs from the Job table Editing the Errlog Search Strings list Editing the Errlog Exclusion list Editing the Table ID list SQLServer Agent Debugging and Troubleshooting SQLServer Agent Cluster Handling SQLServer Agent Collection Error Return Codes Investigating Views and Tables Investigating Data Investigating Processes and Jobs Investigating Memory Usage Investigating Physical Space Investigating Performance Rules Buffer Cache Free Pages Rule Buffer Cache Hit Ratio Rule Cluster Failover Rule Cluster Node Status Rule SQLServer Rule Collection Error SQLServer Rule Collection Timeout Days Since Last Backup Rule Database Status Rule Deadlock Rate Rule Error Log Rule 2

9 SQLServer Last Run Outcome Rule License Limit Rule Login Rate Rule Log Shipping Failures Rule Maximum Block Time MS Rule Max Server Memory Rule Monitored Databases Rule Procedure Cache Hit Ratio Rule Recompile Rate Rule Replication Agent Failed Rule Replication Agents Retrying Rule Replication Conflicts Rule Response Time Rule SQL Long Running Job Rule SQLServer Running Rule SQL Service Checks Trace Poor Performer Rule Worker Threads Rule Tables SQLServer Agent Tables Using SQLServer ASPs The SQLServer agent is shipped with startup parameters that dictate how the agent will behave. You can change these parameters to suit your particular system requirements. The parameters are grouped into the four topics below.. Setting the Connection Details Use the options on the Connection Details tab to set the SQLServer agent connection parameters. You can specify the following: Instance Name Use Windows Authentication SQLServer user name and password Work Database Name Work Database Location Cluster Group Name SQL Connection Timeout To set the connection details 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 3

10 Foglight Cartridge for SQLServer 2. Click the Connection Details tab. 3. Complete the fields: a) In the Instance Name field, enter the fully qualified name of the instance that the SQLServer agent is to monitor. The format of this entry is the same as that used by the Microsoft SQLServer tools, that is HOST[\INSTANCE]. HOST is the name of the machine on which the SQLServer instance resides, and is mandatory. INSTANCE is optional, and if not specified the agent monitors the default SQLServer instance. Examples for local and remote monitored instances are shown below. In these examples MYBOX is the local machine where the agent is running, on which an instance named INST2 resides, and OTHERBOX is a remote machine on which an instance named INST4 resides. To monitor the default instance on the local MYBOX, enter:mybox To monitor instance INST2 on MYBOX, enter MYBOX\INST2 To monitor the default instance on the remote OTHERBOX, enter OTHERBOX To monitor instance INST4 on OTHERBOX, enter OTHERBOX\INST4 4

11 SQLServer Note: You can enter just a period (".") instead of the host name to specify the default instance on the local machine Note: When working in a clustered SQL Server environment, the Instance Name is the SQL Server cluster's instance name. See Cluster Handling for more information. b) Select Use Windows Authentication to log in to the SQLServer instance using the same Windows credentials as used by the Foglight Host Service. When using Windows authentication the following two fields (SQLServer User Name and Password) are ignored. Note: The installation default for the credentials as used by the Foglight Host Service is the BUILTIN\administrator account. Otherwise, clear the Use Windows Authentication flag and then enter an existing SQLServer User Name and SQLServer Password into the following two fields. Note: This SQLServer user will typically be a DBA user. This user requires: create/read/write access for the work and temp databases read access to master, msdb and all monitored databases execute permission for xp_cmdshell.. c) Enter the SQLServer User Name if not using Windows authentication. d) Enter the SQLServer Password if not using Windows authentication. e) In the Work Database Name field, type the name of the database within the instance that the SQLServer agent is to use as its work area. Note: The database name must not contain any spaces. Note: If you have Quest Software's I/Watch product installed, you should not use the same database due to compatibility issues. f) In the Work Database Data Location field, type the directory path name to be used to store the work database data files. Leave the field blank to use the location of the SQLServer master database. g) In the Work Database Log Location field, type the directory path name to be used to store the work database log files.leave the field blank to use the Work Database Data Location specified in the previous field. h) In the Cluster Group Name field, type the name of the cluster resource group. Leave the field blank if you are not using clusters. Note: The Cluster Group Name is the cluster resource group name as seen in the Windows Cluster Administrator for this cluster. This 5

12 Foglight Cartridge for SQLServer is the group name for the group to which the target host belongs, and hence can be seen in the Windows Cluster Administrator under the host node "Active Groups".See Cluster Handling for more details. i) In the SQL Connection Timeout (s) field, type the timeout value between 30 and 3600 seconds. This is the number of seconds that you are prepared to wait for a connection to the instance. The supplied value is Click OK Setting the Data Management Parameters Use the options on the Data Management tab to set the SQLServer agent data managment parameters. Here you can : set the sampling frequency for collections set the period for purging collected data from tables in the Foglight database The agent collects data for each table periodically. This period is called the Sample Frequency. The Sample Frequency value might more accurately be described as a sample interval, that is, the elapsed period between collections. The Foglight server periodically (typically overnight) purges old table data. Data in a table that is older than the Purge Frequency will be purged. The Purge Frequency value might more accurately be described as a retention period, that is, the elapsed period for which data is retained before being purged. Modifying these parameters will affect the amount of data collected and retained in the Foglight tables, and will also have load implications for your SQLServer instance. For more information see Managing Data. The sample and purge values in this dialog are 'agent wide' values, that is, they will apply to all collections that do not have overrides. You can set override values for specific collections using the SQL Collection Override list. To set the data management parameters 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 6

13 SQLServer 2. Click the Data Management tab. 3. Complete the fields: a) In the Global Sample Frequency field, type how often you want the SQLServer agent to collect data (in seconds). This must be between 30 and 3600 seconds. b) In the Global Purge Period field, type the purging period (in days). This is how frequently you want data purged from the Foglight tables. Type 9999.if you do not want data purged. c) Click the down-arrow and choose an override table name from the list. The agent comes shipped with a number of collection models (override tables). These are based on various hypothetical site requirements and provide you with different reporting and alerting scenarios. They can and should be modified to suit your specific needs. For more information, see Selecting a SQLServer Agent Collection Model. Alternatively you can create a new override table. In this case simply type the new table name. Then a new table will be provided for you to edit as required. This new table will be based on the contents of the default table SQL_Collection_Overrides_Standard. d) Click the Edit button located next to the Overrides field to edit the override table. This allows you to set sampling or purging values for specific collections. See Overriding default sampling and purging values. 4. Click OK. 7

14 Foglight Cartridge for SQLServer Setting the Service Checks Use the options on the Service Checks tab to check for the availability of the following services. Server Mail Agent Agent Mail DTC FTS OLAP To set the service checks 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 2. Click the Service Checks tab. 3. Enable the services you want checked. The status of checked services is displayed in the Service table. By default, the Check Server Mail option is not selected. If any of the services are not installed, the SQLServer agent will generate errors to warn you of this. To stop these errors being generated, clear the services that are not installed. 4. Click OK. 8

15 SQLServer Setting the Collection Parameters Use the Collection Parameters tab in the SQLServer ASP dialog to set various collection-specific parameters. To set the collection parameters 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 2. Click the Collection Parameters tab. 3. Complete the fields: a) Select Log agent to debug file if you want additional debugging information about the Foglight SQLServer agent logged. Information is logged to a file called <agentname>.log. For more information see Debugging and Troubleshooting. 9

16 Foglight Cartridge for SQLServer b) In the SQL Statement Timeout (s) field, type the timeout value between 30 and 3600 seconds. This is the number of seconds that you are prepared to wait for any one collection to complete. The default value is 120. c) In SQL Command To Be Timed field, type the text of the SQL command that you want timed. The time taken by the database to complete this SQL command is displayed in the Response table. Note You should not enter an sql statement here that is going to timeout (see SQL Statement Timeout above), otherwise the SQL Response Time rule will not fire as intended. d) In the Maximum Trace Rows field, type the maximum number of rows to be displayed in the Trace collection table. Reducing the value in this field limits the amount of data collected into the Trace collection table. Note The Trace collection is disabled by default. For information about enabling a collection, see Overriding default sampling and purging values. e) Set the following four parameters to identify poorly performing (PP) SQL statements reported by the Trace collection. For an SQL statement to be reported as a poor performer, all four parameters must be exceeded. In the Trace PP Duration (ms) field, type the number of milliseconds. In the Trace PP CPU Usage (ms) field, type the number of milliseconds of CPU usage. In the Trace PP Logical Disk Reads field, type the number of reads. In the Trace PP Physical Disk Writes field, type the number of writes. f) Select Trace system objects if you want the Trace collection to include all system objects in the collection. By default these rows are filtered out. g) Select Trace Quest objects if you want the Trace collection to include all Quest Software's Foglight and Spotlight objects in the sample. By default these rows are filtered out. h) In the Maximum Blocking Rows field, type the maximum number of rows to be displayed in the Blocking collection table. Reducing the value in this field limits the amount of data collected into the Blocking collection table. i) Click the Edit button located next to the Errlog Search Strings field to add, modify or remove error strings from the search list. This list specifies the SQLServer error log entries that will be collected by this agent. A comprehensive list of search strings are supplied by default. 10

17 SQLServer j) Click the Edit button located next to the Errlog Exclusion List to add or remove errorlog search strings from this list. This list specifies the errorlog search strings you want discarded from those collected (see above). k) Click the Edit button located next to the Database Exclusion List to add or remove databases from this list. Use this to limit the number of databases monitored by the SQLServer agent. By default, the following databases are excluded: tempdb, pubs, Northwind, model. The database exclusion list affects the Database, Filegroup, File and Backup collection tables. If the Monitored Databases Rule raises an alert, you may use the Database Exclusion List to reduce the number of databases being monitored. l) Click the Edit button located next to the Table ID List to add or remove table IDs. This list specifies which tables are to be monitored by the TableSize collection table. By default this list is empty. m) Click the Edit button located next to the Job Exclusion List to add or remove job names. This list specifies the job names to be excluded from the Jobs collection table. By default this list is empty. 4. Click OK. Note You can create your own personalized Secondary ASP lists to suit your requirements. Simply type the name you want to call your list in the relevant secondary ASP field and click Edit. A new list based on the default list will open, ready for you to edit as required. Selecting a SQLServer Agent Collection Model This section describes four collection models shipped with the Foglight SQLServer agent. These models are based on various hypothetical site requirements, and are supplied as override tables (see Overriding default sampling and purging values). These frequencies can and should be modified as needed to meet the specific alerting and reporting requirements for your particular site, balancing the need for timely alerts with system and Foglight server loads. SQLServer agent Collection Models There are four collection models shipped with the SQLServer agent and available for selection. These are: 11

18 Foglight Cartridge for SQLServer Standard: Standard values balancing all general requirements. This is the default model used. DBDetail: Concentrating just on real-time DB statistics and performance. ProcDetail: Concentrating just on jobs and processes. Availability: Concentrating just on the availability of core resources. To select a Collection Model 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 2. Click the Data Management tab. 3. From the drop-down list next to Overrides select the collection model you want to use. 4. Click OK. When you elect a model to use, you can vary individual collection intervals within that model to suit your needs (see Overriding default sampling and purging values). You can view some statistical information about each individual collection in the Managing SQLServer Agent Collection Data topic. Collection Categories Each collection is loosely associated with or related to one of the following categories. This is purely an informational grouping only and does not constitute any physical grouping. Intrinsic: These collections are concerned with data that is static, essentially static or else derived from the run state of the agent. The customer should not normally adjust the frequencies for these collections. Availability: These collections are concerned with availability or core components. Database: These collections are concerned with database specifics. Process: These collections are concerned with job and process specifics. Response: These collections are concerned with user response specifics. 12

19 SQLServer Statistic: These collections are concerned with various SQL Server statistics that do not fit into other categories. The Cache collection is usually disabled by default due to its high cost. Perfinfo:These collections are concerned with raw SQLServer perfinfo counters. These break down into Statistic (AMPerfInfo, BMPerfInfo, MMPerfInfo) and Database (DBPerfInfo) categories. Diagnosis: These collections are concerned with troubleshooting specific bottlenecks. Both collections (TableSize and Trace) are disabled by default due to their high cost. "As-Shipped" Sample Frequencies for Collection Models CATEGORY (INFO ONLY) COLLECTION TABLE NAME STANDARD MODEL FREQ AVAIL MODEL FREQ DBDETAIL MODEL FREQ PROCDETAIL MODE L FREQ Intrinsic System SQLConfig CollectionStatus n/a n/a n/a n/a Availability Availability DBStatus Service Database DBMonCount Database Filegroup File Backup General Process Jobs Errorlog Replication LogShipping Blocking Response Response Connection Statistic Statistic LockRate Memory TopUsers Lock Cache

20 Foglight Cartridge for SQLServer Perfinfo AMPerfInfo BMPerfInfo DBPerfInfo MMPerfinfo Diagnosis TableSize Trace Managing SQLServer Agent Collection Data The Foglight SQLServer agent is installed with 'global' collection sample frequencies and purging periods as well as specific collection override values. You can change these sample frequencies and purging periods to suit your specific monitoring and reporting requirements. These may be on long term reporting, monitoring of service availability, general day to day performance monitoring, or some other specific requirements (see more notes about this below). The agent is shipped with a number of "collection models" which provide a variety of reporting options using combinations of collections and sample intervals. These models can be modified as required. For more information see Selecting a SQLServer Agent Collection Model. The following table shows the standard installed default values for Sample Frequency and Purge Period, along with additional information about the volume of data collected by each collection. COLLECTION TABLE NAME STANDARD SAMPLE FREQUENCY PURGE PERIOD (D) BYTES PER ROW ROWS PER COLLECTION AMPerfInfo Availability <= Backup Varies Blocking Varies BMPerfInfo Cache <=20 CollectionStatus * Connection DBMonCount Database Varies DBPerfInfo per db DBStatus No. of accessible dbs Errorlog Varies File Varies Filegroup Varies General Varies 14

21 SQLServer Jobs Varies Lock <=6 per db LockRate LogShipping Memory MMPerfinfo Replication Response SQLConfig <=43 Service Statistic System TableSize Varies TopUsers Trace Varies When changing any of these values you should consider the following factors: the amount of space consumed in the Foglight server database by the collections how often the collected data is likely to change how often you want the associated rules to be evaluated how far back in time you wish to graph the data how fine-grained you wish the graphs to appear the CPU load (relative expense) of the collection. For example, if you double a particular sample interval then you will reduce the data volume over time by half, but any rule associated with that collection will not be evaluated until the interval has elapsed, and any graph of that collection data will have double the previous granularity. Sample intervals should be chosen with these tradeoffs in mind, along with consideration of how often the collected data is likely to change in your environment. Purging periods should be chosen with regard to your specific reporting requirements, and with consideration of the storage space required by the historical data. Note The CollectionStatus collection can help you to determine the relative expense of collections (DurationMS field), and also shows you the actual number of rows stored (RowsStored field). Note An overall limit of 4096 rows per collection is enforced to protect against data flooding. This would be a very rare occurrence. The 4096 row limit is really applicable only to the Errorlog, Backup, TableSize and Jobs collections, as all other collections have either a fixed number of 15

22 Foglight Cartridge for SQLServer returned rows or have a lower limit. To ensure that this limit is not reached for those four collections, you should ensure that your choice of sample interval is appropriate. Note In addition to the volume of data collected per row, note that there is also a 72 byte one-off overhead per table SQLServer Agent Debugging and Troubleshooting There may be times when the Foglight support staff require supplementary information about the SQL Server agent in order to resolve issues that arise. This topic describes how the support staff can obtain such information and is not generally applicable to normal Foglight users unless directed here by the support staff. ASPs for debugging Debugging Set this ASP to true to enable logging to the agent debug log file. You do this by selecting Log to agent debug file on the Collection Parameters ASP tab. The agent debug log file provides developers and support engineers with a trace of the execution of the agent executable. The log file resides in the Foglight Client bin directory and is named <AgentInstanceName>.log, where <AgentInstanceName> is the agent name specified when an instance of the agent is created (that is, with the New->Agent menu option). This agent instance name defaults to SQLServer, so the name of log file is usually SQLServer.log. Note1: The agent debug log file will be appended to if it already exists, and is never truncated. Beware of disk space usage! Note2: This agent uses the existing ErrWorld and agent.log logging paradigms for warning, critical and fatal messages; and (if debugging is enabled) any messages that are sent via those paradigms are also duplicated in the agent debug log file, that is, the agent debug log will contain all debug, warning, critical and fatal messages. EncryptStoredProcs Set this (hidden) ASP to false to be able to inspect/modify the stored procedures within the work database. By default the stored procedures within the work database are encrypted. 16

23 SQLServer DecryptSPFile Set this (hidden) ASP to false to be able to load an unencrypted databaseobjects.sql file. By default this file is expected to require decrypting at load time. There is also a utility executable called crypt available (contact us) for encrypting or decrypting the databaseobjects.sql file. To modify hidden ASPs Modifying hidden ASPs should be done with care and only on the specific instructions of a Quest support engineer. 1. Stop the existing agent and delete it. 2. From the Foglight console menu bar, select Tools and then Agent Browser. 3. Expand the Database node and select SQLServer. 4. Expand the ASP node and select SQLServer. 5. Select the ASP to be modified and change the default value appropriately. 6. Save the changed ASP. Any new agent will now get the changed ASPs. Agent performance issues Memory: 6.7 Mb (approximately) Handles: 150 (approximately) CPU: Most of the time the agent is actually idle and waiting. When it is performing a particular collection it will then use some more CPU for the duration of the collection. Therefore the CPU usage is seen as a series of spikes. The amount of work that the agent has to do will depend mostly on: the collection sample frequencies the number of databases (within the instance) being monitored the size of (that is, the number of objects contained in) each DB the frequency of events that cause alerts to be raised. Performance testing on a heavily loaded 4 CPU system running SQL Server 2000 showed that the majority of collections were finished within 5 seconds. On a lightly loaded system, the figures are much lower. 17

24 Foglight Cartridge for SQLServer Note The Cache and Trace collections are turned off by default (that is, their sample frequency is set to zero) due to their large relative processing expense. For the same reason the TableSize collection is scheduled to only run once every 4 hours. Work database issues You can force the stored procedures to be reloaded by deleting the work database (using the SQLServer Enterprise Manager), and then (re)starting the agent. Be aware that if Quest Software's I/Watch product is installed, the SQLServer agent must not use the same work database due to compatibility issues. The transaction log file for the work database will eventually fill up, causing collections to fail with the MS SQLServer 9002 error, unless you perform periodic backups, or periodically drop the database. The collation of the work database must have the same collation as that of the master database. A work database that is created by the agent will have the correct collation. However, if you specify the name of an existing database to use as the work database, you must ensure that its collation matches that of the master database. Collation conflicts cause collections to fail with the MS SQLServer 446 error. To find out what collations are being used for each database, use the MS SQLServer Query Manager (or similar query execution tool) to execute sp_helpdb. You will see the database's collation in the (very wide) column called status.' It is important that you perform database backups regularly to reduce the size of the database transaction logs. The transaction log for the Foglight work database will grow over time. If you also have Spotlight on SQLServer installed this transaction log may grow very quickly. Periodic maintenance of the work database (assuming it is called QuestSoftware) should include the following, after first stopping the agent and ensuring no users are accessing QuestSoftware. USE QuestSoftware; CHECKPOINT BACKUP LOG QuestSoftware WITH TRUNCATE_ONLY DBCC SHRINKDATABASE( QuestSoftware, TRUNCATEONLY) Other useful information for debugging and tuning The agent log contains user-oriented messages about the status of the agent. This is not to be confused with the agent debug log file described earlier. An agent may have a status of Running, Stopped, or Broken. Viewing the agent log can help you understand why the 18

25 SQLServer agent is in a certain state. To view the agent log, right-click on the agent and select Show Log. The CollectionStatus Table provides some useful agent debugging and tuning information. Individual collections can be disabled by setting their sample frequency to zero. This may help you to isolate a problematic collection. See Overriding default sampling and purging values. The Connection ASP tab (see page 3) includes an SQL Connection Timeout ASP that you can set anywhere between 30 and 3600 seconds. The default setting is 60 seconds. The Collection Parameters ASP tab (see page 9) includes an SQL Statement Timeout ASP that you can set anywhere between 30 and 3600 seconds. The default setting is 120 seconds. Note that in high SQL load situations, a collection may fail due to contention for SQL resources. Although this is not exactly the same as an SQL statement timeout, it will be interpreted and acted upon exactly the same as if it were an SQL statement timeout. The DBCC UPDATEUSAGE utility is usually run by the DBA after operations such as backups or index reorgs. This ensures that the internal statistics are reinitialized. It corrects the rows, used, reserved, and dpages columns of the sysindexes table for tables and clustered indexes (this size information is not maintained for nonclustered indexes). If this utility is not run then these statistics can be incorrect and this may falsely trigger alerts. For information about the DBCC UPDATEUSAGE utility, refer to the SQLServer Books Online help system. Faulty performance counters For a variety of reasons outside of the Foglight agent's control, the SQL Server performance counters (in the master.dbo.sysperfinfo table) may be inaccessible, incomplete or defective. During the System collection (performed at agent startup), the master.dbo.sysperfinfo table is checked and if found to be faulty, a warning is sent to the agent log. Faulty performance counters will result in problems for the following collections: AMPerfInfo (access method): No rows are returned. BMPerfInfo (buffer manager): No rows are returned. DBPerfInfo (database): No rows are returned. MMPerfInfo (memory manager): No rows are returned. 19

26 Foglight Cartridge for SQLServer LockRat: No rows are returned. Memory All columns except BufferCacheHitRatio, ProcCacheActiveMB and ProcCacheHitRatio will be zero. Statistic All columns will be zero. Usually the missing performance counters can be restored on SQLServer by performing the following steps: Open a command prompt in the SQLServer Binn directory for example, C:\Program Files\Microsoft SQL Server\MSSQL\Binn Determine the drivername from the sqlctr.ini file by running: indstr drivername sqlctr.ini Typically the drivername will be 'MSSQLServer' for unnamed instances. Unload the performance counters by running: unlodctr MSSQLServer Substitute your drivername here as appropriate Reload the performance counters by running: lodctr sqlctr.ini Restart the SQLServer instance. Note These steps are for non-clustered SQL Servers only. You can check for successful unloading and loading messages in the application log within the Windows Event Viewer. If you get an unexpected error message, look it up at msdn.microsoft.com. In the event that the above process does not remedy the missing performance counters, see article in the Microsoft Knowledge Base for an alternative possible solution. Log shipping failure alerts Log shipping creates jobs that are to be run on both the source and destination SQLServer. The agent residing on the SQLServer Log Shipping Monitor Server raises an alert if it detects that log shipping has failed. Agents residing on the source or destination server also raise alerts for any particular log shipping job that may have failed on that server. Multiple repeating entries may also be written to the SQLServer Error Log by the SQLServer Log Shipping Monitor Server, and so the agent may raise multiple alerts for the one situation. You may therefore wish to filter out this particular class of error log message from the ErrorLog collection. For filtering instructions see Editing the Errlog Search Strings list. 20

27 SQLServer SQLServer Agent Cluster Handling The Foglight SQLServer Agent monitors the involvement of the host node within a cluster, and also monitors the role of the SQLServer instance within the cluster. Installation In a clustered scenario, all the SQLServer instances that are accessible from the cluster nodes should be monitored. To achieve this, install the following on each node in the cluster: The Foglight Host Services (RAPSD). A Foglight SQLServer Agent for every SQLServer instance that is installed within the cluster. The Foglight Host Services and the Foglight SQLServer Agent must be installed on the node's local disk, rather than on the cluster disk array where the SQLServer instance is typically installed. We recommend that you configure the Cluster Service to start before the Foglight Host Service. If the Foglight Host Service is configured to start automatically at boot time (which is the default) then it is possible that it may start up before the Clustering Service, causing the agent to report the SQLServer instance status incorrectly. Before starting each agent for the first time, do the following: 1. Select the Connection Details ASP tab. 2. In the Instance Name field, enter the clustered SQLServer's instance name. 3. Note: This must be a local instance name as the cluster state cannot be monitored remotely. In the Cluster Group Name field, enter the name of the cluster resource group that the SQLServer is in. Note The Cluster Group Name is the cluster resource group name as seen in the Windows Cluster Administrator for this cluster. This is the group name for the group to which the target host belongs, and hence can be seen in the Windows Cluster Administrator under the host node "Active Groups". 21

28 Foglight Cartridge for SQLServer Some common cluster scenarios and IP Map examples Active/Passive: One SQLServer instance is installed within the cluster. The instance on Box1 is active and the instance on Box2 is passive. In the event of Box1 failing, Box2 will become active. The agent running on Box1 is collecting data from the active SQLServer instance on Box1. The agent running on Box2 is in quiescent mode, only monitoring the status of the passive SQLServer instance on Box2. Active/Active: Two separate SQLServer instances are installed within the cluster so that each node has one active SQLServer instance. This is commonly done to share workloads between the two nodes. Both nodes and both SQLServer instances are normally active. If one node fails then the SQLServer instance on it is failed over to the other node. The SQLServer_1 agent running on Box1 is collecting data from the active SQLServer instance on Box1. 22

29 SQLServer The SQLServer_2 agent running on Box1 is in quiescent mode, only monitoring the status of the passive SQLServer instance on Box1. The SQLServer_1 agent running on Box2 is in quiescent mode, only monitoring the status of the passive SQLServer instance on Box2. The SQLServer_2 agent running on Box2 is collecting data from the active SQLServer instance on Box2. Operational notes If an SQLServer instance is active on a clustered node, the agent (residing on that same node) performs normal monitoring of the instance. If an SQLServer instance is inactive on a clustered node, the agent (residing on that same node) only monitors the cluster status of the instance, and does not perform any SQL dependent collections. The agent is in quiescent mode. In the event of a failover, the agent on the failed node issues an alert and then enters quiescent mode, whilst the agent on the newly activated node takes over the normal monitoring of the (now active) SQLServer instance. As described above, in the event of a failover, the agent does not shutdown and then restart on another node in the cluster. Associated tables and rules The Availability collection table shows whether the SQLServer instance is running, and its role within a cluster. The Cluster Failover rule issues a fatal alert when the cluster resource group becomes inactive within the cluster. The alert raised shows the name of the newly active node and issues an advice. The Cluster Node Status rule issues a warning alert when the cluster resource group becomes inactive within the cluster. The alert raised shows the name of the newly active node. When clustered, the SQL Server Running rule issues a fatal alert if the cluster resource group's role is Active but the cluster resource group is not fully online (as seen by the Windows Cluster Manager). 23

30 Foglight Cartridge for SQLServer SQLServer Agent Collection Error Return Codes These are the possible error return codes from an SQLServer agent collection. CODE NOTES 0 Successful execution and collection. -1 Internal software error. The SQLServer agent will shut down. In the rare event that this should occur, contact Quest Support at support@quest.com -2 No rowset returned. The agent will retry on the next scheduled collection. -3 invalid rowset returned. The agent will retry on the next scheduled collection. -4 A serious sql error suggesting that the connection has failed. The agent will then attempt to re-establish a connection to the SQLServer. -5 An ordinary sql error. The agent will retry on the next scheduled collection. -6 A timeout. The agent will retry on the next scheduled collection. -7 An unexpected number of rows returned. The agent will retry on the next scheduled collection. Note The error codes -2, -3, -5 and -7 may sometimes be returned when a timeout or contention has occurred during the execution of the query. However the agent is unable to discriminate between this event and other possible causes, so should these error codes persist in subsequent collections please contact Quest Support at support@quest.com Editing the Errlog Search Strings list Use this list to specify which SQLServer errorlog entries are to be alerted on. By default, this list contains all the anticipated strings, and the standard severities 1 through 25. You can add, delete or edit strings and assign the relevant severity so that Foglight raises an alert when these errors are logged by SQLServer. The Errorlog Exclusion List secondary ASP allows you to specify errorlog strings that you want discarded from the returned list. To edit the error log search strings list 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 2. Click the Collection Parameters tab. 24

31 SQLServer 3. Click the Edit button located next to the Errlog Search Strings field. To add an errorlog string: a) Click New. b) The Add dialog opens. c) In the String field, type the search string. Note Error log entries that match the search string anywhere in the entry will be alerted on. Note The search string must not contain any single quote or pipe characters, and wildcards are not accepted. To edit a string d) From the Severity list, select a severity. e) Click OK. f) If you want to add another error string, click Apply and then repeat steps c to e. If you are finished, click OK. a) Select a string from the list. b) Click Edit. c) Make your changes. 25

32 Foglight Cartridge for SQLServer d) Click OK. To delete a string: a) Select a string from the list. b) Click Delete. c) Confirm the deletion by clicking Yes. d) Click OK. Editing the Errlog Exclusion list You can specify a list of errorlog search strings to be excluded from the Errorlog collection table using the Errlog Exclusion List on the Collection Parameters tab. Each string in the list is searched for and located wherever it appears in the error log entry. These strings are then removed from the strings returned by the Errlog Search String collection. To edit the error log string exclusion list 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 2. Click the Collection Parameters tab. 3. Click the Edit button located next to the ErrLog Exclusion List. To add a string to the exclusion list: a) Click New. b) The Add dialog opens. c) In the Errorlog Exclusion String field, type the string you want to exclude. Note When typing a string to be excluded, you must use the correct case if the SQLServer instance has been installed as case-sensitive. 26

33 SQLServer d) If you want to exclude another string, click Apply and then repeat step b. If you have listed all the strings you want to exclude, click OK. To delete a string from the list: a) Select the name of the string you want to delete. b) Click Delete. c) Click Yes to confirm deletion. d) Click OK. To edit a string in the list: a) Select the name of the string you want to edit. b) Click Edit. c) In the Errorlog Exclusion String field, modify the string. d) Click OK. Excluding databases from being monitored You can stop the SQLServer agent to from monitoring specific databases by adding the database name to the Database Exclusion list. By default, the following databases are excluded: Tempdb, Pubs, Northwind, Model. The Database Exclusion List is utilized by the DBMonCount, Database, Filegroup, File, General and Backup collection tables. If the Monitored Databases Rule is activated, you can use the Database Exclusion List (see page 27) to reduce the number of databases being monitored. To edit the Database Exclusion list 1. Right-click the SQLServer agent icon and choose Edit, ASPs. 2. Click the Collection Parameters tab. 3. Click the Edit button located next to the Database Exclusion List. 27