Fifty Critical Alerts for Monitoring Windows Servers Best practices

Fifty Critical Alerts for Monitoring Windows Servers Best practices The importance of consolidation, correlation, and detection Enterprise Security Series White Paper 6990 Columbia Gateway Drive, Suite 250 Publication Date: Jan 31, 2007 Columbia MD 21046 877.333.1433

Abstract How important is it for your organization to stop an intrusion immediately? How important is it for your organization to keep your critical applications up at all times? This document identifies and describes the most important events generated by your Windows servers so they can be addressed and corrected by IT personnel in the most efficient manner. The strategic benefit of monitoring these critical events combined with a robust resolution strategy is significant reduction of IT costs while ensuring increased service availability and enhanced security of your enterprise. The information contained in this document represents the current view of Prism Microsystems Inc. on the issues discussed as of the date of publication. Because Prism Microsystems must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Prism Microsystems, and Prism Microsystems cannot guarantee the accuracy of any information presented after the date of publication. This document is for informational purposes only. Prism Microsystems MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, this paper may be freely distributed without permission from Prism, as long as its content is unaltered, nothing is added to the content and credit to Prism is provided. Prism Microsystems may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Prism Microsystems, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. 2007 Prism Microsystems Corporation. All rights reserved. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. Prism Microsystems 2

Overview IT departments today are facing many additional duties and responsibilities they are expected to meet and IT managers are also increasingly being held responsible for end-customer satisfaction. System administrators encounter daily challenges maintaining server security and up-time, and doing both effectively is a difficult and time consuming challenge. When a service-impacting event occurs on critical servers, the faster it is detected and the system administrator notified, the faster the issue can be fixed. In many cases the issue can be prevented from causing a service disruption. So, how do you get instant event notification to the right person at the right time? The keys are to identify the key events that your IT staff needs to know about immediately and then automated the creation of alerts for those events. These critical alarms dramatically shorten service outage times and lower costs by reducing the time it takes IT Operations to respond to, and resolve, the issue. The EventTracker product from Prism Microsystems, Inc. is a reliable, proactive and practical enterprise class solution to centrally monitor, analyze and manage events. EventTracker provides instant notification for many out-of-box critical events for servers. Plus, you can create customized alert rules specific to your business and tune the rules so that false positive alerts are minimized. These alerts notify the right person at the right time with multiple notification methods including email, pager, or audible alarm, or an SNMP trap notification to an enterprise console such as HP OpenView or Tivoli. Automating the monitoring of your critical systems produces the best of all worlds. Automation is less expensive and resource intensive than manual processes. It frees resources to work on other priorities, while ensuring that problems in critical services can still be detected faster and addressed sooner. Prism Microsystems 3

The EventTracker Solution EventTracker includes the fifty alerts that are most critical for your IT security and the operation of your enterprise. Alert Name Description 1 Disk Space is critically low This alert is generated when the system is running low on logical disk space. By default, 80% full is considered as a warning point; the threshold is however a configurable parameter. 2 Critical service is not running Monitoring the availability of critical services is vital for remote server diagnosis and problem resolution. Critical services being stopped during unusual hours of operation can also mean warning signs for intrusions. 3 Critical service could not be started This alert indicates critical services configured in EventTracker for automatic restart, fails to start. An alert of this nature needs immediate action from system administrators. 4 Detected high memory usage This event is generated when the memory usage exceeds a defined threshold and alerts system administrators to examine processes consuming the RAM. 5 Detected software <Some S/W> has been installed on this system Monitoring unauthorized software changes aids in early intrusion detection. 6 EventTracker agent service failed This alert notifies that EventTracker agent service has failed and could not be restarted. Events from the system, during this downtime could be lost unless Guaranteed event delivery has been configured. 7 Domain policy changed This alert indicates a successful change to the Windows Active Directory security policies. This alert is also triggered when the Group Policies are applied. 8 Active Directory: Group policy changed 9 Run away CPU process A process consuming high CPU This alert indicates that group policy or an OU policy has changed. It may change the behavior of active directory users permissions A CPU-intensive process can adversely affect server performance by bogging down memory, slowing all database transactions and can even bring down the server to a halt. These alerts are critical for continued reliable performance and minimizing downtime. 10 Run Away Memory Process A process is taking too much memory This alert suggests that the process running may have a memory leak. It s important to monitor such a process closely. 11 Software uninstalled from a system Installation of unauthorized software packages can increase system vulnerability resulting in virus attacks. 12 Excessive logon (Event ID 529) failures in your enterprise This event indicates an attempt to log on using an unknown user account or a valid user account but with an incorrect password. Concurrent occurrences of these events represent an attack on the enterprise. Prism Microsystems 4

13 Excessive Audit Failure message from a system Excessive audit failure on a system or a particular resource on system is indication to a potential intrusion or violation of a security policy 14 Excessive access failures by a user Logon failures using accounts that have been locked can result 15 Excessive access failures on a specific computer in this intrusion alert. Sophisticated scripts run by hackers use a variety of user name and password combinations to get past windows security. Logon failures on each system should be monitored closely. 16 Excessive access failures in your enterprise Enterprise wide repeated logon failures in a short interval of time is a sure sign of intrusion. 17 Excessive file deletes on a computer This alert notifies that a critical server data has been compromised. 18 Excessive VPN connection failure This alert indicates that someone may be persistently trying to access your VPN server to come in to your network 19 Too many concurrent request to your web site 20 Excessive logon attempts from a particular IP address 21 Excessive Ping failure Several systems are not reachable 22 Excessive remote connections established on a local network service (port) 23 Excessive User lockout in your enterprise (ID=539) 24 High CPU utilization Your system is running in sluggish This alert indicates that too many users are accessing your company web site at this time. Performance may be impacted. You need to pay attentions A number of successive logon attempts from a single remote IP address are an indication of hacking activity. The source of attack should be identified blocked to prevent further attack. Monitoring responses to ICMP packet requests and receipt time of ICMP packets from each destination is essential for network performance tuning. Numerous unknown processes attached to local ports are sure signs of intrusion. This event indicates a logon attempt for a locked account. This event can indicate that a password attack was launched unsuccessfully resulting in the account being locked out. Continuous increase in system load is an indication of potential problem. These alerts give a good head start to system performance tuning. 25 IIS: Logging Shutdown IIS logging shuts down when a disk full error is encountered. Administrators can either free some disk space on the logged drive or move log files to another location. 26 IIS: Server Stopped When users access an application from an ASP page, the underlying COM+ application fails if there is no user logged on to the IIS console. Administrators can quickly resolve this issue by specifying appropriate user account. 27 IIS: World Wide Service Terminated This problem can occur if the Microsoft Distributed Transaction Coordinator (MSDTC) has been configured to use a certain range of ports for incoming requests, but the range that has been specified is not large enough. 28 ISA Server: All Port Port Scan detected This alert notifies that an attempt was made to access more than the pre-configured number of ports. One can specify a threshold, indicating the number of ports that can be accessed. 29 ISA Server: Excessive Win Sock Applications open This alert is generated when the network system has run out of socket handles. WinSock applications that open and close sockets often without closing them properly can cause this error. Prism Microsystems 5

30 ISA Server: Failed to start service This alert indicates that ISA server services failed to start. Analysis of associated windows events can help identify the cause. 31 ISA Server: Land attack This alert notifies that a TCP SYN packet was sent with a spoofed source IP address and port number that matches that of the destination IP address and port. If the attack is, it can cause some TCP implementation to go into a loop that crashes the computer. If this alert occurs, server policy rules or packet filters should be configured to inhibit traffic from the source of the scans. 32 ISA Server: Network communication device may be down 33 ISA Server: Out of band attack detected This event refers to a problem that has occurred at the datalink level, or if the link connection has been cleared. One should check for errors logged for data link or data communications hardware devices. This alert is triggered by an out-of-band denial-of-service attack attempted against a computer protected by ISA Server. If mounted successfully, this attack causes the computer to crash or causes a loss of network connectivity on vulnerable computers. 34 ISA Server: Ping Attack This event occurs if a large amount of information has been appended to an ICMP echo request packet. If the attack is successful, resulting in kernel buffer overflow and system crash. If this alert is received, one should create a protocol rule that specifically denies incoming ICMP echo request packets from the Internet. 35 ISA Server: Port scan detected on a well known port This alert indicates that an attempt was made to scan wellknown ports on a computer to detect services running on those ports. If this alert occurs, one should identify the source of the port scan and check the access logs for indications of unauthorized access. If indications of unauthorized access are present, system should be considered as compromised and take appropriate action. 36 ISA Server: Spoof Attack A spoof attack occurs when packets are received on an IP address that is not reachable via the interface. If logging for dropped packets is set, one can view details in the packet filter log 37 ISA Server: UDP attack This alert occurs when there is an attempt to send an illegal UDP packet. A UDP packet that is constructed with illegal values in certain fields will cause some older operating systems to crash when the packet is received. If the target machine does crash, it is often difficult to determine the cause. Steps against this intruder activity include setting up a packet filter or policy rules to inhibit traffic from the source of the intrusion. Prism Microsystems 6

38 MSExchange: ADC service stopped This can be a mere information event or mean service shutdown due to unexpected errors. If the service fails to start manually, administrators should analyze related errors and warning messages in order to resolve the issue. 39 MSExchange: Database maximum size is reached Normally logged after database has shutdown for reaching its capacity. This message means server requires an upgrade to Enterprise server or running utilities to free up space. A fix from Microsoft enables database extension by 1 GB. 40 MSExchange: IS Service cannot be started A critical error indicating that Microsoft Exchange Information Store service failed to initialize. 41 MSExchange: Log disk is full This issue can occur with insufficient disk space on the drive that contains the databases that are being mounted. 42 MSExchange: Server cannot handle influx of mail This error alert is generated when another MTA service is attempting to send to an address that does not exist at the local server. It might be required to cleanup AD with ADSI and 43 MSExchange: Unable to start exchange server rebuild the server. This error can result from a variety of faulty applications such as iexplore, dns, mmc, winlogon etc. Requires application updates. 44 SQL Server: SQL server stopped Untimely service shutdown events of SQL server and SQL server agent service can mean warning signs for intrusions. 45 SQL Server: Transaction log full These messages indicate that SQL Server cannot allocate additional free space, needed for expanding the database 46 SQL Server: Backup failed Failing to perform backups within the given time frame exposes the server to the risk of data loss. 47 System is not reachable, it may be down Monitoring unreachable destinations is vital for network management. 48 System Resource exhausted This is a critical audit event indicating loss of audit records due to overwriting of earlier records or due to cessation of auditing, depending on the audit policy established; or by internal event queues exceeding their maximum length 49 Back up failed This alert indicated that backup operation is failed for some reason and immediate attention may be required. 50 Critical Web URL is not reachable This alert indicated that certain critical Web URL may be accessible. It may indicate that your web site may be down. Prism Microsystems 7

Summary As IT Departments are challenged with increasing security and server up time plus the added responsibilities of ensuring end-customer satisfaction, it is becoming even more important for the appropriate staff to receive instant notification of critical server events. By employing EventTracker real-time alerts, IT managers are able to configure specific alerts to notify the right person, via the best method, for the most critical events for the organization. This allows IT staff to proactively prevent an intrusion, slow-down, or outage while being able to attend to other responsibilities. For more information on EventTracker, visit www.eventlogmanager.com. Prism Microsystems 8