SolarWinds Comparison of Monitoring Techniques. On both. Target Server & Polling Engine



Similar documents
Network Monitoring Comparison

Vocera Voice 4.3 and 4.4 Server Sizing Matrix

SolarWinds Certified Professional. Exam Preparation Guide

A Comparison of Oracle Performance on Physical and VMware Servers

Legal Notices Introduction... 3

SolarWinds Network Performance Monitor

SolarWinds. Server & Application Monitor. Onboarding Guide. Version 6.2. Part 1 of 3: Get Started

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

A Comparison of Oracle Performance on Physical and VMware Servers

SOLARWINDS NETWORK PERFORMANCE MONITOR

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

SolarWinds Network Performance Monitor

Adonis Technical Requirements

SolarWinds Network Performance Monitor powerful network fault & availabilty management

HP Intelligent Management Center Standard Software Platform

Server & Application Monitor

CONSTRUCTION / SERVICE BILLING SYSTEM SPECIFICATIONS

Virtual Server and Storage Provisioning Service. Service Description

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

Using SolarWinds Orion for Cisco Assessments

SolarWinds Scalability Engine Guidelines for SolarWinds Products Technical Reference

McAfee Enterprise Mobility Management Performance and Scalability Guide

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Load and Performance Testing

Configuring WMI Performance Monitors

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Microsoft Exchange Solutions on VMware

PLCS Reporter v2.8 for IBM Tivoli Storage Manager. Business intelligence for TSM

Getting Started with Capacity Planner

HP Intelligent Management Center Enterprise Software Platform

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Virtualization Guide. McAfee Vulnerability Manager Virtualization

Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

SolarWinds Technical Reference

Heroix Longitude Quick Start Guide V7.1

A Comparison of VMware and {Virtual Server}

Enabling Technologies for Distributed and Cloud Computing

Monitor Solution Best Practice v3.2 part of Symantec Server Management Suite

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Hardware and Software Requirements for Server Applications

SolarWinds. Server and Application Monitor. Onboarding Guide. Version 6.2

[Document Title] SolarWinds Server & Application Monitor (SAM) [Document Subtitle] Angi Gahler. Share: Author: Manish Chacko

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Resource Monitoring During Performance Testing. Experience Report by Johann du Plessis. Introduction. Planning for Monitoring

7 Real Benefits of a Virtual Infrastructure

HP Intelligent Management Center Standard Software Platform

VMware vsphere 4.1 with ESXi and vcenter

Network Configuration Manager

Philips IntelliSpace Critical Care and Anesthesia on VMware vsphere 5.1

Identify and control performance and capacity risks. Introduction... 2

Rally Installation Guide

Infor Web UI Sizing and Deployment for a Thin Client Solution

QuickSpecs. HP PCM Plus v4 Network Management Software Series (Retired) Key features

Smart Business Architecture for Midsize Networks Network Management Deployment Guide

Parallels Desktop 4 for Windows and Linux Read Me

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Adapt Support Managed Service Programs

BridgeWays Management Pack for VMware ESX

Virtual Appliance Setup Guide

Reference Architecture for Dell VIS Self-Service Creator and VMware vsphere 4

VMware vsphere 5.0 Boot Camp

Features Overview Guide About new features in WhatsUp Gold v12

SAP Business One Hardware Requirements Guide

Benchmarking Guide. Performance. BlackBerry Enterprise Server for Microsoft Exchange. Version: 5.0 Service Pack: 4

VI Performance Monitoring

Managing Orion Performance

Oracle Database Scalability in VMware ESX VMware ESX 3.5

VMware vcenter Update Manager Administration Guide

VMWARE WHITE PAPER 1

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

Resolving H202 Errors (INTERNAL)

Enterprise Edition Technology Overview

NetScaler VPX FAQ. Table of Contents

WhatsUp Gold v11 Features Overview

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

Enterprise Cloud VM Image Import User Guide. Version 1.0

How to Configure a Stress Test Project for Microsoft Office SharePoint Server 2007 using Visual Studio Team Suite 2008.

my forecasted needs. The constraint of asymmetrical processing was offset two ways. The first was by configuring the SAN and all hosts to utilize

Optimize Your Deployment Using Best Practices

White Paper. Recording Server Virtualization

Monitoring Databases on VMware

HP ProLiant Essentials Vulnerability and Patch Management Pack Planning Guide

CribMaster Database and Client Requirements

Hardware/Software Guidelines

Very Large Enterprise Network Deployment, 25,000+ Users

HPSA Agent Characterization

Dragon Medical Enterprise Network Edition Technical Note: Requirements for DMENE Networks with virtual servers

Capacity Planning for Microsoft SharePoint Technologies

Distributed Network Monitoring. netbeez.net Booth #2344

WhatsUp Event Alarm v10x Quick Setup Guide

Running custom scripts which allow you to remotely and securely run a script you wrote on Windows, Mac, Linux, and Unix devices.

Transcription:

SolarWinds Comparison of Monitoring Techniques On both Target Server & Polling Engine

Contents Executive Summary... 3 Why Should You Keep Reading (ie: why do I care?)... 4 SNMP polling (as compared to WMI)... 4 WMI polling (as compared to SNMP)... 4 Introduction... 5 Disclaimer... 5 SolarWinds Monitoring Impact on a Single Target... 6 Premise and Architecture... 6 Details... 7 Timeline and Graphical Results... 8 SolarWinds Monitoring Impact on a Polling Engine... 9 Premise and Architecture... 9 Details... 10 Timeline and Graphical Results... 11

Executive Summary SolarWinds (relatively) new technique of monitoring windows servers via WMI instead of SNMP represents a measurable - but manageable impact on both the target and the polling engine. On a target server, monitoring with WMI + a SAM template had no effect whatsoever on RAM or CPU (compared to simple ping monitoring) although it did represent an average increase of 12Kbps. The difference between WMI and SNMP polling was even less noticeable, with a 4Kbps bandwidth bump being the only noticeable effect. On the polling engine the impact was more pronounced: monitoring 300 servers via WMI with a SAM template included (the most aggressive monitoring combination) resulted in the following increases compared to monitoring with simple ping : a 16% increase in average CPU utilization a 4% increase in average RAM usage and a 4Mbps increase in incoming bandwidth. The difference between monitoring 300 machines with WMI vs. SNMP was even less of an impact on average: 6% CPU 2% RAM 2.5Mbps bandwidth received.

Why Should You Keep Reading (ie: why do I care?) If (as the executive summary states), the difference between WMI and SNMP polling is (statistically) negligible, then why the need for additional hand-wringing? Why not just make the switch and go? The answer is that the choice of polling method has other impacts beyond the physical toll on the machines involved. Functionally, there are some pros and cons to be weighed: SNMP polling (as compared to WMI) CON Cannot monitor Windows Volume Mount points CON Challenges with earlier versions of Windows (NT, W2k) CON Requires additional non-standard configuration actions (enabling snmp agent, etc) PRO Fewer ports for enterprise firewall rules PRO No single point of failure for access CON Changing SNMP string requires enterprise-wide changes CON Uses SNMP service start time for uptime metrics o Work-around: set up UnDP for hrsystemuptime PRO Extremely efficient use of CPU, RAM and bandwidth (on both target and poller) WMI polling (as compared to SNMP) CON WMI-only devices cannot use custom pollers (UnDP). o Work-around: If the machine has EVER been an SNMP polled device, the snmp info is retained and custom pollers can be used (at least until the SNMP RO string changes) PRO Settings used by SAM automatically CON significantly more firewall ports required o Work around: per-server config can nail down WMI to just a couple of ports CON will not work across a NAT-ed WAN connection (VPN, etc) CON one password change in AD can cripple monitoring CON cannot monitor topology PRO uses REAL reboot time for uptime metrics CON less efficient (vis a vis SNMP) use of CPU, RAM and bandwidth on both target and poller

Introduction As we rolled out SolarWinds monitoring in our environment (about 5,000 servers and 3,000 network devices) the question of load both on the target devices and the monitoring infrastructure itself became increasingly important. Even seemingly small additions such as a single custom universal device poller could have wide-ranging impacts when applied to 1,000 devices. We wanted to be able to respond with data to concerns of both the application owners (who didn t want monitoring to rock the boat), and the monitoring team (who didn t want to turn on an option that looked nice on one or two systems but would crash everything when rolled out enterprise wide). Much of what we needed was already documented, either in the technical information or in online forums. However, when we looked for some hard numbers regarding WMI we found less. When we asked experienced technical resources, Can you show me the impact of turn on WMI in a large environment, and how that load compares to SNMP (or nothing) we received (at best) vague responses like WMI is 5 times chattier than SNMP ; and (at worst) responses that bordered on snarky: Since we don t have 4000+ nodes in our test environment its difficult for me to tell you exactly what will be the impact of moving 4000+ nodes from SNMP to WMI polling. So, we ended up doing it ourselves. We broke the testing into two major areas of focus: 1) The impact of various monitoring methods on a single target server 2) The impact of various monitoring methods on a polling engine, when used on a significant number of target nodes. Because we own both NPM and SAM, the methods we focused on were: Ping SNMP standard collection WMI standard collection SAM monitoring Disclaimer These tests were designed and executed with exactly one goal: to answer my own curiosity and help me make the right decision for my project. It was not intended to be exhaustive or completely comprehensive. It had to be performed with the hardware we had at hand, in a relatively short timeframe, with minimal impact to both the infrastructure and my real task list. Your mileage may vary, caveat Emptor, and don t forget to tip the wait staff.

SolarWinds Monitoring Impact on a Single Target Premise and Architecture We set out to answer the question of load on a single node (a windows server, in this case) using 5 scenarios: 1) When we just monitor for ping 2) Monitoring via ping and SNMP hardware collection on a standard windows device. 3) Monitoring via ping/snmp plus a SAM template (perfmon, service, and eventlog) 4) Monitoring via ping/wmi plus a SAM template (perfmon, service, and eventlog) 5) Monitoring via ping/wmi hardware for collection only We wanted to avoid observer bias, where our monitoring of the server under load was causing more load than the monitoring that was generating the load. Therefore, we set up an extremely aggressive collection of hardware via SNMP using one polling engine, and then let the server baseline itself to that level. Then we performed the real monitoring (i.e.: the scenarios described above) using a different polling engine writing to a separate database (on a different server). Aggressive polling to observe changes on targe Target Node Normal polling using standard monitoring techniques Poller A Poller B Database A Database B This allowed us to change the monitoring scenarios while observing the effect of those changes on the target from a separate point of reference. Summary of Results Cutting to the chase: In the end, all of the various monitoring options had a negligible impact on the server. Overall, CPU ranged from 0 to 11% utilization, with the high point occurring during WMI + SAM monitoring. RAM varied only by 2% (from 22 to 24%) and bandwidth used by monitoring ranged between 5 and 35Kbps o The only significant spike was in bandwidth used by WMI+SAM, which was higher by 10Kbps than any other monitoring technique

Details The diagram below (and associated spreadsheet) shows the change in RAM, CPU and Network for a target device when different monitoring is applied. The target device was a Vmware guest running on ESX 5.0, provisioned as Windows 2008 R2 (Version 6.1.7601 Service Pack 1 Build 7601) with 4 single-core 3.07GHz Intel Xeon CPUs and 16 Gb of RAM. No other processes were running on the server while this testing was done. Poller A the one doing the heavy collection of metrics another Vmware guest on ESX 5.0 running with Windows 2008 R2 (Version 6.1.7601 Service Pack 1 Build 7601) with 4 single-core 3.07GHz Intel Xeon CPUs and 16 Gb of RAM. Database A was a HP Proliant BL460c G7 with 2 3.07 GHz 6-core Intel Xeon CPUs and 16 Gb RAM. It was running Windows 2008 R2 and MS SQL 2008 Standard edition. Poller B the one doing the standard monitoring we were measuring was a Vmware guest on ESX 5.0 running with Windows 2008 R2 (Version 6.1.7601 Service Pack 1 Build 7601) with 4 dual-core CPU s and 2 logical processors for an effective total of 16 3.07 GHZ Intel Xeon CPU and 12Gb RAM Database B was a HP Proliant BL460c G7 with 2 12-core 3.07GHz Intel Xeon CPU and 192Gb RAM. It was running Windows 2008 R2 and MS SQL 2008 Standard For the SAM monitoring, we built a template that collected 3 perfmon counters, checked for 3 eventlog messages, and gathered the status of 1 service.

Timeline and Graphical Results The sequence of events (corresponding to the numbered red lines on the diagram) are: 1. At 6:45 the target was baselined with ping-only 2. switch to SNMP monitoring: CPU/RAM topology 2 HD RAM/Phys memory as disk 3. 8:50: add SAM template 4. 10:20: Changed to WMI polling CPU/RAM 2 HD RAM/Phys memory as disk 1 nic 5. 11:15: removed SAM template

SolarWinds Monitoring Impact on a Polling Engine Premise and Architecture For this test, we wanted to understand how different monitoring types change the overall load on a polling engine, when those monitors are performed on a significant number of machines The test sequence was: 1) Load a number of servers and monitor their hardware via ICMP/SNMP 2) Add a SAM template 3) Convert those servers to ICMP/WMI 4) Remove the SAM template In this scenario we couldn t reasonably have the monitoring server monitor itself, so we used a second monitoring implementation that would aggressively gather statistics from real poller as we put it through each test 300 Devices Aggressive polling to observe changes on targe Normal polling using standard monitoring techniques Poller A Poller B Database A Database B Summary Ram utilization remained steady throughout the testing, with a high point of 42.3% and a low of 32.7% CPU usage jumped from 2% to 64% o Ping-only ranged from 6% to 38% o SNMP ranged from 9% to 32% o Adding a SAM template to SNMP increased CPU to 12% through 50% o Switching to WMI Polling (still with SAM) took the poller up to 17% through 62% o And WMI polling used between 9% and 47% CPU Bandwidth usage ranged from 2Mbps up to 55Mbps overall o Ping only: 2 10Mbps o SNMP polling: 2.3-11.4Mbps o SNMP+SAM: 5 55Mbps o WMI+SAM: 3Mbps 51Mbps o WMI polling: 3 47Mbps

Details The diagram below (and associated spreadsheet) shows the change in RAM, CPU and Network for a polling engine when different monitoring is applied to approximately 300 nodes. The target poller the one doing the heavy lifting was a Vmware guest on ESX 5.0 running with Windows 2008 R2 (Version 6.1.7601 Service Pack 1 Build 7601) with 4 single-core 3.07GHz Intel Xeon CPUs and 16 Gb of RAM. It should be noted that, besides the 300 nodes in this test scenario, the polling engine was managing another 700 nodes at the same time. The database connected to the target poller was a HP Proliant BL460c G7 with 2 3.07 GHz 6-core Intel Xeon CPUs and 16 Gb RAM. It was running Windows 2008 R2 and MS SQL 2008 Standard edition. The monitoring poller the one that was watching the stats on the target poller was a Vmware guest on ESX 5.0 running with Windows 2008 R2 (Version 6.1.7601 Service Pack 1 Build 7601) with 4 dual-core CPU s and 2 logical processors for an effective total of 16 3.07 GHZ Intel Xeon CPU and 12Gb RAM The database connected to the monitoring poller was a HP Proliant BL460c G7 with 2 12-core 3.07GHz Intel Xeon CPU and 192Gb RAM. It was running Windows 2008 R2 and MS SQL 2008 Standard

Timeline and Graphical Results The sequence of events and corresponding (red) markers are: 1. 2:23: scanned 400 nodes 2. 2:35: switched to ping-only 3. 2:50: finished switching to ping-only 309 nodes total 4. 4:05: re-scanned nodes, updated to SNMP 5. 4:15: scan completed 309 nodes 1008 disks 1620 nics 6. 5:45: added SAM Template 7. 9:03: re-scanned nodes, hand-converted some, updated to WMI 8. 9:50: scan and convert completed 301 nodes 1226 disks 299 nics 9. 5:45: removed SAM Template