Version 3.0 and Beyond September 21 st, 2006 nagios@nagios.org
Where Things Currently Stand Nagios 3.x Daemon: Coding 90% completed Lots of internal improvements Documentation needs to be written CVS code is pre-alpha, alpha/beta Real Soon Now TM New Web Interface: Delayed until Nagios 4.x 2
Changes: Nagios 3.0 Features Object definitions Notification logic Plugin spec Custom variables Host check logic 3
3.0 Features In Depth
Object Definitions Multiple template names: Names separated by commas Allows for more advanced inheritance of object properties Easier configuration management for complex environments 5
Multiple Template Names Multiple inheritance sources... # Generic host template define host{ name generic-host active_checks_enabled 1 check_interval 10... register 0 # Development web server template define host{ name development-server check_interval 15 notification_options d,u,r... register 0 # Development web server define host{ use host_name... generic-host,development-server devweb1 6
Multiple Template Names Complex inheritance abilities... # Development web server define host{ use 1, 4, 8 host_name devweb1... 7
Object Definitions Suppression of inherited object vars: Character variables in templates (e.g. event_handler) couldn't be cleared in objects using them until now! Use null as keyword to clear value # Generic host template define host{ name event_handler... register 0 generic-host handle-host-event # Development web server define host{ host_name event_handler... devweb1 null 8
Object Definitions Extended info definitions: Hostextinfo and Serviceextinfo object types are gone Extended info now stored in host and service definitions Existing definitions are still processed by Nagios and automatically merged with host/service definitions # Dev server HTTP define service{ host_name devweb1 description HTTP icon_image iis40.png icon_image_alt IIS 5 notes This is a web server notes_url http://someurl action_url http://someurl... 9
Subgroup references: Object Definitions Host, service, and contact groups can now reference other groups for membership Referencing Groups # All Windows servers define hostgroup{ hostgroup_name hostgroup_names members windows-servers web-servers,file-servers pdc,bdc,!fs1 Referencing Individual Hosts # All Windows servers define hostgroup{ hostgroup_name members windows-servers pdc,bdc,a,b,c,x,y,z # Windows web servers define hostgroup{ hostgroup_name members web-servers a,b,c # Windows web servers define hostgroup{ hostgroup_name members web-servers a,b,c # Windows file servers define hostgroup{ hostgroup_name members file-servers x,y,z,fs1 # Windows file servers define hostgroup{ hostgroup_name members file-servers x,y,z,fs1 10
Contacts: Object Definitions Notifications for hosts, services, and escalations can now be configured for individual contacts, rather than groups define host{ host_name contacts... define host{ host_name contactgroups... define host{ host_name contactgroups contacts... devweb1 paul,sheila devweb2 web-developers devweb3 web-developers!paul,gunter,shiela 11
First notification delay: Notifications Delay 1 st notification until problem persists for x minutes Previously tough to do (had to use escalations) Scheduled downtime: Notifications on downtime start, end, cancellation Custom (TODO): User-initiated, custom notifications about hosts, services define host{ host_name devweb1 first_notification_delay 15 notification_options d,u,r,s... 12
Plugin Output Multiline output and perfdata: Extension of current plugin spec Maintains compatibility with existing plugins Supported for host/service and active/passive checks No inherent limit on # of lines or characters in output Current plugin spec: 13
New plugin spec: Plugin Output 14
Custom Object Variables Custom variables: Available in host, service, contact definitions Prefixed with an underscore (e.g. _mycustomvar) Contain user-specified data Passwords SNMP community strings Location information Instant messaging addresses Accessible in macros and environment vars Values can be modified via external commands 15
Custom Object Variables Example - Custom host variables: Host Definition define host{ host_name devweb1 address 192.168.0.1 _mac_address 00-06-5B-75-AD-EB _LOCATION Room 451, Lenard Hall _InventoryID 560781 _owner Paul Lezaro... Macros $_HOSTMAC_ADDRESS$ = 00-06-5B-75-AD-EB $_HOSTLOCATION$ = Room 451, Lenard Hall $_HOSTINVENTORYID$ = 560781 $_HOSTOWNER$ = Paul Lezaro Environment Vars NAGIOS HOSTMAC_ADDRESS = 00-06-5B-75-AD-EB NAGIOS HOSTLOCATION = Room 451, Lenard Hall NAGIOS HOSTINVENTORYID = 560781 NAGIOS HOSTOWNER = Paul Lezaro 16
Custom Object Variables Example - Custom service variables: Service Definition define service{ host_name description _SNMP_community _Notes... router1 Uptime secret Some notes... Macros $_SERVICESNMP_COMMUNITY$ = secret $_SERVICENOTES$ = Some notes... Environment Vars NAGIOS SERVICESNMP_COMMUNITY = secret NAGIOS SERVICENOTES = Some notes... 17
Custom Object Variables Example - Custom contact variables: Contact Definition define contact{ contact_name paul _AIM_username something _Skype_number 555555555 _Yahoo_ID something... Macros $_CONTACTAIM_USERNAME$ = something $_CONTACTSKYPE_NUMBER$ = 555555555 $_CONTACTYAHOO_ID$ = something Environment Vars NAGIOS CONTACTAIM_USERNAME = something NAGIOS CONTACTSKYPE_NUMBER = 555555555 NAGIOS CONTACTYAHOO_ID = something 18
Major overhaul! Host Check Logic Host checks are no longer a major bottleneck Most host checks run in parallel Scheduled host checks now help performance Host checks now have a retry interval 19
Old Host Check Logic All hosts UP to start Service problem detected 20
Old Host Check Logic Host is checked max_attempts times Host is determined to be not up Is it down or unreachable? 21
Old Host Check Logic Host check propagated to parent Parent is not up 22
Old Host Check Logic Check propagated to grandparent Granparent host is UP 23
Old Host Check Logic Status of host and parent can now be determined 24
Old Host Check Logic Child hosts are checked (serially) and found to be unreachable as well 25
Old Host Check Logic Terrible performance! All checks performed serially Everything else is put on hold No notifications, service checks, etc. Time cost: (hosts) x (attempts/host) x (time/attempt) Worst case cost: (8 hosts) x (3 attempts/host) x (5 seconds/attempt) = 120 seconds! Best case cost: (8 hosts) x (1 attempts each) x (5 seconds/attempt) = 40 seconds 26
New Host Check Logic All hosts UP to start Service problem detected 27
New Host Check Logic Host is checked 1 time (real or cached) Host is determined to be not up Is it down or unreachable? 28
New Host Check Logic Assuming max attempts > 1... Switch2 goes into a soft down state Parallel checks of parent and child hosts are initiated 29
New Host Check Logic Parent and children are not up 30
New Host Check Logic Soft states set for parent/child hosts Switch2 is soft unreachable after another re-check Parallel checks propagated to extended relatives 31
New Host Check Logic Eventually... Parallel checks propagate to all necessary hosts Max attempts reached for all hosts Hosts enter hard states 32
New Host Check Logic Determining current host status: Current host status is critical in monitoring How old is too old? Should the host be rechecked or can we use latest state? Cached host checks: If last host check result is fresh enough (within cached check horizon), use old/cached status If not, run an actual check of the host 33
New Host Check Logic Predictive dependency checks: Host is in a soft problem state Parallel checks of all hosts it depends on will also be launched Helps ensure accurate dependency tests for notifications 34
New Host Check Logic Much better performance: Most checks performed in parallel Cached results mean less overhead Notifications, service checks, etc. are not delayed Scales better especially in network outages Best performance when: Host checks are regularly scheduled Max attempts > 1 Cached host checks are enabled 35
New Host Check Logic Check logic options: use_old_host_check_logic=[0/1] 0 = Use new host check logic (3.x) 1 = Use old host check logic (2.x and earlier) cached_host_check_horizon=[#] Seconds before host status need to be rechecked enable_predictive_host_dependency_checks=[0/1] 0 = No predictive checks (2.x and earlier) 1 = Perform predictive checks 36
Future Plans Nagios 4.x: Other: DB integration (MySQL/Postgres) NDOUtils addon PHP-based GUI with Multiple instance support Internationalization Easier addon integration Community website for news, events, etc. Documentation wiki of, by, and for the community 37
Questions? nagios@nagios.org