My concern is whether other service monitors that we have not discovered could have a similar issue. We do not want to go through the process of testing/deleting/creating all of them.
Tests:
- Submitting a passive check result of CRITICAL on any service and we get an alert
Forcing a failure on the node the service monitor is checking for does not trigger an alert (after the allocated number of times to check before generating an alert)
Some services for the same host will send an alert when there is a failure
Services that are able to send alerts show attempts to do so in the Nagios and email logs
Services that are not able to send alerts do not show attempts to do so in the Nagios and email logs
Enabling debug for Nagios did not reveal any attempt for the problem service to trigger an alert (using tail -F /usr/local/nagios/var/nagios.debug)
Deleting a service monitor and recreating it will create a fully functioning service monitor that alerts us when there is a failure
- Services, that send alerts vs services that do not, appear to be configured the same
Service monitors are set to check every 5 minutes
When a problem is detected, check every 1 minute for 5 minutes before sending an alert
Notifications go to the same groups when comparing service monitors that send alerts and those that do not
- Manage hosts are the same and use the same host
Templates: xiwizard_website_http_content_service
Manage host groups: 0
Manage service groups: 1 (tried removing this so it was 0, thinking the group might block alerts, no luck when testing non-alerting service monitors)
Active: checked
Initial State: <none selected>
Check interval: 5
Retry interval: 1
Max attempts: 5
Active checks enabled: Skip
Passive checks enabled: Skip
Check period: xi_timeperiod_24x7
Freshness threshold: <blank>
Check freshness: Skip
Obsess over service: Skip
Event handler: <blank>
Event handler enabled: Skip
Low flap threshold: <blank>
High flap threshold: <blank>
Flap detection enabled: Skip
Flap detection options: <none selected>
Retain Satus information: Skip
Retain non-status information: Skip
Process perf data: Skip
Is volatile: Skip
Manage Contacts: same for both services
Manage contact groups: none
Notification period: xi_timeperiod_24x7
Notification options: Warning, Critical, Unknown, Recovery, Flapping, Scheduled Downtime
Notification interval: 60
First notification delay: 0
Notification enabled: On (we have tested Skip here and services that send alerts continue to function, those that do not continue to not send alerts)
Stalking options: <none selected>
- Nagios XI 5.5.8
RedHat 7.6