notifications troubles
Posted: Wed Apr 24, 2019 3:42 am
Hello,
I have NagiosXI 5.4.13
I have issue understanding the logic of the behaviour I am seeing.
The story - we have a remote site with 13 devices, which has terrible internet and frequent power outages so we got many flapping start, flapping stop alarms(4-5 every hour for 10 devices, and 2 devices died before I even added them to Nagios). I tried playing with the thresholds, but the flapping notifications continued. Now, I know I can just stop notifications for flapp, but once I did and the spam from them was gone I saw that there are 1 or 2 times every hour when the host is down long enough that it is considered down and not flapping so I still received 10-20 notifications every hour for the 10 flapping devices and a notification every 15 minutes for the 2 devices that had died a long time ago.
At that time I decided to abandon the flapping monitoring and to just give the host a first notification delay of 120 minutes(from the start the notification interval is 0 so I don't receive too many mails). The trouble is that I continue to receive host down notifications a few times every hour for these devices. It seems like any setting I try for notification configuration apart from notification_enabled on/off, doesn't get reflected.
I tried deleting configuration files and writing them again, I tried deleting the hosts and adding them again, I tried having the check_interval increased to 60 minutes(I figured I won't receive notifications if a check hasn't been made), I even added a parent host to make the devices unreachable, rather than down, but the behaviour remains the same - almost every 15 minutes I get "Host down" notification for nodes, that are configured like the below config(taken from /usr/local/nagios/var/objects.cache) .
The services on the hosts, apparently don't suffer from the same issue, as I stopped receiving notifications for the Ping service any more, I receive only for the "Host Down"
Can anyone please walk me through the notification logic - I have read many documents on it, quite a few topics in this forum as well and I seem to be missing a vital piece of information.
define host {
host_name <host name>
alias <alias>
address <IP address>
parents <site firewall>
check_period xi_timeperiod_24x7
check_command check_xi_host_ping!3000.0!80%!5000.0!100%
contacts <several contacts>
notification_period xi_timeperiod_24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 4
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,d,s
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 120.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
_XIWIZARD <custom wizard name>
}
Thanks.
I have NagiosXI 5.4.13
I have issue understanding the logic of the behaviour I am seeing.
The story - we have a remote site with 13 devices, which has terrible internet and frequent power outages so we got many flapping start, flapping stop alarms(4-5 every hour for 10 devices, and 2 devices died before I even added them to Nagios). I tried playing with the thresholds, but the flapping notifications continued. Now, I know I can just stop notifications for flapp, but once I did and the spam from them was gone I saw that there are 1 or 2 times every hour when the host is down long enough that it is considered down and not flapping so I still received 10-20 notifications every hour for the 10 flapping devices and a notification every 15 minutes for the 2 devices that had died a long time ago.
At that time I decided to abandon the flapping monitoring and to just give the host a first notification delay of 120 minutes(from the start the notification interval is 0 so I don't receive too many mails). The trouble is that I continue to receive host down notifications a few times every hour for these devices. It seems like any setting I try for notification configuration apart from notification_enabled on/off, doesn't get reflected.
I tried deleting configuration files and writing them again, I tried deleting the hosts and adding them again, I tried having the check_interval increased to 60 minutes(I figured I won't receive notifications if a check hasn't been made), I even added a parent host to make the devices unreachable, rather than down, but the behaviour remains the same - almost every 15 minutes I get "Host down" notification for nodes, that are configured like the below config(taken from /usr/local/nagios/var/objects.cache) .
The services on the hosts, apparently don't suffer from the same issue, as I stopped receiving notifications for the Ping service any more, I receive only for the "Host Down"
Can anyone please walk me through the notification logic - I have read many documents on it, quite a few topics in this forum as well and I seem to be missing a vital piece of information.
define host {
host_name <host name>
alias <alias>
address <IP address>
parents <site firewall>
check_period xi_timeperiod_24x7
check_command check_xi_host_ping!3000.0!80%!5000.0!100%
contacts <several contacts>
notification_period xi_timeperiod_24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 4
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,d,s
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 120.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
_XIWIZARD <custom wizard name>
}
Thanks.