Nagios Support Forum

Posted: **Wed Apr 24, 2019 3:42 am**

Hello,

I have NagiosXI 5.4.13
I have issue understanding the logic of the behaviour I am seeing.
The story - we have a remote site with 13 devices, which has terrible internet and frequent power outages so we got many flapping start, flapping stop alarms(4-5 every hour for 10 devices, and 2 devices died before I even added them to Nagios). I tried playing with the thresholds, but the flapping notifications continued. Now, I know I can just stop notifications for flapp, but once I did and the spam from them was gone I saw that there are 1 or 2 times every hour when the host is down long enough that it is considered down and not flapping so I still received 10-20 notifications every hour for the 10 flapping devices and a notification every 15 minutes for the 2 devices that had died a long time ago.
At that time I decided to abandon the flapping monitoring and to just give the host a first notification delay of 120 minutes(from the start the notification interval is 0 so I don't receive too many mails). The trouble is that I continue to receive host down notifications a few times every hour for these devices. It seems like any setting I try for notification configuration apart from notification_enabled on/off, doesn't get reflected.
I tried deleting configuration files and writing them again, I tried deleting the hosts and adding them again, I tried having the check_interval increased to 60 minutes(I figured I won't receive notifications if a check hasn't been made), I even added a parent host to make the devices unreachable, rather than down, but the behaviour remains the same - almost every 15 minutes I get "Host down" notification for nodes, that are configured like the below config(taken from /usr/local/nagios/var/objects.cache) .
The services on the hosts, apparently don't suffer from the same issue, as I stopped receiving notifications for the Ping service any more, I receive only for the "Host Down"
Can anyone please walk me through the notification logic - I have read many documents on it, quite a few topics in this forum as well and I seem to be missing a vital piece of information.

define host {
host_name <host name>
alias <alias>
address <IP address>
parents <site firewall>
check_period xi_timeperiod_24x7
check_command check_xi_host_ping!3000.0!80%!5000.0!100%
contacts <several contacts>
notification_period xi_timeperiod_24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 4
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,d,s
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 120.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
_XIWIZARD <custom wizard name>
}

Thanks.

Posted: **Wed Apr 24, 2019 2:15 pm**

I find looking at the /usr/local/nagios/var/nagios.log file can be helpful for situations when one isn't sure why alerts and notifications are triggering when they do. The log will contain entries for state changes and notifications. I'd try to avoid making any config changes while you're monitoring the log - keep the config as static as possible and note the times of any odd behavior(emails you wouldn't expect, etc...) and then try to line those up with the events in the log.

And at the risk of providing links to documents you've probably seen, here are some that go over flapping and notification logic:

https://assets.nagios.com/downloads/nag ... pping.html
https://assets.nagios.com/downloads/nag ... tions.html
http://sites.box293.com/nagios/guides/c ... oft-states

Feel free to PM me the logs, examples to look for and corresponding configurations.

Nagios Support Forum

notifications troubles

notifications troubles

Re: notifications troubles