Page 3 of 4

Re: "HOST UP" flood after Nagios Core update

Posted: Wed Jun 27, 2018 9:26 am
by madsantos
scottwilkerson wrote:Can you determine if the email are coming form this server or the server with obsess_over_services enabled?
We have a "nagios central" which does passive checks to every host on their different networks. Let's call it Central. Central is the one which sends the emails, but its "obsess_over_hosts" and "obsess_over_services" flags are 0 (as you could see in the previous post, when I sent you the code from nagios.cfg). The flag was never 1 on Central, but it was always 1 on every machine so that we know when a machine goes offline. If I delete the ochp command or turn the flag into 0 on any machine (besides Central), I will get no warnings about machine STATE changes, but if I maintain the flag at 1, as it was previous to the update, then I will get a flood of "HOST IS UP" messages every single check. I feel like the problem has more to do with the fact that Nagios does not understand that the previous STATE was UP, and it shouldn't send any more messages unless the STATE changes. You can see in the picture below that the emails start with a "RECOVERY" Notification, which makes no sense because the machine was always ON. I currently have one machine with the ohcp command enabled, which I am using to debug at the cost of receiving one message per 5 minutes:

https://imgur.com/a/LCnO2fE

EDIT: So to answer the question, the mail comes from Central, which has no "obsess_over_services" or "obsess_over_hosts" flag enabled. It only passive checks and sends emails in case it receives STATE changes.

Re: "HOST UP" flood after Nagios Core update

Posted: Wed Jun 27, 2018 3:35 pm
by scottwilkerson
That sheds more light. Can you complete the exercise you did earlier but from the central server?
scottwilkerson wrote:would it be possible to have you peer into the objects.cached and grab the full definition for one of these hosts, and then obfuscate any sensitive info and post the definition?

Re: "HOST UP" flood after Nagios Core update

Posted: Thu Jun 28, 2018 4:33 am
by madsantos
Hi and thanks for your patience, hopefully we can find the cause of this behavior. Below is the code referring to the only host I chose to maintain the "obsess_over_hosts" at 1 and whose "recovery" emails are flooding my inbox.

Code: Select all

define host {
	host_name	some_host
	alias  	  Some Host
	address  	some_address
	check_period	24x7
	check_command	check-host-alive
	contact_groups	admins
	notification_period	24x7
	initial_state	o
	importance	0
	check_interval	5.000000
	retry_interval	1.000000
	max_check_attempts	10
	active_checks_enabled	0
	passive_checks_enabled	1
	obsess	1
	event_handler_enabled	1
	low_flap_threshold	0.000000
	high_flap_threshold	0.000000
	flap_detection_enabled	1
	flap_detection_options	a
	freshness_threshold	0
	check_freshness	0
	notification_options	r,d
	notifications_enabled	1
	notification_interval	0.000000
	first_notification_delay	0.000000
	stalking_options	n
	process_perf_data	1
	retain_status_information	1
	retain_nonstatus_information	1
	}

Re: "HOST UP" flood after Nagios Core update

Posted: Thu Jun 28, 2018 10:15 am
by scottwilkerson
I honestly do see why this is happening.

Just to rule one other thing out, on the central server you do just have 1 nagios parent process running correct?

Code: Select all

ps -ef|grep nagios.cfg

Re: "HOST UP" flood after Nagios Core update

Posted: Thu Jun 28, 2018 11:55 am
by madsantos
scottwilkerson wrote:I honestly do see why this is happening.

Just to rule one other thing out, on the central server you do just have 1 nagios parent process running correct?

Code: Select all

ps -ef|grep nagios.cfg
Just to clarify things a bit more, the first machine I updated was Central, which didn't affect the amount of warnings. After I updated one of the monitored machines, that's when I started receiving all the alerts (from that specific machine, all outdated machines are fine). So I can't tell if this would happen if I only updated the other machines, while leaving Central outdated.

Here's the result of the filtered ps command:

Code: Select all

user@localhost:~$ ps -ef | grep nagios.cfg
user   6804   6556  0 17:41 pts/8    00:00:00 grep --color=auto nagios.cfg
nagios    98762      1  0 12:25 ?        00:00:40 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
nagios    98769  98762  0 12:25 ?        00:00:01 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

Re: "HOST UP" flood after Nagios Core update

Posted: Thu Jun 28, 2018 12:03 pm
by scottwilkerson
and you are 100% sure the central is what is sending the alerts?

Was this image on the central?
https://i.imgur.com/lm5RrG8.png

Can you look at the notification history on the central and one of the upgraded servers?

Really notifications should be disabled on the non-central machines but I digress

Re: "HOST UP" flood after Nagios Core update

Posted: Fri Jun 29, 2018 4:42 am
by madsantos
scottwilkerson wrote:and you are 100% sure the central is what is sending the alerts?

Was this image on the central?
https://i.imgur.com/lm5RrG8.png

Can you look at the notification history on the central and one of the upgraded servers?

Really notifications should be disabled on the non-central machines but I digress
I'm 100% sure that it is indeed the central that is sending the alerts and I can confirm that the image corresponds to the Central notification area. I've double checked notifications on the "debug machine" and there are no related entries on the list.
To sum up: Central floods my email and smartphone with "host up" alerts and you can check their history on Central's notifications area. The other upgraded machines do not send those alerts and consequently there are no entries on their notifications area.

Re: "HOST UP" flood after Nagios Core update

Posted: Fri Jun 29, 2018 2:16 pm
by scottwilkerson
At present I don't have anything more to suggest other than to simply roll back the servers you upgraded to 4.2.0
as we cannot replicate the issue, and have and no other reports of same or similar happening

4.2.0 may be downloaded here
https://assets.nagios.com/downloads/nag ... 2.0.tar.gz

Re: "HOST UP" flood after Nagios Core update

Posted: Mon Jul 02, 2018 5:01 am
by madsantos
scottwilkerson wrote:At present I don't have anything more to suggest other than to simply roll back the servers you upgraded to 4.2.0
as we cannot replicate the issue, and have and no other reports of same or similar happening

4.2.0 may be downloaded here
https://assets.nagios.com/downloads/nag ... 2.0.tar.gz
I downgraded Central to 4.2.0 but managed to keep the other machines updated to 4.4.1. Everything is acting as it should now.

Thank you for your time and effort, cheers

Re: "HOST UP" flood after Nagios Core update

Posted: Mon Jul 02, 2018 10:56 am
by scottwilkerson
Well, it sound like for some reason the issue is just with the central.

If we every can replicate this I will post back here.