"HOST UP" flood after Nagios Core update

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: "HOST UP" flood after Nagios Core update

Postby madsantos » Wed Jun 27, 2018 9:26 am

scottwilkerson wrote:Can you determine if the email are coming form this server or the server with obsess_over_services enabled?


We have a "nagios central" which does passive checks to every host on their different networks. Let's call it Central. Central is the one which sends the emails, but its "obsess_over_hosts" and "obsess_over_services" flags are 0 (as you could see in the previous post, when I sent you the code from nagios.cfg). The flag was never 1 on Central, but it was always 1 on every machine so that we know when a machine goes offline. If I delete the ochp command or turn the flag into 0 on any machine (besides Central), I will get no warnings about machine STATE changes, but if I maintain the flag at 1, as it was previous to the update, then I will get a flood of "HOST IS UP" messages every single check. I feel like the problem has more to do with the fact that Nagios does not understand that the previous STATE was UP, and it shouldn't send any more messages unless the STATE changes. You can see in the picture below that the emails start with a "RECOVERY" Notification, which makes no sense because the machine was always ON. I currently have one machine with the ohcp command enabled, which I am using to debug at the cost of receiving one message per 5 minutes:

https://imgur.com/a/LCnO2fE

EDIT: So to answer the question, the mail comes from Central, which has no "obsess_over_services" or "obsess_over_hosts" flag enabled. It only passive checks and sends emails in case it receives STATE changes.
madsantos
 
Posts: 18
Joined: Fri Apr 27, 2018 5:26 am

Re: "HOST UP" flood after Nagios Core update

Postby scottwilkerson » Wed Jun 27, 2018 3:35 pm

That sheds more light. Can you complete the exercise you did earlier but from the central server?

scottwilkerson wrote:would it be possible to have you peer into the objects.cached and grab the full definition for one of these hosts, and then obfuscate any sensitive info and post the definition?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12331
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: "HOST UP" flood after Nagios Core update

Postby madsantos » Thu Jun 28, 2018 4:33 am

Hi and thanks for your patience, hopefully we can find the cause of this behavior. Below is the code referring to the only host I chose to maintain the "obsess_over_hosts" at 1 and whose "recovery" emails are flooding my inbox.

Code: Select all
define host {
   host_name   some_host
   alias       Some Host
   address     some_address
   check_period   24x7
   check_command   check-host-alive
   contact_groups   admins
   notification_period   24x7
   initial_state   o
   importance   0
   check_interval   5.000000
   retry_interval   1.000000
   max_check_attempts   10
   active_checks_enabled   0
   passive_checks_enabled   1
   obsess   1
   event_handler_enabled   1
   low_flap_threshold   0.000000
   high_flap_threshold   0.000000
   flap_detection_enabled   1
   flap_detection_options   a
   freshness_threshold   0
   check_freshness   0
   notification_options   r,d
   notifications_enabled   1
   notification_interval   0.000000
   first_notification_delay   0.000000
   stalking_options   n
   process_perf_data   1
   retain_status_information   1
   retain_nonstatus_information   1
   }
madsantos
 
Posts: 18
Joined: Fri Apr 27, 2018 5:26 am

Re: "HOST UP" flood after Nagios Core update

Postby scottwilkerson » Thu Jun 28, 2018 10:15 am

I honestly do see why this is happening.

Just to rule one other thing out, on the central server you do just have 1 nagios parent process running correct?

Code: Select all
ps -ef|grep nagios.cfg
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12331
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: "HOST UP" flood after Nagios Core update

Postby madsantos » Thu Jun 28, 2018 11:55 am

scottwilkerson wrote:I honestly do see why this is happening.

Just to rule one other thing out, on the central server you do just have 1 nagios parent process running correct?

Code: Select all
ps -ef|grep nagios.cfg

Just to clarify things a bit more, the first machine I updated was Central, which didn't affect the amount of warnings. After I updated one of the monitored machines, that's when I started receiving all the alerts (from that specific machine, all outdated machines are fine). So I can't tell if this would happen if I only updated the other machines, while leaving Central outdated.

Here's the result of the filtered ps command:

Code: Select all
user@localhost:~$ ps -ef | grep nagios.cfg
user   6804   6556  0 17:41 pts/8    00:00:00 grep --color=auto nagios.cfg
nagios    98762      1  0 12:25 ?        00:00:40 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
nagios    98769  98762  0 12:25 ?        00:00:01 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
madsantos
 
Posts: 18
Joined: Fri Apr 27, 2018 5:26 am

Re: "HOST UP" flood after Nagios Core update

Postby scottwilkerson » Thu Jun 28, 2018 12:03 pm

and you are 100% sure the central is what is sending the alerts?

Was this image on the central?
https://i.imgur.com/lm5RrG8.png

Can you look at the notification history on the central and one of the upgraded servers?

Really notifications should be disabled on the non-central machines but I digress
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12331
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: "HOST UP" flood after Nagios Core update

Postby madsantos » Fri Jun 29, 2018 4:42 am

scottwilkerson wrote:and you are 100% sure the central is what is sending the alerts?

Was this image on the central?
https://i.imgur.com/lm5RrG8.png

Can you look at the notification history on the central and one of the upgraded servers?

Really notifications should be disabled on the non-central machines but I digress


I'm 100% sure that it is indeed the central that is sending the alerts and I can confirm that the image corresponds to the Central notification area. I've double checked notifications on the "debug machine" and there are no related entries on the list.
To sum up: Central floods my email and smartphone with "host up" alerts and you can check their history on Central's notifications area. The other upgraded machines do not send those alerts and consequently there are no entries on their notifications area.
madsantos
 
Posts: 18
Joined: Fri Apr 27, 2018 5:26 am

Re: "HOST UP" flood after Nagios Core update

Postby scottwilkerson » Fri Jun 29, 2018 2:16 pm

At present I don't have anything more to suggest other than to simply roll back the servers you upgraded to 4.2.0
as we cannot replicate the issue, and have and no other reports of same or similar happening

4.2.0 may be downloaded here
https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.2.0.tar.gz
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12331
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: "HOST UP" flood after Nagios Core update

Postby madsantos » Mon Jul 02, 2018 5:01 am

scottwilkerson wrote:At present I don't have anything more to suggest other than to simply roll back the servers you upgraded to 4.2.0
as we cannot replicate the issue, and have and no other reports of same or similar happening

4.2.0 may be downloaded here
https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.2.0.tar.gz


I downgraded Central to 4.2.0 but managed to keep the other machines updated to 4.4.1. Everything is acting as it should now.

Thank you for your time and effort, cheers
madsantos
 
Posts: 18
Joined: Fri Apr 27, 2018 5:26 am

Re: "HOST UP" flood after Nagios Core update

Postby scottwilkerson » Mon Jul 02, 2018 10:56 am

Well, it sound like for some reason the issue is just with the central.

If we every can replicate this I will post back here.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12331
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

PreviousNext

Return to Nagios Core

Who is online

Users browsing this forum: delboy1966, gms, Google [Bot] and 28 guests