Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
scottwilkerson wrote:Can you determine if the email are coming form this server or the server with obsess_over_services enabled?
We have a "nagios central" which does passive checks to every host on their different networks. Let's call it Central. Central is the one which sends the emails, but its "obsess_over_hosts" and "obsess_over_services" flags are 0 (as you could see in the previous post, when I sent you the code from nagios.cfg). The flag was never 1 on Central, but it was always 1 on every machine so that we know when a machine goes offline. If I delete the ochp command or turn the flag into 0 on any machine (besides Central), I will get no warnings about machine STATE changes, but if I maintain the flag at 1, as it was previous to the update, then I will get a flood of "HOST IS UP" messages every single check. I feel like the problem has more to do with the fact that Nagios does not understand that the previous STATE was UP, and it shouldn't send any more messages unless the STATE changes. You can see in the picture below that the emails start with a "RECOVERY" Notification, which makes no sense because the machine was always ON. I currently have one machine with the ohcp command enabled, which I am using to debug at the cost of receiving one message per 5 minutes:
EDIT: So to answer the question, the mail comes from Central, which has no "obsess_over_services" or "obsess_over_hosts" flag enabled. It only passive checks and sends emails in case it receives STATE changes.
That sheds more light. Can you complete the exercise you did earlier but from the central server?
scottwilkerson wrote:would it be possible to have you peer into the objects.cached and grab the full definition for one of these hosts, and then obfuscate any sensitive info and post the definition?
Hi and thanks for your patience, hopefully we can find the cause of this behavior. Below is the code referring to the only host I chose to maintain the "obsess_over_hosts" at 1 and whose "recovery" emails are flooding my inbox.
Just to clarify things a bit more, the first machine I updated was Central, which didn't affect the amount of warnings. After I updated one of the monitored machines, that's when I started receiving all the alerts (from that specific machine, all outdated machines are fine). So I can't tell if this would happen if I only updated the other machines, while leaving Central outdated.
Can you look at the notification history on the central and one of the upgraded servers?
Really notifications should be disabled on the non-central machines but I digress
I'm 100% sure that it is indeed the central that is sending the alerts and I can confirm that the image corresponds to the Central notification area. I've double checked notifications on the "debug machine" and there are no related entries on the list.
To sum up: Central floods my email and smartphone with "host up" alerts and you can check their history on Central's notifications area. The other upgraded machines do not send those alerts and consequently there are no entries on their notifications area.
At present I don't have anything more to suggest other than to simply roll back the servers you upgraded to 4.2.0
as we cannot replicate the issue, and have and no other reports of same or similar happening
scottwilkerson wrote:At present I don't have anything more to suggest other than to simply roll back the servers you upgraded to 4.2.0
as we cannot replicate the issue, and have and no other reports of same or similar happening