notification troubles after downtime end.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

notification troubles after downtime end.

Post by MichielvM »

Hi all,

This weekend I had a group of Hosts going down for maintenance purposes.
I added them to a temporary hostgroup and dropped a downtime schedule on that group.
After rebooting, some hosts and some services (not all related to each other) did not return to an ok state.
The downtime period expired, so I expect that Nagios would send out notifications of these non-ok states. It didn't.
Our techs had to find out by looking at the Xi operations centre that some services we're still not OK.

By default, all hosts have Notifications disabled. This was mentioned by some as the possible culprit.
Looking through manuals, I cannot find anything to confirm this statement.

Note: My setup for a scheduled downtime for a host is adding schedules for both host and it's services. so basically there's two schedule periods active.
I read that scheduling only the host is enough to also hush it's associated services. Is that correct? Could that have anything to do with it?
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: notification troubles after downtime end.

Post by cmerchant »

Here is a good writeup in the online Core documentation for notifications:

http://nagios.sourceforge.net/docs/nagi ... tions.html

From one of your points regarding the service checks:
As a side note, notifications for services are suppressed if the host they're associated with is in a period of scheduled downtime.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: notification troubles after downtime end.

Post by Box293 »

MichielvM wrote: After rebooting, some hosts and some services (not all related to each other) did not return to an ok state.
The downtime period expired, so I expect that Nagios would send out notifications of these non-ok states. It didn't.
If these host and service objects were acknowledged. then no notifications will be sent from that point on. This is regardless if a downtime period is currently in effect.

Also, if you have host and service escalations defined that only run for x amount of notifications, once these pass then no more notifications will be sent.

I hope some of this helps.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: notification troubles after downtime end.

Post by Fred Kroeger »

I have had the same issue where we schedule downtime for serevr patching. Unfortunately if a server is still down at the end of the downtime becauase it didn't restart , Nagios *does not* send a notification.

I have logged a case for this as it now means that I can't rely on Nagios for a lights out operation. Someone needs to check the Nagios screen after the patching to ensure that all servers came up again.
Unfortunately, this doesn't seem to be a priority as there has been no response to this since I logged this 2 months ago.

http://tracker.nagios.org/view.php?id=660

Fred
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

Re: notification troubles after downtime end.

Post by MichielvM »

Fred Kroeger wrote:I have had the same issue where we schedule downtime for serevr patching. Unfortunately if a server is still down at the end of the downtime becauase it didn't restart , Nagios *does not* send a notification.

I have logged a case for this as it now means that I can't rely on Nagios for a lights out operation. Someone needs to check the Nagios screen after the patching to ensure that all servers came up again.
Unfortunately, this doesn't seem to be a priority as there has been no response to this since I logged this 2 months ago.

http://tracker.nagios.org/view.php?id=660

Fred
Seems logical to me, that if a host does not check OK after Downtime ends there should be bells and whistles all over the damn place.
Hint to Nagios: prioritize Fred's case!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: notification troubles after downtime end.

Post by tmcdonald »

What XI and Core versions are you on? I tried to replicate but could not. Here was my setup:

1.) Create a dummy service attached to localhost that does check_dummy 0 with flapping off, notifications on a 5-minute repeat, and checks on 1 minute for both OK and non-OK states
2.) Force some checks to get a history going
3.) Change it to check_dummy 2 to produce a critical state
4.) Force more checks, receive critical email
5.) Schedule 5-minute downtime
6.) Force even more checks during that downtime, do not receive email
7.) Downtime ends, within 5 minutes I have an email notifying me of a critical state
Former Nagios employee
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: notification troubles after downtime end.

Post by jdalrymple »

I was able to reproduce the issue in XI, however I was working with a host, not a service. One potential workaround would be to add in the notification for scheduled downtime events as that will indicate for you the status of the host at both the beginning and the end of the downtime so you can react to any remaining down. This is far from ideal though, I would agree... especially if you have 1000 hosts affected by the downtime but only 1 doesn't recover.

I will bring this up with the developers and see if there is some underlying logic that we're overlooking. If they can't provide any we'll indicate the severity of the bug to them and hopefully it gets pushed up the list.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: notification troubles after downtime end.

Post by Fred Kroeger »

All Notifications are set to 0 so only 1 notification is sent out. We send an email to a ticketing system so sending more than 1 notification is not an option.

Regards Fred
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

Re: notification troubles after downtime end.

Post by MichielvM »

Core is : 4.0.8
Xi is : 2014R2.6

I'm gonna set up a sandbox to play with and post results back here. In the meantime I would apprciate it if your development team can shed some light on this.
I really have no clue what I could have overlooked when scheduling this downtime.
The host/services that failed had no active acknowledgements.

A point to add; The history graphs show nothing between downtimeend and the time our tech dept. fixed the host.
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

Re: notification troubles after downtime end.

Post by MichielvM »

In my OP I mentioned that by default all hosts have a no-notify profile. We only react to service checks.
Is it possible that this has something to do with it?
Locked