Critical notifications not being sent from XI

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Critical notifications not being sent from XI

Post by rferebee »

Hello,

We experienced an issue this month with a service check for one of our Windows servers. There is a service check configured for the C: drive which is set to go Critical when the drive space used exceeds 56.9GB. The service check went Critical May 1st at 22:30:27 and proceeded to send out a notification to the contact group assigned to it.

The issue is that the service remained in a Critical for 10 days and it is configured to send out an alert every 24 hours to the same contact group. For whatever reason, the service check only notified the Nagios Admin contact group using the 24 hour interval for the days the service was critical.

When the status changed on May 11th back to Warning, there was no notification sent to the contact group. The status then went back to Critical on May 19th and remained there for 3 days and no notifications were sent.

The server ended up cratering because no one was made aware of the issue via Nagios notifications. My superiors would like me to figure out why this happened to prevent it from happening in the future. Please see attached notification log and the graphical representation of the service status. I can also provide a System Profile if necessary.

Thank you.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Critical notifications not being sent from XI

Post by cdienger »

Please PM me a profile(Admin > System Config > System Profile > Download Profile) along with May's logs found in /usr/local/nagios/var/archives/ - please compress them if they are not already.

What is the name of the other contact group it should have notified?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Critical notifications not being sent from XI

Post by cdienger »

Data received and being reviewed. Are you certain about the dates and do you have emails from those days that can be provided? I'm just trying to line things up as best as possible and what I'm seeing is:

-The service is only configured to send recovery and critical alerts and not warnings.
-The service is currently configured to be escalated. It's first escalated notification is the 5 notification sent.
-The service is currently configured to escalate a second time. The second escalation occurs the 9th notification.
-The escalations are configured to notify a select group(probably Admin group you refer to).

This should help explain some of the behavior but I do still see some behavior I can't quite explain(a gap between the 10th and 22nd and a "custom" dispatcher).

It does look like there have been changes to the contactgroup memberships and possibly the notification handler. Are you aware of any changes that were made during the month to either of these or escalations?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Critical notifications not being sent from XI

Post by rferebee »

I am fairly certain about the dates of the events. I can PM you emails that my team received during the time this issue was happening.

Escalations are configured for most of our Service Checks, if after 5 days no one acknowledges or resolves the service alert then an escalation email is sent to our Nagios Admin group every day for 5 days and once more on the 9th day. Those emails seemed to go out just fine during this time.

The ones I'm worried about are the ones that were supposed to be sent to the ServerSupportContact group. It appears, looking at the notification log, that only one notification was sent to that contact group on May 1st and then never again despite the Alert Settings being configured to send a notification every 1440 minutes. See screen shot attached.
Where are you seeing that only Recovery and Critical alerts are sent out?

The ServerSupportContact group is added to multiple service checks weekly, basically whenever we add new devices for monitoring it's possible that group will be added to alerting. I'm not aware of any changes to the notification handler for the group.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Critical notifications not being sent from XI

Post by cdienger »

Thanks for confirming. I'm labbing this up to see if I can reproduce and will keep you updated.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Critical notifications not being sent from XI

Post by rferebee »

The last notification that my XI environment sent out was 5/24/2019 at 10:15:00 and hasn't sent one out since.

Something happened, I'm not sure what.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Critical notifications not being sent from XI

Post by rferebee »

Wait. Somehow my fail over server backup was written to my Production server which disabled notifications.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Critical notifications not being sent from XI

Post by rferebee »

I don't know what the heck happened, but I think I have it fixed. Somehow the settings in my Prod environment were overwritten by the settings in my fail over environment sometime on or before May 24th.

My fail over environment is configured to not send out notifications, so my Prod environment hasn't been sending out any alerts since the 24th... damn.

Having 3 different environments that all backup to and from each other is a real pain sometimes.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Critical notifications not being sent from XI

Post by cdienger »

Thanks for the update! Are we okay to lock this one up or did you have any further questions/concerns about this?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Critical notifications not being sent from XI

Post by rferebee »

Hold on, sorry for the confusion.

The original issue I opened this thread for is NOT resolved. We still need to figure out why the ServerSupportContact group didn't get their notifications from May 11th-19th.

The issue I was talking about yesterday was something different entirely. Honestly, I shouldn't even have mentioned it in this thread.
Locked