Some notifications not firing after upgraded to 5.5.1

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
gmackey
Posts: 35
Joined: Wed Mar 22, 2017 1:13 pm
Location: Edmond, OK
Contact:

Some notifications not firing after upgraded to 5.5.1

Post by gmackey »

Last night we had a host and a service on another host go down and did not receive notifications for them from Nagios XI. I only found out about them because of another host that was affected in which case Nagios XI sent the notifications like it should have. I checked at the time and they were definitely in Unhandled status and had notifications enabled and contact groups assigned. Looking at the Notifications log in the XI web interface, it is clear that Nagios XI wasn't even trying to send notifications to anyone for this host or the service in question. We have far fewer notifications in that log than normal since the upgrade to 5.5.1 from 5.4. Our Linux admin also noticed the day before that he wasn't receiving notifications for something that exhibited the same behavior as this.

Could you tell me what the next troubleshooting step should be? I'm sure there's a log somewhere I need to check to find out what's going on. Thanks!
=================================================
Nagios XI 5.6.5 Enterprise
CentOS 6.10 (64-bit) VMware image
SSL implemented and forced, with exception for localhost
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Some notifications not firing after upgraded to 5.5.1

Post by jomann »

Well you should check out the event log for sure, and search for the host/service that was not sending out a notification. Check for the alert and see if it tried to notify. If it did though it should have shown up in the notifications section. The most obvious things to check after that are that the host/service is going to have the actual settings to be able to alert. If you want to view the actual flattened definition you can check the objects.cache (/usr/local/nagios/var/objects.cache) for the host/service definition and look for the notification_options. There was no downtime or anything occurring during this right?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
gmackey
Posts: 35
Joined: Wed Mar 22, 2017 1:13 pm
Location: Edmond, OK
Contact:

Re: Some notifications not firing after upgraded to 5.5.1

Post by gmackey »

There is clearly an issue after checking the event log.

Here is a service that sends notifications properly and is configured identically from what I can see:

SERVICE ALERT: Baubo;Svc - Apache Tomcat 8;OK;HARD;1;Tomcat8=running (auto)
SERVICE ALERT: Baubo;Svc - Apache Tomcat 8;CRITICAL;HARD;5;critical(Tomcat8=stopped (auto))
SERVICE ALERT: Baubo;Svc - Apache Tomcat 8;UNKNOWN;HARD;5;Failed to open service manager: 1115: A system shutdown is in progress.
SERVICE ALERT: Baubo;Svc - Apache Tomcat 8;CRITICAL;SOFT;1;critical(Tomcat8=stopping (auto))

And here is a service that does not trigger notifications to be sent:

SERVICE ALERT: Kane;Svc - SCCM;CRITICAL;SOFT;1;critical(SMS_EXECUTIVE=stopped (auto))

It never gets past that first service alert. Same goes for another service I tried. All of these services and hosts were sending notifications right before we upgrade from 5.4.13 to 5.5.1 and I have evidence of this because of a temporary network misconfiguration on a core router that brought every single host and service down, resulting in about 14,500 notifications. So yeah, notifications working great before upgrade and not triggering after upgrade. Some of them still work, though, but I can't see a difference in the config on those.
=================================================
Nagios XI 5.6.5 Enterprise
CentOS 6.10 (64-bit) VMware image
SSL implemented and forced, with exception for localhost
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Some notifications not firing after upgraded to 5.5.1

Post by scottwilkerson »

Are the hosts down for the services that are not sending?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
gmackey
Posts: 35
Joined: Wed Mar 22, 2017 1:13 pm
Location: Edmond, OK
Contact:

Re: Some notifications not firing after upgraded to 5.5.1

Post by gmackey »

No, the hosts were left operational. I literally just picked a random service from a random host that was previously sending email notifications days before to test. The other service I tested was the one that I mentioned that was down recently when another related host (a shared database server) was completely down.
=================================================
Nagios XI 5.6.5 Enterprise
CentOS 6.10 (64-bit) VMware image
SSL implemented and forced, with exception for localhost
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Some notifications not firing after upgraded to 5.5.1

Post by scottwilkerson »

I have a feeling this may be due to a reported bug about some settings not getting refactored properly in Core
https://github.com/NagiosEnterprises/na ... issues/557

I added this thread to that ticket so when a fix is available they can notify this thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Some notifications not firing after upgraded to 5.5.1

Post by scottwilkerson »

I believe I found the cause in Core and is fixed in the maint branch on Github
https://github.com/NagiosEnterprises/na ... ee/maint​​

Code: Select all

wget https://github.com/NagiosEnterprises/nagioscore/archive/maint.tar.gz​
tar xzf maint.tar.gz​
cd nagioscore-maint​
configureflags="--with-command-group=​nagcmd"
if [ ! `command -v systemctl` ] || [ -f /etc/init.d/nagios ]; then
    configureflags="--with-init-type=sysv $configureflags"
fi
./configure "$configureflags"​
make -j 2 all​
make install​

service nagios restart

After this once the services stuck in soft state go to OK state either naturally, or by stopping nagios and removing retention.dat they should no longer get stuck
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked