Nagios XI notification backlog

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
dgianetti
Posts: 10
Joined: Wed Jun 19, 2013 9:21 am
Location: Connecticut

Nagios XI notification backlog

Post by dgianetti »

We had a period of about a week where we were wondering if Nagios was sending email notifications. Today, I updated to the latest version and we are now being inundated with old notifications. The problem is they are all dated 'now'. The mail queue was empty and there were no signs that a subsystem was not working correctly. However, the upgrade went successfully and now it appears all the old notifications are emptying out.

First questions is where would all these event notifications queue up if they weren't in the mail queue? Is there something I can monitor to alert me to a build up here? Is there some other way to kick the system to get things going if this should happen again? I'm stumped.

As of today, we're on 5.6.2 of XI.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios XI notification backlog

Post by dwhitfield »

My suspicion is these were events in MySQL (or Postgres). Restarting the database might have done it, but you might have needed to clear the queues. I will leave it to someone in support to give you the info on clearing the queues.
dgianetti
Posts: 10
Joined: Wed Jun 19, 2013 9:21 am
Location: Connecticut

Re: Nagios XI notification backlog

Post by dgianetti »

Thanks. That was my suspicion too. 'mailq' returned '0' and there didn't appear to be any other signs of a backup. All the subsystems were reporting up as well, so it's disturbing that this problem existed and there was no indication. A way to monitor this queue would be great. Input from someone in support is earnestly solicited. :)

Thanks!
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI notification backlog

Post by tgriep »

In Nagios XI, when notifications are generated, they are put in to a Database table so another process can read the command and process it.
After it is processed, it should be removed from the database but I suspect that they were not removed.
When the server was upgraded, those stuck command were processed and that is where the false emails could of come from.

The xi_events SQL table is where they are stored but it typically should be empty.
Be sure to check out our Knowledgebase for helpful articles and solutions!
dgianetti
Posts: 10
Joined: Wed Jun 19, 2013 9:21 am
Location: Connecticut

Re: Nagios XI notification backlog

Post by dgianetti »

Fantastic! That's the answer I was after, thanks. In our case, notifications suddenly stopped being received. The notifications were showing up in reports as being sent, but the users reported never receiving them. Test emails sent from Nagios XI UI were received successfully, however. There were not other signs of things being queued up. I even restarted the server to see if that would help. Ultimately, the update seemed to shake it loose. Unfortunately for me, the system is relatively new and there are still skeptics in our organization. Events like this don't help my case.

I'm hoping I might put a monitor on that table and have it alert when the table starts to show signs of queuing up notifications. Was there something else I could have done to give things a kick and get it going again? We've run Nagios XI in a separate environment of ours for many years and have never encountered this issue before.

Thanks again for the help!
dgianetti
Posts: 10
Joined: Wed Jun 19, 2013 9:21 am
Location: Connecticut

Re: Nagios XI notification backlog

Post by dgianetti »

Actually, it looks like it's happening again. I just queried the table:
use nagiosxi;
select * from xi_events;

I get back 244 rows! First one is at 02:09:37 today (2019-05-29). If I'm understanding you correctly, that means the problem reoccurred at around 2am this morning and no notifications have been sent out since that time. This is weird.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI notification backlog

Post by tgriep »

Having data in that table is OK as it is queued up data for all of the events that the system is running. It is not exclusive for notifications.

The following query will only output the Notification events on that table.

Code: Select all

echo "select * from xi_events WHERE event_type=2;"| mysql -u root -pnagiosxi nagiosxi
Lets stop the nagios processes, clear out the temporary data in the SQL tables, run a repair and start up the processes.
Run the following as root.

Code: Select all

service npcd stop
service nagios stop
service ndo2db stop
service crond stop
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm
service mysqld restart
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill -9 -u nagios
pkill python
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond start
Let the system run for 5 to 10 minutes and login to the GUI.
Go to the Home > Notifications menu and see if the system is generating Notifications.
If it is, see if the users are receiving them.

If not, we would have to see the System Profile from the server.
To get your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to the forum post.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked