"Last check" occasionally hangs

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
tmvision
Posts: 32
Joined: Fri Dec 01, 2017 8:15 am

"Last check" occasionally hangs

Post by tmvision »

Hi,

Our Nagios XI installation has started acting up, where it doesn't look like it is taking in new service checks. That is, the status on in the web front-end isn't updating (most obvious is the "Last check" column, but as far as I can tell none of the fields are updated).
Looking at the performance graph of a service shows that data is in fact collected, as the graph is up-to-date. But we are not sure if state-history is actually saved or not.
It will typically hang for up to an hour, and then suddenly resume as if nothing had happened. Then it may suddenly hang again.
We followed the troubleshooting steps outlined in https://support.nagios.com/kb/article.php?id=19 and came to the conclusion that this may be a problem with ndo2db, as the information in "classic" Nagios (non-XI) is up-to-date. When looking in /etc/mariadb/mariadb.log we see no mention of crashed tables.

We have enabled debug-logging for ndo2db debug_level=3 and notice that this log is at a standstill when hang occurs, and resume printing when the system works. I don't see anything in this log describing the actual problem though. Sometimes we get the following in our nagios.log:

Code: Select all

[1512126403] ndomod: Error writing to data sink!  Some output may get lost...
[1512126403] ndomod: Please check remote ndo2db log, database connection or SSL Parameters
Can you help us track down this odd behavior?

System:
CentOS 7, 64-bit
Nagios XI 5.4.11 (manually installed by downloading installation-script)
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: "Last check" occasionally hangs

Post by npolovenko »

Hello, @tmvision. Could you still upload ndo2db.log? Maybe we'll be able to spot something unusual. Also, it would be useful to get your system profile, so we can take a look at the cfg configs and other system log files. To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post. Or you may upload it to the cloud storage of your choice and share a link with me in PM.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tmvision
Posts: 32
Joined: Fri Dec 01, 2017 8:15 am

Re: "Last check" occasionally hangs

Post by tmvision »

Hello

I am attaching the profile.zip to this post
You do not have the required permissions to view the files attached to this post.
tmvision
Posts: 32
Joined: Fri Dec 01, 2017 8:15 am

Re: "Last check" occasionally hangs

Post by tmvision »

and the ndo2 debug logs
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: "Last check" occasionally hangs

Post by npolovenko »

@tmvision, Thank you. I see two problems right away. The first one is that a lot of times NPCD is timing out. And right now your time out value is too small. Please navigate to /usr/local/nagios/etc/pnp/process_perfdata.cfg and chnage the timeout value:

Code: Select all

TIMEOUT = 5
To this:

Code: Select all

TIMEOUT = 40
After that please run:

Code: Select all

service npcd restart
Also, there're a lot of the messages indicating crashed DB tables. Please navigate to /usr/local/nagiosxi/scripts/ and run:

Code: Select all

./repair_databases.sh
Let us know if that fixed your issue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tmvision
Posts: 32
Joined: Fri Dec 01, 2017 8:15 am

Re: "Last check" occasionally hangs

Post by tmvision »

Thank you.

We have now applied the configuration changed, and repaired the database (again)

We will keep an eye on the system to see if any issues occur again, as it was an issue which wasn't apparent right away the first time around.
tmvision
Posts: 32
Joined: Fri Dec 01, 2017 8:15 am

Re: "Last check" occasionally hangs

Post by tmvision »

The issue unfortunately was not resolved.

The fields are still not being updated and we even had to reboot the server all of a sudden due to not be able to reach it.

I am attaching the latest debug logs. I had to clear up the dbmaint.lock but that should be resolved.
You do not have the required permissions to view the files attached to this post.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: "Last check" occasionally hangs

Post by dwhitfield »

Please take a look at https://support.nagios.com/kb/article.php?id=139

You definitely have kernel queue issues. I'm not seeing the messages I'd expect in the syslog, but we just get a tail of it, so we might have just missed it.
Locked