"Last check" occasionally hangs

tmvision · Post by **tmvision** » Fri Dec 01, 2017 8:44 am

Hi,

Our Nagios XI installation has started acting up, where it doesn't look like it is taking in new service checks. That is, the status on in the web front-end isn't updating (most obvious is the "Last check" column, but as far as I can tell none of the fields are updated).
Looking at the performance graph of a service shows that data is in fact collected, as the graph is up-to-date. But we are not sure if state-history is actually saved or not.
It will typically hang for up to an hour, and then suddenly resume as if nothing had happened. Then it may suddenly hang again.
We followed the troubleshooting steps outlined in https://support.nagios.com/kb/article.php?id=19 and came to the conclusion that this may be a problem with ndo2db, as the information in "classic" Nagios (non-XI) is up-to-date. When looking in /etc/mariadb/mariadb.log we see no mention of crashed tables.

We have enabled debug-logging for ndo2db debug_level=3 and notice that this log is at a standstill when hang occurs, and resume printing when the system works. I don't see anything in this log describing the actual problem though. Sometimes we get the following in our nagios.log:

Code: Select all

[1512126403] ndomod: Error writing to data sink!  Some output may get lost...
[1512126403] ndomod: Please check remote ndo2db log, database connection or SSL Parameters

Can you help us track down this odd behavior?

System:
CentOS 7, 64-bit
Nagios XI 5.4.11 (manually installed by downloading installation-script)

npolovenko · Post by **npolovenko** » Fri Dec 01, 2017 10:52 am

Hello, @tmvision. Could you still upload ndo2db.log? Maybe we'll be able to spot something unusual. Also, it would be useful to get your system profile, so we can take a look at the cfg configs and other system log files. To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post. Or you may upload it to the cloud storage of your choice and share a link with me in PM.

tmvision · Post by **tmvision** » Mon Dec 04, 2017 3:21 am

Hello

I am attaching the profile.zip to this post

tmvision · Post by **tmvision** » Mon Dec 04, 2017 4:10 am

and the ndo2 debug logs

npolovenko · Post by **npolovenko** » Mon Dec 04, 2017 12:08 pm

@tmvision, Thank you. I see two problems right away. The first one is that a lot of times NPCD is timing out. And right now your time out value is too small. Please navigate to /usr/local/nagios/etc/pnp/process_perfdata.cfg and chnage the timeout value:

Code: Select all

TIMEOUT = 5

To this:

Code: Select all

TIMEOUT = 40

After that please run:

Code: Select all

service npcd restart

Also, there're a lot of the messages indicating crashed DB tables. Please navigate to /usr/local/nagiosxi/scripts/ and run:

Code: Select all

./repair_databases.sh

Let us know if that fixed your issue.

tmvision · Post by **tmvision** » Tue Dec 05, 2017 4:31 am

Thank you.

We have now applied the configuration changed, and repaired the database (again)

We will keep an eye on the system to see if any issues occur again, as it was an issue which wasn't apparent right away the first time around.

tmvision · Post by **tmvision** » Tue Dec 05, 2017 9:09 am

The issue unfortunately was not resolved.

The fields are still not being updated and we even had to reboot the server all of a sudden due to not be able to reach it.

I am attaching the latest debug logs. I had to clear up the dbmaint.lock but that should be resolved.

dwhitfield · Post by **dwhitfield** » Tue Dec 05, 2017 2:35 pm

Please take a look at https://support.nagios.com/kb/article.php?id=139

You definitely have kernel queue issues. I'm not seeing the messages I'd expect in the syslog, but we just get a tail of it, so we might have just missed it.

Nagios Support Forum

"Last check" occasionally hangs

"Last check" occasionally hangs

Re: "Last check" occasionally hangs

Re: "Last check" occasionally hangs

Re: "Last check" occasionally hangs

Re: "Last check" occasionally hangs

Re: "Last check" occasionally hangs

Re: "Last check" occasionally hangs

Re: "Last check" occasionally hangs