Page 1 of 2

last check status

Posted: Thu Sep 04, 2014 10:24 am
by vvz
Hello!
I remarked that on my nagios web-interface in "Last Check" column (Service Status Details window) I have records with time stamps well behind current time.
I googled the problem and got to know that it is a problem with files in /var/log/nagios/spool/checkresults/ folder.
My question is - shall (can I) I delete all files in the folder to fix the problem? any other steps to do?
Thank you

Re: last check status

Posted: Thu Sep 04, 2014 11:19 am
by slansing
It may not have to do with those files, it could be that your system time, or php time, or a combination of both are off. So lets check the basics first, what is the output of:

Code: Select all

grep 'date.time' /etc/php.ini
hwclock
date
And what is your current timezone?

Re: last check status

Posted: Thu Sep 04, 2014 11:26 am
by vvz
$sudo grep 'date.time' /etc/php.ini
$ sudo hwclock
Thu 04 Sep 2014 12:18:55 PM CLT -0.937417 seconds
$ sudo date
Thu Sep 4 12:24:21 CLT 2014

Re: last check status

Posted: Thu Sep 04, 2014 11:33 am
by slansing
What is the current time on your workstation? And how far off are the results in the web interface? Well behind can be a relative term. :) It does look like you have a bit of drift present on your system that we can fix up with NTP. I'm more curious to know how far off the results are though, is this across the board? Or only on specific checks?

Re: last check status

Posted: Thu Sep 04, 2014 12:04 pm
by vvz
the nagios server itself located in Chile and curren time on server was 12:18 when I ran commands you asked me.

The difference between current time and records on web interface was about 6 hours. I monitor 11 boxes with 117 services in Chile. I am not able to tell you exactly how many of those were behind (I've restarted nagios about an hour ago), but not 2 or 4 of services were behind, about 50% of them.

As I mentioned I've restarted nagios an hour ago and now my current time on server 12:44 and about 50% of services have 12:34, 12:33 in the column.

One more I should mention (it may make your task easier), everything was just perfect with nagios, but about a week ago the server crashed because of our programmers. it wasn't nagios problem, another reason. They were testing internal soft on the server and filled completely the hard drive, nagios is running on. 100%. No space at all. (they were given another drive to make their tests, do not ask me why they didn't use it). After that I met the problem with time stamps. And another problem (I believe there is a link between ) - for one box notification markers are up and down. Nobody touched config files. I've checked - notifications are enabled in configs. but periodically on web interface some services are marked with disabled notifications. After a few minutes services are marked with enabled notifications. I don't do anything, just monitor the web interface. When I click on particular service ang go to service window - Service State Information window - sometimes I can see Notifications:enabled, sometimes:disabled.

I've already installed nagios on dedicated server and I just use the situation to be more familiar with nagios troubleshooting. For now I just need to connect asterisk to a new nagios instance.
How safe is to delete files fron checkresults folder? As far as I understand nagios creates them automatically... right?

Re: last check status

Posted: Fri Sep 05, 2014 11:42 am
by sreinhardt
You are correct, nagios does create them automatically and removing them should be just fine. It will cause the loss of a few check results if any are stored there. I really don't know if this is going to resolve your issue though, 9/10 times, its a mismatch between mysql, php, system, and nagios timestamps that cause this, especially since this is a repeatable item. However clearing these won't hurt anything, so go for it.

Re: last check status

Posted: Fri Sep 05, 2014 11:56 am
by vvz
I've stopped nagios server and NSCA, moved to checkresults\ and what I can see there that periodically files are added there, even nagios is stopped. The owner of files is nagios user. For a few seconds folder is empty and then new files appear there. mysqld? shall I stop it too?

Re: last check status

Posted: Fri Sep 05, 2014 12:00 pm
by vvz
I guess it is apache, who adds them

Re: last check status

Posted: Fri Sep 05, 2014 2:28 pm
by sreinhardt
Yes they can be added via apache or anything else that would want to send a result to core. Mysql, as you seem to have found, should not be the culprit there.

Re: last check status

Posted: Fri Sep 05, 2014 2:35 pm
by vvz
after 20 min break in nagios process and deleting all files in checkresults all active checks are working normally now, the last problem is passive checks -they were not updated for the last 3 hours, according to web interface.

working on right now

Thank you for your help. I believe we can close this thread