Monitoring Process Issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jwessels
Posts: 16
Joined: Mon May 12, 2014 5:57 am

Monitoring Process Issue

Post by jwessels »

Hi Support,

I have a problem with the process monitor, it seems to be behind schedule?

Image

As well as receiving this warning on all services in the eventlog
Warning: The check of service 'nagiosxi-64 VM Status' on host 'tsamarvca.tharisa.com' looks like it was orphaned (results never came back; last_check=1409025921; next_check=1409026579). I'm scheduling an immediate check of the service...

And after applying the configuration, when adding or editing hosts / services, it reports that the active host and service checks and notifications are disabled, this corrects after an hour

This issue started after the disk filled up with backups.
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Monitoring Process Issue

Post by lmiltchev »

Do you have any database errors?

Code: Select all

tail -25 /var/log/mysqld.log
What is the output of the following command?

Code: Select all

grep embedded /usr/local/nagios/etc/nagios.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
jwessels
Posts: 16
Joined: Mon May 12, 2014 5:57 am

Re: Monitoring Process Issue

Post by jwessels »

Hi

Here is the output

[root@tsamarnagios ~]# tail -25 /var/log/mysqld.log
140825 13:59:15 [Note] /usr/libexec/mysqld: Shutdown complete

140825 13:59:15 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
140825 13:59:27 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140825 13:59:28 InnoDB: Initializing buffer pool, size = 8.0M
140825 13:59:28 InnoDB: Completed initialization of buffer pool
140825 13:59:28 InnoDB: Started; log sequence number 0 44243
140825 13:59:28 [Note] Event Scheduler: Loaded 0 events
140825 13:59:28 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
140825 14:19:09 [Note] /usr/libexec/mysqld: Normal shutdown

140825 14:19:09 [Note] Event Scheduler: Purging the queue. 0 events
140825 14:19:11 InnoDB: Starting shutdown...
140825 14:19:15 InnoDB: Shutdown completed; log sequence number 0 44243
140825 14:19:15 [Note] /usr/libexec/mysqld: Shutdown complete

140825 14:19:15 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
140825 14:19:26 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140825 14:19:26 InnoDB: Initializing buffer pool, size = 8.0M
140825 14:19:26 InnoDB: Completed initialization of buffer pool
140825 14:19:26 InnoDB: Started; log sequence number 0 44243
140825 14:19:26 [Note] Event Scheduler: Loaded 0 events
140825 14:19:26 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
You have new mail in /var/spool/mail/root
[root@tsamarnagios ~]# grep embedded /usr/local/nagios/etc/nagios.cfg
enable_embedded_perl=0
use_embedded_perl_implicitly=0
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Monitoring Process Issue

Post by tmcdonald »

Is the timing off or is your system clock off?

Code: Select all

grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
date
Former Nagios employee
jwessels
Posts: 16
Joined: Mon May 12, 2014 5:57 am

Re: Monitoring Process Issue

Post by jwessels »

Code: Select all

root@tsamarnagios ~]# grep "date.timezone" /etc/php.ini
; http://www.php.net/manual/en/datetime.configuration.php#ini.date.timezone
date.timezone = Africa/Johannesburg
[root@tsamarnagios ~]# ls -l /etc/localtime
lrwxrwxrwx 1 root root 39 May 22 21:42 /etc/localtime -> /usr/share/zoneinfo/Africa/Johannesburg                                                                                                  
[root@tsamarnagios ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Thu Aug 28 8:28:34 SAST 2014
The local time looked wrong and I used the follwoing commands to set it

Code: Select all

mv /etc/localtime /etc/localtime.bak
ln -s /usr/share/zoneinfo/Africa/Johannesburg /etc/localtime
lrwxrwxrwx 1 root root 39 Aug 28 08:35 /etc/localtime -> /usr/share/zoneinfo/Africa/Johannesburg

That cleared the warning about the orphaned checks, but the eventlog still shows that the checks are now an hour behind
Capture.PNG
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Monitoring Process Issue

Post by lmiltchev »

Have you tried restarting the server after making the changes? You can also try running the following:

Code: Select all

service nagios stop
rm -f /usr/local/nagios/var/retention.dat
service nagios start
Note: The status of all of your checks will go to "Pending".
Be sure to check out our Knowledgebase for helpful articles and solutions!
jwessels
Posts: 16
Joined: Mon May 12, 2014 5:57 am

Re: Monitoring Process Issue

Post by jwessels »

Hi,

I have run the command and removed the retention file, this had no effect, the time of the events in the eventlog are still 8+ hours behind

Restarting the server has the same effect as restarting the nagios service, active checks and notifications are disabled.

I have to start the monitoring engine or the processing manually after restarting the appliance/ nagios service or when adding hosts/services to enable the checks, or wait 30+ minutes for it to start automatically.

Here is the status of the monitoring process.
monitor.PNG
You do not have the required permissions to view the files attached to this post.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Monitoring Process Issue

Post by sreinhardt »

What version of nagios XI are you presently running? Any other neb modules, such as mod_gearman or livestatus? Just to clarify, you did a full server reboot and time did not correct itself? We may need to clear the retention.dat file, so that scheduling will not be so far in the future since it is likely partially if not fully off by the previous time issues. Simply moving the file like below, and restarting the nagios daemon would recreate it.

Code: Select all

mv /usr/local/nagios/var/retention.dat /usr/local/nagios/var/retention.dat.old
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
jwessels
Posts: 16
Joined: Mon May 12, 2014 5:57 am

Re: Monitoring Process Issue

Post by jwessels »

Hi

No custom configurations or modules, except for the install of the vmware perl sdk and yum updates.
VMware 64bit appliance
XI 2014R1.4
CentOS release 6.5 (Final)
cpe:/o:centos:linux:6:GA

Full server reboot(s) or nagios service restart(s) does not correct the time.

Renaming or removing the retention.dat does reset the checks to the correct time, but it doesnt keep up and ends up begind and the warning about orphaned ckecks are present again

I do get the following when restarting the nagios service: Warning - nagios did not exit in a timely manner

After restarting the nagios service the monitoring engine stops, I have to start this manually. Or the process state is stoped and i have to start it manually.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Monitoring Process Issue

Post by lmiltchev »

If Nagios is not exiting in a timely manner, you can try following the steps, outlined on our FAQ wiki page here:

http://support.nagios.com/wiki/index.ph ... ely_manner

Let us know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked