Monitoring Process Issue

jwessels · Post by **jwessels** » Tue Aug 26, 2014 9:10 am

Hi Support,

I have a problem with the process monitor, it seems to be behind schedule?

As well as receiving this warning on all services in the eventlog
Warning: The check of service 'nagiosxi-64 VM Status' on host 'tsamarvca.tharisa.com' looks like it was orphaned (results never came back; last_check=1409025921; next_check=1409026579). I'm scheduling an immediate check of the service...

And after applying the configuration, when adding or editing hosts / services, it reports that the active host and service checks and notifications are disabled, this corrects after an hour

This issue started after the disk filled up with backups.

Post by **lmiltchev** » Tue Aug 26, 2014 1:01 pm

Do you have any database errors?

Code: Select all

tail -25 /var/log/mysqld.log

What is the output of the following command?

Code: Select all

grep embedded /usr/local/nagios/etc/nagios.cfg

jwessels · Post by **jwessels** » Wed Aug 27, 2014 1:26 am

Hi

Here is the output

[root@tsamarnagios ~]# tail -25 /var/log/mysqld.log
140825 13:59:15 [Note] /usr/libexec/mysqld: Shutdown complete

140825 13:59:15 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
140825 13:59:27 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140825 13:59:28 InnoDB: Initializing buffer pool, size = 8.0M
140825 13:59:28 InnoDB: Completed initialization of buffer pool
140825 13:59:28 InnoDB: Started; log sequence number 0 44243
140825 13:59:28 [Note] Event Scheduler: Loaded 0 events
140825 13:59:28 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
140825 14:19:09 [Note] /usr/libexec/mysqld: Normal shutdown

140825 14:19:09 [Note] Event Scheduler: Purging the queue. 0 events
140825 14:19:11 InnoDB: Starting shutdown...
140825 14:19:15 InnoDB: Shutdown completed; log sequence number 0 44243
140825 14:19:15 [Note] /usr/libexec/mysqld: Shutdown complete

140825 14:19:15 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
140825 14:19:26 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140825 14:19:26 InnoDB: Initializing buffer pool, size = 8.0M
140825 14:19:26 InnoDB: Completed initialization of buffer pool
140825 14:19:26 InnoDB: Started; log sequence number 0 44243
140825 14:19:26 [Note] Event Scheduler: Loaded 0 events
140825 14:19:26 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
You have new mail in /var/spool/mail/root
[root@tsamarnagios ~]# grep embedded /usr/local/nagios/etc/nagios.cfg
enable_embedded_perl=0
use_embedded_perl_implicitly=0

tmcdonald · Post by **tmcdonald** » Wed Aug 27, 2014 11:07 am

Is the timing off or is your system clock off?

Code: Select all

grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
date

jwessels · Post by **jwessels** » Thu Aug 28, 2014 4:29 am

Code: Select all

root@tsamarnagios ~]# grep "date.timezone" /etc/php.ini
; http://www.php.net/manual/en/datetime.configuration.php#ini.date.timezone
date.timezone = Africa/Johannesburg
[root@tsamarnagios ~]# ls -l /etc/localtime
lrwxrwxrwx 1 root root 39 May 22 21:42 /etc/localtime -> /usr/share/zoneinfo/Africa/Johannesburg                                                                                                  
[root@tsamarnagios ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Thu Aug 28 8:28:34 SAST 2014

The local time looked wrong and I used the follwoing commands to set it

Code: Select all

mv /etc/localtime /etc/localtime.bak
ln -s /usr/share/zoneinfo/Africa/Johannesburg /etc/localtime

lrwxrwxrwx 1 root root 39 Aug 28 08:35 /etc/localtime -> /usr/share/zoneinfo/Africa/Johannesburg

That cleared the warning about the orphaned checks, but the eventlog still shows that the checks are now an hour behind

Capture.PNG

Post by **lmiltchev** » Thu Aug 28, 2014 10:20 am

Have you tried restarting the server after making the changes? You can also try running the following:

Code: Select all

service nagios stop
rm -f /usr/local/nagios/var/retention.dat
service nagios start

Note: The status of all of your checks will go to "Pending".

jwessels · Post by **jwessels** » Tue Sep 02, 2014 12:27 am

Hi,

I have run the command and removed the retention file, this had no effect, the time of the events in the eventlog are still 8+ hours behind

Restarting the server has the same effect as restarting the nagios service, active checks and notifications are disabled.

I have to start the monitoring engine or the processing manually after restarting the appliance/ nagios service or when adding hosts/services to enable the checks, or wait 30+ minutes for it to start automatically.

Here is the status of the monitoring process.

monitor.PNG

sreinhardt · Post by **sreinhardt** » Tue Sep 02, 2014 3:52 pm

What version of nagios XI are you presently running? Any other neb modules, such as mod_gearman or livestatus? Just to clarify, you did a full server reboot and time did not correct itself? We may need to clear the retention.dat file, so that scheduling will not be so far in the future since it is likely partially if not fully off by the previous time issues. Simply moving the file like below, and restarting the nagios daemon would recreate it.

Code: Select all

mv /usr/local/nagios/var/retention.dat /usr/local/nagios/var/retention.dat.old

jwessels · Post by **jwessels** » Tue Sep 09, 2014 1:27 am

Hi

No custom configurations or modules, except for the install of the vmware perl sdk and yum updates.
VMware 64bit appliance
XI 2014R1.4
CentOS release 6.5 (Final)
cpe:/o:centos:linux:6:GA

Full server reboot(s) or nagios service restart(s) does not correct the time.

Renaming or removing the retention.dat does reset the checks to the correct time, but it doesnt keep up and ends up begind and the warning about orphaned ckecks are present again

I do get the following when restarting the nagios service: Warning - nagios did not exit in a timely manner

After restarting the nagios service the monitoring engine stops, I have to start this manually. Or the process state is stoped and i have to start it manually.

Post by **lmiltchev** » Tue Sep 09, 2014 9:06 am

If Nagios is not exiting in a timely manner, you can try following the steps, outlined on our FAQ wiki page here:

http://support.nagios.com/wiki/index.ph ... ely_manner

Let us know if this helped.

Nagios Support Forum

Monitoring Process Issue

Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue

Re: Monitoring Process Issue