I did set auto_reschedule_checks=0 and still NagiosXI stops processing. Core seems to continue to function normally. Currently running Nagios XI 2014R2.6.
In XI still get System Ok: and all green checks. On the Monitor Engine Process it shows everything is green as well.
Process Start Time 2015-04-02 10:42:28
Total Running Time 2h 22m 57s (this must be client side as it is updating
Process ID 17233
But when I look at a check in XI (at 1:10 PM) I see
Last Check: 2015-04-02 12:15:27
Next Check: 2015-04-02 12:20:27
checking running processes they indeed are running
Code: Select all
ps -ef | grep 17233
nagios 17233 1 0 10:42 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 17235 17233 0 10:42 ? 00:00:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 17236 17233 0 10:42 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 17237 17233 0 10:42 ? 00:00:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 17238 17233 0 10:42 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 17239 17233 0 10:42 ? 00:00:14 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 17240 17233 0 10:42 ? 00:00:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 17246 17233 0 10:42 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
One thing interesting even thought I have auto_reschedule_checks=0 I see the following in the messages log
Apr 2 13:02:27 nagios nagios: Warning: The check of service 'FTP' on host 'nyctp' looks like it was orphaned (results never came back; last_check=1427993136; next_check=1427993436). I'm scheduling an immediate check of the service...
localhost (nagios) was alerting it needed updates. Ran yum update to get the new ssl. . I see in the message log that nagios recognized the update
Apr 2 13:00:27 nagios nagios: SERVICE ALERT: localhost;Yum Updates;OK;HARD;4;YUM OK: O/S is up to date.
but then I look in the XI interface
YUM WARNING: O/S requires an update.
Status Details
Service State: Warning
Duration: 15h 30m 3s
Service Stability: Unchanging (stable)
Last Check: 2015-04-02 12:25:26
Next Check: 2015-04-02 12:30:26
Checking Nagios Core this show the correct info, it is just XI that is failing to funciton.
Current Status: OK (for 0d 0h 17m 57s)
Status Information: YUM OK: O/S is up to date.
Performance Data:
Current Attempt: 1/4 (HARD state)
Last Check Time: 04-02-2015 13:15:26
Check Type: ACTIVE
Check Latency / Duration: 0.000 / 0.743 seconds
Next Scheduled Check: 04-02-2015 13:20:26
Last State Change: 04-02-2015 13:00:26
Still have no clue where to start to troubleshoot why XI keeps failing with no major changes to what was a once working system.