Page 1 of 3

System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 10:32 am
by rseiwert
If the core Nagios process is not running how can these status lights be green?

Something I noticed recently was that the System status across the top of the page and the System Status and Monitoring Engine status are reporting invalid information. Initally I thought this was not updating because Nagios was crashed. Then I saw /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php still running
/bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
In /usr/local/nagiosxi/var/sysstat.log I see logs which state Nagios is down but the XI interface shows a healthy monitoring engine. I believe these pages get their info from this sysstat.php script. Looking at the log file /usr/local/nagiosxi/var/sysstat.log states tha Nagios is not running but in the pictures it shows green.

Code: Select all

DB BACKEND:
Array
(
    [last_checkin] => 2015-04-02 20:09:23
    [bytes_processed] => 33339486
    [entries_processed] => 46796
    [connect_time] => 2015-04-02 17:21:14
    [disconnect_time] => 0000-00-00 00:00:00
)
CMDLINE=/etc/init.d/nagios status
nagios is not running
OUTPUT=nagios is not running
RETURNCODE=0


The XI interface shows Image
ImageImage

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 11:09 am
by mp4783
I don't quite understand your question. What are you looking at that suggests there's a problem? The picture included looks fine.

If it's just log entries, then I'm sure you realize you will see such messages when Nagios reconfigures itself.

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 11:22 am
by ssax
It also happens to pop up when you apply configuration and browse other pages in different tabs during that time if you're quick enough.

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 11:34 am
by rseiwert
So my question would be how could the core Nagios Process be down for almost 12 hours and all the little engine status lights still be green. The nagios process definitely was not running. How can XI say everything is OK? sysstat.php is running and logging it's latest results.

Nagios is down and sysstat.php knows it's down. How can it still be green? Shouldn't something on these pictures be red?

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 11:49 am
by rseiwert
Just to be clear when those pictures were captured Nagios was not running for the last 12 hours.
Image

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 12:31 pm
by ssax
Nagios XI pulls that information from the postgresql DB, if it's not showing the proper information it means that the DB wasn't being updated by the backend process that checks it.

Let me dig into it a little further and I'll update you.

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 12:36 pm
by ssax
Please attach your /usr/local/nagiosxi/var/sysstat.log

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 12:38 pm
by lmiltchev
This is a really weird issue. The web UI shows that nagios indeed was running (process id 1600). I wonder if you had two nagios processes (one that "died" and another that was running). It is hard to say now (after the fact). You can probably show us (in code wraps) the nagios.log from that time. Hopefully, we will find some clues in it.

Also, to rule out permission issues, run the following commands and show us the output:

Code: Select all

ll -d /usr/local/nagios/var
ll /usr/local/nagios/var
What is the Nagios XI version that you are currently using?

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 12:45 pm
by ssax
Also, please attach the following files:

Code: Select all

/var/lib/pgsql/data/pg_log/postgresql-Thu.log
/var/lib/pgsql/data/pg_log/postgresql-Fri.log

Re: System Status and Monitoring Engine Status Invalid

Posted: Fri Apr 03, 2015 12:52 pm
by mp4783
There's an easy way to tell if your Nagios processes are up on Linux box. The process counts will depend upon your configuration.

Nagios Core Collector

ps -ef | grep "nagios/bin/nagios --worker" | grep -v 'grep'

There should be 5+ processes.

Nagios Cron Jobs:

ps -ef | grep "/php/bin/php -q /opt/app/nagios/nagiosxi/cron" | grep -v 'grep'

There should be 10+ processes

Nagios Database Backend:

ps -ef | grep "ndo2db.cfg" | grep -v 'grep'

There should be 3 processes.

If you happen to be running Mod Gearman, that can cause issues like you've described.