System Status and Monitoring Engine Status Invalid

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

System Status and Monitoring Engine Status Invalid

Post by rseiwert »

If the core Nagios process is not running how can these status lights be green?

Something I noticed recently was that the System status across the top of the page and the System Status and Monitoring Engine status are reporting invalid information. Initally I thought this was not updating because Nagios was crashed. Then I saw /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php still running
/bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
In /usr/local/nagiosxi/var/sysstat.log I see logs which state Nagios is down but the XI interface shows a healthy monitoring engine. I believe these pages get their info from this sysstat.php script. Looking at the log file /usr/local/nagiosxi/var/sysstat.log states tha Nagios is not running but in the pictures it shows green.

Code: Select all

DB BACKEND:
Array
(
    [last_checkin] => 2015-04-02 20:09:23
    [bytes_processed] => 33339486
    [entries_processed] => 46796
    [connect_time] => 2015-04-02 17:21:14
    [disconnect_time] => 0000-00-00 00:00:00
)
CMDLINE=/etc/init.d/nagios status
nagios is not running
OUTPUT=nagios is not running
RETURNCODE=0


The XI interface shows Image
ImageImage
Last edited by rseiwert on Fri Apr 03, 2015 11:51 am, edited 2 times in total.
Grumpy Olde IT Guy
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: System Status and Monitoring Engine Status Invalid

Post by mp4783 »

I don't quite understand your question. What are you looking at that suggests there's a problem? The picture included looks fine.

If it's just log entries, then I'm sure you realize you will see such messages when Nagios reconfigures itself.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: System Status and Monitoring Engine Status Invalid

Post by ssax »

It also happens to pop up when you apply configuration and browse other pages in different tabs during that time if you're quick enough.
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: System Status and Monitoring Engine Status Invalid

Post by rseiwert »

So my question would be how could the core Nagios Process be down for almost 12 hours and all the little engine status lights still be green. The nagios process definitely was not running. How can XI say everything is OK? sysstat.php is running and logging it's latest results.

Nagios is down and sysstat.php knows it's down. How can it still be green? Shouldn't something on these pictures be red?
Grumpy Olde IT Guy
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: System Status and Monitoring Engine Status Invalid

Post by rseiwert »

Just to be clear when those pictures were captured Nagios was not running for the last 12 hours.
Image
Grumpy Olde IT Guy
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: System Status and Monitoring Engine Status Invalid

Post by ssax »

Nagios XI pulls that information from the postgresql DB, if it's not showing the proper information it means that the DB wasn't being updated by the backend process that checks it.

Let me dig into it a little further and I'll update you.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: System Status and Monitoring Engine Status Invalid

Post by ssax »

Please attach your /usr/local/nagiosxi/var/sysstat.log
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: System Status and Monitoring Engine Status Invalid

Post by lmiltchev »

This is a really weird issue. The web UI shows that nagios indeed was running (process id 1600). I wonder if you had two nagios processes (one that "died" and another that was running). It is hard to say now (after the fact). You can probably show us (in code wraps) the nagios.log from that time. Hopefully, we will find some clues in it.

Also, to rule out permission issues, run the following commands and show us the output:

Code: Select all

ll -d /usr/local/nagios/var
ll /usr/local/nagios/var
What is the Nagios XI version that you are currently using?
Be sure to check out our Knowledgebase for helpful articles and solutions!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: System Status and Monitoring Engine Status Invalid

Post by ssax »

Also, please attach the following files:

Code: Select all

/var/lib/pgsql/data/pg_log/postgresql-Thu.log
/var/lib/pgsql/data/pg_log/postgresql-Fri.log
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: System Status and Monitoring Engine Status Invalid

Post by mp4783 »

There's an easy way to tell if your Nagios processes are up on Linux box. The process counts will depend upon your configuration.

Nagios Core Collector

ps -ef | grep "nagios/bin/nagios --worker" | grep -v 'grep'

There should be 5+ processes.

Nagios Cron Jobs:

ps -ef | grep "/php/bin/php -q /opt/app/nagios/nagiosxi/cron" | grep -v 'grep'

There should be 10+ processes

Nagios Database Backend:

ps -ef | grep "ndo2db.cfg" | grep -v 'grep'

There should be 3 processes.

If you happen to be running Mod Gearman, that can cause issues like you've described.
Locked