XI Failing without Reporting IT

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

XI Failing without Reporting IT

Post by rseiwert »

Had an interesting morning due to a communication failure with our SAN. The problem was in the network switches but that is not why I'm posting here. When I checked this morning XI showed everything up and no issues. We Nagios XI 2014R2.6 and auto-login a readonly user by default. We use the Operation Center screen, which was time stamped with the current data and time and was not reporting anything down.

When the SAN is down all the VM clusters are down and Nagios XI runs on the VM cluster along with email, and database, and file servers, and Citrix PNA, and Everything else. Everything was down (including Nagios) but Nagios didn't know it. Nagios's HD had been ripped out from under it but according to the screen everyone looks at everything was OK.

It is my humble opinion that when invalid data is being presented there should be some indication that it is invalid. I do have checks for XI Daemons, XI Jobs, ActiveHostChecks, and ActiveServiceChecks but of course Nagios is failed they mean nothing. I really really feel that somehow when the checks are stale they should be reported as such. I also feel that the system health checks should be reported to non-admin users so they at least have a clue that what they are looking at is invalid. Finally the system health checks need to be made be more accurate, actually verifying that the PID is the process it thinks and if possible check for a heart beat on the service.
Grumpy Olde IT Guy
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: XI Failing without Reporting IT

Post by WillemDH »

Monitor the production monitoring Nagios XI from another Nagios Core or XI, preferably from another datacenter / location. You can install a Nagios XI free edition that can monitor up to 7 hosts. Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
rseiwert
Posts: 196
Joined: Wed Jun 22, 2011 10:33 pm
Location: Somewhere between Here and Now

Re: XI Failing without Reporting IT

Post by rseiwert »

A good idea. But if Nagios is reporting it's up will that really help? Ping and HTTP checks would not in this instance. Possibly remotely executing the system health checks but it has been documented that these are not accurate.

The point of Nagios to me is a single pane of glass monitoring. To check Nagios then check Nagios to check Nagios. Quis custodiet ipsos custodes? The true problem is that the XI PHP that generates the web pages should be able to figure out something is rotten, stale, or if it even has a heartbeat.

Then present that information to non-admin users. System health is only exposed to administrators. Most people here do not login, rather use the read-only autologin until they need to acknowledge an issue or configure the system.
Grumpy Olde IT Guy
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: XI Failing without Reporting IT

Post by jdalrymple »

I can recreate this and I agree, the first line of defense should be some amount of monitoring of Nagios being performed by the browser. I can recreate your circumstances pretty easily.

I'll bring it up with the devs and let you know what they say.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: XI Failing without Reporting IT

Post by jdalrymple »

Internal feature request created. No ETA as usual, but being as high profile of an issue as this is I'd expect it to receive high priority by the devs and make it into the next version.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: XI Failing without Reporting IT

Post by Box293 »

FYI there is the "Nagios XI Wizard" which checks a number of XI things and this can be a remote server. So have a free XI monitoring production and production monitoring the free instance.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked