Page 1 of 1

Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Fri Apr 17, 2015 1:00 pm
by JakeHatMacys
So we've set up a reverse ping for some of our servers behind the firewall. It works on 98% of the servers we initially couldn't ping. There are a few however we're seeing where we swapped the host command over but in Nagios XI it's showing down still... and when you go into the same host in Core and run it from a command line it works fine.

Nagios XI results:
Capture.JPG
Same monitor run in core using "Test Check Command"
Capture1.JPG
Capture2.JPG
Ever seen anything like this before? Any ideas on what could be causing it????

Re: Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Fri Apr 17, 2015 1:16 pm
by jdalrymple
The thing that has me a bit thrown is that the status indicates to me that the plugin is indeed running... but I'm not entirely convinced.

What is your shell script trying to actually do?

Does the nagios user have rights to do all the stuff required by that shell script?

Try running the plugin as the nagios user - that generally can eliminate any thoughts of permissions problems then we move on to thinking about selinux.

Re: Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Fri Apr 17, 2015 1:25 pm
by JakeHatMacys
jdalrymple wrote:The thing that has me a bit thrown is that the status indicates to me that the plugin is indeed running... but I'm not entirely convinced.

What is your shell script trying to actually do?

Does the nagios user have rights to do all the stuff required by that shell script?

Try running the plugin as the nagios user - that generally can eliminate any thoughts of permissions problems then we move on to thinking about selinux.
The script just logs into the server using SSH and issues the unix uptime command and replies back to Nagios with a 0 = okay or a 1 = critical.

I don't think it could be a permissions issue as we have this working on about 500 other servers on the box with no issues. To me it seems like the Nagios Plugin manager isn't accepting the return code in this instance... is there a log file we can look at to see the execution of the plugin? Didn't see it in the Nagios.log

Re: Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Fri Apr 17, 2015 1:40 pm
by jdalrymple
The reason it would be interesting to see the script is because the "is not responding" has to come from somewhere. Nagios isn't putting that in. I suspect it's part of some logic in your script that might also indicate where the return code 2 is coming from.

Re: Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Sun Apr 19, 2015 8:55 pm
by Box293
JakeHatMacys wrote:The script just logs into the server using SSH and issues the unix uptime command and replies back to Nagios with a 0 = okay or a 1 = critical.
Anything SSH related needs to be tested as the "nagios" user.
JakeHatMacys wrote:Same monitor run in core using "Test Check Command"

Image

Image

The "Test Check Command" executes as the apache user and hence when it comes to SSH related stuff things like the id_dsa files used by SSH are specific per user directory.

Re: Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Tue Apr 21, 2015 3:07 pm
by JakeHatMacys
The script itself actually uses a service account of ours to log into the box to be clear, the Nagios user is just kicking off the script. We did make some changes that's helping with some of the hosts in Nagios Core. Odd problem but for some servers we used the IP address for the "address" instead of the DNS entry and it's not working. On some of them when we switch back to the host name on the "address" it's then working fine. So it might have something to do with our DNS, we use a round robin.

Here's some screen shots of what we did from the server, core, and XI:
Capture3.jpg

Re: Nagios Core & Nagios XI seem to disagree on Monitoring

Posted: Tue Apr 21, 2015 3:12 pm
by ssax
Try your command again but run the command below first:

Code: Select all

su nagios