Page 1 of 1

Nagios XI & Core not reflecting the same status on Services

Posted: Tue Feb 09, 2016 3:43 pm
by JakeHatMacys
I have a script I'm running in Core fine but when I check nagios XI it's always red stating it's timing out.... I can log in manually fine as well. Ever seen anything like this?
Capture.JPG
When I run it manually in core it comes back fine though... makes no sense to me:
Capture1.JPG
Thoughts? Script is basically just logging in via SSH and doing a df -v and taking some metrics. Works like a charm for 90% of our servers but trying to work through our problem children and this is a scenario we're seeing.

Re: Nagios XI & Core not reflecting the same status on Servi

Posted: Tue Feb 09, 2016 4:50 pm
by rkennedy
The web check versus the check that XI is using different usernames, what username did you establish the ssh key for?

Re: Nagios XI & Core not reflecting the same status on Servi

Posted: Wed Feb 10, 2016 8:30 am
by JakeHatMacys
rkennedy wrote:The web check versus the check that XI is using different usernames, what username did you establish the ssh key for?
We actually don't use that, the security team shot down using SSH keys. We're using a home brew'd shell script, would the web UI kick that off differently?

And again out of roughly 13,000 service running only about 600 of these are failing, I can't say that it's due to this every time. But we're trying to sort out the used cases and this seems to be an oddity we're running into. Testing it via core works like a charm but XI keeps coming back timed out after 60 seconds.

Re: Nagios XI & Core not reflecting the same status on Servi

Posted: Wed Feb 10, 2016 9:58 am
by rkennedy
JakeHatMacys wrote:
rkennedy wrote:The web check versus the check that XI is using different usernames, what username did you establish the ssh key for?
We actually don't use that, the security team shot down using SSH keys. We're using a home brew'd shell script, would the web UI kick that off differently?

And again out of roughly 13,000 service running only about 600 of these are failing, I can't say that it's due to this every time. But we're trying to sort out the used cases and this seems to be an oddity we're running into. Testing it via core works like a charm but XI keeps coming back timed out after 60 seconds.
To clarify, I believe you're testing in the CCM (not core). The CCM will use a different username to run the script versus running over the CLI / as a Nagios check.

What are the full permissions on the file on these servers that aren't working?

Re: Nagios XI & Core not reflecting the same status on Servi

Posted: Wed Feb 10, 2016 11:00 am
by JakeHatMacys
rkennedy wrote:
JakeHatMacys wrote:
rkennedy wrote:The web check versus the check that XI is using different usernames, what username did you establish the ssh key for?
We actually don't use that, the security team shot down using SSH keys. We're using a home brew'd shell script, would the web UI kick that off differently?

And again out of roughly 13,000 service running only about 600 of these are failing, I can't say that it's due to this every time. But we're trying to sort out the used cases and this seems to be an oddity we're running into. Testing it via core works like a charm but XI keeps coming back timed out after 60 seconds.
To clarify, I believe you're testing in the CCM (not core). The CCM will use a different username to run the script versus running over the CLI / as a Nagios check.

What are the full permissions on the file on these servers that aren't working?
The script is located on our Nagios server in our libexec directory. We actually log into the servers using SSHPASS 1.05 (via the script), I can give you the local file permissions in a bit (we're currently migrating the server to another VM cluster to help with I/O performance)

Re: Nagios XI & Core not reflecting the same status on Servi

Posted: Wed Feb 10, 2016 11:55 am
by rkennedy
Sounds good - I'll watch for them. Usually errors like this are related to permissions.