Page 1 of 1

Works in shell and in "Test Command Check" but not live?

Posted: Tue Jul 14, 2015 10:44 am
by phil821
Hello, I am trying to run a few pearl scripts that run locally on the nagios server.

The checks run perfectly in the shell, and they also work when I "test command check" and input the hostname. However they do not work when I activate the check with the same hostname I tested.

What's also interesting is that even though the checked is simply duplicated, it works on some hosts and others not. Moreover, I use the same script for several checks (ie, temperature, datastore...) and it works with a host for some checks and does not work with the same host for other checks.

Some of these checks previously worked before prior to a server restart.

Re: Works in shell and in "Test Command Check" but not live?

Posted: Tue Jul 14, 2015 2:32 pm
by tgriep
Could you post the command that you ran that you are having issues with and how it is configured in XI?
Can you provide a screen shot of the errors that you are receiving in the GUI?

Re: Works in shell and in "Test Command Check" but not live?

Posted: Wed Jul 15, 2015 8:55 am
by phil821
Sure.

Okay so here is a screenshot of how the test command is successful in Nagios.
NagiosTestCommandSuccess.jpg
When I add that host, which carries the exact same hostname originally passed after the -H option and run it live, I get this error(I have included two additional checks that behave similarly but give different error messages) -
NagiosLiveCommandFail.jpg

Re: Works in shell and in "Test Command Check" but not live?

Posted: Wed Jul 15, 2015 11:15 am
by jdalrymple
It's hard to read what the actual problem is.

Some checks work when pointed at some hosts, but not all the time, etc. etc? Our postgres check works on some hosts, but our temperature check doesn't ever work...

Can you help us find a common theme amongst failing checks? If not my next best recommendation is to look for disk consistency problems, out of memory issues, etc. Your problems seem very broad and it's difficult to know where to start troubleshooting.

Re: Works in shell and in "Test Command Check" but not live?

Posted: Wed Jul 15, 2015 10:44 pm
by Box293
phil821 wrote:The checks run perfectly in the shell
Are you testing as the user 'nagios' ? This is the user account that the monitoring engine uses. If you are SSH'd to XI as root, this is not the same test. The "Test Check Command" button runs as the user 'apache'.

SSH to XI

Code: Select all

su nagios
/usr/local/nagios/libexec/check_postgres.pl -H blah blah blah
When you are finished type exit

Re: Works in shell and in "Test Command Check" but not live?

Posted: Thu Jul 16, 2015 11:47 am
by phil821
One common theme is that the checks being run (I have the same issue with check_esxhost) is that they are both written in perl.

Yes I have run the checks as the user nagios and they run successfully.

I unfortunately do not know any perl so it is hard for me to map out the scripts' actual logic however both of these scripts came from the "Monitoring Wizard" built into XI.

Re: Works in shell and in "Test Command Check" but not live?

Posted: Thu Jul 16, 2015 11:57 am
by phil821
Ok so I just fixed the problem.

In the check, there was parameter called --dbpass. Our specific password had a $ in it which was messing things up. Why the command worked originally in the test (which should be the EXACT same as implemented live), I really don't know but regardless the problem was a special characters.

Re: Works in shell and in "Test Command Check" but not live?

Posted: Thu Jul 16, 2015 2:52 pm
by tmcdonald
Dollar signs are weird. They have special meaning in PHP and the shell, both of which are in use when checks are saved and then run. Usually we can just escape them like $$ and be done with it, but then Nagios itself uses macros of the form $MACRO$ so that causes some issues.

There is probably a philosophical note here about money being the root of all problems, but I'll save that for another time.

Are we alright to close this thread?