I have been trialling the Nagios XI appliance (2014R2.7) and have noticed that check_nrpe returns error code 255 if the remote nrpe agent is not running. I would have expected this to be a 2 (CRITICAL).
For example:
[root@nagiosxi-64 ~]# /usr/local/nagios/libexec/check_nrpe -H MYIPADDRESS
connect to address MYIPADDRESS port 5666: Connection refused
connect to host MYIPADDRESS port 5666: Connection refused
[root@nagiosxi-64 ~]# echo $?
255
MYIPADDRESS is a server with the nrpe agent installed but *not running*. When the agent is running, it works as expected. The problem is specifically that when check_nrpe gets "connection refused", it returns 255.
When nagios executes these check, it isn't able to interpret 255, so the check status is:
(Return code of 255 is out of bounds)
The version of check_nrpe included with the appliance is 2.15.
It looks like this problem has been addressed in this pull request from October 2014:
https://github.com/NagiosEnterprises/nrpe/pull/13
Is this fix likely to be incorporated into the version of check_nrpe that is shipped with Nagios XI?
Many thanks.
--
Chris
check_nrpe returning error code 255 when connection refused
Re: check_nrpe returning error code 255 when connection refu
I am not sure if it is, but it can easily be patched:cgoerner wrote:Is this fix likely to be incorporated into the version of check_nrpe that is shipped with Nagios XI?
Code: Select all
cd /tmp
wget https://github.com/NagiosEnterprises/nrpe/archive/master.zip
unzip master.zip
cd nrpe-master/src/
wget https://github.com/SteveLowe/nrpe/commit/a788d94b0c9dd4e130c1b06339f947774a798560.patch -O 255.patch
patch < 255.patch
cd ..
./configure
make
make install
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: check_nrpe returning error code 255 when connection refu
This was just discussed in another thread in the customer only forum so the knowledge is very fresh in my brain.
It's very likely that 255 was used for a reason. That reason is likely that "check_nrpe" is absolutely not intended to check whether or not the nrpe service is running or not. It's intent is to check disk or check memory or something else on a remote host. So while more philisophical in nature - some might argue that because NRPE is down any result that we would give back regarding the state of our disk or memory check would be invalid.
Me personally, I think it should return UNKNOWN, and as such I've modified code in my personal repositories to reflect that. However - we all have our own opinions. What I'm saying is that I wouldn't count on that pull request to get included, and furthermore who knows when the next NRPE update will be. If you want the change made I would consider making it yourself then using your own NRPE source for your systems.
FWIW, the alternative that our customer chose was to use a service dependancy check_tcp for check_nrpe. This is also a very logical solution, just a bit of legwork.
It's very likely that 255 was used for a reason. That reason is likely that "check_nrpe" is absolutely not intended to check whether or not the nrpe service is running or not. It's intent is to check disk or check memory or something else on a remote host. So while more philisophical in nature - some might argue that because NRPE is down any result that we would give back regarding the state of our disk or memory check would be invalid.
Me personally, I think it should return UNKNOWN, and as such I've modified code in my personal repositories to reflect that. However - we all have our own opinions. What I'm saying is that I wouldn't count on that pull request to get included, and furthermore who knows when the next NRPE update will be. If you want the change made I would consider making it yourself then using your own NRPE source for your systems.
FWIW, the alternative that our customer chose was to use a service dependancy check_tcp for check_nrpe. This is also a very logical solution, just a bit of legwork.
Re: check_nrpe returning error code 255 when connection refu
Thanks abrist and jdalrymple.
That philosophy makes some sense, but UNKNOWN does seem to be a better choice.
It sounds like a workaround is going to be the best solution.
Thanks for your help. Much appreciated.
That philosophy makes some sense, but UNKNOWN does seem to be a better choice.
It sounds like a workaround is going to be the best solution.
Thanks for your help. Much appreciated.