Page 1 of 1

NRPE Errors generating notifications.

Posted: Fri Jun 21, 2013 6:37 am
by Viyullas33
Hello,

We have a problem with the NRPE plugin.
Some of our monitored servers are overloaded and, only sometimes, when the check is made they don't return a valid response code. So we are getting a "Return code of 255 is out of bounds" output from the plugin.

This is treated as a CRITICAL alert, and so, notifications are sent. Is there any way to avoid this behavior?

I ve tried to create an event handler to intercept them and change SERVICESTATE to Warning, but I think it's not possible to change severity in the fly.

Re: NRPE Errors generating notifications.

Posted: Fri Jun 21, 2013 10:03 am
by abrist
Have you tried in creasing the timeout of the check?

Re: NRPE Errors generating notifications.

Posted: Sat Jun 22, 2013 1:46 am
by Viyullas33
I don't think it's a timeout problem, because it would give me a timeout message, not an out of bounds. right?
Anyway, I will set a 50 sec timeout and see what happens.

Re: NRPE Errors generating notifications.

Posted: Mon Jun 24, 2013 9:37 am
by slansing
Please verify that the commands point to the correct directories, and that the permissions on the plugins allow them to execute. Also, let us know how things go with changing the timeout rate, though I agree it should give you a timeout instead.

Re: NRPE Errors generating notifications.

Posted: Tue Jun 25, 2013 4:16 am
by Viyullas33
I don't see any changes after the timeout augmentation.
The checks are working just fine most of the time, so the persmissions should be ok.

This is a sample command check:

check_local_disk ssh $ARG1$@$HOSTADDRESS$ -C '"/home/"$ARG1$"/nagios/libexec/check_disk" -w $ARG2$ -c $ARG3$ -E -t 50 -p $ARG4$'

Re: NRPE Errors generating notifications.

Posted: Tue Jun 25, 2013 9:24 am
by slansing
They are working "most of the time?" I am assuming until something times out? There has to be a missing piece here that has not been shared because those options alone control the timeout error bounceback you receive. I see you are trying to SSH into a system and then executing a command, what is the reason for this?

Re: NRPE Errors generating notifications.

Posted: Tue Jun 25, 2013 5:17 pm
by Viyullas33
This is the command we are using for this checks.
If it is a timeout problem why am I getting an out of bounds error?

The systems are AIX, the nagios server is a linux box, and yes, the check is working most of the time. It is just sometimes that shows this error, and it is doing it with different AIX servers and different checks. Most of them disk checks.
If I connect to the server and execute the same command myself it returns a valid response. But, sometimes thet I could make the check when the error is showing I can see the AIX server is responding very slowly.

Which would be the best way to make a disk check?

Re: NRPE Errors generating notifications.

Posted: Wed Jun 26, 2013 9:38 am
by abrist
Use check_by_ssh. But if the tcp connection for the ssh connection is timing out, you may have a problem with any active checks. You may need to look at passive checks.