NRPE Errors generating notifications.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Viyullas33
Posts: 6
Joined: Fri Jan 20, 2012 3:54 am

NRPE Errors generating notifications.

Post by Viyullas33 »

Hello,

We have a problem with the NRPE plugin.
Some of our monitored servers are overloaded and, only sometimes, when the check is made they don't return a valid response code. So we are getting a "Return code of 255 is out of bounds" output from the plugin.

This is treated as a CRITICAL alert, and so, notifications are sent. Is there any way to avoid this behavior?

I ve tried to create an event handler to intercept them and change SERVICESTATE to Warning, but I think it's not possible to change severity in the fly.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NRPE Errors generating notifications.

Post by abrist »

Have you tried in creasing the timeout of the check?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Viyullas33
Posts: 6
Joined: Fri Jan 20, 2012 3:54 am

Re: NRPE Errors generating notifications.

Post by Viyullas33 »

I don't think it's a timeout problem, because it would give me a timeout message, not an out of bounds. right?
Anyway, I will set a 50 sec timeout and see what happens.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NRPE Errors generating notifications.

Post by slansing »

Please verify that the commands point to the correct directories, and that the permissions on the plugins allow them to execute. Also, let us know how things go with changing the timeout rate, though I agree it should give you a timeout instead.
Viyullas33
Posts: 6
Joined: Fri Jan 20, 2012 3:54 am

Re: NRPE Errors generating notifications.

Post by Viyullas33 »

I don't see any changes after the timeout augmentation.
The checks are working just fine most of the time, so the persmissions should be ok.

This is a sample command check:

check_local_disk ssh $ARG1$@$HOSTADDRESS$ -C '"/home/"$ARG1$"/nagios/libexec/check_disk" -w $ARG2$ -c $ARG3$ -E -t 50 -p $ARG4$'
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NRPE Errors generating notifications.

Post by slansing »

They are working "most of the time?" I am assuming until something times out? There has to be a missing piece here that has not been shared because those options alone control the timeout error bounceback you receive. I see you are trying to SSH into a system and then executing a command, what is the reason for this?
Viyullas33
Posts: 6
Joined: Fri Jan 20, 2012 3:54 am

Re: NRPE Errors generating notifications.

Post by Viyullas33 »

This is the command we are using for this checks.
If it is a timeout problem why am I getting an out of bounds error?

The systems are AIX, the nagios server is a linux box, and yes, the check is working most of the time. It is just sometimes that shows this error, and it is doing it with different AIX servers and different checks. Most of them disk checks.
If I connect to the server and execute the same command myself it returns a valid response. But, sometimes thet I could make the check when the error is showing I can see the AIX server is responding very slowly.

Which would be the best way to make a disk check?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NRPE Errors generating notifications.

Post by abrist »

Use check_by_ssh. But if the tcp connection for the ssh connection is timing out, you may have a problem with any active checks. You may need to look at passive checks.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked