Hello,
We have a problem with the NRPE plugin.
Some of our monitored servers are overloaded and, only sometimes, when the check is made they don't return a valid response code. So we are getting a "Return code of 255 is out of bounds" output from the plugin.
This is treated as a CRITICAL alert, and so, notifications are sent. Is there any way to avoid this behavior?
I ve tried to create an event handler to intercept them and change SERVICESTATE to Warning, but I think it's not possible to change severity in the fly.
NRPE Errors generating notifications.
Re: NRPE Errors generating notifications.
Have you tried in creasing the timeout of the check?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 6
- Joined: Fri Jan 20, 2012 3:54 am
Re: NRPE Errors generating notifications.
I don't think it's a timeout problem, because it would give me a timeout message, not an out of bounds. right?
Anyway, I will set a 50 sec timeout and see what happens.
Anyway, I will set a 50 sec timeout and see what happens.
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NRPE Errors generating notifications.
Please verify that the commands point to the correct directories, and that the permissions on the plugins allow them to execute. Also, let us know how things go with changing the timeout rate, though I agree it should give you a timeout instead.
-
- Posts: 6
- Joined: Fri Jan 20, 2012 3:54 am
Re: NRPE Errors generating notifications.
I don't see any changes after the timeout augmentation.
The checks are working just fine most of the time, so the persmissions should be ok.
This is a sample command check:
check_local_disk ssh $ARG1$@$HOSTADDRESS$ -C '"/home/"$ARG1$"/nagios/libexec/check_disk" -w $ARG2$ -c $ARG3$ -E -t 50 -p $ARG4$'
The checks are working just fine most of the time, so the persmissions should be ok.
This is a sample command check:
check_local_disk ssh $ARG1$@$HOSTADDRESS$ -C '"/home/"$ARG1$"/nagios/libexec/check_disk" -w $ARG2$ -c $ARG3$ -E -t 50 -p $ARG4$'
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NRPE Errors generating notifications.
They are working "most of the time?" I am assuming until something times out? There has to be a missing piece here that has not been shared because those options alone control the timeout error bounceback you receive. I see you are trying to SSH into a system and then executing a command, what is the reason for this?
-
- Posts: 6
- Joined: Fri Jan 20, 2012 3:54 am
Re: NRPE Errors generating notifications.
This is the command we are using for this checks.
If it is a timeout problem why am I getting an out of bounds error?
The systems are AIX, the nagios server is a linux box, and yes, the check is working most of the time. It is just sometimes that shows this error, and it is doing it with different AIX servers and different checks. Most of them disk checks.
If I connect to the server and execute the same command myself it returns a valid response. But, sometimes thet I could make the check when the error is showing I can see the AIX server is responding very slowly.
Which would be the best way to make a disk check?
If it is a timeout problem why am I getting an out of bounds error?
The systems are AIX, the nagios server is a linux box, and yes, the check is working most of the time. It is just sometimes that shows this error, and it is doing it with different AIX servers and different checks. Most of them disk checks.
If I connect to the server and execute the same command myself it returns a valid response. But, sometimes thet I could make the check when the error is showing I can see the AIX server is responding very slowly.
Which would be the best way to make a disk check?
Re: NRPE Errors generating notifications.
Use check_by_ssh. But if the tcp connection for the ssh connection is timing out, you may have a problem with any active checks. You may need to look at passive checks.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.