Page 1 of 2

nrpe; ignore connection time-out triggered critical alert ?

Posted: Mon Mar 27, 2017 3:31 pm
by nagmoto
Hi

We have a service check on linux server run extra long time occasionally.
This will trigger service critical alert with time-out message.
Is it possible Nagios-core 4.x to ignore critical service check with message of "time-out" from remote npre daemon ?

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Mon Mar 27, 2017 3:43 pm
by avandemore
You can use plugin thresholds like this:

https://support.nagios.com/forum/viewto ... 02#p209502

However it may be more desirable to use -u.

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Mon Mar 27, 2017 3:45 pm
by dwhitfield
In addition to what @avandemore said, you could also increase the global timeout in the nagios.cfg. It's possible this will lead to increased load on the server, but it's an option.

What plugin are you using? We can be more specific with help in that regard.

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Tue Mar 28, 2017 2:32 am
by nagmoto
Thanks for the great pointer.
I am using following currently.

Code: Select all

check_ora_tablespace -t 120 .....
I will try

Code: Select all

check_ora_tablespace -u -t 120 .....

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Tue Mar 28, 2017 9:11 am
by cdienger
Let us know once you've had a chance to try it. Thanks!

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Tue Mar 28, 2017 1:57 pm
by nagmoto
sure, I will update.

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Tue Mar 28, 2017 2:04 pm
by cdienger
Thanks.

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Tue Mar 28, 2017 5:23 pm
by nagmoto
1. On remote oracle server:, I made the following change.


1. This is my check command on Nagios server side.

Code: Select all

check_command   check_nrpe!check_ora_tablespace -u -t 120 -a nagios dbpas123 m218279dcss3001.tet.com pbax081 95 80 1523
     
2. and I am seeing this in the nagios server:/var/log/nagios/nagios.log

Code: Select all

   
Oracle Tablespace Check 7A;CRITICAL;HARD;3;(Service check timed out after 60.01 seconds)
     
look like -t 120 was not passed on.
3. So I login into oracle server change the timeout in /etc/nagios/nrpe.cfg from 60 to 120 seconds.

Code: Select all

   command_timeout=60 -> command_timeout=120
This change should decrease the timeout frequency a lot.
4. This is the check_ora_tablespace command definition in nrpe agent.

Code: Select all

command[check_ora_tablespace]=/usr/lib64/nagios/plugins/contrib/dbmon.py -a tablespaceUsage -u $ARG1$ -p $ARG2$ -s $ARG3$ -d $ARG4$ -c $ARG5$ -w $ARG6$ -r $ARG7$
4. Question:
How can I pass on the more desirable "-u" option ?
from host's service check definition ?
Is there a variable can be set for "-u" effect in nrpe agent:/etc/nagios/nrpe.cfg ?

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Wed Mar 29, 2017 10:08 am
by avandemore
This document explains how to get NRPE working along with how the arguments work:

https://assets.nagios.com/downloads/nag ... g_NRPE.pdf

This document is for troubleshooting NRPE issues:

https://assets.nagios.com/downloads/nag ... utions.pdf

In the context I was referring to, -u is an argument for check_nrpe, not the plugin. You can view information about that by running:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -h

Re: nrpe; ignore connection time-out triggered critical aler

Posted: Wed Mar 29, 2017 4:17 pm
by nagmoto
Now that I set the time out value on nrpe agent's /etc/nagios/nrpe.cfg file.
Is following syntax correct to pass on "-u" to remote nrpe agent from nagios server ?

Code: Select all

check_command   check_nrpe!check_ora_tablespace -u  -a nagios dbpas123 m218279dcss3001.tet.com pbax081 95 80 1523