Page 1 of 1
Cleaning out False Positive Alerts (ssl handshake/timeouts)
Posted: Thu Aug 11, 2016 10:14 am
by and1100
Hi,
I am trying to disable warning/criticals (essentially the email alerts) that are sent when the following events happen:
- random 'CHECK_NRPE: Error - Could not complete SSL handshake.' on servers that most definitely work. Recovery happens a few moments later. Very sporadic and random (and annoying).
- sporadic 'CHECK_NRPE: Socket timeout after 30 seconds.' Also recovers quickly.
Are there ways to do the following:
- make these queries NOT be warn/criticals?
or
-disable alerts for such events
or
-disable whatever possible to stop these checks from failing in this manner
Thank you.
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Thu Aug 11, 2016 4:22 pm
by rkennedy
There are a couple ways this could be done:
- You should be able to raise the max_check_attempts, which will help to counteract the false positive.
- Increase your notification_interval to a longer amount of time so that it has time to resolve itself, thus preventing false positives.
Will either of those solutions work for you?
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Thu Aug 11, 2016 5:00 pm
by Box293
and1100 wrote: make these queries NOT be warn/criticals?
NRPE v3 has the ability to do this (check_nrpe):
NEW TIMEOUT SYNTAX
-t <interval>:<state>
<interval> = Number of seconds before connection times out (default=10)
<state> = Check state to exit with in the event of a timeout (default=CRITICAL)
Timeout state must be a valid state name (case-insensitive) or integer:
(OK, WARNING, CRITICAL, UNKNOWN) or integer (0-3)
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H centos18 -t 2:3
CHECK_NRPE STATE UNKNOWN: Socket timeout after 2 seconds.
echo $?
3
https://support.nagios.com/kb/article.php?id=515
https://support.nagios.com/kb/article.php?id=520
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Fri Aug 12, 2016 6:36 am
by and1100
rkennedy wrote:There are a couple ways this could be done:
- You should be able to raise the max_check_attempts, which will help to counteract the false positive.
- Increase your notification_interval to a longer amount of time so that it has time to resolve itself, thus preventing false positives.
Will either of those solutions work for you?
Hi -- I think this is doable. I am upping the max_check_attempts on each individual service, correct?
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Fri Aug 12, 2016 6:37 am
by and1100
Box293 wrote:and1100 wrote: make these queries NOT be warn/criticals?
NRPE v3 has the ability to do this (check_nrpe):
NEW TIMEOUT SYNTAX
-t <interval>:<state>
<interval> = Number of seconds before connection times out (default=10)
<state> = Check state to exit with in the event of a timeout (default=CRITICAL)
Timeout state must be a valid state name (case-insensitive) or integer:
(OK, WARNING, CRITICAL, UNKNOWN) or integer (0-3)
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H centos18 -t 2:3
CHECK_NRPE STATE UNKNOWN: Socket timeout after 2 seconds.
echo $?
3
https://support.nagios.com/kb/article.php?id=515
https://support.nagios.com/kb/article.php?id=520
Hi -- this sounds great, however, wouldn't this be a change I would have to do on each client? I guess my goal is to be able to make the changes on the Nagios server itself. I should have specified that. Is there a way to disable SSL handshaking on the Nagios server as a whole, rather than nrpe checks individually?
Thank you.
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Fri Aug 12, 2016 12:26 pm
by rkennedy
Nope, this would just take adjusting your check_nrpe command to use the -t 2:3 parameter. In your case, it would probably be -t 30:0. (time out 30 seconds, OK if time out)
Keep in mind this is a feature of NRPE v3. SSL needs to be configured on the client side, as that's where it's specified what needs to be used to talk to the client.
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Wed Aug 17, 2016 6:35 am
by and1100
Hi Guys,
I am back now and giving this a shot. Recently, we had this scenario happen:
A host randomly threw out this email alert:
Hyperlink: TBD;
Additional Info:
CHECK_NRPE: Error - Could not complete SSL handshake.
The max_check_attempts for the HOST seems to be pretty high already (it's 10).
max_check_attempts 10
The notification interval seems to be standard:
notification_interval 30
Here's an example of the service that threw out this alert:
define service{
use generic-service
host_name ahost
service_description Process: sssd
check_command check_nrpe_linux!check_sssd
}
I'm not seeing where to implement check_nrpe to use the parameters you've said? Just to confirm, I need to recompile nrpe to v3 on the Nagios server? If so, I'm performing that right now.
Thanks for all of your help.
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Wed Aug 17, 2016 11:47 am
by rkennedy
It really depends how your check_nrpe_linux command is defined, but I imagine you could hard code pass it there for every service check. Keep in mind though, that this is only applicable to NRPE v3, nothing prior. What you'd want to pass is probably -t 60:0
Yes, you need to upgrade to NRPE v3 for this.
Re: Cleaning out False Positive Alerts (ssl handshake/timeou
Posted: Wed Aug 17, 2016 4:31 pm
by Box293
and1100 wrote:I need to recompile nrpe to v3 on the Nagios server? If so, I'm performing that right now.
Just
check_nrpe, which is outlined at the bottom of this KB article:
https://support.nagios.com/kb/article.php?id=515