Cleaning out False Positive Alerts (ssl handshake/timeouts)

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
and1100
Posts: 93
Joined: Mon Mar 25, 2013 8:37 am

Cleaning out False Positive Alerts (ssl handshake/timeouts)

Post by and1100 »

Hi,

I am trying to disable warning/criticals (essentially the email alerts) that are sent when the following events happen:

- random 'CHECK_NRPE: Error - Could not complete SSL handshake.' on servers that most definitely work. Recovery happens a few moments later. Very sporadic and random (and annoying).

- sporadic 'CHECK_NRPE: Socket timeout after 30 seconds.' Also recovers quickly.


Are there ways to do the following:
- make these queries NOT be warn/criticals?
or
-disable alerts for such events
or
-disable whatever possible to stop these checks from failing in this manner

Thank you.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by rkennedy »

There are a couple ways this could be done:
- You should be able to raise the max_check_attempts, which will help to counteract the false positive.
- Increase your notification_interval to a longer amount of time so that it has time to resolve itself, thus preventing false positives.

Will either of those solutions work for you?
Former Nagios Employee
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by Box293 »

and1100 wrote: make these queries NOT be warn/criticals?
NRPE v3 has the ability to do this (check_nrpe):
NEW TIMEOUT SYNTAX
-t <interval>:<state>
<interval> = Number of seconds before connection times out (default=10)
<state> = Check state to exit with in the event of a timeout (default=CRITICAL)
Timeout state must be a valid state name (case-insensitive) or integer:
(OK, WARNING, CRITICAL, UNKNOWN) or integer (0-3)

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H centos18 -t 2:3
CHECK_NRPE STATE UNKNOWN: Socket timeout after 2 seconds.

echo $?
3
https://support.nagios.com/kb/article.php?id=515
https://support.nagios.com/kb/article.php?id=520
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
and1100
Posts: 93
Joined: Mon Mar 25, 2013 8:37 am

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by and1100 »

rkennedy wrote:There are a couple ways this could be done:
- You should be able to raise the max_check_attempts, which will help to counteract the false positive.
- Increase your notification_interval to a longer amount of time so that it has time to resolve itself, thus preventing false positives.

Will either of those solutions work for you?
Hi -- I think this is doable. I am upping the max_check_attempts on each individual service, correct?
and1100
Posts: 93
Joined: Mon Mar 25, 2013 8:37 am

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by and1100 »

Box293 wrote:
and1100 wrote: make these queries NOT be warn/criticals?
NRPE v3 has the ability to do this (check_nrpe):
NEW TIMEOUT SYNTAX
-t <interval>:<state>
<interval> = Number of seconds before connection times out (default=10)
<state> = Check state to exit with in the event of a timeout (default=CRITICAL)
Timeout state must be a valid state name (case-insensitive) or integer:
(OK, WARNING, CRITICAL, UNKNOWN) or integer (0-3)

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H centos18 -t 2:3
CHECK_NRPE STATE UNKNOWN: Socket timeout after 2 seconds.

echo $?
3
https://support.nagios.com/kb/article.php?id=515
https://support.nagios.com/kb/article.php?id=520
Hi -- this sounds great, however, wouldn't this be a change I would have to do on each client? I guess my goal is to be able to make the changes on the Nagios server itself. I should have specified that. Is there a way to disable SSL handshaking on the Nagios server as a whole, rather than nrpe checks individually?

Thank you.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by rkennedy »

Nope, this would just take adjusting your check_nrpe command to use the -t 2:3 parameter. In your case, it would probably be -t 30:0. (time out 30 seconds, OK if time out)

Keep in mind this is a feature of NRPE v3. SSL needs to be configured on the client side, as that's where it's specified what needs to be used to talk to the client.
Former Nagios Employee
and1100
Posts: 93
Joined: Mon Mar 25, 2013 8:37 am

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by and1100 »

Hi Guys,

I am back now and giving this a shot. Recently, we had this scenario happen:

A host randomly threw out this email alert:

Hyperlink: TBD;
Additional Info:

CHECK_NRPE: Error - Could not complete SSL handshake.

The max_check_attempts for the HOST seems to be pretty high already (it's 10).
max_check_attempts 10

The notification interval seems to be standard:
notification_interval 30

Here's an example of the service that threw out this alert:

define service{
use generic-service
host_name ahost
service_description Process: sssd
check_command check_nrpe_linux!check_sssd
}

I'm not seeing where to implement check_nrpe to use the parameters you've said? Just to confirm, I need to recompile nrpe to v3 on the Nagios server? If so, I'm performing that right now.

Thanks for all of your help.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by rkennedy »

It really depends how your check_nrpe_linux command is defined, but I imagine you could hard code pass it there for every service check. Keep in mind though, that this is only applicable to NRPE v3, nothing prior. What you'd want to pass is probably -t 60:0

Yes, you need to upgrade to NRPE v3 for this.
Former Nagios Employee
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Cleaning out False Positive Alerts (ssl handshake/timeou

Post by Box293 »

and1100 wrote:I need to recompile nrpe to v3 on the Nagios server? If so, I'm performing that right now.
Just check_nrpe, which is outlined at the bottom of this KB article:
https://support.nagios.com/kb/article.php?id=515
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked