check_http certificate socket timeout

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

check_http certificate socket timeout

Post by Shivaramakrishnan »

Hello,
I am trying to troubleshoot this problem for over 3 days now.
I have a remote server where I have cert check.I am using check_http plugin.The check used to work fine till now.But from past few days,I get socket time out alerts.I checked the server for load issues,but the load seems to be absolutely fine.I tried to increase the timeout interval on the remote to 120 seconds.But it did not fix the issue.Once I get the socket time out email,after some time it clears out by itself.Below my command definition.

command[check_cert]=/usr/lib/nagios/plugins/check_http --ssl -I xyz.com -C 30 -t 120
OK - Certificate will expire on 09/22/2013 15:01.

command[check_cert1]=/usr/lib/nagios/plugins/check_http --ssl -I abc.com -C 30 -t 120
OK - Certificate will expire on 09/11/2013 12:00.

How could I fix this?Any help would be greatly appreciated.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: check_http certificate socket timeout

Post by sreinhardt »

When you are seeing this error, are you able to run the check manually and get a correct result? Also, do you have other checks on that web server to verify that nothing else is happening to it? Everything with your checks look great, so that ideally shouldn't be the issue.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: check_http certificate socket timeout

Post by Shivaramakrishnan »

I tried manually running the command from the nagios server and on the remote server when I got the alert.

At the time of alert:
mgr01:~# /usr/lib/nagios/plugins/check_http --ssl -I xyz.com -C 30 -t 120
CRITICAL - Cannot make SSL connection
OK - Certificate will expire on 09/22/2013 15:01.

Once the alert recovers:
mgr01:~# /usr/lib/nagios/plugins/check_http --ssl -I xyz.com -C 30 -t 120
OK - Certificate will expire on 09/22/2013 15:01.

From Nagios Server:
ngo01.atl2:/var/log/nagios3# /usr/lib/nagios/plugins/check_nrpe -H 10.12.7.210 -p 5666 -c check_cert
CHECK_NRPE: Socket timeout after 10 seconds.



I have other checks running fine on the remote server.I did not get any alerts wrt load or total procs. Any idea?
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: check_http certificate socket timeout

Post by sreinhardt »

Would I be correct to assume mrg01 is the remote server that you are running nrpe checks on, and ngo01.atl2 is the nagios server that is executing nrpe? If so, at the time of an error have you verified that you can run another, or the same nrpe check from ngo01 to mgr01? Also other checks may not be seeing the issue as well, as they may not be scheduled to run a check at the same time, and it seems that this is recovering pretty quickly. You could try a couple things:

1) increase the nrpe timeout to 30 or 60 seconds and see if the same error occurs. ./check_nrpe -t [time in seconds]
2) Impliment flap detection on that service, so that if it fails and then comes back within a moment or two, it does not alert you.
Detection and Handling of State Flapping
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: check_http certificate socket timeout

Post by Shivaramakrishnan »

Yes,your assumptions are right.
I tried that already.When I try to run using check_nrpe from nagios server at the time of alert ,I get socket time out.
I believe that I have already increased the nrpe time out interval to 120 seconds on the remote server.
mgr01:~# /usr/lib/nagios/plugins/check_http --ssl -I xyz.com -C 30 -t 120


Do I need to do any changes on the nagios server?

This is what I have in my nagios server for defining the service:
define service{
name generic-service
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 0 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_interval 0 ; Only send notifications on status change by default.
is_volatile 0
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 3
notification_period 24x7
notification_options w,u,c,r
notification_interval 60
contact_groups admins
register 0
}

------
define service {
use generic-service
host_name mgr01.atl2
service_description HTTPS Check Cert - xyz.com
check_command check_nrpe_1arg!check_cert
}
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: check_http certificate socket timeout

Post by sreinhardt »

Actually yes, you will need to change the check_nrpe timeout not check_http. While check_http may, although does not seem to, be having issues. It is more so the check_nrpe timeout that is causing the problem for you, especially with the increase to check_http above the default check_nrpe timeout of 30 seconds if I recall. It is the same flag for both checks though.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: check_http certificate socket timeout

Post by Shivaramakrishnan »

where do I change the check_nrpe timeout interval?Whether it needs to be on remote server or on the nagios server?Can you let me know which file to change and what parameters that needs to be changed?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_http certificate socket timeout

Post by abrist »

You will find the line in /usr/local/nagios/etc/nrpe.cfg on the remote host:

Code: Select all

command_timeout=[timeout in seconds]
It is measured in seconds, so set it accordingly. You will also have to edit the check_nrpe command on the server to include the "-t [seconds]" to increase the wait time.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: check_http certificate socket timeout

Post by Shivaramakrishnan »

Is there a way how I could increase the timeout interval for a particular service below?
This is my service definition in nagios server.Is there something like
service_check_timeout 120
that I could define here?Is this right ?
The reason I am asking is my remote server's nrpe.cfg is controlled by puppet server.If I change the timeout interval,then it would be changing on all the remote servers.


define service {
use generic-service
host_name mgr01.atl2
service_description HTTPS Check Cert - xyz.com
check_command check_nrpe_1arg!check_cert
}
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: check_http certificate socket timeout

Post by sreinhardt »

From some internal testing last week, there is a -t flag for check_nrpe. However it is not used to allow a check to override the current set amount but more so limit to a lower amount than the max check time. It seems that you need to increase both the client and server nrpe timeout settings or else it will not work as expected. Is it possible to exclude the config from being copied for a short time, day or two, to verify the issue. Otherwise as I mentioned before flap detection is probably a great and simple resolve for this, as it appears that it only fails for a single check then comes back. Flap detection was designed specifically to handle situations where a service may intermittently have issues but should not send a warning unless it has had continued issues, say 3, 4, 5 checks that fail. When that single timeout happens, instead of sending you an alert for failure and recovery, it proceeds along and if it fails for another set amount of checks then it alerts you.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked