I also inherited Nagios core like a lot of people, but moving to Nagios XI very soon.
The system has been running, basically ok for a long time. On Friday morning we started getting a lot of socket timeout errors, but no configuration changes had been made on Nagios. Right now, it will timeout, then later show a recovery. In the past if the problem came up, everyone denied it was their problem, networks and all, but someone always went in and did something, or so it seemed.
My thing is, it's been working for so long, if networks or anyone doesn't want to bother with it, is there a simple way to just increase the timeout setting so I can just focus on getting my new clean, (not inherited) XI install up and running.
Looking at some of the other threads, we do not have the "check_nrpe!check_ftp" issue a different person was having.
Thanks
Socket Timeout Errors
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Socket Timeout Errors
Was this specific to your nrpe checks? If so, you can increase the NRPE check timeouts by adding a '-t <number>' to the command, you can do this for specific services, or for the entire command as a whole. It looks like the name of your nrpe command is the standard check_nrpe, so in your commands.cfg file, find that command and add the timeout range similar to the following:
Code: Select all
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ $ARG2$
-
- Posts: 65
- Joined: Thu Oct 31, 2013 11:50 am
Re: Socket Timeout Errors
I was editing my CommandsCheck.cfg file, and ended up adding the "-t 45" to all the lines in there dealing with NRPE. Then I switched them all to 30 but I was still getting "Socket timeout after 10 seconds" errors.
This doesn't reflect the latest version of my file, it's just where it was at one point the other day. So I'm still working on getting the timeout variable switched to a larger number.
Thanks
;check_nrpe for Windows
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckCPU -a warn=$ARG1$ crit=$ARG2$ time=20m time=10s time=4
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 45 -c CheckDriveSize -a ShowAll MinWarn=$ARG2$ MaxCrit=$ARG3$ Drive=$ARG1$
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 –t 45 -c CheckServiceState -a ShowAll $ARG1$
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckTaskSched -a "filter=title eq $ARG1$ AND exit_code ne 0" "syntax=%title% (%most_recent_run_time%)" crit=>0
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckTaskSched2 -a "filter=title eq $ARG1$ AND exit_code ne 0" "syntax=%title% (%most_recent_run_time%)" crit=>0
command_name check_nrpe_no_ssl
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ –t 45 -n
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_dns
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ –t 45 -c check_active_procs -a "$ARG1$"
command_name check_nrpe_no_arg_nt
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 45 -c $ARG1$
command_name check_nrpe_no_arg
command_line $USER1$/check_nrpe.1.8 -H $HOSTADDRESS$ –t 45 -c $ARG1$
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 45 -a $ARG3$
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ –t 45 -c check_oracle -a $ARG1$
;check_nrpe for NT
command_name nt_check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 45 -c $ARG1$ -a $ARG2$
This doesn't reflect the latest version of my file, it's just where it was at one point the other day. So I'm still working on getting the timeout variable switched to a larger number.
Thanks
;check_nrpe for Windows
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckCPU -a warn=$ARG1$ crit=$ARG2$ time=20m time=10s time=4
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 45 -c CheckDriveSize -a ShowAll MinWarn=$ARG2$ MaxCrit=$ARG3$ Drive=$ARG1$
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 –t 45 -c CheckServiceState -a ShowAll $ARG1$
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckTaskSched -a "filter=title eq $ARG1$ AND exit_code ne 0" "syntax=%title% (%most_recent_run_time%)" crit=>0
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckTaskSched2 -a "filter=title eq $ARG1$ AND exit_code ne 0" "syntax=%title% (%most_recent_run_time%)" crit=>0
command_name check_nrpe_no_ssl
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ –t 45 -n
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_dns
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ –t 45 -c check_active_procs -a "$ARG1$"
command_name check_nrpe_no_arg_nt
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 45 -c $ARG1$
command_name check_nrpe_no_arg
command_line $USER1$/check_nrpe.1.8 -H $HOSTADDRESS$ –t 45 -c $ARG1$
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 45 -a $ARG3$
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ –t 45 -c check_oracle -a $ARG1$
;check_nrpe for NT
command_name nt_check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 45 -c $ARG1$ -a $ARG2$
Re: Socket Timeout Errors
You will also need to edit the command and connection timeouts in the nrpe.cfg of the remote host. Don't forget to restart the remote nrpe daemon afterwards!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.