check_nrpe works from CLI, fails from server with timeout
-
ChristopherSchultz
- Posts: 7
- Joined: Tue Jun 18, 2013 10:58 am
check_nrpe works from CLI, fails from server with timeout
Today I saw that one of my services was giving an error with the message: "CHECK_NRPE: Socket timeout after 10 seconds."
I figured the service was down so I started checking. The service was up, so Nagios was making a mistake. So I went to the command line on the Nagios server (the one making the check_nrpe call, not the server being probed) and did this:
$ time /usr/lib/nagios/plugins/check_nrpe -H my.hostname -c check_my_nrpe_service
PING OK - Packet loss = 0%, RTA = 88.35 ms|rta=88.345001ms;100.000000;1000.000000;0.000000 pl=0%;10;10;0
real 0m4.129s
user 0m0.008s
sys 0m0.000s
(The "service" to be checked is basically running check_ping).
So, the probed server responds within 5 seconds, but check_nrpe complains about a 10-second timeout.
I have other services on this same server being checked via NRPE (e.g. system load, user load, disk space, etc.) and they all seem to work without a problem.
I searched around and the only promising lead was a badly-cached IP address lookup (which *has* happened to me when configuring iptables and a host's IP address changes), but I double-checked the hostname in the monitor's config file (it's correct), DNS resolves correctly, and I have restarted Nagios entirely just in case there was an incorrect cached DNS lookup. No change in behavior.
Any suggestions?
I figured the service was down so I started checking. The service was up, so Nagios was making a mistake. So I went to the command line on the Nagios server (the one making the check_nrpe call, not the server being probed) and did this:
$ time /usr/lib/nagios/plugins/check_nrpe -H my.hostname -c check_my_nrpe_service
PING OK - Packet loss = 0%, RTA = 88.35 ms|rta=88.345001ms;100.000000;1000.000000;0.000000 pl=0%;10;10;0
real 0m4.129s
user 0m0.008s
sys 0m0.000s
(The "service" to be checked is basically running check_ping).
So, the probed server responds within 5 seconds, but check_nrpe complains about a 10-second timeout.
I have other services on this same server being checked via NRPE (e.g. system load, user load, disk space, etc.) and they all seem to work without a problem.
I searched around and the only promising lead was a badly-cached IP address lookup (which *has* happened to me when configuring iptables and a host's IP address changes), but I double-checked the hostname in the monitor's config file (it's correct), DNS resolves correctly, and I have restarted Nagios entirely just in case there was an incorrect cached DNS lookup. No change in behavior.
Any suggestions?
Re: check_nrpe works from CLI, fails from server with timeou
Any reason why you are using check_nrpe to do a ping check? (Are you checking a separate network node?)
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
ChristopherSchultz
- Posts: 7
- Joined: Tue Jun 18, 2013 10:58 am
Re: check_nrpe works from CLI, fails from server with timeou
Yes, I'm using check_ping from the remote host because I have to check to see whether a VPN tunnel is available from that host. I can't check it from anywhere else.
Re: check_nrpe works from CLI, fails from server with timeou
Try to increase the timeout, just in case the scheduler is a bit behind, or the server is under load.
Was this check working at one point? Or has it been failing since deployment?
Was this check working at one point? Or has it been failing since deployment?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
ChristopherSchultz
- Posts: 7
- Joined: Tue Jun 18, 2013 10:58 am
Re: check_nrpe works from CLI, fails from server with timeou
It /was/ working for a while. I have tried increasing the timeout, but I may be doing it incorrectly:
define command {
command_name check_nrpe_with_timeout
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$
}
define service {
use local-service
host_name hostname
service_description VPN:[Client Name]
check_command check_nrpe_with_timeout!check_VPN_[client_name]!30
}
I still get this error:
CHECK_NRPE: Socket timeout after 10 seconds.
I would have expected "socket timeout after 30 seconds" when specifying the timeout. I definitely restarted Nagios after making those changes, and I have only one Nagios server running -- no intermediaries).
define command {
command_name check_nrpe_with_timeout
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$
}
define service {
use local-service
host_name hostname
service_description VPN:[Client Name]
check_command check_nrpe_with_timeout!check_VPN_[client_name]!30
}
I still get this error:
CHECK_NRPE: Socket timeout after 10 seconds.
I would have expected "socket timeout after 30 seconds" when specifying the timeout. I definitely restarted Nagios after making those changes, and I have only one Nagios server running -- no intermediaries).
Re: check_nrpe works from CLI, fails from server with timeou
You may have the directive:
OR
declared in the remote host's nrpe.cfg
Code: Select all
command_timeout=10Code: Select all
connection_timeout=10Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
ChristopherSchultz
- Posts: 7
- Joined: Tue Jun 18, 2013 10:58 am
Re: check_nrpe works from CLI, fails from server with timeou
This is all I have:
$ grep _timeout `find . -type f`
./conf.d/my_vpn_host.cfg: command_name check_nrpe_with_timeout
./conf.d/my_vpn_host.cfg: check_command check_nrpe_with_timeout!check_VPN_client_name!30
./nagios.cfg:service_check_timeout=60
./nagios.cfg:host_check_timeout=30
./nagios.cfg:event_handler_timeout=30
./nagios.cfg:notification_timeout=30
./nagios.cfg:ocsp_timeout=5
./nagios.cfg:perfdata_timeout=5
Any other suggestions?
$ grep _timeout `find . -type f`
./conf.d/my_vpn_host.cfg: command_name check_nrpe_with_timeout
./conf.d/my_vpn_host.cfg: check_command check_nrpe_with_timeout!check_VPN_client_name!30
./nagios.cfg:service_check_timeout=60
./nagios.cfg:host_check_timeout=30
./nagios.cfg:event_handler_timeout=30
./nagios.cfg:notification_timeout=30
./nagios.cfg:ocsp_timeout=5
./nagios.cfg:perfdata_timeout=5
Any other suggestions?
-
ChristopherSchultz
- Posts: 7
- Joined: Tue Jun 18, 2013 10:58 am
Re: check_nrpe works from CLI, fails from server with timeou
Whoops, I just realized that you might have meant the server being monitored -- seeing as how you suggested looking at nrpe.cfg.
I only have /etc/nagios/nrpe.cfg -- no other configuration files on the server.
$ grep _timeout nrpe.cfg
command_timeout=60
connection_timeout=300
So the 10-second timeout is still a mystery to me.
I only have /etc/nagios/nrpe.cfg -- no other configuration files on the server.
$ grep _timeout nrpe.cfg
command_timeout=60
connection_timeout=300
So the 10-second timeout is still a mystery to me.
Re: check_nrpe works from CLI, fails from server with timeou
Also check your nagios.cfg on the core server for:
Code: Select all
service_check_timeout=10Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
ChristopherSchultz
- Posts: 7
- Joined: Tue Jun 18, 2013 10:58 am
Re: check_nrpe works from CLI, fails from server with timeou
You can see above that it is already set to 60:
> ./nagios.cfg:service_check_timeout=60
> ./nagios.cfg:service_check_timeout=60