nrpe "Connection refused by host"

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

Tony, that's a great trick: turning on debug_logging on the server. I chose debug_level=16 (Host/service checks), and even then I had to quickly rename debug log files as they filled up and I waited for my checks to run.

Here's the output, grep'd for my remote host name:

Code: Select all

root@asb-con-ngs-001:/usr/local/nagios/var #> grep asb-sac-jac-001 nagios.deb*
nagios.debug:[1299104890.159400] [016.0] [pid=11518] Scheduling a forced, active check of service 'SSH/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 14:27:58 2011
nagios.debug:[1299104890.160424] [016.0] [pid=11518] Scheduling a forced, active check of service 'Load/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 14:27:58 2011
nagios.debug:[1299104890.160779] [016.0] [pid=11518] Scheduling a forced, active check of service 'JBoss/PVTL' on host 'asb-sac-jac-001' @ Wed Mar  2 14:27:58 2011
nagios.debug:[1299104890.219353] [016.0] [pid=11518] Attempting to run scheduled check of service 'SSH/LINUX' on host 'asb-sac-jac-001': check options=1, latency=12.219000
nagios.debug:[1299104890.219388] [016.0] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104890.341347] [016.0] [pid=11518] Attempting to run scheduled check of service 'Load/LINUX' on host 'asb-sac-jac-001': check options=1, latency=12.341000
nagios.debug:[1299104890.342928] [016.0] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104890.466791] [016.0] [pid=11518] Attempting to run scheduled check of service 'JBoss/PVTL' on host 'asb-sac-jac-001': check options=1, latency=12.466000
nagios.debug:[1299104890.467505] [016.0] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.717799] [016.1] [pid=11518] Handling check result for service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.717856] [016.0] [pid=11518] ** Handling check result for service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.717876] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: SSH/LINUX, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: SSH OK - OpenSSH_4.3 (protocol 2.0)\n
nagios.debug:[1299104895.718062] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'SSH/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 17:28:10 2011
nagios.debug:[1299104895.718903] [016.1] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104895.718970] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104895.722048] [016.1] [pid=11518] Handling check result for service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722067] [016.0] [pid=11518] ** Handling check result for service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722083] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: Load/LINUX, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 2, OUTPUT: Connection refused by host\n
nagios.debug:[1299104895.722227] [016.0] [pid=11518] ** On-demand check for host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722244] [016.0] [pid=11518] ** Run sync check of host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722298] [016.0] [pid=11518] ** Executing sync check of host 'asb-sac-jac-001'...
nagios.debug:[1299104896.843634] [016.1] [pid=11518] HOST: asb-sac-jac-001, ATTEMPT=1/10, CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=0, NEW STATE=0
nagios.debug:[1299104896.843690] [016.1] [pid=11518] Pre-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug:[1299104896.843710] [016.1] [pid=11518] Post-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug:[1299104896.843728] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.843825] [016.1] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.843860] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.843975] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'Load/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 15:28:10 2011
nagios.debug:[1299104896.845086] [016.1] [pid=11518] Handling check result for service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845180] [016.0] [pid=11518] ** Handling check result for service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845200] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: JBoss/PVTL, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 2, OUTPUT: Connection refused by host\n
nagios.debug:[1299104896.845313] [016.0] [pid=11518] ** On-demand check for host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845330] [016.0] [pid=11518] ** Run sync check of host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845416] [016.1] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.845453] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.845533] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'JBoss/PVTL' on host 'asb-sac-jac-001' @ Wed Mar  2 14:33:10 2011
nagios.debug.1:[1299104844.425668] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.1:[1299104844.644729] [016.1] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001' for flapping...
nagios.debug.1:[1299104844.644905] [016.1] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug.1:[1299104844.645081] [016.1] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001' for flapping...
Looking at timestamps 1299104895.722083 and 1299104896.845200 it looks to me like the JBoss and Load checks (the ones using nrpe) ran when scheduled, and got the result I've been seeing: "Connection refused by host".
Maybe the return code of 2 is a problem.

Here's what running these 2 checks manually look like:

Code: Select all

nagios@h:w #> whoami
nagios
nagios@h:w #> /usr/local/nagios/libexec/check_nrpe -H asb-sac-jac-001 -c check_jboss_log -t 90
OK - 0 Pivotal errors found
nagios@h:w #> /usr/local/nagios/libexec/check_nrpe -H asb-sac-jac-001 -c check_load
OK - load average: 0.08, 0.12, 0.06|load1=0.080;15.000;30.000;0; load5=0.120;10.000;25.000;0; load15=0.060;5.000;20.000;0; 
Thanks for any ideas....Lyle
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: nrpe "Connection refused by host"

Post by mguthrie »

I'm noticing you have this as your service definition
check_command check_nrpe!check_jbosslog

But you're running this from the command-line
check_nrpe -H asb-sac-jac-001 -c check_jboss_log -t 90
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

That *was* a typo. I got my check definition name and script name mixed up. But that didn't fix things, and check_load is just the vanilla stuff from the install.

But I have to confess that I do have a nrpe version mismatch for this remote host.

check_nrpe on the server reports that it's 2.5.1, while the nrpe executable on the remote host reports that it's 2.12

Not sure if this is the problem. Thanks...Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

There was actually a reason I used a debug_level of -1 instead of 16 - you'll note that mine gave the actual command, as it would be run manually, not just a message saying it was running. That's what I'm after here.
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

To the untrained eye, it doesn't look a lot different with debug_level=-1, but here it is:

Code: Select all

root@asb-con-ngs-001:/usr/local/nagios/var #> grep asb-sac-jac-001 nagios.debug*
nagios.debug.old.2:[1299108786.075906] [128.1] [pid=11518] Command Arguments: asb-sac-jac-001;1299108771
nagios.debug.old.2:[1299108786.076043] [016.0] [pid=11518] Scheduling a forced, active check of service 'SSH/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 15:32:51 2011
nagios.debug.old.2:[1299108786.077111] [016.0] [pid=11518] Scheduling a forced, active check of service 'Load/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 15:32:51 2011
nagios.debug.old.2:[1299108786.077593] [016.0] [pid=11518] Scheduling a forced, active check of service 'JBoss/PVTL' on host 'asb-sac-jac-001' @ Wed Mar  2 15:32:51 2011
nagios.debug.old.2:[1299108798.048592] [008.0] [pid=11518] ** Service Check Event ==> Host: 'asb-sac-jac-001', Service: 'SSH/LINUX', Options: 1, Latency: 27.048000 sec
nagios.debug.old.2:[1299108798.048631] [016.0] [pid=11518] Attempting to run scheduled check of service 'SSH/LINUX' on host 'asb-sac-jac-001': check options=1, latency=27.048000
nagios.debug.old.2:[1299108798.048702] [016.0] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108798.164065] [008.0] [pid=11518] ** Service Check Event ==> Host: 'asb-sac-jac-001', Service: 'Load/LINUX', Options: 1, Latency: 27.164000 sec
nagios.debug.old.2:[1299108798.164112] [016.0] [pid=11518] Attempting to run scheduled check of service 'Load/LINUX' on host 'asb-sac-jac-001': check options=1, latency=27.164000
nagios.debug.old.2:[1299108798.165679] [016.0] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108798.279435] [008.0] [pid=11518] ** Service Check Event ==> Host: 'asb-sac-jac-001', Service: 'JBoss/PVTL', Options: 1, Latency: 27.279000 sec
nagios.debug.old.2:[1299108798.279482] [016.0] [pid=11518] Attempting to run scheduled check of service 'JBoss/PVTL' on host 'asb-sac-jac-001': check options=1, latency=27.279000
nagios.debug.old.2:[1299108798.281271] [016.0] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.187444] [016.1] [pid=11518] Handling check result for service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.187488] [016.0] [pid=11518] ** Handling check result for service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.187504] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: SSH/LINUX, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: SSH OK - OpenSSH_4.3 (protocol 2.0)\n
nagios.debug.old.2:[1299108802.187809] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'SSH/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 18:33:18 2011
nagios.debug.old.2:[1299108802.188577] [016.1] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108802.188628] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108802.188817] [016.1] [pid=11518] Handling check result for service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.188850] [016.0] [pid=11518] ** Handling check result for service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.188865] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: Load/LINUX, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 2, OUTPUT: Connection refused by host\n
nagios.debug.old.2:[1299108802.188959] [016.0] [pid=11518] ** On-demand check for host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.188987] [016.0] [pid=11518] ** Run sync check of host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108802.189129] [016.0] [pid=11518] ** Executing sync check of host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108803.311564] [016.1] [pid=11518] HOST: asb-sac-jac-001, ATTEMPT=1/10, CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=0, NEW STATE=0
nagios.debug.old.2:[1299108803.311613] [016.1] [pid=11518] Pre-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug.old.2:[1299108803.311661] [016.1] [pid=11518] Post-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug.old.2:[1299108803.311694] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108803.311827] [016.1] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108803.311877] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108803.311949] [032.0] [pid=11518] ** Service Notification Attempt ** Host: 'asb-sac-jac-001', Service: 'Load/LINUX', Type: 0, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
nagios.debug.old.2:[1299108803.312198] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'Load/LINUX' on host 'asb-sac-jac-001' @ Wed Mar  2 16:33:18 2011
nagios.debug.old.2:[1299108803.313165] [016.1] [pid=11518] Handling check result for service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108803.313201] [016.0] [pid=11518] ** Handling check result for service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108803.313217] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: JBoss/PVTL, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 2, OUTPUT: Connection refused by host\n
nagios.debug.old.2:[1299108803.313339] [016.0] [pid=11518] ** On-demand check for host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108803.313367] [016.0] [pid=11518] ** Run sync check of host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108803.313505] [016.1] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108803.313551] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108803.313607] [032.0] [pid=11518] ** Service Notification Attempt ** Host: 'asb-sac-jac-001', Service: 'JBoss/PVTL', Type: 0, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
nagios.debug.old.2:[1299108803.313757] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'JBoss/PVTL' on host 'asb-sac-jac-001' @ Wed Mar  2 15:38:18 2011
nagios.debug.old.2:[1299108804.127324] [008.0] [pid=11518] ** Host Check Event ==> Host: 'asb-sac-jac-001', Options: 0, Latency: 16.127000 sec
nagios.debug.old.2:[1299108804.127382] [016.0] [pid=11518] Attempting to run scheduled check of host 'asb-sac-jac-001': check options=0, latency=16.127000
nagios.debug.old.2:[1299108804.127419] [016.0] [pid=11518] ** Running async check of host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108804.128303] [016.0] [pid=11518] Checking host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108807.133749] [016.1] [pid=11518] Handling check result for host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108807.133781] [016.1] [pid=11518] ** Handling async check result for host 'asb-sac-jac-001'...
nagios.debug.old.2:[1299108807.133911] [016.1] [pid=11518] HOST: asb-sac-jac-001, ATTEMPT=1/10, CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=0, NEW STATE=0
nagios.debug.old.2:[1299108807.133958] [016.1] [pid=11518] Pre-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug.old.2:[1299108807.134002] [016.1] [pid=11518] Post-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug.old.2:[1299108807.134032] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.old.2:[1299108807.134136] [016.0] [pid=11518] Scheduling a non-forced, active check of host 'asb-sac-jac-001' @ Wed Mar  2 15:38:27 2011
nagios.debug.old.2:[1299108807.134449] [016.1] [pid=11518] ** Async check result for host 'asb-sac-jac-001' handled: new state=0
The results happen at timestamps 1299108802.188865 and 1299108803.313217

Thanks for the help on this one, Tony. I can tell you're checking back frequently.

But I thought I'd get the book thrown at me for the version mismatch. :roll:

....Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

Can you try actually grepping for libexec, not the host name? The particular line I'm looking for will have the IP address, not the name:

Code: Select all

tail -f /usr/local/nagios/var/nagios.debug | grep libexec
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

:shock:

I hesitated to send you all 500 lines of debug info containing "libexec", so I started looking for the IP address of the remote host. Couldn't find it, so I looked for "check_jboss_log" and saw an unexpected address. Turns out it was the address of our F5 load balancers that the JBoss servers sit behind. The JBoss servers use the F5's as their default gateway.

I'm guessing that when I issue the check manually, the nslookup happens and I reach the JBoss server. But the Core Server must cache the IP address of the F5's as the return address of the JBoss servers.

Not sure how to solve this, but it's a relief to know I'm not going nuts. :)

...Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

Oh, interesting. So I take it then that your host definitions have "address" set as the DNS name, not the IP address? I suppose one approach would be to just define them with the IP address statically, although it'd be more interesting to figure out how to make the lookup work as expected.
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

Well it turns out the resolution wasn't really *that* interesting, Tony.

I had mistakenly entered the addresses of the F5's into the JBoss server host definitions. Maybe I was told "you get to the JBoss servers via the F5's". Host checks were happy, though checking the wrong remote host. But when I started using check_nrpe, nothing was on the F5 end to accept. Of course when I did a check manually, nslookup translated the address correctly.

Thanks for all the help on this one.....Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

Ah, well I'm glad that's sorted!
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
Locked