Re: nrpe "Connection refused by host"
Posted: Wed Mar 02, 2011 5:52 pm
Tony, that's a great trick: turning on debug_logging on the server. I chose debug_level=16 (Host/service checks), and even then I had to quickly rename debug log files as they filled up and I waited for my checks to run.
Here's the output, grep'd for my remote host name:
Looking at timestamps 1299104895.722083 and 1299104896.845200 it looks to me like the JBoss and Load checks (the ones using nrpe) ran when scheduled, and got the result I've been seeing: "Connection refused by host".
Maybe the return code of 2 is a problem.
Here's what running these 2 checks manually look like:
Thanks for any ideas....Lyle
Here's the output, grep'd for my remote host name:
Code: Select all
root@asb-con-ngs-001:/usr/local/nagios/var #> grep asb-sac-jac-001 nagios.deb*
nagios.debug:[1299104890.159400] [016.0] [pid=11518] Scheduling a forced, active check of service 'SSH/LINUX' on host 'asb-sac-jac-001' @ Wed Mar 2 14:27:58 2011
nagios.debug:[1299104890.160424] [016.0] [pid=11518] Scheduling a forced, active check of service 'Load/LINUX' on host 'asb-sac-jac-001' @ Wed Mar 2 14:27:58 2011
nagios.debug:[1299104890.160779] [016.0] [pid=11518] Scheduling a forced, active check of service 'JBoss/PVTL' on host 'asb-sac-jac-001' @ Wed Mar 2 14:27:58 2011
nagios.debug:[1299104890.219353] [016.0] [pid=11518] Attempting to run scheduled check of service 'SSH/LINUX' on host 'asb-sac-jac-001': check options=1, latency=12.219000
nagios.debug:[1299104890.219388] [016.0] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104890.341347] [016.0] [pid=11518] Attempting to run scheduled check of service 'Load/LINUX' on host 'asb-sac-jac-001': check options=1, latency=12.341000
nagios.debug:[1299104890.342928] [016.0] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104890.466791] [016.0] [pid=11518] Attempting to run scheduled check of service 'JBoss/PVTL' on host 'asb-sac-jac-001': check options=1, latency=12.466000
nagios.debug:[1299104890.467505] [016.0] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.717799] [016.1] [pid=11518] Handling check result for service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.717856] [016.0] [pid=11518] ** Handling check result for service 'SSH/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.717876] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: SSH/LINUX, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: SSH OK - OpenSSH_4.3 (protocol 2.0)\n
nagios.debug:[1299104895.718062] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'SSH/LINUX' on host 'asb-sac-jac-001' @ Wed Mar 2 17:28:10 2011
nagios.debug:[1299104895.718903] [016.1] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104895.718970] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104895.722048] [016.1] [pid=11518] Handling check result for service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722067] [016.0] [pid=11518] ** Handling check result for service 'Load/LINUX' on host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722083] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: Load/LINUX, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 2, OUTPUT: Connection refused by host\n
nagios.debug:[1299104895.722227] [016.0] [pid=11518] ** On-demand check for host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722244] [016.0] [pid=11518] ** Run sync check of host 'asb-sac-jac-001'...
nagios.debug:[1299104895.722298] [016.0] [pid=11518] ** Executing sync check of host 'asb-sac-jac-001'...
nagios.debug:[1299104896.843634] [016.1] [pid=11518] HOST: asb-sac-jac-001, ATTEMPT=1/10, CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=0, NEW STATE=0
nagios.debug:[1299104896.843690] [016.1] [pid=11518] Pre-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug:[1299104896.843710] [016.1] [pid=11518] Post-handle_host_state() Host: asb-sac-jac-001, Attempt=1/10, Type=HARD, Final State=0
nagios.debug:[1299104896.843728] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.843825] [016.1] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.843860] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.843975] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'Load/LINUX' on host 'asb-sac-jac-001' @ Wed Mar 2 15:28:10 2011
nagios.debug:[1299104896.845086] [016.1] [pid=11518] Handling check result for service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845180] [016.0] [pid=11518] ** Handling check result for service 'JBoss/PVTL' on host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845200] [016.1] [pid=11518] HOST: asb-sac-jac-001, SERVICE: JBoss/PVTL, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 2, OUTPUT: Connection refused by host\n
nagios.debug:[1299104896.845313] [016.0] [pid=11518] ** On-demand check for host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845330] [016.0] [pid=11518] ** Run sync check of host 'asb-sac-jac-001'...
nagios.debug:[1299104896.845416] [016.1] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.845453] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug:[1299104896.845533] [016.0] [pid=11518] Scheduling a non-forced, active check of service 'JBoss/PVTL' on host 'asb-sac-jac-001' @ Wed Mar 2 14:33:10 2011
nagios.debug.1:[1299104844.425668] [016.1] [pid=11518] Checking host 'asb-sac-jac-001' for flapping...
nagios.debug.1:[1299104844.644729] [016.1] [pid=11518] Checking service 'JBoss/PVTL' on host 'asb-sac-jac-001' for flapping...
nagios.debug.1:[1299104844.644905] [016.1] [pid=11518] Checking service 'Load/LINUX' on host 'asb-sac-jac-001' for flapping...
nagios.debug.1:[1299104844.645081] [016.1] [pid=11518] Checking service 'SSH/LINUX' on host 'asb-sac-jac-001' for flapping...
Maybe the return code of 2 is a problem.
Here's what running these 2 checks manually look like:
Code: Select all
nagios@h:w #> whoami
nagios
nagios@h:w #> /usr/local/nagios/libexec/check_nrpe -H asb-sac-jac-001 -c check_jboss_log -t 90
OK - 0 Pivotal errors found
nagios@h:w #> /usr/local/nagios/libexec/check_nrpe -H asb-sac-jac-001 -c check_load
OK - load average: 0.08, 0.12, 0.06|load1=0.080;15.000;30.000;0; load5=0.120;10.000;25.000;0; load15=0.060;5.000;20.000;0;