nrpe "Connection refused by host"

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

nrpe "Connection refused by host"

Post by lyle »

I'm having some problems getting nrpe checks running on a new remote host. Our Core 3.0.3 Solaris Server tells me "Connection refused by host" for the service on the remote host.

On the remote host (Centos 5.5), I've installed nrpe 2.12 per your "Installing the Linux Agent", and written a custom plug-in check.

I can query the remote host interactively from the Core Server, via:
/usr/local/nagios/libexec/check_nrpe -H <my remote host> -c <my new plugin check>
and that all works fine.

With debug on in nrpe.cfg, performing the above interactive query from the Server produces this in the remote host logfile:
Feb 28 10:54:58 asb-sac-jac-001 xinetd[23489]: START: nrpe pid=23515 from=172.20.5.72
Feb 28 10:54:58 asb-sac-jac-001 nrpe[23515]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Feb 28 10:54:58 asb-sac-jac-001 xinetd[23489]: EXIT: nrpe status=0 pid=23515 duration=0(sec)

So that all looks good, and would seem to say ssl handshaking is working ok.

But when I let Nagios Core schedule the check, *nothing" shows up in the logfile on the remote host, and in the logfile on the Core Server, I get:
"sshd[7719]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer"

I think I'm pretty close. Thanks for any advice....Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

a) What does your check_nrpe command definition look like?
b) Why is that log showing messages from sshd?
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

Thanks for the reply, Tony.

Now, I'm thinking that the sshd error in the Core Server's /var/adm/messages is not related to my problem. I've done some tests that seem to confirm that they aren't connected.

The Core webpage does show the checks being executed, but failing with "Connection refused by host". As I said earlier in the thread, I can execute the checks by hand from the Server, and everything is fine.

Here's some of the config files:

Code: Select all

from asb-linux.cfg:
define service {
  use           pvtl-jboss
  host_name     asb-sac-jac-001,asb-sac-jac-002
  servicegroups clippercard-web-services
  contact_groups appdev-group
}


from templates-linux.cfg:
define service {
  name                  pvtl-jboss
  use                   generic-service
  service_description   JBoss/PVTL
  check_command         check_nrpe!check_jbosslog
  check_period          24x7
  normal_check_interval 5
  retry_check_interval  1
  contact_groups        pvtl-group
  notification_options  w,u,c,r
  notification_period   24x7
  icon_image            http1.png
  icon_image_alt        HTTP
}


from commands-erg.cfg:
define command{
  command_name  check_nrpe
  command_line  $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 90
}
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

I added another service check, this time an nrpe check of the standard "check_load" plugin that's part of the distribution on the remote host.

On the Core Server web page, I get "Connection refused by host", the same as before. Yet when I issue the check_nrpe command interactively from the Server, I immediately get the correct response.

At least that eliminates my script on the remote host.

Thanks....Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

Well, that certainly seems right. Here are two things you can try next:
1) When you're running the check_nrpe interactively, are you doing it as the 'nagios' user, not root? (I haven't seen this make a difference with check_nrpe before, but have with other plugins.)
2) If you watch the output of 'top -c', when you see your checks being executed do they match what you were running?
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: nrpe "Connection refused by host"

Post by mguthrie »

Just out of curiousity, do you get the same result from the command line when you manually run the check with the timeout flag?

check_nrpe -H <address> -c check_jbosslog -t 90
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

I noticed that when I run the check manually from the Server, /var/log/messages on the remote host clearly shows the event (including debug info).

But when Core on the Server schedules the check, the remote host's log file is silent. Yet the Core webpage says "Connection refused by host".

Any help is appreciated. I'm running out of things to try. Thanks...Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

That sounds suspiciously like it's pointing at the wrong machine. Since I just spent the last few minutes being frustrated by a similar problem, I'll ask, are you sure you have the right IP address configured for the host in question in Nagios?
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
lyle
Posts: 158
Joined: Sun Nov 21, 2010 3:05 am

Re: nrpe "Connection refused by host"

Post by lyle »

Thanks for the suggestions, guys.

I tried the "-t 90" switch on the interactive command and it made no difference. But I'm glad to try *anything* at this point.

I ran the check interactively from root, the nagios user, and a personal account. All ran quickly and correctly.

I never noticed the "-c" switch on top, but I set up a loop to capture top output to a file. Haven't seen the interactive check captured to the file yet, and haven't tried capturing a scheduled check. I'll keep playing with it, but something tells me the check happens so quickly it will slip through the cracks of top, and it's the scheduled check that I really want to see.

....Lyle
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: nrpe "Connection refused by host"

Post by tonyyarusso »

This may be an easier way than dumping top output:

Edit /usr/local/nagios/etc/nagios.cfg , scroll down to debug_level and set it to -1 (negative one). Restart nagios with 'service nagios restart'. Now, watch the debug output, which will be something like this:

Code: Select all

[root@localhost var]# tail -f /usr/local/nagios/var/nagios.debug | grep libexec
[1299077835.011568] [2048.1] [pid=29426]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -H 192.168.5.52 -t 30 -c check_disk -a '-w 20% -c 10% -p /home''
[1299077837.028125] [2048.1] [pid=29426]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -H 173.45.235.65 -t 30 -c check_mem -a '-w 15 -c 5''
[1299077839.046666] [2048.1] [pid=29426]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -H 192.168.5.52 -t 30 -c check_procs -a '-w 250 -c 350''
[1299077842.073864] [2048.1] [pid=29426]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -H 173.45.238.97 -t 30 -c check_init_service -a 'crond''
[1299077845.095156] [2048.1] [pid=29426]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -H 173.45.235.65 -t 30 -c check_load -a '-w 15,10,5 -c 30,20,10''
[1299077846.105028] [2048.1] [pid=29426]   Done.  Final output: '/usr/local/nagios/libexec/check_icmp -H 173.45.238.97 -w 3000.0,80% -c 5000.0,100% -p 5'
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
Locked