Sudden Check_NRPE failures on monitored hosts

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Sudden Check_NRPE failures on monitored hosts

Post by agenerette »

'Understood. I was thinking that if it did turn out to be a Chef-related thing, then I would need to take it outside of the Nagios forum.

If, though, we confine our troubleshooting to the files that are in place, I'm hoping that we'll be able to find out what's going on, independent of "chef-client" being run against a particular host to update. Oh, and we're using github.com for version-control.

I also wanted to send this over (run from the monitored host):

root@ip-10-170-213-72:~# iptables -L | grep nrpe | grep 54.245
ACCEPT tcp -- ec2-54-245-0-0.us-west-2.compute.amazonaws.com/16 anywhere tcp dpt:nrpe ctstate NEW /* allow NRPE from NA */
ACCEPT tcp -- ec2-54-245-143-104.us-west-2.compute.amazonaws.com anywhere tcp dpt:nrpe ctstate NEW /* allow NRPE from nagios-host */

So, it really looks like the iptables info. should support NRPE communications between server and host.
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Sudden Check_NRPE failures on monitored hosts

Post by agenerette »

Yeah, this makes no sense. You'll notice that the server_port directive, in both nodes' nrpe.cfg files is set to 5666. Where you see 10.244.20.90 in the monitored host's nrpe.cfg file, that, again, is the Nagios server's private IP. 54.245.143.104 is the server's public IP. I just added the latter address to the host's nrpe.cfg file, restart the Nagios services on both, and ran check_nrpe, again.

Now, I'm seeing the following in /var/log/syslog on the monitored host:

Aug 18 18:37:31 ip-10-170-213-72 kernel: [23822048.653570] DROP_AFW_OUTPUT IN= OUT=eth0 SRC=10.170.213.72 DST=172.16.0.23 LEN=72 TOS=0x00 PREC=0x00 TTL=64 ID=54504 DF PROTO=UDP SPT=41068 DPT=53 LEN=52 UID=1009 GID=1010
Aug 18 18:37:31 ip-10-170-213-72 kernel: [23822048.653840] DROP_AFW_OUTPUT IN= OUT=eth0 SRC=10.170.213.72 DST=50.31.164.240 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=60846 DF PROTO=TCP SPT=34303 DPT=443 SEQ=951138630 ACK=0 WINDOW=14600 RES=0x00 SYN URGP=0 UID=1009 GID=1010
Aug 18 18:38:00 ip-10-170-213-72 kernel: [23822077.475928] DROP_AFW_OUTPUT IN= OUT=eth0 SRC=10.170.213.72 DST=172.16.0.23 LEN=72 TOS=0x00 PREC=0x00 TTL=64 ID=61710 DF PROTO=UDP SPT=60009 DPT=53 LEN=52 UID=1009 GID=1010
Aug 18 18:38:00 ip-10-170-213-72 kernel: [23822077.476386] DROP_AFW_OUTPUT IN= OUT=eth0 SRC=10.170.213.72 DST=172.16.0.23 LEN=99 TOS=0x00 PREC=0x00 TTL=64 ID=61710 DF PROTO=UDP SPT=53162 DPT=53 LEN=79 UID=1009 GID=1010
Aug 18 18:38:00 ip-10-170-213-72 kernel: [23822077.476648] DROP_AFW_OUTPUT IN= OUT=eth0 SRC=10.170.213.72 DST=172.16.0.23 LEN=72 TOS=0x00 PREC=0x00 TTL=64 ID=61710 DF PROTO=UDP SPT=43826 DPT=53 LEN=52 UID=1009 GID=1010
Aug 18 18:38:00 ip-10-170-213-72 kernel: [23822077.476930] DROP_AFW_OUTPUT IN= OUT=eth0 SRC=10.170.213.72 DST=50.31.164.240 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=43289 DF PROTO=TCP SPT=34304 DPT=443 SEQ=1599232023 ACK=0 WINDOW=14600 RES=0x00 SYN URGP=0 UID=1009 GID=1010
Aug 18 18:38:16 ip-10-170-213-72 nrpe[4252]: Error: Could not complete SSL handshake. 1

Beyond the question of why the server's public address isn't getting auto-added to the hosts' nrpe.cfg file, there's this port question. I know of know other way to tell the NRPE utilities which port to use other than using that server_port directive.

Do you happen to know anything about controlling Nagios/NRPE settings and, for that matter, iptables settings, via Chef? Especially with iptables, I'm thinking that something must have changed, but I'm not sure what to look at changing.

-Anthony
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Sudden Check_NRPE failures on monitored hosts

Post by agenerette »

It's looking like my reply to your last posting isn't showing up on the forum. Did you see it, by chance?
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Sudden Check_NRPE failures on monitored hosts

Post by eloyd »

I saw it. I just haven't had time to respond.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Sudden Check_NRPE failures on monitored hosts

Post by agenerette »

Ah, 'understood. I certainly know how busy things can get. The main thing that makes this something of a priority is that the alerts are generating hundreds of emails per day. This will make it difficult, of course, to see when/if a legitimate email comes in.

I do appreciate your taking the time to help me with the issue, though.

-Anthony
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Sudden Check_NRPE failures on monitored hosts

Post by eloyd »

I may have some time today to look into this further. I will let you know what my little brain comes up with.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Sudden Check_NRPE failures on monitored hosts

Post by agenerette »

'Sounds good. Thanks.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Sudden Check_NRPE failures on monitored hosts

Post by tmcdonald »

agenerette, did eloyd ever contact you via PM about this?
Former Nagios employee
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Sudden Check_NRPE failures on monitored hosts

Post by eloyd »

We discussed offline. I think things were resolved, if I remember correctly, but I've have a lot of NRPE discussions lately and they might be a little mixed up. :-)
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Sudden Check_NRPE failures on monitored hosts

Post by tmcdonald »

I'm going to assume this was resolved since we never heard back from OP. If not, please let me know and I will re-open this.
Former Nagios employee
Locked