Weird DNS issue, NRPE

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Weird DNS issue, NRPE

Post by rkymtnhigh »

Today I changed DNS servers on our CentOS nagios install and all of my checks are still communicating except for one.
It uses check_cpu_stats via NRPE.
The check actually succeeds when I use the IP address of the host, but fails when I use the hostname (or FQDN).
So, DNS right? The weird thing is an nslookup resolves the host correctly, using the new name server.
I thought it was a cache issue, but the correct IP seems to be showing. Also, other NRPE checks on the same host (using DNS name) succeed.
I'm puzzled! Any input or thoughts are greatly appreciated!

RMH
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Weird DNS issue, NRPE

Post by jdalrymple »

My (seemingly invaluable) notes:
xinetd:

- At xinetd start time it does a forward lookup to get the address for "only_from" - this doesn't seem to matter though
- Every time a request comes in there is a reverse lookup. If the proper name isn't returned in the reverse lookup the connection fails with "CHECK_NRPE: Error - Could not complete SSL handshake."

nrpe -d:

- Every time a check_nrpe request comes in a forward lookup is done, if the IP matches it works, if the record doesn't match check_nrpe fails with "CHECK_NRPE: Error - Could not complete SSL handshake."

So both seem resilient to dynamic DNS. With xinetd you'll have to have a good functioning reverse lookup zone (for it to work at all), and with nrpe -d you'll need a quick to update forward lookup zone.
My guess is you're using xinetd, and you didn't update the PTR?

Just a guess. The above described behavior is 100% accurate though.
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Weird DNS issue, NRPE

Post by rkymtnhigh »

Thanks for your help. The error I get is CHECK_NRPE: Socket timeout after 10 seconds.

Where does the PTR record need to be created?

Thank you,

RMH
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Weird DNS issue, NRPE

Post by jdalrymple »

The fact that you're getting a different error is a bit of a tell-tale that's not your problem, but your PTR should be the same nonetheless. The fact that you've already verified it works by IP though would indicate otherwise.
rkymtnhigh wrote:Where does the PTR record need to be created?
PTR records are created in your DNS infrastructure. They're also known as reverse records.

https://en.wikipedia.org/wiki/List_of_DNS_record_types
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Weird DNS issue, NRPE

Post by rkymtnhigh »

Ok yeah so the PTR record exists.

Is there anywhere else I might look for a potential problem?

Thank you
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Weird DNS issue, NRPE

Post by jdalrymple »

Did you perform a `dig -x` from the monitored host to verify that it's reverse resolving the IP properly? Clearly something isn't quite right.

I suspect you want desperately to think it's NRPE caching or some weird thing, but I promise the above specified is the behavior of the daemon with regards to name resolution in the only_from or the allowed_hosts fields. It does perform the lookup each time, you can verify this for yourself with tcpdump.
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Weird DNS issue, NRPE

Post by rkymtnhigh »

dig -x returns the correct FQDN from the correct DNS server.

I have other services for the same host that are resolving correctly.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Weird DNS issue, NRPE

Post by jdalrymple »

Let's back up a few steps, I possibly am looking at this from the wrong angle.

Where did the DNS server change take place, the Nagios host or the monitored host? I assumed the monitored host but am now feeling like I had it backwards.

If that is indeed the case please post the output of:

1) check command succeeding using IP
2) check command failing using hostname
3) cat /etc/nsswitch.conf
4) cat /etc/hosts
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Weird DNS issue, NRPE

Post by rkymtnhigh »

Thank you for your help, I changed DNS servers on the Nagios host.

1) check command succeeding using IP

Code: Select all

./check_nrpe -H 192.168.XXX.XXX -u -c check_cpu_stats -a '-w 85 -c 95'
CPU STATISTICS OK: user=41.93% system=4.31% iowait=0.20% idle=53.56% | user=41.93% system=4.31% iowait=0.20%;85;95 idle=53.56%
2) check command failing using hostname

Code: Select all

./check_nrpe -H host.domain.com -u -c check_cpu_stats -a '-w 85 -c 95'
CHECK_NRPE: Socket timeout after 10 seconds.
3) cat /etc/nsswitch.conf

Code: Select all

#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Valid entries include:
#
#       nisplus                 Use NIS+ (NIS version 3)
#       nis                     Use NIS (NIS version 2), also called YP
#       dns                     Use DNS (Domain Name Service)
#       files                   Use the local files
#       db                      Use the local database (.db) files
#       compat                  Use NIS on compat mode
#       hesiod                  Use Hesiod for user lookups
#       [NOTFOUND=return]       Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd:    db files nisplus nis
#shadow:    db files nisplus nis
#group:     db files nisplus nis

passwd:     files
shadow:     files
group:      files

#hosts:     db files nisplus nis dns
hosts:      files dns

# Example - obey only what nisplus tells us...
#services:   nisplus [NOTFOUND=return] files
#networks:   nisplus [NOTFOUND=return] files
#protocols:  nisplus [NOTFOUND=return] files
#rpc:        nisplus [NOTFOUND=return] files
#ethers:     nisplus [NOTFOUND=return] files
#netmasks:   nisplus [NOTFOUND=return] files

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files

netgroup:   nisplus

publickey:  nisplus

automount:  files nisplus
aliases:    files nisplus
4) cat /etc/hosts

Code: Select all

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Weird DNS issue, NRPE

Post by jdalrymple »

Very weird.

What do you see if you `tcpdump port 5666` while running 1 & 2?
Locked