Page 1 of 3
Weird DNS issue, NRPE
Posted: Mon Aug 24, 2015 2:28 pm
by rkymtnhigh
Today I changed DNS servers on our CentOS nagios install and all of my checks are still communicating except for one.
It uses check_cpu_stats via NRPE.
The check actually succeeds when I use the IP address of the host, but fails when I use the hostname (or FQDN).
So, DNS right? The weird thing is an nslookup resolves the host correctly, using the new name server.
I thought it was a cache issue, but the correct IP seems to be showing. Also, other NRPE checks on the same host (using DNS name) succeed.
I'm puzzled! Any input or thoughts are greatly appreciated!
RMH
Re: Weird DNS issue, NRPE
Posted: Mon Aug 24, 2015 2:49 pm
by jdalrymple
My (seemingly invaluable) notes:
xinetd:
- At xinetd start time it does a forward lookup to get the address for "only_from" - this doesn't seem to matter though
- Every time a request comes in there is a reverse lookup. If the proper name isn't returned in the reverse lookup the connection fails with "CHECK_NRPE: Error - Could not complete SSL handshake."
nrpe -d:
- Every time a check_nrpe request comes in a forward lookup is done, if the IP matches it works, if the record doesn't match check_nrpe fails with "CHECK_NRPE: Error - Could not complete SSL handshake."
So both seem resilient to dynamic DNS. With xinetd you'll have to have a good functioning reverse lookup zone (for it to work at all), and with nrpe -d you'll need a quick to update forward lookup zone.
My guess is you're using xinetd, and you didn't update the PTR?
Just a guess. The above described behavior is 100% accurate though.
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 10:51 am
by rkymtnhigh
Thanks for your help. The error I get is CHECK_NRPE: Socket timeout after 10 seconds.
Where does the PTR record need to be created?
Thank you,
RMH
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 11:55 am
by jdalrymple
The fact that you're getting a different error is a bit of a tell-tale that's not your problem, but your PTR should be the same nonetheless. The fact that you've already verified it works by IP though would indicate otherwise.
rkymtnhigh wrote:Where does the PTR record need to be created?
PTR records are created in your DNS infrastructure. They're also known as reverse records.
https://en.wikipedia.org/wiki/List_of_DNS_record_types
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 12:31 pm
by rkymtnhigh
Ok yeah so the PTR record exists.
Is there anywhere else I might look for a potential problem?
Thank you
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 12:41 pm
by jdalrymple
Did you perform a `dig -x` from the monitored host to verify that it's reverse resolving the IP properly? Clearly something isn't quite right.
I suspect you want desperately to think it's NRPE caching or some weird thing, but I promise the above specified is the behavior of the daemon with regards to name resolution in the only_from or the allowed_hosts fields. It does perform the lookup each time, you can verify this for yourself with tcpdump.
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 3:43 pm
by rkymtnhigh
dig -x returns the correct FQDN from the correct DNS server.
I have other services for the same host that are resolving correctly.
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 3:55 pm
by jdalrymple
Let's back up a few steps, I possibly am looking at this from the wrong angle.
Where did the DNS server change take place, the Nagios host or the monitored host? I assumed the monitored host but am now feeling like I had it backwards.
If that is indeed the case please post the output of:
1) check command succeeding using IP
2) check command failing using hostname
3) cat /etc/nsswitch.conf
4) cat /etc/hosts
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 4:18 pm
by rkymtnhigh
Thank you for your help, I changed DNS servers on the Nagios host.
1) check command succeeding using IP
Code: Select all
./check_nrpe -H 192.168.XXX.XXX -u -c check_cpu_stats -a '-w 85 -c 95'
CPU STATISTICS OK: user=41.93% system=4.31% iowait=0.20% idle=53.56% | user=41.93% system=4.31% iowait=0.20%;85;95 idle=53.56%
2) check command failing using hostname
Code: Select all
./check_nrpe -H host.domain.com -u -c check_cpu_stats -a '-w 85 -c 95'
CHECK_NRPE: Socket timeout after 10 seconds.
3) cat /etc/nsswitch.conf
Code: Select all
#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Valid entries include:
#
# nisplus Use NIS+ (NIS version 3)
# nis Use NIS (NIS version 2), also called YP
# dns Use DNS (Domain Name Service)
# files Use the local files
# db Use the local database (.db) files
# compat Use NIS on compat mode
# hesiod Use Hesiod for user lookups
# [NOTFOUND=return] Stop searching if not found so far
#
# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd: db files nisplus nis
#shadow: db files nisplus nis
#group: db files nisplus nis
passwd: files
shadow: files
group: files
#hosts: db files nisplus nis dns
hosts: files dns
# Example - obey only what nisplus tells us...
#services: nisplus [NOTFOUND=return] files
#networks: nisplus [NOTFOUND=return] files
#protocols: nisplus [NOTFOUND=return] files
#rpc: nisplus [NOTFOUND=return] files
#ethers: nisplus [NOTFOUND=return] files
#netmasks: nisplus [NOTFOUND=return] files
bootparams: nisplus [NOTFOUND=return] files
ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files
netgroup: nisplus
publickey: nisplus
automount: files nisplus
aliases: files nisplus
4) cat /etc/hosts
Code: Select all
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Re: Weird DNS issue, NRPE
Posted: Tue Aug 25, 2015 4:22 pm
by jdalrymple
Very weird.
What do you see if you `tcpdump port 5666` while running 1 & 2?