Page 1 of 1

Address resolution issue

Posted: Tue Sep 25, 2018 10:18 am
by as300182
I've got Nagios core 4.2.2 running on RedHat 5 without any issues.

I've built a new box with Nagios core 4.4.2 running on RedHat 7. The new Nagios instance seems to have some issues with resolving addresses with check_http. It looks like the Nagios worker threads can't resolve an address for a host that has multiple A records. I see entries like this in the logs:

Code: Select all

[1537886820] Warning: Check of host 'openplatform-int' timed out after 30.01 seconds
[1537886820] wproc: Core Worker 3858: job 2632 (pid=20581): Dormant child reaped
[1537886871] wproc: Core Worker 3858: job 2724 (pid=21160) timed out. Killing it
[1537886871] wproc: CHECK job 2724 from worker Core Worker 3858 timed out after 30.00s
[1537886871] wproc:   host=openplatform-int; service=(null);
[1537886871] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
An nslookup on both servers reveals the same response:

Code: Select all

nslookup openplatform-int.com
Server:         10.1.2.23
Address:        10.1.2.23#53

openplatform-int.com        canonical name = internal-opt-prod.lb.anypointdns.net.
Name:   internal-opt-prod.lb.anypointdns.net
Address: 10.251.60.138
Name:   internal-opt-prod.lb.anypointdns.net
Address: 10.251.60.253
The server running ver 4.2.2 doesn't have a problem with this at all, but the one running 4.4.2 does. The former is using check_http v2.1.1, the latter is using check_http v2.2.1 so you might think this is a plugin issue. However, running check_http from the command line returns a correct result for the same host, so it's looking like Nagios core is the bad boy here.

Has anyone else experienced this, or have any suggestions as to the cause and how to fix please?

Re: Address resolution issue

Posted: Wed Sep 26, 2018 10:36 am
by cdienger
Can you provide the config file for the check in question? I would also be curious to see a packet capture as well:

tcpdump -s 0 -i any port 53 or host 10.251.60.138 or host 10.251.60.253 -w output.pcap

Let it run just long enough to let the check run on it's own a couple times and PM the output.pcap

Do you have other configs that use check_http that work? If so, do they only have a single A record?

Re: Address resolution issue

Posted: Tue Oct 23, 2018 8:16 am
by as300182
The service command is a straightforward check_http command.

Code: Select all

define service{
	use                             snow-service,nagiosgraph
	host_name                       openplatform-int
	service_description             Web Server Availability
    contact_groups                  +openapi-platform
	check_command                   check_http! -f follow -H openplatform-int.com -u "https://openplatform-int.com/" -e "HTTP/1.1 301,HTTP/1.1 403 Forbidden"
}
tcpdump isn't installed and I couldn't get it installed for some reason but with some help I managed to get wireshark to collect some data. For what it's worth, here it is.

Code: Select all

89	6.961933	10.58.26.85	10.58.96.15	DNS	86	Standard query 0xe32e A openplatform-int.com
90	6.961949	10.58.26.85	10.58.96.15	DNS	86	Standard query 0xf69a AAAA openplatform-int.com
91	6.962139	10.58.96.15	10.58.26.85	DNS	222	Standard query response 0xf69a AAAA openplatform-int.com CNAME internal-arm-opt-prod.lb.anypointdns.net SOA ns-1268.awsdns-30.org
92	6.966955	10.58.96.15	10.58.26.85	DNS	553	Standard query response 0xe32e A openplatform-int.com CNAME internal-arm-opt-prod.lb.anypointdns.net A 10.251.60.253 A 10.251.60.138 NS h.gtld-servers.net NS b.gtld-servers.net NS c.gtld-servers.net NS f.gtld-servers.net NS d.gtld-servers.net NS m.gtld-servers.net NS k.gtld-servers.net NS l.gtld-servers.net NS g.gtld-servers.net NS a.gtld-servers.net NS j.gtld-servers.net NS e.gtld-servers.net NS i.gtld-servers.net A 192.41.162.30 A 192.12.94.30 A 192.48.79.30 A 192.54.112.30 A 192.52.178.30 A 192.33.14.30 A 192.26.92.30 A 192.35.51.30 A 192.43.172.30 A 192.31.80.30
93	6.967129	10.58.26.85	10.58.96.15	DNS	86	Standard query 0x9a4d AAAA openplatform-int.com
94	6.967314	10.58.96.15	10.58.26.85	DNS	222	Standard query response 0x9a4d AAAA openplatform-int.com CNAME internal-arm-opt-prod.lb.anypointdns.net SOA ns-1268.awsdns-30.org
95	6.969508	10.58.26.85	10.58.96.15	DNS	86	Standard query 0x1e61 A openplatform-int.com
96	6.969707	10.58.96.15	10.58.26.85	DNS	553	Standard query response 0x1e61 A openplatform-int.com CNAME internal-arm-opt-prod.lb.anypointdns.net A 10.251.60.253 A 10.251.60.138 NS j.gtld-servers.net NS e.gtld-servers.net NS c.gtld-servers.net NS h.gtld-servers.net NS m.gtld-servers.net NS a.gtld-servers.net NS g.gtld-servers.net NS l.gtld-servers.net NS k.gtld-servers.net NS d.gtld-servers.net NS b.gtld-servers.net NS f.gtld-servers.net NS i.gtld-servers.net A 192.41.162.30 A 192.12.94.30 A 192.48.79.30 A 192.54.112.30 A 192.52.178.30 A 192.33.14.30 A 192.26.92.30 A 192.35.51.30 A 192.43.172.30 A 192.31.80.30
97	6.969896	10.58.26.85	10.251.60.253	ICMP	100	Echo (ping) request  id=0x0602, seq=1/256, ttl=64 (no response found!)
Does that mean anything to you?

Re: Address resolution issue

Posted: Wed Oct 24, 2018 10:54 am
by cdienger
Do you have access to the DNS server to verify the A records for this host? It looks like there could be a loop. A request is made:

89 6.961933 10.58.26.85 10.58.96.15 DNS 86 Standard query 0xe32e A openplatform-int.com

and the response contains an A record containing a hostname instead of an IP:

92 6.966955 10.58.96.15 10.58.26.85 DNS 553 Standard query response 0xe32e A openplatform-int.com ....

then .000174 seconds later is another request for the same host:

93 6.967129 10.58.26.85 10.58.96.15 DNS 86 Standard query 0x9a4d AAAA openplatform-int.com

and the A record containing the hostname is again returned.