Nagios raises alerts when I reboot a DNS server

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
hgp-it
Posts: 3
Joined: Sat Mar 22, 2014 5:17 pm

Nagios raises alerts when I reboot a DNS server

Post by hgp-it »

Hi all,

We are using Nagios 3.4.4 (part of Nagios FAN on CentOS5). We have an issue where, even though we have three DNS servers on our network, if we reboot the primary DNS server, Nagios throws a wobbly and raises critical alerts for many hosts.

Our Nagios server's /etc/resolv.conf is as follows:

"
; generated by /sbin/dhclient-script
search hayley-group.local
nameserver 10.11.1.217
nameserver 10.11.200.3
nameserver 10.11.200.2
"

The server gets it's IP address and DNS info via DHCP.

Can anyone please explain why Nagios doesn't fall back to the secondary and tertiary DNS servers when the first one fails to resolve? Is there a DNS server list specific to Nagios which I need to update.

Thank you!

Elliot
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Nagios raises alerts when I reboot a DNS server

Post by eloyd »

Nagios does not have a specific DNS server list. My question to you is, does anything else go wobbly? It sounds like your DNS server might be partially responding while it's being restarted, so the Nagios server never tries the second one.

How about this: Turn off DHCP-created /etc/resolv.conf and use your own. Then use your DNS servers in reverse order, so you're not always using the one that gets rebooted.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
hgp-it
Posts: 3
Joined: Sat Mar 22, 2014 5:17 pm

Re: Nagios raises alerts when I reboot a DNS server

Post by hgp-it »

Hi eloyd,

Thank you for your reply.

The last time this happened was when the DNS server received Windows updates and required a reboot - this was the most local DNS server to our Nagios box, and was naturally its primary DNS server. So I guess it could have appeared "wobbly" as services were going down due to the restart. I just expected DNS to be a bit more hardy than this - it either responded with a straight answer, or the service was unavailable: "No real reply from DNS? Try the next."

An example of one of the alerts we get during the DNS server's reboot is:
"
NOTIFICATION TYPE: PROBLEM
HOSTNAME: rp2
HOST ALIAS: NGD
HOST ADDRESS: rp2.hayley-group.local
HOST GROUP NAMES: _Windows_Servers,NGD_Data_Centre

STATE: DOWN
INFORMATION: (Host Check Timed Out)
DATE / TIME: 25-06-2014 16:28:47
"

So, are you suggesting that if we reboot Nagios' primary DNS server, we edit resolv.conf first and re-order the nameservers? We can handle this, it just seems a shame that DNS isn't handled more robustly. Surely other people have encountered this issue - or are they just running DNS on linux boxes which they never reboot? :D

Cheers.

Ell
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Nagios raises alerts when I reboot a DNS server

Post by eloyd »

Actually, from your last post, it sounds as though you are getting the kind of alerts I would expect. You should see a host down alert and a service down alert if the box is being rebooted. If you're seeing more than that, then maybe something else is going on.

And Nagios boxes don't generally run DNS service as well, so they have to use a DNS server somewhere. Unix DNS servers may stay up longer than Windows servers, but they do reboot from time to time. I'd be curious if you're saying that while the primary is rebooting, NO DNS resolution can occur on the Nagios box. If that's true, then the underlying operating system has a problem, not Nagios.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Nagios raises alerts when I reboot a DNS server

Post by Box293 »

You could configure your Nagios XI server to use the RES_ROTATE option to round-robin the DNS servers.

From the man page (man 5 resolv.conf):
rotate sets RES_ROTATE in _res.options, which causes round-robin selection of nameservers from among those listed. This has the effect of spreading the query load among all listed servers, rather than having all clients try the first listed server first every time.
All this means is that when your 1st dns server is rebooted, there might be a chance that the Nagios XI server is currently not using it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
hgp-it
Posts: 3
Joined: Sat Mar 22, 2014 5:17 pm

Re: Nagios raises alerts when I reboot a DNS server

Post by hgp-it »

Hi guys,

You are on track, eloyd, with your last sentence.
while the primary is rebooting, NO DNS resolution can occur on the Nagios box
That's exactly what's happening from my point-of-view.

We have hundreds of Windows computers on the same network, which will fallback to our secondary or tertiary DNS servers fine, if our primary is down when doing basic nslookups or whatever. But it's as though the Nagios system doesn't know about the fallback DNS servers.

I need to do some more testing I think, to see if I can nail it down to something more specific happening with the operating system (CentOS5).

Thanks all for your input thus far.

Elliot
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Nagios raises alerts when I reboot a DNS server

Post by Box293 »

hgp-it wrote:We have hundreds of Windows computers on the same network, which will fallback to our secondary or tertiary DNS servers fine
It's a common misconception that Windows computers have primary, secondard and tertiary DNS servers in their TCP/IP settings.

The Windows TCP/IP stack round-robins the DNS servers in it's list every 15 minutes (by default). This is why in the DNS server settings they are labelled "Preferred DNS Server" and "Alternate DNS Server".

The following article no longer seems to be available online but here is a copy from my records:
http://blogs.technet.com/b/ajayr/archiv ... rnate.aspx

- Under What conditions is the Alternate DNS Server used by a Client Machine?
- Why is the Client sending DNS Queries to the Alternate DNS Server, when the Preferred is working?
There is a fair bit of confusion around the purpose of the Alternate DNS Server.
This Post should hopefully put these questions to bed.
The general assumption is that, the Windows DNS Client on all counts, will send a DNS query to the Preferred DNS first. If this query fails, then it will query the Alternate DNS Server, and so on and so forth.
The above statement is true, however there is a twist.

The Windows DNS Client will reset the DNS Server Priority at periodic intervals. By default, the server priorities are reset every 15 minutes.
Let's look at an example:
I have a DNS Client configured as follows:
Preferred DNS: 192.168.0.1
Alternate DNS: 10.10.0.1
The DNS Client will start by sending queries to 192.168.0.1. After 15 minutes it will switch priority to 10.10.0.1. Thus all queries will first be sent to 10.10.0.1 for a period of 15 minutes before switching back to 192.168.0.1

There is another condition that triggers a Priority Switch.
If say the Preferred DNS timed out on a DNS query, the DNS Client will send that DNS Query to the Alternate DNS.
If the Alternate DNS resolves the Query, the Priority will now switch to the Alternate DNS, until either it times out on a Query or the Priority Time Limit expires.
It is a common practice to configure the Preferred DNS Server with the IP of a Local Site DNS Server and the Alternate DNS Server with that of a Remote Site. The problem arises when Firewall/Network folk raise complaints that Clients are sending DNS Traffic to Remote DNS Servers. Well, that is because they have been configured to do so.
The solution or rather workaround to this is a Registry Change described in: http://support.microsoft.com/kb/320760.
The Key will Change the Server Priority Time Limit to 0, which means it never has a chance to reset Priorities.

- Ajay Rodrigues
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Nagios raises alerts when I reboot a DNS server

Post by eloyd »

@Box,

While I agree that "optiosn rotate" would make it proportionally less likely that this will occur, I'm thinking that "options timeout:2 attempts:1" might be more useful. This means one attempt is made with a 2 second timeout (hgp-it, you may need to adjust upwards to find a number that works) before moving on to the next one. This way, the primary is always used unless it fails. "options rotate" always round robins, so if any of them are down, you have a 1-in-X chance of a failure.

The other option is to figure out why you need to reboot your Windows DNS server so frequently that it causes you to post to a Nagios forum for help. :lol:
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios raises alerts when I reboot a DNS server

Post by tmcdonald »

hgp-it, any updates? If it is working I would like to close the thread.
Former Nagios employee
Locked