Page 1 of 1
check_bind.sh plugin script logic
Posted: Thu Sep 24, 2015 2:08 pm
by linuser
I have noticed that the check_bind script, or more accurately, what gets designated as a successful or NXDOMAIN request, seems to flatten out and zero out over time if there are no changes, or the difference in the changes are always the same. For instance, I am using the plugin in a dev environment. I run a script that makes queries to the DNS server at certain intervals. Our Nagios server calls the script every 5 minutes, it runs, it dumps stats as expected. If I grep the named stats file for the values that the script looks for, I can see the info that gets reported back, such as :
Code: Select all
less named.stats.tmp | grep 'resulted in successful'
1 queries resulted in successful answer
785 queries resulted in successful answer
1 queries resulted in successful answer
772 queries resulted in successful answer
and:
Code: Select all
less named.stats.tmp | grep 'resulted in NXDOMAIN'
1 queries resulted in NXDOMAIN
331 queries resulted in NXDOMAIN
1 queries resulted in NXDOMAIN
319 queries resulted in NXDOMAIN
Looking at the script the difference in the 2 values is what's looked for. So we have a difference of 13 for successful answer and 12 for NXDOMAIN. The differences have remained 13 and 12 for at least the last hour in each dump, so I suspect this is why I have noticed that everything is zeroed out now. And here is the script logic:
Code: Select all
if [ "$succ_1st" == '' ]
then
success=0
else
success=`expr $succ_1st - $succ_2nd`
So my question is what does [ "$succ_1st" == '' ] mean because it seems to be why the script is returning the 0 values now. I was under the impression that even if the differences in the output is always the same, it should report the values. Is this not the case?
Re: check_bind.sh plugin script logic
Posted: Thu Sep 24, 2015 4:23 pm
by linuser
I restarted nrpe, named, and fired my script back up. So far so good. Values are being populated into Nagios every five minutes. I suspect the problem may have been at some point in time I was getting 4 lines in the file for each query, and the script was only expecting 2, and did not know how to handle 4, so it went into success = 0 mode?? Just a hunch.
Ever since I started it back up I have only seen 2 occurrences of each, instead of 4 - like below:
Code: Select all
less named.stats.tmp | grep 'resulted in successful'
748 queries resulted in successful answer
714 queries resulted in successful answer
less named.stats.tmp | grep 'resulted in NXDOMAIN'
727 queries resulted in NXDOMAIN
694 queries resulted in NXDOMAIN
Again this is just a guess and if anyone has any other knowledge on this or whether the script should have been able to handle 4 lines instead of 2 and NOT go back to spitting out zeros please let me know.
Re: check_bind.sh plugin script logic
Posted: Thu Sep 24, 2015 5:06 pm
by Box293
linuser wrote:Code: Select all
if [ "$succ_1st" == '' ]
then
success=0
else
success=`expr $succ_1st - $succ_2nd`
So my question is what does [ "$succ_1st" == '' ] mean because it seems to be why the script is returning the 0 values now. I was under the impression that even if the differences in the output is always the same, it should report the values. Is this not the case?
This is smart/lazy programming. It's designed to prevent negative numbers or invalid results if $succ_1st was null.
Not sure why you were getting four results.
Re: check_bind.sh plugin script logic
Posted: Thu Sep 24, 2015 6:54 pm
by linuser
So looking at that code would you say having 4 results instead of 2 would cause the script to break and start zeroing out values?
Re: check_bind.sh plugin script logic
Posted: Fri Sep 25, 2015 8:46 am
by linuser
It does appear to be that when 4 lines occur in named.stats, it throws the script off. The 2 extra lines seem to be queries against itself, or a local service installed that depends on "localhost.localdomain"
Code: Select all
less /tmp/named.stats.tmp | grep 'resulted in NXDOMAIN'
4 queries resulted in NXDOMAIN
1305 queries resulted in NXDOMAIN
4 queries resulted in NXDOMAIN
1272 queries resulted in NXDOMAIN
And this output is from named.stats:
Code: Select all
[localhost.localdomain]
[155.168.192.in-addr.arpa]
4 queries resulted in NXDOMAIN
4 queries resulted in nxrrset
12 queries resulted in authoritative answer
4 queries resulted in successful answer
Thing is, I am not sure of the best way to fix this. I suppose I could modify the lines of code in the check_bind.sh script to exclude those lines of code somehow. Anyone got any ideas?
Re: check_bind.sh plugin script logic
Posted: Fri Sep 25, 2015 12:29 pm
by Box293
Unfortunately I'm really stretched for time as the conference is next week and following that I'm on holidays for another 2 weeks. After that I will have the ability to look deeper into the plugin otherwise someone else may be able to help.
If you don't have it solved in three weeks time let me know and I'll help out.
Re: check_bind.sh plugin script logic
Posted: Fri Sep 25, 2015 2:50 pm
by linuser
Thanks - I edited the lines in the script using sed to omit anything it sees between 2 patterns where the extra lines are being inserted. I'll keep an eye on it and report back.
Code: Select all
succ_1st=`grep 'resulted in successful answer' $path_tmp/named.stats.tmp | sed '/localhost.localdomain/,/testdns.net/{//!d}' | awk '{ print $1 }' | grep -m1 ''`
Re: check_bind.sh plugin script logic
Posted: Mon Sep 28, 2015 8:04 am
by linuser
Actually I had to reverse the grep and sed. "Sed out" what I did not want then pipe the rest to grep. Here is what the final command looks like.
Code: Select all
succ_1st=`sed '/localhost.localdomain/,/testdns.net/{//!d}' $path_tmp/named.stats.tmp | grep 'resulted in successful answer' | awk '{ print $1 }' | grep -m1 ''`
Even though this allows performance data to once again populate, it brings up another important concern. I will post a new thread on that. This one can be closed since it is resolved.