Page 1 of 3
Checks always falling behind
Posted: Thu Jul 18, 2013 1:16 pm
by grimm26
I'm having a problem with scheduled checks always falling behind. I've seen it as far as an hour behind. This is probably a result of the number of checks (~3K feeding another ~10K passive services), the fact that most of them are SNMP walks, and that I am using NDOutils to feed it all into MySQL. I found that NDO is hitting the ceiling of some kernel params for messaging so I cranked those up and those warnings seem to have stopped. I've disabled host checking since I only care about services. I restarted nagios and now things seem to be hovering around a couple minutes late - I can deal with that.
Now, I have a check_nagios running via cron to make sure that things are flowing at all, but is there a check that I can do to check how far behind the scheduling queue is running?
Re: Checks always falling behind
Posted: Thu Jul 18, 2013 6:10 pm
by abrist
You will probably want to just pull information from the scheduliong queue cgi and grab the topmost table entry for next check time:
Code: Select all
http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7
I came up with this one liner to get the time of the next check:
Code: Select all
curl -s -u nagiosadmin:<password> http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7 | grep -m 2 "<TR CLASS=" | tail -n1 | awk 'BEGIN { FS = "<TD CLASS=\047queueOdd\047>|<TD CLASS=\047queueEven\047>" } ; { print $4 }' | sed 's/<.*//'
Obviously, replace <password> and <nagios server ip> with their actual values for your environment. At this point you can compare the date reported to the current date of the nagios system and report it through a plugin script right to the XI interface:
Code: Select all
#!/bin/bash
# Get time/date from topmost entry in the schedule queue for the next check. Returns 'CCYY-MM-DD hh:mm:ss'.
NEXT=$(curl -s -u nagiosadmin:<password> http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7 | grep -m 2 "<TR CLASS=" | tail -n1 | awk 'BEGIN { FS = "<TD CLASS=\047queueOdd\047>|<TD CLASS=\047queueEven\047>" } ; { print $4 }' | sed 's/<.*//'| awk 'BEGIN { FS = " |-"};{ print $3,$1,$2,$4 }' | sed 's/ /-/g' | sed 's/-/ /g3')
# Converts date time above to unix time.
NEXTUT=$(date -d "$NEXT" +%s)
# Get current unix time
CURRENT=$(date +%s)
# Subtract current time from next check time
OFFSET=$(($NEXTUT - $CURRENT))
# Echo offset string for nagios status data.
echo "The scheduler is currently Offset by $OFFSET seconds | offset=$OFFSET"
# Exit with 0 so that Nagios shows 'OK'
exit 0
That was fun.
Re: Checks always falling behind
Posted: Mon Jul 22, 2013 10:05 am
by grimm26
I'll try that. However, even though I have disabled host checks I still see them in the scheduling queue. Does it still queue them and only check if they are enabled when it tries to run the check?
Re: Checks always falling behind
Posted: Tue Jul 23, 2013 11:08 am
by lmiltchev
If you deactivated the check via the CCM, it should get removed from the queue.
Re: Checks always falling behind
Posted: Tue Jul 23, 2013 1:28 pm
by grimm26
lmiltchev wrote:If you deactivated the check via the CCM, it should get removed from the queue.
I disabled host checks via the web UI and also set execute_host_checks to 0 in nagios.cfg. Even after a restart, host checks still show in the queue.
Re: Checks always falling behind
Posted: Tue Jul 23, 2013 3:37 pm
by abrist
You may need to flush retention.dat:
Code: Select all
service nagios stop
rm /usr/local/nagios/var/retention.dat
service nagios start
Re: Checks always falling behind
Posted: Tue Jul 23, 2013 4:15 pm
by grimm26
I mean new host checks are in the queue constantly. Not old ones that would be retained. It is still actively scheduling host checks.
Re: Checks always falling behind
Posted: Tue Jul 23, 2013 4:40 pm
by abrist
Is there a chance you have multiple nagios parent processes running?
Code: Select all
service nagios stop
ps -aef | grep nagios.cfg
killall nagios
service nagios start
Re: Checks always falling behind
Posted: Tue Jul 23, 2013 5:11 pm
by grimm26
Nope.
Re: Checks always falling behind
Posted: Wed Jul 24, 2013 4:02 pm
by abrist
Are the checks running as well as queuing, or just queuing?