Checks always falling behind

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

abrist wrote:Are the checks running as well as queuing, or just queuing?
I don't think they are running, but some hosts say that a check has been run recently. Some are pending, but I think those are all hosts that do not have a service assigned to them.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Checks always falling behind

Post by abrist »

Are these actual host checks, or is there a chance they are a service check that is running a host-alive check or other icmp check?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

abrist wrote:You will probably want to just pull information from the scheduliong queue cgi and grab the topmost table entry for next check time:

Code: Select all

http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7
I came up with this one liner to get the time of the next check:

Code: Select all

curl -s -u nagiosadmin:<password> http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7 | grep -m 2 "<TR CLASS=" | tail -n1 | awk 'BEGIN { FS = "<TD CLASS=\047queueOdd\047>|<TD CLASS=\047queueEven\047>" } ; { print $4 }' | sed 's/<.*//'
Obviously, replace <password> and <nagios server ip> with their actual values for your environment. At this point you can compare the date reported to the current date of the nagios system and report it through a plugin script right to the XI interface:

Code: Select all

#!/bin/bash

# Get time/date from topmost entry in the schedule queue for the next check.  Returns 'CCYY-MM-DD hh:mm:ss'.  
NEXT=$(curl -s -u nagiosadmin:<password> http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7 | grep -m 2 "<TR CLASS=" | tail -n1 | awk 'BEGIN { FS = "<TD CLASS=\047queueOdd\047>|<TD CLASS=\047queueEven\047>" } ; { print $4 }' | sed 's/<.*//'| awk 'BEGIN { FS = " |-"};{ print $3,$1,$2,$4 }' | sed 's/ /-/g' | sed 's/-/ /g3')

# Converts date time above to unix time.
NEXTUT=$(date -d "$NEXT" +%s)

# Get current unix time
CURRENT=$(date +%s)

# Subtract current time from next check time
OFFSET=$(($NEXTUT - $CURRENT))

# Echo offset string for nagios status data.
echo "The scheduler is currently Offset by $OFFSET seconds | offset=$OFFSET"

# Exit with 0 so that Nagios shows 'OK'  
exit 0
That was fun.
Thanks, that's a nice start. I'm interested in $5 of the first awk, though (the Next Check column). Also had to undo your manipulation of the date/time since I already have Nagios set for iso8601. So I become concerned when your script starts showing a negative offset :). I'm putting this in now. I had this issue crop up again overnight and I think it correlates with when I did a nagios reload instead of a restart. I'll avoid reloads from now on and see if that helps. If only I could get ndomod to load config dat asynchronously at startup....
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

abrist wrote:Are the checks running as well as queuing, or just queuing?
Upon further review today, the host checks are definitely running. execute_host_checks is set to 0 in nagios.cfg but I am using a default template for my hosts which has checks enabled. That shouldn't matter if the master setting in nagios.cfg is set to 0, right?
I'm running Nagios 3.5.0 on RHEL6.4.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Checks always falling behind

Post by scottwilkerson »

grimm26 wrote:
abrist wrote:Are the checks running as well as queuing, or just queuing?
Upon further review today, the host checks are definitely running. execute_host_checks is set to 0 in nagios.cfg but I am using a default template for my hosts which has checks enabled. That shouldn't matter if the master setting in nagios.cfg is set to 0, right?
I'm running Nagios 3.5.0 on RHEL6.4.
Well, there is also another layer that takes precedent. If a command was submitted via the web UI or command pipe it will override the setting in the nagios.cfg

The only way to know for sure would be to look in the objects.cached
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

scottwilkerson wrote:
grimm26 wrote:
abrist wrote:Are the checks running as well as queuing, or just queuing?
Upon further review today, the host checks are definitely running. execute_host_checks is set to 0 in nagios.cfg but I am using a default template for my hosts which has checks enabled. That shouldn't matter if the master setting in nagios.cfg is set to 0, right?
I'm running Nagios 3.5.0 on RHEL6.4.
Well, there is also another layer that takes precedent. If a command was submitted via the web UI or command pipe it will override the setting in the nagios.cfg

The only way to know for sure would be to look in the objects.cached
Nope. I'm the only one using the UI or the CLI. Nagios is showing that host checks are disabled, but they are still queueing up and running.
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

Got a chance to restart Nagios this morning and I added

Code: Select all

active_checks_enabled  0
to the generic-host template. Only after that are host checks not being scheduled and executed.
Bottom line, it seems like

Code: Select all

execute_host_checks=0
in nagios.cfg doesn't do anything.
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

I opened issue 469
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Checks always falling behind

Post by abrist »

Great. Thanks for the sleuthing. This directive should either be fixed or removed, from at least the documentation.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
grimm26
Posts: 36
Joined: Wed Dec 12, 2012 1:57 pm

Re: Checks always falling behind

Post by grimm26 »

anyway, service checks are still falling behind on this machine :). Nagiostats tells me:

Code: Select all

Nagios Stats 3.5.0
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 03-15-2013
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /var/log/nagios/status.dat
Status File Age:                        0d 0h 0m 3s
Status File Version:                    3.5.0

Program Running Time:                   0d 14h 39m 24s
Nagios PID:                             3360
Used/High/Total Command Buffers:        0 / 1927 / 8192

Total Services:                         24096
Services Checked:                       24095
Services Scheduled:                     6911
Services Actively Checked:              6912
Services Passively Checked:             17184
Total Service State Change:             0.000 / 36.780 / 0.143 %
Active Service Latency:                 0.000 / 550.965 / 528.919 sec
Active Service Execution Time:          0.000 / 190.632 / 1.134 sec
Active Service State Change:            0.000 / 36.780 / 0.353 %
Active Services Last 1/5/15/60 min:     531 / 2646 / 6911 / 6911
Passive Service Latency:                0.069 / 5.159 / 2.946 sec
Passive Service State Change:           0.000 / 11.320 / 0.058 %
Passive Services Last 1/5/15/60 min:    673 / 5122 / 17184 / 17184
Services Ok/Warn/Unk/Crit:              24010 / 4 / 71 / 11
Services Flapping:                      0
Services In Downtime:                   0
What is the difference between service latency and service execution time and why is there is such a big difference between the two. My service checks are all 5 minute intervals and the max execution time fits within that. Why is the latency so high then?
[edit] oh duh cuz it doesn't fork 6911 checks at once. I'm working on getting the execution time down but I may just have to split out into multiple instances.
Locked