I don't think they are running, but some hosts say that a check has been run recently. Some are pending, but I think those are all hosts that do not have a service assigned to them.abrist wrote:Are the checks running as well as queuing, or just queuing?
Checks always falling behind
Re: Checks always falling behind
Re: Checks always falling behind
Are these actual host checks, or is there a chance they are a service check that is running a host-alive check or other icmp check?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Checks always falling behind
Thanks, that's a nice start. I'm interested in $5 of the first awk, though (the Next Check column). Also had to undo your manipulation of the date/time since I already have Nagios set for iso8601. So I become concerned when your script starts showing a negative offsetabrist wrote:You will probably want to just pull information from the scheduliong queue cgi and grab the topmost table entry for next check time:I came up with this one liner to get the time of the next check:Code: Select all
http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7Obviously, replace <password> and <nagios server ip> with their actual values for your environment. At this point you can compare the date reported to the current date of the nagios system and report it through a plugin script right to the XI interface:Code: Select all
curl -s -u nagiosadmin:<password> http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7 | grep -m 2 "<TR CLASS=" | tail -n1 | awk 'BEGIN { FS = "<TD CLASS=\047queueOdd\047>|<TD CLASS=\047queueEven\047>" } ; { print $4 }' | sed 's/<.*//'That was fun.Code: Select all
#!/bin/bash # Get time/date from topmost entry in the schedule queue for the next check. Returns 'CCYY-MM-DD hh:mm:ss'. NEXT=$(curl -s -u nagiosadmin:<password> http://<nagios server ip>/nagios/cgi-bin/extinfo.cgi?type=7 | grep -m 2 "<TR CLASS=" | tail -n1 | awk 'BEGIN { FS = "<TD CLASS=\047queueOdd\047>|<TD CLASS=\047queueEven\047>" } ; { print $4 }' | sed 's/<.*//'| awk 'BEGIN { FS = " |-"};{ print $3,$1,$2,$4 }' | sed 's/ /-/g' | sed 's/-/ /g3') # Converts date time above to unix time. NEXTUT=$(date -d "$NEXT" +%s) # Get current unix time CURRENT=$(date +%s) # Subtract current time from next check time OFFSET=$(($NEXTUT - $CURRENT)) # Echo offset string for nagios status data. echo "The scheduler is currently Offset by $OFFSET seconds | offset=$OFFSET" # Exit with 0 so that Nagios shows 'OK' exit 0
Re: Checks always falling behind
Upon further review today, the host checks are definitely running. execute_host_checks is set to 0 in nagios.cfg but I am using a default template for my hosts which has checks enabled. That shouldn't matter if the master setting in nagios.cfg is set to 0, right?abrist wrote:Are the checks running as well as queuing, or just queuing?
I'm running Nagios 3.5.0 on RHEL6.4.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Checks always falling behind
Well, there is also another layer that takes precedent. If a command was submitted via the web UI or command pipe it will override the setting in the nagios.cfggrimm26 wrote:Upon further review today, the host checks are definitely running. execute_host_checks is set to 0 in nagios.cfg but I am using a default template for my hosts which has checks enabled. That shouldn't matter if the master setting in nagios.cfg is set to 0, right?abrist wrote:Are the checks running as well as queuing, or just queuing?
I'm running Nagios 3.5.0 on RHEL6.4.
The only way to know for sure would be to look in the objects.cached
Re: Checks always falling behind
Nope. I'm the only one using the UI or the CLI. Nagios is showing that host checks are disabled, but they are still queueing up and running.scottwilkerson wrote:Well, there is also another layer that takes precedent. If a command was submitted via the web UI or command pipe it will override the setting in the nagios.cfggrimm26 wrote:Upon further review today, the host checks are definitely running. execute_host_checks is set to 0 in nagios.cfg but I am using a default template for my hosts which has checks enabled. That shouldn't matter if the master setting in nagios.cfg is set to 0, right?abrist wrote:Are the checks running as well as queuing, or just queuing?
I'm running Nagios 3.5.0 on RHEL6.4.
The only way to know for sure would be to look in the objects.cached
Re: Checks always falling behind
Got a chance to restart Nagios this morning and I added to the generic-host template. Only after that are host checks not being scheduled and executed.
Bottom line, it seems likein nagios.cfg doesn't do anything.
Code: Select all
active_checks_enabled 0Bottom line, it seems like
Code: Select all
execute_host_checks=0Re: Checks always falling behind
Great. Thanks for the sleuthing. This directive should either be fixed or removed, from at least the documentation.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Checks always falling behind
anyway, service checks are still falling behind on this machine
. Nagiostats tells me:
What is the difference between service latency and service execution time and why is there is such a big difference between the two. My service checks are all 5 minute intervals and the max execution time fits within that. Why is the latency so high then?
[edit] oh duh cuz it doesn't fork 6911 checks at once. I'm working on getting the execution time down but I may just have to split out into multiple instances.
Code: Select all
Nagios Stats 3.5.0
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 03-15-2013
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /var/log/nagios/status.dat
Status File Age: 0d 0h 0m 3s
Status File Version: 3.5.0
Program Running Time: 0d 14h 39m 24s
Nagios PID: 3360
Used/High/Total Command Buffers: 0 / 1927 / 8192
Total Services: 24096
Services Checked: 24095
Services Scheduled: 6911
Services Actively Checked: 6912
Services Passively Checked: 17184
Total Service State Change: 0.000 / 36.780 / 0.143 %
Active Service Latency: 0.000 / 550.965 / 528.919 sec
Active Service Execution Time: 0.000 / 190.632 / 1.134 sec
Active Service State Change: 0.000 / 36.780 / 0.353 %
Active Services Last 1/5/15/60 min: 531 / 2646 / 6911 / 6911
Passive Service Latency: 0.069 / 5.159 / 2.946 sec
Passive Service State Change: 0.000 / 11.320 / 0.058 %
Passive Services Last 1/5/15/60 min: 673 / 5122 / 17184 / 17184
Services Ok/Warn/Unk/Crit: 24010 / 4 / 71 / 11
Services Flapping: 0
Services In Downtime: 0
[edit] oh duh cuz it doesn't fork 6911 checks at once. I'm working on getting the execution time down but I may just have to split out into multiple instances.