Spikes of on-demand checks

raulpe · Post by **raulpe** » Mon Nov 25, 2013 3:58 pm

Hello,

I recently setup MRTG to record my Nagios configuration performance. The chart attached shows a one day snapshot of active host checks where the green area represents scheduled checks and the blue line is on-demand checks.

Are the spikes showing on the on-demand checks normal? If not, can this be caused by assigning multiple parents to a host? If normal, why are they happening every hour?

Thank you.

sreinhardt · Post by **sreinhardt** » Mon Nov 25, 2013 5:24 pm

on demand checks are generally caused by a service check failing, and scheduling an immediate host check. Have you recently setup any checks that are failing at a somewhat regular basis?

raulpe · Post by **raulpe** » Tue Nov 26, 2013 8:07 am

Are on-demand checks triggered by soft states or hard states only? I have a few services that depend on a file being available but I have given it a lot of time for it to be available before it becomes a hard state. That's the only thing I can imagine generating this number of checks simultaneously.

sreinhardt · Post by **sreinhardt** » Tue Nov 26, 2013 11:51 am

They are triggered by both if I recall correctly, I know they are called on each soft state, but may be used on hard states too, to determine again if the host is down or if the service is down for notification logic. Your case might make sense if those services are returning a soft warning or critical state if the file is not available. You might try creating a temp file or some other way of determining the last time that file was available, and if it is not presently there, check the temp file to see if it is still within a reasonable time range. something like:

Code: Select all

if (file does not exist) {
  if (temp file exists) {
    if (temp file within 2 hours) { //return ok, no file but within range
      echo "File not found, but within time specified"
      exit 0
    }
    else { //return critical, no file and out of range
       echo "File not found, outside of time specified"
       exit 2
    }
  else{ //return critical, file and temp dont exist
     echo "file not found and temp file does not exist"
     exit 2
  }
else{ //file exists, so do normal checks and if everything is ok, set temp file, use the timestamp on file for comparison. 
   ... some code to check file contents...
   touch /tmp/temp-file
}

Just some pseudo code that might work to alleviate some false positives on those file checks. You would also want to shorten the soft state times and number of checks if you are going to do this.

Nagios Support Forum

Spikes of on-demand checks

Spikes of on-demand checks

Re: Spikes of on-demand checks

Re: Spikes of on-demand checks

Re: Spikes of on-demand checks