Interpreting the change service_check_timeout

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
nagmoto
Posts: 195
Joined: Fri Jan 09, 2015 8:05 am

Interpreting the change service_check_timeout

Post by nagmoto »

Hi
As seen in attached screenshot, Around 10:30am today, service_check_timeout was increased from default 60 seconds to 200.
After this change active service check number dropped a lot.
Can you help me understand how so ?
Attachments
Screen Shot 2015-10-18 at 10.42.45 PM.png
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Interpreting the change service_check_timeout

Post by Box293 »

How is this graph being generated? Is this an service for localhost? If so what is the plugin used?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
nagmoto
Posts: 195
Joined: Fri Jan 09, 2015 8:05 am

Re: Interpreting the change service_check_timeout

Post by nagmoto »

How is this graph being generated?

mrtg graphing from /usr/bin/nagiostats command

Code: Select all

[root@nagios ~]# egrep -v '^#|^$'  /etc/nagios/mrtg.cfg
WorkDir: /usr/share/nagios/html/stats
<snipped>
Target[nagios-j]: `/usr/bin/nagiostats --mrtg --data=NUMSACTSVCCHECKS5M,NUMOACTSVCCHECKS5M,PROGRUNTIME,NAGIOSVERPID`
MaxBytes[nagios-j]: 7000
Title[nagios-j]: Active Service Checks
PageTop[nagios-j]: <H1>Active Service Checks</H1>
Options[nagios-j]: growright,gauge,nopercent
YLegend[nagios-j]: Checks
ShortLegend[nagios-j]:  
LegendI[nagios-j]:  Scheduled Checks:
LegendO[nagios-j]:  On-Demand Checks:
<snipped>
[root@nagios ~]#

Is this an service for localhost?

Yes, on localhost. system info is centos 6.x and using its vendor nagios 3.5.1 package

If so what is the plugin used?

I followed Tom Ryder's book page 275 in "Nagios core administration cookbook".
nagmoto
Posts: 195
Joined: Fri Jan 09, 2015 8:05 am

Re: Interpreting the change service_check_timeout

Post by nagmoto »

Looks like there were many service checks need times between 60 and 200 seconds.
So once the service timeout set to 200 then those checks were able to executed without resubmit for next runs.
If this is the case, am I able to look up those timeout records in nagios.log* files ?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Interpreting the change service_check_timeout

Post by tmcdonald »

There's a cool script here that can analyze your check times:

https://exchange.nagios.org/directory/P ... me/details

You will need to modify the line that points to your log file, but otherwise it usually works right out of the box. Not sure what you are looking for specifically, but the output should be related.
Former Nagios employee
nagmoto
Posts: 195
Joined: Fri Jan 09, 2015 8:05 am

Re: Interpreting the change service_check_timeout

Post by nagmoto »

Hi Tom

1. Thanks for the pointer to the script. I am able to see a list of checks and their time spent.
tmcdonald wrote:There's a cool script here that can analyze your check times:

https://exchange.nagios.org/directory/P ... me/details

You will need to modify the line that points to your log file, but otherwise it usually works right out of the box. Not sure what you are looking for specifically, but the output should be related.
2. Still, I don't understand why the change of service_check_timeout from 60 to 200 seconds , in /etc/nagios/nagios.cfg, will reduce the schedule checks number from 1420 to 150. The profile perl script I ran show very few checks are over 60 seconds long.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Interpreting the change service_check_timeout

Post by jdalrymple »

If you're not seeing many run up against the timeout then I am not sure what is going on here. If you look at the Nagios process statistics in the UI do they correlate well to what your MRTG graph indicates? I am a bit put off by how ridiculously flat your graph is, seems kind of overly ideal.
nagmoto wrote:So once the service timeout set to 200 then those checks were able to executed without resubmit for next runs.If this is the case, am I able to look up those timeout records in nagios.log* files ?
Once logged, it's logged and doesn't change. You should be able to grep the log archives for timeouts so you can compare prior to current. I don't know that I expect that to explain your situation though.

Let us know what you find.
Locked