Hi
As seen in attached screenshot, Around 10:30am today, service_check_timeout was increased from default 60 seconds to 200.
After this change active service check number dropped a lot.
Can you help me understand how so ?
Interpreting the change service_check_timeout
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Interpreting the change service_check_timeout
How is this graph being generated? Is this an service for localhost? If so what is the plugin used?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Interpreting the change service_check_timeout
How is this graph being generated?
mrtg graphing from /usr/bin/nagiostats command
Is this an service for localhost?
Yes, on localhost. system info is centos 6.x and using its vendor nagios 3.5.1 package
If so what is the plugin used?
I followed Tom Ryder's book page 275 in "Nagios core administration cookbook".
mrtg graphing from /usr/bin/nagiostats command
Code: Select all
[root@nagios ~]# egrep -v '^#|^$' /etc/nagios/mrtg.cfg
WorkDir: /usr/share/nagios/html/stats
<snipped>
Target[nagios-j]: `/usr/bin/nagiostats --mrtg --data=NUMSACTSVCCHECKS5M,NUMOACTSVCCHECKS5M,PROGRUNTIME,NAGIOSVERPID`
MaxBytes[nagios-j]: 7000
Title[nagios-j]: Active Service Checks
PageTop[nagios-j]: <H1>Active Service Checks</H1>
Options[nagios-j]: growright,gauge,nopercent
YLegend[nagios-j]: Checks
ShortLegend[nagios-j]:
LegendI[nagios-j]: Scheduled Checks:
LegendO[nagios-j]: On-Demand Checks:
<snipped>
[root@nagios ~]#
Yes, on localhost. system info is centos 6.x and using its vendor nagios 3.5.1 package
If so what is the plugin used?
I followed Tom Ryder's book page 275 in "Nagios core administration cookbook".
Re: Interpreting the change service_check_timeout
Looks like there were many service checks need times between 60 and 200 seconds.
So once the service timeout set to 200 then those checks were able to executed without resubmit for next runs.
If this is the case, am I able to look up those timeout records in nagios.log* files ?
So once the service timeout set to 200 then those checks were able to executed without resubmit for next runs.
If this is the case, am I able to look up those timeout records in nagios.log* files ?
Re: Interpreting the change service_check_timeout
There's a cool script here that can analyze your check times:
https://exchange.nagios.org/directory/P ... me/details
You will need to modify the line that points to your log file, but otherwise it usually works right out of the box. Not sure what you are looking for specifically, but the output should be related.
https://exchange.nagios.org/directory/P ... me/details
You will need to modify the line that points to your log file, but otherwise it usually works right out of the box. Not sure what you are looking for specifically, but the output should be related.
Former Nagios employee
Re: Interpreting the change service_check_timeout
Hi Tom
1. Thanks for the pointer to the script. I am able to see a list of checks and their time spent.
1. Thanks for the pointer to the script. I am able to see a list of checks and their time spent.
2. Still, I don't understand why the change of service_check_timeout from 60 to 200 seconds , in /etc/nagios/nagios.cfg, will reduce the schedule checks number from 1420 to 150. The profile perl script I ran show very few checks are over 60 seconds long.tmcdonald wrote:There's a cool script here that can analyze your check times:
https://exchange.nagios.org/directory/P ... me/details
You will need to modify the line that points to your log file, but otherwise it usually works right out of the box. Not sure what you are looking for specifically, but the output should be related.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Interpreting the change service_check_timeout
If you're not seeing many run up against the timeout then I am not sure what is going on here. If you look at the Nagios process statistics in the UI do they correlate well to what your MRTG graph indicates? I am a bit put off by how ridiculously flat your graph is, seems kind of overly ideal.
Let us know what you find.
Once logged, it's logged and doesn't change. You should be able to grep the log archives for timeouts so you can compare prior to current. I don't know that I expect that to explain your situation though.nagmoto wrote:So once the service timeout set to 200 then those checks were able to executed without resubmit for next runs.If this is the case, am I able to look up those timeout records in nagios.log* files ?
Let us know what you find.