Some service checks to run over 1 minutes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Some service checks to run over 1 minutes

Post by dlukinski »

Hello XI support

Need help with this one. Please define (detailed if possible) how we could have some service checks running way over 1 minute before the timeout?
We have existing environment with over 500 hosts and 6.5k+ checks.

How after integration with SELENIUM, Q/A programmers are telling us that under 1 minute checks are completely unrealistic (has to be a lot longer if not double digits)
- So is there way to make only SOME service checks to go over 1 minute long? (so that they would not timeout)?


If not and this is a global variable only, what should we take into consideration when changing one?
Of course we have many retries configured to happen every minute or every to minutes (some are w/o templates).
Even if every 1 minute could be fixed into every 2, many checks are every 3-5 min and have to be this way.
Therefore really unsure how to approach global variable changes if required.
------------------------------------------------------------------------------------------------
Maybe we should make a ticket out of it?

Would this approach be correct?
- https://deadlockprocess.wordpress.com/2 ... tosrhel-5/
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Some service checks to run over 1 minutes

Post by gormank »

Look at the service difinitions and see what command runs them. If its check_nrpe for example, you can create a check_nrpe_long (or whatever) command and use a longer timeout, or make the timeout part of the list of arguments.

I'd guess the timeout needs to be less than the check interval, or there will be problems.
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some service checks to run over 1 minutes

Post by dlukinski »

gormank wrote:Look at the service difinitions and see what command runs them. If its check_nrpe for example, you can create a check_nrpe_long (or whatever) command and use a longer timeout, or make the timeout part of the list of arguments.

I'd guess the timeout needs to be less than the check interval, or there will be problems.
This does not work so far:

(Service check timed out after 60.01 seconds) with $USER1$/check_selenium -t 300 --script=$USER1$/$ARG1$
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Some service checks to run over 1 minutes

Post by gormank »

# grep 60 /usr/local/nagios/etc/nagios.cfg
host_freshness_check_interval=60
interval_length=60
max_check_result_file_age=3600
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some service checks to run over 1 minutes

Post by dlukinski »

gormank wrote:# grep 60 /usr/local/nagios/etc/nagios.cfg
host_freshness_check_interval=60
interval_length=60
max_check_result_file_age=3600
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
Which would impact all services checks.. is that OK ?
- does it mean we have to specify timeouts manually for the rest of them to avoid default values?
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Some service checks to run over 1 minutes

Post by gormank »

You need to look at your service definitions, as I suggested earlier to answer those questions...
What are the timeouts defined?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Some service checks to run over 1 minutes

Post by tgriep »

Most plugins should have a default timeout so for those plugins, increasing the system wide service timeout value will not affect those.
You would have to monitor the nagios.log file for the service timeout of 60 seconds that you are currently getting, edit those checks and add a timeout to them.
Then when you increase the system wide timeout, those checks will not take the longer time to timeout.
Be sure to check out our Knowledgebase for helpful articles and solutions!
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some service checks to run over 1 minutes

Post by dlukinski »

gormank wrote:You need to look at your service definitions, as I suggested earlier to answer those questions...
What are the timeouts defined?
in production it is 60 sec

What we are trying to understand is the impact on all other checks with 1-2 min re-tries and 5-10 min run frequency, where specifically and only SELENIUM may require 10-30 min timeouts
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some service checks to run over 1 minutes

Post by dlukinski »

tgriep wrote:Most plugins should have a default timeout so for those plugins, increasing the system wide service timeout value will not affect those.
You would have to monitor the nagios.log file for the service timeout of 60 seconds that you are currently getting, edit those checks and add a timeout to them.
Then when you increase the system wide timeout, those checks will not take the longer time to timeout.

So if I get this right,

1. We increase system-wide to whatever the value we need.
2. Monitor logs: pretty much 1 script: check_selenium (we may multiply) where long timeouts would be required..
3. All other checks should not take longer to timeout after increase is made; what about all the consecutive checks created post-increase?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Some service checks to run over 1 minutes

Post by tgriep »

None of the other checks will take longer to run. The system wide timeout settings it there for if someone writes a plugin that doesn't have a timeout built in it and it will keep those from running too long.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked