Page 1 of 1

Active Check Following a Non-OK Check Occurs Too Soon

Posted: Wed Nov 01, 2023 12:19 pm
by DStackley
Nagios Core 4.4.4

With the following parameters set in the cfg file for a service check:

check_period 24x7
max_check_attempts 4
check_interval 7
retry_interval 6

The following behavior was observed:
[10-29-2023 19:09:01] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;1;<summary>
[10-29-2023 19:10:31] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;2;<summary>
[10-29-2023 19:15:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;3;<summary>
[10-29-2023 19:21:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;HARD;4;<summary>

The delta time between Soft 1 and Soft 2 was 1 minute 30 seconds | should have been 6 minutes
The delta time between Soft 2 and Soft 3 was 4 minutes 53 seconds | should have been 6 minutes
The delta time between Soft 3 and Hard 4 was 6 minutes 0 seconds | This was correct. 6 minutes

The Nagios server involved is a Linux RHEL 7 VM server with almost no load or CPU utilization.

The Nagios Program-Wide Performance Information chart is below:
2023-11-01 09_54_40-Nagios nflmsnagprvs001 - Program-Wide Performance Information.jpg
If anyone can give insight as to why Nagios service checks fire before they are supposed to during a retry, any information would be greatly appreciated. Also, if additional information is needed please so advise.

Thanks in Advance!

Re: Active Check Following a Non-OK Check Occurs Too Soon

Posted: Thu Nov 02, 2023 9:36 am
by gwesterman
Hi @DStackley,

It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.

Thank you!

Re: Active Check Following a Non-OK Check Occurs Too Soon

Posted: Mon Nov 06, 2023 5:28 pm
by DStackley
Hi @gwesterman,

I can and have (attached) supplied the nagios.cfg file. By service cfg file, you are likely referring to all of the cfg files that contain the service descriptions. This would end up being 146 files that would need to be sanitized before I can post them. One metric you may be looking for is the number of active service checks. These total to be 1,553 with an average check interval across all being 6.5 minutes.

Please let me know if the information provided is enough or if you still require the service cfgs.

Thanks,
ds

Re: Active Check Following a Non-OK Check Occurs Too Soon

Posted: Tue Nov 07, 2023 1:40 am
by kg2857
The OP may want to look into the nagios scheduler and it's timing, which is somewhat inexact due to having many services to schedule in a limited time.

Re: Active Check Following a Non-OK Check Occurs Too Soon

Posted: Sun Dec 24, 2023 4:54 am
by clausxebec
gwesterman wrote: Thu Nov 02, 2023 9:36 am Hi @DStackley,
basket random
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.

Thank you!
Thank you for your assistance

Re: Active Check Following a Non-OK Check Occurs Too Soon

Posted: Mon Dec 25, 2023 2:19 am
by bekean23
Check freshness settings allow Nagios to consider service checks as "stale" if they haven't been updated within a certain wordle timeframe. Adjusting freshness thresholds can impact how often checks are scheduled.

Re: Active Check Following a Non-OK Check Occurs Too Soon

Posted: Thu Feb 22, 2024 9:59 pm
by otisjame
clausxebec wrote: Sun Dec 24, 2023 4:54 am
gwesterman wrote: Thu Nov 02, 2023 9:36 am Hi @DStackley,
basket random
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers among us might be of assistance.

Thank you!
Thank you for your assistance
Thanks for the information, I will try to figure it out for more.