Active Check Following a Non-OK Check Occurs Too Soon

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Post Reply
DStackley
Posts: 2
Joined: Fri Oct 25, 2013 12:28 pm

Active Check Following a Non-OK Check Occurs Too Soon

Post by DStackley »

Nagios Core 4.4.4

With the following parameters set in the cfg file for a service check:

check_period 24x7
max_check_attempts 4
check_interval 7
retry_interval 6

The following behavior was observed:
[10-29-2023 19:09:01] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;1;<summary>
[10-29-2023 19:10:31] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;2;<summary>
[10-29-2023 19:15:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;3;<summary>
[10-29-2023 19:21:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;HARD;4;<summary>

The delta time between Soft 1 and Soft 2 was 1 minute 30 seconds | should have been 6 minutes
The delta time between Soft 2 and Soft 3 was 4 minutes 53 seconds | should have been 6 minutes
The delta time between Soft 3 and Hard 4 was 6 minutes 0 seconds | This was correct. 6 minutes

The Nagios server involved is a Linux RHEL 7 VM server with almost no load or CPU utilization.

The Nagios Program-Wide Performance Information chart is below:
2023-11-01 09_54_40-Nagios nflmsnagprvs001 - Program-Wide Performance Information.jpg
If anyone can give insight as to why Nagios service checks fire before they are supposed to during a retry, any information would be greatly appreciated. Also, if additional information is needed please so advise.

Thanks in Advance!
gwesterman
Posts: 202
Joined: Wed Aug 23, 2023 11:29 am

Re: Active Check Following a Non-OK Check Occurs Too Soon

Post by gwesterman »

Hi @DStackley,

It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.

Thank you!
DStackley
Posts: 2
Joined: Fri Oct 25, 2013 12:28 pm

Re: Active Check Following a Non-OK Check Occurs Too Soon

Post by DStackley »

Hi @gwesterman,

I can and have (attached) supplied the nagios.cfg file. By service cfg file, you are likely referring to all of the cfg files that contain the service descriptions. This would end up being 146 files that would need to be sanitized before I can post them. One metric you may be looking for is the number of active service checks. These total to be 1,553 with an average check interval across all being 6.5 minutes.

Please let me know if the information provided is enough or if you still require the service cfgs.

Thanks,
ds
Attachments
nagios.cfg
(45 KiB) Downloaded 447 times
kg2857
Posts: 304
Joined: Wed Apr 12, 2023 5:48 pm

Re: Active Check Following a Non-OK Check Occurs Too Soon

Post by kg2857 »

The OP may want to look into the nagios scheduler and it's timing, which is somewhat inexact due to having many services to schedule in a limited time.
clausxebec
Posts: 2
Joined: Sun Dec 24, 2023 4:51 am

Re: Active Check Following a Non-OK Check Occurs Too Soon

Post by clausxebec »

gwesterman wrote: Thu Nov 02, 2023 9:36 am Hi @DStackley,
basket random
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.

Thank you!
Thank you for your assistance
bekean23
Posts: 12
Joined: Sat Jul 01, 2023 11:39 pm

Re: Active Check Following a Non-OK Check Occurs Too Soon

Post by bekean23 »

Check freshness settings allow Nagios to consider service checks as "stale" if they haven't been updated within a certain wordle timeframe. Adjusting freshness thresholds can impact how often checks are scheduled.
otisjame
Posts: 9
Joined: Mon Sep 18, 2023 11:54 pm

Re: Active Check Following a Non-OK Check Occurs Too Soon

Post by otisjame »

clausxebec wrote: Sun Dec 24, 2023 4:54 am
gwesterman wrote: Thu Nov 02, 2023 9:36 am Hi @DStackley,
basket random
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers among us might be of assistance.

Thank you!
Thank you for your assistance
Thanks for the information, I will try to figure it out for more.
Post Reply