Nagios Core 4.4.4
With the following parameters set in the cfg file for a service check:
check_period 24x7
max_check_attempts 4
check_interval 7
retry_interval 6
The following behavior was observed:
[10-29-2023 19:09:01] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;1;<summary>
[10-29-2023 19:10:31] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;2;<summary>
[10-29-2023 19:15:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;3;<summary>
[10-29-2023 19:21:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;HARD;4;<summary>
The delta time between Soft 1 and Soft 2 was 1 minute 30 seconds | should have been 6 minutes
The delta time between Soft 2 and Soft 3 was 4 minutes 53 seconds | should have been 6 minutes
The delta time between Soft 3 and Hard 4 was 6 minutes 0 seconds | This was correct. 6 minutes
The Nagios server involved is a Linux RHEL 7 VM server with almost no load or CPU utilization.
The Nagios Program-Wide Performance Information chart is below:
If anyone can give insight as to why Nagios service checks fire before they are supposed to during a retry, any information would be greatly appreciated. Also, if additional information is needed please so advise.
Thanks in Advance!
Active Check Following a Non-OK Check Occurs Too Soon
-
- Posts: 202
- Joined: Wed Aug 23, 2023 11:29 am
Re: Active Check Following a Non-OK Check Occurs Too Soon
Hi @DStackley,
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.
Thank you!
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.
Thank you!
Re: Active Check Following a Non-OK Check Occurs Too Soon
Hi @gwesterman,
I can and have (attached) supplied the nagios.cfg file. By service cfg file, you are likely referring to all of the cfg files that contain the service descriptions. This would end up being 146 files that would need to be sanitized before I can post them. One metric you may be looking for is the number of active service checks. These total to be 1,553 with an average check interval across all being 6.5 minutes.
Please let me know if the information provided is enough or if you still require the service cfgs.
Thanks,
ds
I can and have (attached) supplied the nagios.cfg file. By service cfg file, you are likely referring to all of the cfg files that contain the service descriptions. This would end up being 146 files that would need to be sanitized before I can post them. One metric you may be looking for is the number of active service checks. These total to be 1,553 with an average check interval across all being 6.5 minutes.
Please let me know if the information provided is enough or if you still require the service cfgs.
Thanks,
ds
- Attachments
-
- nagios.cfg
- (45 KiB) Downloaded 447 times
Re: Active Check Following a Non-OK Check Occurs Too Soon
The OP may want to look into the nagios scheduler and it's timing, which is somewhat inexact due to having many services to schedule in a limited time.
-
- Posts: 2
- Joined: Sun Dec 24, 2023 4:51 am
Re: Active Check Following a Non-OK Check Occurs Too Soon
Thank you for your assistancegwesterman wrote: ↑Thu Nov 02, 2023 9:36 am Hi @DStackley,
basket random
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers host and service check scheduling might be of assistance.
Thank you!
Re: Active Check Following a Non-OK Check Occurs Too Soon
Check freshness settings allow Nagios to consider service checks as "stale" if they haven't been updated within a certain wordle timeframe. Adjusting freshness thresholds can impact how often checks are scheduled.
Re: Active Check Following a Non-OK Check Occurs Too Soon
Thanks for the information, I will try to figure it out for more.clausxebec wrote: ↑Sun Dec 24, 2023 4:54 amThank you for your assistancegwesterman wrote: ↑Thu Nov 02, 2023 9:36 am Hi @DStackley,
basket random
It would help to get a look at your service cfg file itself and your main configuration file (nagios.cfg). In the meantime, this article that covers among us might be of assistance.
Thank you!