a question about services.cfg

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
smcracraft
Posts: 35
Joined: Sat Sep 25, 2010 12:53 pm

a question about services.cfg

Post by smcracraft »

I still don't understand how best to use

check_interval
retry_interval
max_check_attempts

to measure something over a period so that it doesn't
massively alarm unnecessarily.

Can someone explain it in plain English non-geek-speak?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: a question about services.cfg

Post by tmcdonald »

check_interval:
Alright, so I have this thing I want to monitor. As long as it is checking out alright, I only want to check it every 5 minutes. So I'll set "check_interval 5" in that service's config file.
retry_interval:
Sometimes things don't check out alright, maybe there is a problem that needs to be looked into. If I check this thing and there *is* a problem, I want to re-check it every minute since it's important now, so I'll set "retry_interval 1".
max_check_attempts:
Just in case the thing turns out to be a temporary problem, I want to re-check it a few times before determining it really is something to worry about and start sending alert emails. 3 times should be enough (remembering that it will check every 1 minute, as per retry_interval) so I will set "max_check_attempts 3".
Former Nagios employee
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: a question about services.cfg

Post by eloyd »

I give a Nagios training presentation that talks about the "Nagios Timeline." Basically, take what Trevor just said and turn it into a lot of text and one picture:

Code: Select all

If a service is OK it is in a HARD state.
In normal OK state, checks are performed at check_interval intervals.
If it then becomes non-OK, it is in a SOFT state.
  Further checks are made at a decreased interval (retry_interval)
Services remain in a SOFT state until they have had max_check_attempts successive non-OK attempts.
  At that point, they are in a HARD state equal to the last status (HARD WARNING or HARD CRITICAL).
  Further checks are made at the normal interval from now on (check_interval).
Services that are in a non-OK HARD state that then become OK are placed into a SOFT OK state.
  The next check will result in a HARD OK if it is an OK status, or a SOFT version of a non-OK state, and this process repeats.

This allows for “instantaneous” outages that don’t immediately trigger notifications or event handlers.
The picture shows the normal check_interval until something goes wrong, then a max_check_attempts of 3 being checked at the reduce retry_interval until it goes bad, then it changes back to check_interval until it goes OK.
Nagios Timeline.png
Nagios Timeline.png (2.79 KiB) Viewed 1602 times
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: a question about services.cfg

Post by slansing »

Hopefully this helps answer the OP's question, thanks for the tips guys. Let us know if you have further questions smcracraft!
Locked