We are testing Nagios XI and have alerts setup for 10 services. I set the max_check_attempts to 1 and was receiving about 20 alerts per day.
When I switch max_check_attempts to 3, the alerts disappear. I wondering if this is something to be concerned about or if this is typical behavior when checking services (html, ftp, mail, etc) over the internet. I don't want to mask any problems, but I'm curious why the services report being down 20 times/day even if it is just momentary.
Thanks,
Bill
setting max_check_attempts to 1 ?
Re: setting max_check_attempts to 1 ?
If the max_check_attempts is set to 3, it will recheck the host or service 2 more times before sending an alert, whereas if it's set to 1, it will send an alert anytime there's a state change. If you have flap detection enabled, it will send a single flapping notification, and then temporarily suppress the notifications until the host/service state stabilizes. Notification settings are extremely flexible, so if there's something specific you're looking for let us know.
-
bmccarty12
- Posts: 7
- Joined: Wed Mar 09, 2011 12:37 pm
Re: setting max_check_attempts to 1 ?
At this point I'm trying to understand what the alerts are telling me.
When setting max_check_attempts to 1, I think it's telling me that 20 times per day one of the services we are monitoring is not responding.
Does that indicate some type of intermittent problem on our network ?
Is max_check_attempts = 1 overly sensitive when checking services over the internet ?
If I set max_check_attempts = 3 and the alerts stop, am I just masking a problem ?
When setting max_check_attempts to 1, I think it's telling me that 20 times per day one of the services we are monitoring is not responding.
Does that indicate some type of intermittent problem on our network ?
Is max_check_attempts = 1 overly sensitive when checking services over the internet ?
If I set max_check_attempts = 3 and the alerts stop, am I just masking a problem ?
Re: setting max_check_attempts to 1 ?
It's hard to say, it depends on what the check is, and where the thresholds are set for warning, critical, but also possibly the timeout settings for the check itself. If you want to detect every possible problem, then leaving it one would certainly do that, but if you're more interested in "hard" states, where a site or service was truly in a problem state, then having it at 3 would give a better representation. The checks are essentially giving snapshots of a host or service state every X amount of minutes (X being your "check_interval"). Ultimately, you'll have to make the call for your monitoring environment. Some admins want to know every time something is in a "soft" problem state, and some only want notifications upon a "hard" problem state.