Passive Monitoring | Workflow | Dependencies | Thresholds

steffi · Post by **steffi** » Sun Mar 10, 2019 2:38 pm

Hi.
Even I read about Nagios, Flapping, Passive Monitoring and so on, I do not understand completely what is going on. My services are flapping and I want to understand why exactly this is happening.

Here is my setup:
- Passive Checks are being transmitted from remote servers every 5 Minutes (default on ubuntu Installation), which I do not want to touch, if possible.
- Every time, a passive check is being processed, I see in the logs that the result is OK and Service State stays OK (if it was before) or it experienced a SOFT recovery (when it has been NOT OK at last check and is flapping).
- When flapping was detected, Nagios is triggering an ACTIVE async check for execution 2 Minutes later, which in my case is triggering this:
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
I want to be informed as soon as possible, when the main server is not receiving passive checks anymore. So I send an state NOT OK to trigger notifications when this happens.

So Passive Checks are trying to set state OK, while every 2 minutes the defined active check is triggering state CRITICAL. I guess the 2 minutes later are inherited from the defined retry_interval.

What exactly is going on? How can I solve this? I am stuck for the moment.

Furthermore, I can not see my flap_detection_options a mentioned anywhere in the docs. I do not remember having changed that option but there is a chance I changed it by myself somehow. If not - what does it stand for and what is the purpose?

Here is the configuration I am running.

define service {
host_name mailer.my.tld
service_description imap-587
check_period 24x7
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
contact_groups admins
notification_period 24x7
initial_state o
importance 0
check_interval 10.000000
retry_interval 2.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 630
check_freshness 0
notification_options r,w,u,c
notifications_enabled 1
notification_interval 60.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}

Thanks,
Steffi

Post by **cdienger** » Mon Mar 11, 2019 1:32 pm

Having active_checks_enabled 1 will cause this since it will execute the check command every 10 minutes and then 2 more times 2 minutes apart after the initial down stage. Set active_checks_enabled to 0 so that this doesn't occur and the check will only be run if a passive check doesn't come in after 630 seconds.

steffi · Post by **steffi** » Mon Mar 11, 2019 1:50 pm

Thanks.
This is working.

Am I right that the check_intervall in case of passive monitoring only has no more function?

Post by **cdienger** » Mon Mar 11, 2019 2:45 pm

Glad to hear! check_interval is not used for passive checks.

Nagios Support Forum

Passive Monitoring | Workflow | Dependencies | Thresholds

Passive Monitoring | Workflow | Dependencies | Thresholds

Re: Passive Monitoring | Workflow | Dependencies | Threshold

Re: Passive Monitoring | Workflow | Dependencies | Threshold

Re: Passive Monitoring | Workflow | Dependencies | Threshold