Hi.
Even I read about Nagios, Flapping, Passive Monitoring and so on, I do not understand completely what is going on. My services are flapping and I want to understand why exactly this is happening.
Here is my setup:
- Passive Checks are being transmitted from remote servers every 5 Minutes (default on ubuntu Installation), which I do not want to touch, if possible.
- Every time, a passive check is being processed, I see in the logs that the result is OK and Service State stays OK (if it was before) or it experienced a SOFT recovery (when it has been NOT OK at last check and is flapping).
- When flapping was detected, Nagios is triggering an ACTIVE async check for execution 2 Minutes later, which in my case is triggering this:
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
I want to be informed as soon as possible, when the main server is not receiving passive checks anymore. So I send an state NOT OK to trigger notifications when this happens.
So Passive Checks are trying to set state OK, while every 2 minutes the defined active check is triggering state CRITICAL. I guess the 2 minutes later are inherited from the defined retry_interval.
What exactly is going on? How can I solve this? I am stuck for the moment.
Furthermore, I can not see my flap_detection_options a mentioned anywhere in the docs. I do not remember having changed that option but there is a chance I changed it by myself somehow. If not - what does it stand for and what is the purpose?
Here is the configuration I am running.
define service {
host_name mailer.my.tld
service_description imap-587
check_period 24x7
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
contact_groups admins
notification_period 24x7
initial_state o
importance 0
check_interval 10.000000
retry_interval 2.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 630
check_freshness 0
notification_options r,w,u,c
notifications_enabled 1
notification_interval 60.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
Thanks,
Steffi
Passive Monitoring | Workflow | Dependencies | Thresholds
Re: Passive Monitoring | Workflow | Dependencies | Threshold
Having active_checks_enabled 1 will cause this since it will execute the check command every 10 minutes and then 2 more times 2 minutes apart after the initial down stage. Set active_checks_enabled to 0 so that this doesn't occur and the check will only be run if a passive check doesn't come in after 630 seconds.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Passive Monitoring | Workflow | Dependencies | Threshold
Thanks.
This is working.
Am I right that the check_intervall in case of passive monitoring only has no more function?
This is working.
Am I right that the check_intervall in case of passive monitoring only has no more function?
Re: Passive Monitoring | Workflow | Dependencies | Threshold
Glad to hear! check_interval is not used for passive checks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.