Passive Monitoring | Workflow | Dependencies | Thresholds
Posted: Sun Mar 10, 2019 2:38 pm
Hi.
Even I read about Nagios, Flapping, Passive Monitoring and so on, I do not understand completely what is going on. My services are flapping and I want to understand why exactly this is happening.
Here is my setup:
- Passive Checks are being transmitted from remote servers every 5 Minutes (default on ubuntu Installation), which I do not want to touch, if possible.
- Every time, a passive check is being processed, I see in the logs that the result is OK and Service State stays OK (if it was before) or it experienced a SOFT recovery (when it has been NOT OK at last check and is flapping).
- When flapping was detected, Nagios is triggering an ACTIVE async check for execution 2 Minutes later, which in my case is triggering this:
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
I want to be informed as soon as possible, when the main server is not receiving passive checks anymore. So I send an state NOT OK to trigger notifications when this happens.
So Passive Checks are trying to set state OK, while every 2 minutes the defined active check is triggering state CRITICAL. I guess the 2 minutes later are inherited from the defined retry_interval.
What exactly is going on? How can I solve this? I am stuck for the moment.
Furthermore, I can not see my flap_detection_options a mentioned anywhere in the docs. I do not remember having changed that option but there is a chance I changed it by myself somehow. If not - what does it stand for and what is the purpose?
Here is the configuration I am running.
define service {
host_name mailer.my.tld
service_description imap-587
check_period 24x7
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
contact_groups admins
notification_period 24x7
initial_state o
importance 0
check_interval 10.000000
retry_interval 2.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 630
check_freshness 0
notification_options r,w,u,c
notifications_enabled 1
notification_interval 60.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
Thanks,
Steffi
Even I read about Nagios, Flapping, Passive Monitoring and so on, I do not understand completely what is going on. My services are flapping and I want to understand why exactly this is happening.
Here is my setup:
- Passive Checks are being transmitted from remote servers every 5 Minutes (default on ubuntu Installation), which I do not want to touch, if possible.
- Every time, a passive check is being processed, I see in the logs that the result is OK and Service State stays OK (if it was before) or it experienced a SOFT recovery (when it has been NOT OK at last check and is flapping).
- When flapping was detected, Nagios is triggering an ACTIVE async check for execution 2 Minutes later, which in my case is triggering this:
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
I want to be informed as soon as possible, when the main server is not receiving passive checks anymore. So I send an state NOT OK to trigger notifications when this happens.
So Passive Checks are trying to set state OK, while every 2 minutes the defined active check is triggering state CRITICAL. I guess the 2 minutes later are inherited from the defined retry_interval.
What exactly is going on? How can I solve this? I am stuck for the moment.
Furthermore, I can not see my flap_detection_options a mentioned anywhere in the docs. I do not remember having changed that option but there is a chance I changed it by myself somehow. If not - what does it stand for and what is the purpose?
Here is the configuration I am running.
define service {
host_name mailer.my.tld
service_description imap-587
check_period 24x7
check_command critical_stale_state!2!***Service is in STALE STATE. Passive Check Result HAS NOT BEEN RECEIVED from remote server***
contact_groups admins
notification_period 24x7
initial_state o
importance 0
check_interval 10.000000
retry_interval 2.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 630
check_freshness 0
notification_options r,w,u,c
notifications_enabled 1
notification_interval 60.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
Thanks,
Steffi