It is a nightmare to troubleshoot.jdalrymple wrote:Sounds like it's a very infrequent problem that occurs? That is going to make it even tougher to sort out.
Any chance you could find one of the false alerts in your nagios.log and share the contents exactly? Like I mentioned check_icmp is pretty solid so I doubt the actual plugin is where the problem lies. I'm wondering if there is some useful output coming back.
Anyway, the support team cited cases in the net where check_icmp was giving false alarm.
I just need the parameters to use to actually minimize the alerts when there are intermittent packet loss.
check_icmp should check if the loss is constant for about 10sec before alerting.
Right now it is configured to check every 5 minutes and retry after 5 minutes before alerting.
However, we get alerts host is down and then the very next minute host is up alert. Not sure what happened to check again after 5 minutes.
Please advice
Code: Select all
define host {
host_name My Server
alias Staging Server
address 10.10.10.10
check_period 24x7
check_command check-host-fping!!!!!!!!
contact_groups CGRP_INFRA_SM1_WINTEL,CGRP_TMC
notification_period 24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 5.000000
max_check_attempts 2
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,d
notifications_enabled 1
notification_interval 44640.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
..
...