Notification sent before threshold breach
Posted: Wed Apr 15, 2020 8:20 am
Hi,
We are running nagiosxi 5.6.6 on centOS 7.
core version is 4.4.3.
We received a complaint from one of the sys admin, that they received notification for disk utilization when the actual utilization was below threshold.
Looking into the nagiosxi performance graph we could not see the utilization breaching the threshold.
Looking into the /usr/local/nagios/var/nagios.log revealed that the service state had directly entered "Critical HARD" from warning, with no "critical soft"
Kindly help in identifying the issue.
service definition:
######################################################
define service {
host_name XXXXXXXX
service_description Root Volume
check_period 24x7
check_command check_xi_hpe_ncpa_disk!-t 5nidNag -p 5693!disk/logical/!|!/used_percent!-w 80 -c 90!!!
contacts servicenow_integration
notification_period 24x7
initial_state o
importance 0
check_interval 1.000000
retry_interval 1.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,w,c
notifications_enabled 1
notification_interval 480.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
_AG Platform#Alerts
}
####################################################
nagios.log entry
[1586934000] CURRENT SERVICE STATE: s1l00103g;Ncpa_Agent_Status;OK;HARD;1;HTTP OK: HTTP/1.1 200 OK - 184 bytes in 0.026 second response time
[1586934000] CURRENT SERVICE STATE: s1l00103g;Root Volume;WARNING;HARD;3;WARNING: Used_percent was 82.50 %
[1586934000] CURRENT SERVICE STATE: s1l00103g;SSH Service Monitoring;OK;HARD;1;SSH OK - OpenSSH_7.4 (protocol 2.0)
[1586934000] CURRENT SERVICE STATE: s1l00103g;Swap Usage;OK;HARD;1;OK: Used swap was 9.60 % (Total: 3.10 GiB, Used: 0.30 GiB, Free: 2.80 GiB)
[1586940698] SERVICE NOTIFICATION: servicenow_integration;s1l00103g;Root Volume;WARNING;notify_servicenow_service;WARNING: Used_percent was 82.50 %
[1586943165] SERVICE NOTIFICATION: servicenow_integration;s1l00103g;Root Volume;CRITICAL;notify_servicenow_service;CRITICAL: Used_percent was 91.50 %
[1586943165] SERVICE ALERT: s1l00103g;Root Volume;CRITICAL;HARD;3;CRITICAL: Used_percent was 91.50 %
[1586943224] SERVICE NOTIFICATION: servicenow_integration;s1l00103g;Root Volume;WARNING;notify_servicenow_service;WARNING: Used_percent was 82.80 %
[1586943224] SERVICE ALERT: s1l00103g;Root Volume;WARNING;HARD;3;WARNING: Used_percent was 82.80 %
####################
We are running nagiosxi 5.6.6 on centOS 7.
core version is 4.4.3.
We received a complaint from one of the sys admin, that they received notification for disk utilization when the actual utilization was below threshold.
Looking into the nagiosxi performance graph we could not see the utilization breaching the threshold.
Looking into the /usr/local/nagios/var/nagios.log revealed that the service state had directly entered "Critical HARD" from warning, with no "critical soft"
Kindly help in identifying the issue.
service definition:
######################################################
define service {
host_name XXXXXXXX
service_description Root Volume
check_period 24x7
check_command check_xi_hpe_ncpa_disk!-t 5nidNag -p 5693!disk/logical/!|!/used_percent!-w 80 -c 90!!!
contacts servicenow_integration
notification_period 24x7
initial_state o
importance 0
check_interval 1.000000
retry_interval 1.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,w,c
notifications_enabled 1
notification_interval 480.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
_AG Platform#Alerts
}
####################################################
nagios.log entry
[1586934000] CURRENT SERVICE STATE: s1l00103g;Ncpa_Agent_Status;OK;HARD;1;HTTP OK: HTTP/1.1 200 OK - 184 bytes in 0.026 second response time
[1586934000] CURRENT SERVICE STATE: s1l00103g;Root Volume;WARNING;HARD;3;WARNING: Used_percent was 82.50 %
[1586934000] CURRENT SERVICE STATE: s1l00103g;SSH Service Monitoring;OK;HARD;1;SSH OK - OpenSSH_7.4 (protocol 2.0)
[1586934000] CURRENT SERVICE STATE: s1l00103g;Swap Usage;OK;HARD;1;OK: Used swap was 9.60 % (Total: 3.10 GiB, Used: 0.30 GiB, Free: 2.80 GiB)
[1586940698] SERVICE NOTIFICATION: servicenow_integration;s1l00103g;Root Volume;WARNING;notify_servicenow_service;WARNING: Used_percent was 82.50 %
[1586943165] SERVICE NOTIFICATION: servicenow_integration;s1l00103g;Root Volume;CRITICAL;notify_servicenow_service;CRITICAL: Used_percent was 91.50 %
[1586943165] SERVICE ALERT: s1l00103g;Root Volume;CRITICAL;HARD;3;CRITICAL: Used_percent was 91.50 %
[1586943224] SERVICE NOTIFICATION: servicenow_integration;s1l00103g;Root Volume;WARNING;notify_servicenow_service;WARNING: Used_percent was 82.80 %
[1586943224] SERVICE ALERT: s1l00103g;Root Volume;WARNING;HARD;3;WARNING: Used_percent was 82.80 %
####################