[Bug/Notification] Service notification not triggered after state recovery and re-failure (Nagios Core 4.4.14)
Posted: Mon May 26, 2025 11:01 pm
## Summary
In Nagios Core 4.4.14 (on AlmaLinux 9.5, installed via YUM), we are encountering a problem where service notifications are not sent after state transitions like:
`CRITICAL → OK → CRITICAL`
Only the first CRITICAL state triggers a notification. After recovery (OK), if the service becomes CRITICAL again, no notification is sent.
## Expected Behavior
We want Nagios to trigger a service notification (and execute our custom notification script) **every time** the service enters a HARD state like CRITICAL — even if it was previously in CRITICAL before recovering.
## Actual Behavior
- Initial state change from OK to CRITICAL → notification sent
- Recovery to OK → notification sent
- State changes back to CRITICAL → **no notification sent**
- No `SERVICE NOTIFICATION` log is recorded in `/var/log/nagios/nagios.log`.
## Notification Script
The custom script we are using is:
```bash
#!/bin/bash
#$1:HOSTADDRESS
#$2:HOSTALIAS
#$3:SERVICESTATE
#$4:CONTACTEMAIL
if [ "${3}" == "OK" ]; then
Stat="IF-UP"
else
Stat="IF-DOWN"
fi
/usr/bin/printf "%b" "Event: ${Stat}\nIpAddress: ${1}\nHost: ${2}\n" | /usr/bin/mail -r [email protected] -s "${Stat} ${1}" ${4}
※(We replaced actual email addresses with [email protected] for privacy.)
◆Configuration Summary
〇services.cfg
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
check_interval 30
retry_interval 1
contact_groups ADMING001
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
register 0
}
define service{
use generic-service
host_name TEST001
service_description TESTS001
check_command check_ping!1000.0,80%!2000.0,100%
check_interval 2
retry_interval 1
}
〇contacts.cfg
define contact {
contact_name ADMIN001
use generic-contact
service_notification_options u,c,r
host_notification_options n
service_notification_commands send-alarm2
host_notification_commands send-alarm2
email [email protected]
}
〇commands.cfg
define command {
command_name send-alarm2
command_line /usr/local/nagios/libexec/send-alarm2.sh $HOSTADDRESS$ $HOSTALIAS$ $SERVICESTATE$ $CONTACTEMAIL$
}
◆Question
Is this behavior expected?
How can we configure Nagios so that it sends notifications on every state change, even if it's a reoccurrence of the same state?
In Nagios Core 4.4.14 (on AlmaLinux 9.5, installed via YUM), we are encountering a problem where service notifications are not sent after state transitions like:
`CRITICAL → OK → CRITICAL`
Only the first CRITICAL state triggers a notification. After recovery (OK), if the service becomes CRITICAL again, no notification is sent.
## Expected Behavior
We want Nagios to trigger a service notification (and execute our custom notification script) **every time** the service enters a HARD state like CRITICAL — even if it was previously in CRITICAL before recovering.
## Actual Behavior
- Initial state change from OK to CRITICAL → notification sent
- Recovery to OK → notification sent
- State changes back to CRITICAL → **no notification sent**
- No `SERVICE NOTIFICATION` log is recorded in `/var/log/nagios/nagios.log`.
## Notification Script
The custom script we are using is:
```bash
#!/bin/bash
#$1:HOSTADDRESS
#$2:HOSTALIAS
#$3:SERVICESTATE
#$4:CONTACTEMAIL
if [ "${3}" == "OK" ]; then
Stat="IF-UP"
else
Stat="IF-DOWN"
fi
/usr/bin/printf "%b" "Event: ${Stat}\nIpAddress: ${1}\nHost: ${2}\n" | /usr/bin/mail -r [email protected] -s "${Stat} ${1}" ${4}
※(We replaced actual email addresses with [email protected] for privacy.)
◆Configuration Summary
〇services.cfg
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
check_interval 30
retry_interval 1
contact_groups ADMING001
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
register 0
}
define service{
use generic-service
host_name TEST001
service_description TESTS001
check_command check_ping!1000.0,80%!2000.0,100%
check_interval 2
retry_interval 1
}
〇contacts.cfg
define contact {
contact_name ADMIN001
use generic-contact
service_notification_options u,c,r
host_notification_options n
service_notification_commands send-alarm2
host_notification_commands send-alarm2
email [email protected]
}
〇commands.cfg
define command {
command_name send-alarm2
command_line /usr/local/nagios/libexec/send-alarm2.sh $HOSTADDRESS$ $HOSTALIAS$ $SERVICESTATE$ $CONTACTEMAIL$
}
◆Question
Is this behavior expected?
How can we configure Nagios so that it sends notifications on every state change, even if it's a reoccurrence of the same state?