Hi,
we are experiencing some strange problems with e-mail alerts. For some services, CRITICAL alert is sent after 1 of 5 checks, not 5 of 5 as it should be. Service and service template are configured properly.
For some of those services, OK alert isn't sent at all, but it is enabled in notification options.
This is not happening on all services, only a few of 5000 of them.
Does anyone know why is this happening?
Nagios XI 5.5.3
XI alerts sending problem
Re: XI alerts sending problem
Can you show us the actual config of a 'problem' service, along with configs or all relevant templates that this service is using?
Also, show a screenshot of a State History and Notifications reports for this service the same time period.
Note: In the State History report, select State Type = Both.
Also, show a screenshot of a State History and Notifications reports for this service the same time period.
Note: In the State History report, select State Type = Both.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: XI alerts sending problem
So, this is the problem:
Service state history: Notification history: Sometimes Nagios assumes HARD state even if check is at 1/5. In this example is WARNING state. Plus, some alerts aren't even sent. We noticed this started to happen after Nagios upgrade (5.5.2 --> 5.5.3).
Ping service uses xiwizard_xxxx_emerson_ping_service service template which uses xiwizard_generic_service service template.
Ping service config:
xiwizard_xxxx_emerson_ping_service service template:
xiwizard_generic_service service template:
check_xi_service_none command:
Service state history: Notification history: Sometimes Nagios assumes HARD state even if check is at 1/5. In this example is WARNING state. Plus, some alerts aren't even sent. We noticed this started to happen after Nagios upgrade (5.5.2 --> 5.5.3).
Ping service uses xiwizard_xxxx_emerson_ping_service service template which uses xiwizard_generic_service service template.
Ping service config:
Code: Select all
define service {
host_name napajanje-st-pujanke
service_description Ping
use xiwizard_xxxx_emerson_ping_service
check_command check_icmp!200,20%!500,60%
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 720
notification_period xi_timeperiod_24x7
contacts xxxx
contact_groups xxxx,xxxx,xxxx,xxxx
_contacts xxxx
_contact_groups xxxx,xxxx,xxxx,xxxx
_ping_critical 500ms
_ping_critical_perct 60%
_ping_warning 200ms
_ping_warning_perct 20%
_xiwizard xxxx
register 1
}
Code: Select all
define service {
name xiwizard_xxxx_emerson_ping_service
service_description Checkping
servicegroups check_ping
use xiwizard_generic_service
check_command check_icmp!200.0,20%!500.0,60%
max_check_attempts 5
check_interval 3
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 0
check_period xi_timeperiod_24x7
flap_detection_enabled 0
notification_interval 720
notification_period xi_timeperiod_24x7
notification_options w,c,u,r,f,
notifications_enabled 1
register 0
}
Code: Select all
define service {
name xiwizard_generic_service
check_command check_xi_service_none
is_volatile 0
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period xi_timeperiod_24x7
parallelize_check 1
obsess_over_service 1
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period xi_timeperiod_24x7
notifications_enabled 1
register 0
}
Code: Select all
$USER1$/check_dummy 0 "Nothing to monitor"You do not have the required permissions to view the files attached to this post.
Re: XI alerts sending problem
I believe that it is working as expected and that you are seeing proper functionality, we would need to see what state the host was in during the time that the services went hard 1 of 5 though to be sure.
https://assets.nagios.com/downloads/nag ... uling.html
After an extensive discussion with the developers and the other techs here it seems to be working as intended. (was broken in the past, and it currently works as it should)
If the host is in a down state (hard or soft) when the service checks it will check the host state and because the host is down (whether hard or soft) the services go into a hard problem state and it resets the current attempt to 1.
One way that you can get around it would be to set host_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and restart the nagios service:
Setting that will stop the service checks from even running if the host is in a problem state (hard or soft) to prevent alerts/notifications.
Please include the host in the state history output if you'd like us to validate if that is what is indeed occurring.
Thank you
Taken from here:When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is UP. If the host is not UP (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1. Since the service is in a hard non-OK state, the service check will be rescheduled at the normal frequency specified by the check_interval option instead of the retry_interval option.
https://assets.nagios.com/downloads/nag ... uling.html
After an extensive discussion with the developers and the other techs here it seems to be working as intended. (was broken in the past, and it currently works as it should)
If the host is in a down state (hard or soft) when the service checks it will check the host state and because the host is down (whether hard or soft) the services go into a hard problem state and it resets the current attempt to 1.
One way that you can get around it would be to set host_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and restart the nagios service:
Code: Select all
service nagios restartPlease include the host in the state history output if you'd like us to validate if that is what is indeed occurring.
Thank you