Nagios Notifications rules
Posted: Fri Jun 29, 2012 12:31 pm
Hi,
I was wondering if there is any way to suppress notification depending on status return statement?
We have a situation, where we remotely check performance through "check_by_ssh" and every now and then we get a CRITICAL alert, but it is CRITICAL because of timeout issue...
So, Severiity is CRITICAL and Info says:
Info: (Service Check Timed Out)
For check_by_ssh, I already set "-t 300" option.
Because all contacts receive CRITICAL alerts, I can't stop these alerts to going to operations center and they start panicking... So, is there a rule I can setup, which says: if "Info" Like "Timeout" then send notification just to admin instead of all contacts?
If not, is there at least a way I can make "Timeout" errors categorized as "Unknown" - which is currently just sent to Nagios admin who can look at it, rather than everybody receiving it.
I'd like to be notified when there is a timeout, but just not as CRITICAL - as it almost always recovers - and if it still remains unknown then we'd go check it anyway.
I know there is a "service_check_timeout" in nagios.cfg - this is still left at "60", may be increasing this could solve the problem all together... however, cannot be 100% sure, so getting it to alert "timeout" errors as "Unkown" seems to be the best option. i.e. if it is an option.
Thanks in advance.
I was wondering if there is any way to suppress notification depending on status return statement?
We have a situation, where we remotely check performance through "check_by_ssh" and every now and then we get a CRITICAL alert, but it is CRITICAL because of timeout issue...
So, Severiity is CRITICAL and Info says:
Info: (Service Check Timed Out)
For check_by_ssh, I already set "-t 300" option.
Because all contacts receive CRITICAL alerts, I can't stop these alerts to going to operations center and they start panicking... So, is there a rule I can setup, which says: if "Info" Like "Timeout" then send notification just to admin instead of all contacts?
If not, is there at least a way I can make "Timeout" errors categorized as "Unknown" - which is currently just sent to Nagios admin who can look at it, rather than everybody receiving it.
I'd like to be notified when there is a timeout, but just not as CRITICAL - as it almost always recovers - and if it still remains unknown then we'd go check it anyway.
I know there is a "service_check_timeout" in nagios.cfg - this is still left at "60", may be increasing this could solve the problem all together... however, cannot be 100% sure, so getting it to alert "timeout" errors as "Unkown" seems to be the best option. i.e. if it is an option.
Thanks in advance.