[Nagios-devel] BUG: Recovery notifications sent to contacts which
Posted: Wed Aug 20, 2008 6:25 am
Dies ist eine mehrteilige Nachricht im MIME-Format.
--=_alternative 004F34A8C12574AB_=
Content-Type: text/plain; charset="utf-8"
content-transfer-encoding: quoted-printable
Greetings,
it seems I triggered a bug with our new nagios instance, as it shows quite=
=20
a strange behaviour.
Quoting from the nagios 3.x documentation:=20
http://nagios.sourceforge.net/docs/3_0/ ... tions.html
Service and Host Filters:
"Note: Notifications about host or service recoveries are only sent out if=
=20
a notification was sent out
for the original problem. It doesn't make sense to get a recovery=20
notification for something you never
knew was a problem... "
This is what happened:
1. Service went CRITICAL -> Notifications to the contacts user1-mail,=20
user2-mail
2. Service went WARNING -> Notifications to the contacts user1-mail,=20
user2-mail
3. Service went OK -> Notifications to the contacts=20
user1-mail,user2-mail,user1-sms,user2-sms
vmctx02 CPU CRITICAL 18-08-2008 16:24:50 user1-mail=20
mail-notification CRITICAL: 15m: average load 100% critical
vmctx02 CPU CRITICAL 18-08-2008 16:24:50 user2-mail=20
mail-notification CRITICAL: 15m: average load 100% critical
vmctx02 CPU WARNING 18-08-2008 16:31:50 user1-mail=20
mail-notification WARNING: 15m: average load 99% warning
vmctx02 CPU WARNING 18-08-2008 16:31:50 user2-mail=20
mail-notification WARNING: 15m: average load 99% warning
vmctx02 CPU OK 18-08-2008 16:32:50 user1-sms sms-notification=
=20
OK: 15m: average load 92%
vmctx02 CPU OK 18-08-2008 16:32:50 user2-sms sms-notification=
=20
OK: 15m: average load 92%
vmctx02 CPU OK 18-08-2008 16:32:50 user1-mail=20
mail-notification OK: 15m: average load 92%
vmctx02 CPU OK 18-08-2008 16:32:50 user2-mail=20
mail-notification OK: 15m: average load 92%
I do not understand why the 2 sms contacts were notified, they never=20
received a
problem notification in first place. It was an escalation which triggered=20
those sms -
but it shouldn't have in my opinion. It seems it only happens in our=20
environment, if
exactly 2 notifications were sent before a recovery.
These are the relevant configs:
Contacts and Templates (user1 and user2 are identical):
define contact {
name generic-contact-mail
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,r
service_notification_options u,c,w,r
host_notification_commands mail-notification
service_notification_commands mail-notification
register 0
}
define contact {
contact_name user1-mail
use generic-contact-mail
alias User1
email [email protected]
}
define contact {
name generic-contact-sms
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,r
service_notification_options u,c,r
host_notification_commands sms-notification
service_notification_commands sms-notification
register 0
}
define contact {=20
contact_name user1-sms
use generic-contact-sms
alias S R
pager +49-DONT-CALL-ME
}=20
Service Templates and Service:
define service {
name generic-service
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 1
retry_check_interval 3
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 0
check_freshness 1
freshness_threshold 120
notifications_enabled 1
notification_interval 60
notification_period 24x7
notification_options u,c,w,r
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
--=_alternative 004F34A8C12574AB_=
Content-Type: text/plain; charset="utf-8"
content-transfer-encoding: quoted-printable
Greetings,
it seems I triggered a bug with our new nagios instance, as it shows quite=
=20
a strange behaviour.
Quoting from the nagios 3.x documentation:=20
http://nagios.sourceforge.net/docs/3_0/ ... tions.html
Service and Host Filters:
"Note: Notifications about host or service recoveries are only sent out if=
=20
a notification was sent out
for the original problem. It doesn't make sense to get a recovery=20
notification for something you never
knew was a problem... "
This is what happened:
1. Service went CRITICAL -> Notifications to the contacts user1-mail,=20
user2-mail
2. Service went WARNING -> Notifications to the contacts user1-mail,=20
user2-mail
3. Service went OK -> Notifications to the contacts=20
user1-mail,user2-mail,user1-sms,user2-sms
vmctx02 CPU CRITICAL 18-08-2008 16:24:50 user1-mail=20
mail-notification CRITICAL: 15m: average load 100% critical
vmctx02 CPU CRITICAL 18-08-2008 16:24:50 user2-mail=20
mail-notification CRITICAL: 15m: average load 100% critical
vmctx02 CPU WARNING 18-08-2008 16:31:50 user1-mail=20
mail-notification WARNING: 15m: average load 99% warning
vmctx02 CPU WARNING 18-08-2008 16:31:50 user2-mail=20
mail-notification WARNING: 15m: average load 99% warning
vmctx02 CPU OK 18-08-2008 16:32:50 user1-sms sms-notification=
=20
OK: 15m: average load 92%
vmctx02 CPU OK 18-08-2008 16:32:50 user2-sms sms-notification=
=20
OK: 15m: average load 92%
vmctx02 CPU OK 18-08-2008 16:32:50 user1-mail=20
mail-notification OK: 15m: average load 92%
vmctx02 CPU OK 18-08-2008 16:32:50 user2-mail=20
mail-notification OK: 15m: average load 92%
I do not understand why the 2 sms contacts were notified, they never=20
received a
problem notification in first place. It was an escalation which triggered=20
those sms -
but it shouldn't have in my opinion. It seems it only happens in our=20
environment, if
exactly 2 notifications were sent before a recovery.
These are the relevant configs:
Contacts and Templates (user1 and user2 are identical):
define contact {
name generic-contact-mail
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,r
service_notification_options u,c,w,r
host_notification_commands mail-notification
service_notification_commands mail-notification
register 0
}
define contact {
contact_name user1-mail
use generic-contact-mail
alias User1
email [email protected]
}
define contact {
name generic-contact-sms
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,r
service_notification_options u,c,r
host_notification_commands sms-notification
service_notification_commands sms-notification
register 0
}
define contact {=20
contact_name user1-sms
use generic-contact-sms
alias S R
pager +49-DONT-CALL-ME
}=20
Service Templates and Service:
define service {
name generic-service
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 1
retry_check_interval 3
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 0
check_freshness 1
freshness_threshold 120
notifications_enabled 1
notification_interval 60
notification_period 24x7
notification_options u,c,w,r
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]