Two identical services with different email behaviour

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Two identical services with different email behaviour

Post by WillemDH »

Hello,

I was kind of surprised today that when submitting a passive OK result to a service state today, our on duty phone received an email whil the service wasn't configured to send emails to this on duty contact. After some research I still don't understand why this service is sending out emails to this contact.
In fact the service belongs to a cluster node and the exact same service on another node with exact the same config is not sending emails to this on duty contact when it's state changes... When I retried submitting critcal results, it aagin sent an email to a contact that is not defined in the service nor service template.

Config of the service and service template:

Service that does not send an email to on duty contact:

Code: Select all

define service {
        host_name                       clusternode02
        service_description             CLU_Cluster_Services_Events
        use                             dig-windows-eventlog-cluster-prio1
        servicegroups                   +all_svc_dummy_servicegroup
        check_period                    xi_timeperiod_24x7
        notification_period             xi_timeperiod_24x7
        contacts                        +nagiosadmin
        contact_groups                  +xi_dig_dummy_contact_group
        icon_image                      windowseventlog.png
        _xiwizard                       windowseventlog
        register                        1
        }
Service that does send an email to on duty contact, while not defined!:

Code: Select all

define service {
        host_name                       clusternode01
        service_description             CLU_Cluster_Services_Events
        use                             dig-windows-eventlog-cluster-prio1
        servicegroups                   +all_svc_dummy_servicegroup
        check_period                    xi_timeperiod_24x7
        notification_period             xi_timeperiod_24x7
        contacts                        +nagiosadmin
        contact_groups                  +xi_dig_dummy_contact_group
        icon_image                      windowseventlog.png
        _xiwizard                       windowseventlog
        register                        1
        }

Code: Select all

define service {
       name                                     dig-windows-eventlog-cluster-prio1
       service_description                      CLU_Cluster_Services_Events
       display_name                             CLU_Cluster_Services_Events
       servicegroups                            +all_crit_svc_win_cluster,all_svc_win_evt_cluster
       check_command                            check_dummy!0!'Dummy check passed'!!!!!!
       is_volatile                              0
       initial_state                            o
       max_check_attempts                       3
       check_interval                           1440
       retry_interval                           10
       active_checks_enabled                    0
       passive_checks_enabled                   1
       check_period                             xi_timeperiod_24x7
       obsess_over_service                      0
       check_freshness                          0
       event_handler_enabled                    0
       flap_detection_enabled                   1
       process_perf_data                        0
       retain_status_information                1
       retain_nonstatus_information             1
       notification_interval                    1440
       first_notification_delay                 0
       notification_period                      xi_timeperiod_24x7
       notifications_enabled                    1
       stalking_options                         o,w,c,u,
       register                                 0
}
As you can see, only nagiosadmin should be notified. And the config is the same for clusternode01 and clusternode02. So why does our onduty contact receive emails when the state of CLU_Cluster_Services_Events on clusternode01 changes?

I'm probably missing something?

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Two identical services with different email behaviour

Post by abrist »

Can we see the configs for the "on duty" and nagiosadmin contacts? Additionally, do you have any escalations or hostgroups that may effect the objects in question?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Two identical services with different email behaviour

Post by WillemDH »

Andy,

We are not using any host or service escalations. THis is the config of the onduty contact:

Code: Select all

define contact {
        contact_name                            wachtdienst.blackberry
        alias                                   BlackBerry Wachtdienst
        host_notifications_enabled              1
        service_notifications_enabled           1
        host_notification_period                wachtdienst.blackberry_notification_times
        service_notification_period             wachtdienst.blackberry_notification_times
        host_notification_options               d,u,r
        service_notification_options            c,r
        email                                   [email protected]
        host_notifications_enabled              1
        service_notifications_enabled           1
        use                                     xi_contact_generic
        }
And the config of the nagiosadmin:

Code: Select all

define contact {
        contact_name                            nagiosadmin
        alias                                   Nagios Administrator
        host_notifications_enabled              1
        service_notifications_enabled           1
        host_notification_period                nagiosadmin_notification_times
        service_notification_period             nagiosadmin_notification_times
        host_notification_options               d,u,r
        service_notification_options            w,u,c,r
        host_notification_commands              xi_host_notification_handler
        service_notification_commands           xi_service_notification_handler
        email                                   nagios@localhost (=> strange, because emailadress in Users is another emailadres)
        use                                     xi_contact_generic
        }
Grtz
Nagios XI 5.8.1
https://outsideit.net
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Two identical services with different email behaviour

Post by sreinhardt »

One more thing, could you provide the host configuration for those two as well. By default a service will notify the contact on a host if no contact is specified directly in the service object, Since you have additive contacts instead of standard contacts defined, I am wondering if that might be where the additional contact is. I should also note, the nagiosadmin@localhost is perfectly fine, the contact information does not actually change to match the user if you update the user info only, since it doesn't matter to the php mailer script. (just so you know, since I saw your comment)
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Two identical services with different email behaviour

Post by WillemDH »

I'll post the host config once I find the time. One remark though. The services are identically and have +nagiosadmin defined as contact. I know host contacts are inherited if no contact is defined, but there is a contact defined on both services and their template is the same, so how is it possible they ebhave differently.
Nagios XI 5.8.1
https://outsideit.net
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Two identical services with different email behaviour

Post by WillemDH »

Ok, the config of the hosts:

Code: Select all

define host {
        host_name                       clusternode01.domain
        use                             dig_windows_server_prio1
        alias                           clusternode01.domain
        address                         10.10.10.01
        hostgroups                      +all_srv_inf_phy_win_fts_bla,all_srv_os_windows_2003,all_srv_rol_backup,all_ss_loc_ac
        check_period                    xi_timeperiod_24x7
        contacts                        +nagiosadmin
        contact_groups                  +xi_dig_dummy_contact_group
        notification_period             xi_timeperiod_24x7
        icon_image                      win_server.png
        statusmap_image                 win_server.png
        _xiwizard                       windowsserver
        register                        1
        }

Code: Select all

define host {
        host_name                       clusternode02.domain
        use                             dig_windows_server_prio1
        alias                           clusternode02.domain
        address                         10.10.10.02
        hostgroups                      +all_srv_inf_phy_win_fts_bla,all_srv_os_windows_2003,all_srv_rol_backup,all_ss_loc_dg
        check_period                    xi_timeperiod_24x7
        contacts                        +nagiosadmin
        contact_groups                  +xi_dig_dummy_contact_group
        notification_period             xi_timeperiod_24x7
        icon_image                      win_server.png
        statusmap_image                 win_server.png
        _xiwizard                       windowsserver
        register                        1
        }
The only difference between the hosts is the all_ss_loc_ac and all_ss_loc_dg hostgroup, thse hostgroups are almost identilcally, I don't see how this can create the different behaviour..
I redid the tests (had to disable flapping because) and result is the same. The nagiosadmin contact was notified byt both CLU_Cluster_Services_Events, as expected. Only CLU_Cluster_Services_Events on clusternode01.domain send out an email to the onduty contact, which makes my head spin...

I changed the Max check attempts of the service template to 1 by they way, as the event is only received once. This is the reason I noitced this issue so late. I hope my other clusters are not behaving the same way now..

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Two identical services with different email behaviour

Post by sreinhardt »

I just noticed, your on duty contact, is using the generic core notification handlers. Have you checked the local mail spool, or did you intend to have them use the xi contact handlers instead? I bet that is our big difference... unless your going to shoot me down and say that this contact is working for other hosts\services.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Two identical services with different email behaviour

Post by WillemDH »

Indeed this contact seems to work fine for other services / hosts. And like I said earlier it is working for the service on one cluster node and not for the except for its host identical other service. The problem would not be in the configuration of the on duty contact I would think?
Nagios XI 5.8.1
https://outsideit.net
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Two identical services with different email behaviour

Post by lmiltchev »

Can you open the "/usr/local/nagios/var/objects.cache" in a text editor, find both host definitions (the one that works, and the one that doesn't) and compare the them? See if you have the same contacts / contactgroups listed.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Two identical services with different email behaviour

Post by WillemDH »

Code: Select all

define host {
        host_name     clusternode01
        alias   clusternode01
        address ipadress
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
        contacts        wachtdienst.blackberry,nagiosadmin
        contact_groups  xi_dig_dummy_contact_group
        notification_period     xi_timeperiod_24x7
        initial_state   o
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      3
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess_over_host        1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,d,u
        freshness_threshold     0
        check_freshness 0
        notification_options    d,u,r,f,s
        notifications_enabled   1
        notification_interval   30.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        icon_image      win_server.png
        statusmap_image win_server.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowsserver
        }

Code: Select all

define host {
        host_name       clusternode02
        alias   clusternode02
        address ipadress
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
        contacts        wachtdienst.blackberry,nagiosadmin
        contact_groups  xi_dig_dummy_contact_group
        notification_period     xi_timeperiod_24x7
        initial_state   o
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      3
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess_over_host        1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,d,u
        freshness_threshold     0
        check_freshness 0
        notification_options    d,u,r,f,s
        notifications_enabled   1
        notification_interval   30.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        icon_image      win_server.png
        statusmap_image win_server.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowsserver
        }
Seems the same to me no?

And the specific service where I'm having the issue:

Code: Select all

define service {
        host_name       clusternode01
        service_description     CLU_Cluster_Services_Events
        display_name    CLU_Cluster_Services_Events
        check_period    xi_timeperiod_24x7
        check_command   check_dummy!0!'Dummy check passed'!!!!!!
        contacts        wachtdienst.blackberry,nagiosadmin
        contact_groups  xi_dig_dummy_contact_group
        notification_period     xi_timeperiod_24x7
        initial_state   o
        check_interval  1440.000000
        retry_interval  10.000000
        max_check_attempts      1
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   0
        passive_checks_enabled  1
        obsess_over_service     0
        event_handler_enabled   0
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,w,u,c
        freshness_threshold     0
        check_freshness 0
        notification_options    u,w,c,r,f,s
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        o,u,w,c
        process_perf_data       0
        failure_prediction_enabled      1
        icon_image      windowseventlog.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowseventlog
        }

Code: Select all

define service {
        host_name       clusternode02
        service_description     CLU_Cluster_Services_Events
        display_name    CLU_Cluster_Services_Events
        check_period    xi_timeperiod_24x7
        check_command   check_dummy!0!'Dummy check passed'!!!!!!
        contacts        nagiosadmin
        contact_groups  xi_dig_dummy_contact_group
        notification_period     xi_timeperiod_24x7
        initial_state   o
        check_interval  1440.000000
        retry_interval  10.000000
        max_check_attempts      1
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   0
        passive_checks_enabled  1
        obsess_over_service     0
        event_handler_enabled   0
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,w,u,c
        freshness_threshold     0
        check_freshness 0
        notification_options    u,w,c,r,f,s
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        o,u,w,c
        process_perf_data       0
        failure_prediction_enabled      1
        icon_image      windowseventlog.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowseventlog
        }
Seems there is a difference here, wachtdienst.blackberry is listed as a contact for clusternode1's CLU_Cluster_Services_Events service.., while it is not (I triple checked in the service in CCM, nor in the service template configured.. <confusion>

Grtz
Nagios XI 5.8.1
https://outsideit.net
Locked