Page 1 of 2

Weird notification issue!

Posted: Thu Nov 20, 2014 10:28 am
by BanditBBS
Ok, I've got a weird one here that I can not figure out!

I have two services on the same host:

Code: Select all

define service {
        host_name                       extn-chi-pdb02
        service_description             PRODEBS - Database Alert Log Errors
        use                             extn_generic-service-5
        servicegroups                   extn_prodebs
        check_command                   check_by_ssh_dblog!/us1001/app/oracle/diag/rdbms/prodebs/PRODEBS2/trace/alert_PRODEBS2.log!ORA!-c 1!!!!!
        max_check_attempts              1
        check_period                    xi_timeperiod_24x7
        flap_detection_enabled          0
        notification_period             xi_timeperiod_24x7
        register                        1
        }
		
define service {
        host_name                       extn-chi-pdb02
        service_description             PRODEBS - WF Deferred Queue
        use                             extn_generic-service-5
        servicegroups                   extn_prodebs
        check_command                   check_xi_oraclequery2!1522!PRODEBS!xxxxxxx!xxxxxxx!"Select count(*) from apps.wf_deferred where state=0"!"WF Deferred Queue"!--warning=100 --critical=125!
        check_period                    xi_timeperiod_24x7
        notification_period             xi_timeperiod_24x7
        _xiwizard                       oraclequery
        register                        1
        }
When a notification goes out for the top one it is only sent to the escalations and not the inherited contacts from the host. When a notification is sent for the bottom one it goes to the escalations and the inherited contacts from the host. WTF am I missing here?


EDIT #1: Now this is even weirder. I have another host with the identical setup(and I mean identical) and the service is the same as well, and it alerts to everyone it should. So now I am even more confused

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 4:04 pm
by cmerchant
You haven't shown us your service escalation, or the extn_generic-service-5 service template. In your first service definition you set the max_check_attempts to 1. Which if you don't escalate until say the third check with the changed state, you won't ever reach escalation.

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 4:09 pm
by BanditBBS
As I stated, the escalations are working perfectly, but here it is regardless:

Code: Select all

define serviceescalation {
        hostgroup_name                          EXTN-Prod-Servers
        service_description                     *
        contacts                                matmakuru,pushwinder,sasaf
        first_notification                      1
        last_notification                       1
        notification_interval                   15
        escalation_period                       24x7
        escalation_options                      w,u,c,r,
        }
The inherited ones from the HOST are the ones not working for every service, just some. The services I listed above, one works and one doesn't.
Here is the service template:

Code: Select all

define service {
       name                                     extn_generic-service-5
       service_description                      Generic Exterran Service(5 Min)
       is_volatile                              0
       max_check_attempts                       3
       check_interval                           5
       retry_interval                           2
       active_checks_enabled                    1
       passive_checks_enabled                   1
       check_period                             24x7
       parallelize_check                        1
       obsess_over_service                      1
       check_freshness                          0
       event_handler_enabled                    1
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       notification_interval                    0
       notification_period                      24x7
       notification_options                     w,c,u,r,
       notifications_enabled                    1
       register                                 0

}
And here is the host:

Code: Select all

define host {
        host_name                       extn-chi-pdb02
        use                             extn_generic_host
        alias                           extn-chi-pdb02(PRODEBS2)
        address                         10.xx.xx.xx
        contacts                        +extn_nagios_db
        contact_groups                  +EXTN_NAGIOS_ALL_CG
        register                        1
        }
and before you ask:

Code: Select all

define host {
       name                                     extn_generic_host
       check_command                            check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
       max_check_attempts                       2
       check_interval                           5
       retry_interval                           1
       check_period                             xi_timeperiod_24x7
       event_handler_enabled                    1
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       contact_groups                           +EXTN_NAGIOS_ALL_CG
       notification_interval                    0
       notification_period                      xi_timeperiod_24x7
       notifications_enabled                    1
       register                                 0

}

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 5:05 pm
by BanditBBS
I'm beginning to wonder if the escalations are messing with the inheritence from host to service....but if so, why isn't it everywhere for every service. Very confusing.

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 5:40 pm
by scottwilkerson
Once an escalation is triggered, original contacts are not notified any longer unless they are part of the escalation

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 5:49 pm
by BanditBBS
scottwilkerson wrote:Once an escalation is triggered, original contacts are not notified any longer unless they are part of the escalation
Scott(and everyone),

Please trust me that these two hosts are setup identical.
This is from this problem host when the service fails:
Capture2.PNG
This is from a working host when it fails. Setup identical(I PROMISE :) )
Capture.PNG

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 6:17 pm
by scottwilkerson
"Setup identical" including, the same contacts on all of the hosts, services, escalations ? Are all of the contacts set the same, same time periods, etc?

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 8:45 pm
by BanditBBS
scottwilkerson wrote:"Setup identical" including, the same contacts on all of the hosts, services, escalations ? Are all of the contacts set the same, same time periods, etc?
Yes sir. The only difference is the IP on the host, the hostname and the slight difference in service description. Besides that, everything is identical. I'll go ahead and show, maybe I am missing something so I will edit this post in a bit and paste comparisons of the 2 services that are identical yet behave differently.

Hosts:

Code: Select all

define host {
        host_name                       extn-chi-pdb02
        use                             extn_generic_host
        alias                           extn-chi-pdb02(PRODEBS2)
        address                         10.160.101.17
        contacts                        +extn_nagios_db
        contact_groups                  +EXTN_NAGIOS_ALL_CG
        register                        1
        }
define host {
        host_name                       extn-chi-pdb05
        use                             extn_generic_host
        alias                           extn-chi-pdb05(ASCP)
        address                         10.160.101.20
        contacts                        +extn_nagios_db
        contact_groups                  +EXTN_NAGIOS_ALL_CG
        register                        1
        }
Services:

Code: Select all

define service {
        host_name                       extn-chi-pdb02
        service_description             PRODEBS - Database Alert Log Errors
        use                             extn_generic-service-5
        servicegroups                   extn_prodebs
        check_command                   check_by_ssh_dblog!/us1001/app/oracle/diag/rdbms/prodebs/PRODEBS2/trace/alert_PRODEBS2.log!ORA!-c 1!!!!!
        max_check_attempts              1
        check_period                    xi_timeperiod_24x7
        flap_detection_enabled          0
        notification_period             xi_timeperiod_24x7
        register                        1
        }

define service {
        host_name                       extn-chi-pdb05
        service_description             PRODASCP - Database Alert Log Errors
        use                             extn_generic-service-5
        servicegroups                   extn_prodascp
        check_command                   check_by_ssh_dblog!/us1001/oracle/db/tech_st/11.1.0/admin/PRODASCP_extn-chi-pdb05/diag/rdbms/prodascp/PRODASCP/trace/alert_PRODASCP.log!ORA!-c 1!!!!!
        max_check_attempts              1
        check_period                    xi_timeperiod_24x7
        flap_detection_enabled          0
        notification_period             xi_timeperiod_24x7
        register                        1
        }
The templates and other stuff have already been pasted earlier in thread. Also the images in my previous post show the difference in notifications sent.

Re: Weird notification issue!

Posted: Thu Nov 20, 2014 11:51 pm
by Box293
Can you run these two commands and post the output:

Code: Select all

cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb02/p;d};ba}"
cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb05/p;d};ba}"
Can you run these two commands and post the output for the "PRODEBS - Database Alert Log Errors" and "PRODASCP - Database Alert Log Errors" services:

Code: Select all

cat /usr/local/nagios/var/objects.cache | sed -rn "/define service \{/{:a;N;/}/{/.host_name.extn-chi-pdb02/p;d};ba}"
cat /usr/local/nagios/var/objects.cache | sed -rn "/define service \{/{:a;N;/}/{/.host_name.extn-chi-pdb05/p;d};ba}"

Re: Weird notification issue!

Posted: Fri Nov 21, 2014 12:21 am
by BanditBBS
Here you go sir:

First two commands:

Code: Select all

[jclark@iss-chi-nag05 ~]$ cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb02/p;d};ba}"
define host {
        host_name       extn-chi-pdb02
        alias   extn-chi-pdb02(PRODEBS2)
        address 10.160.101.17
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
        contact_groups  EXTN_NAGIOS_ALL_CG
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      2
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    a
        notifications_enabled   1
        notification_interval   0.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        retain_status_information       1
        retain_nonstatus_information    1
        }
[jclark@iss-chi-nag05 ~]$ cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb05/p;d};ba}"
define host {
        host_name       extn-chi-pdb05
        alias   extn-chi-pdb05(ASCP)
        address 10.160.101.20
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
        contact_groups  EXTN_NAGIOS_ALL_CG
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      2
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    a
        notifications_enabled   1
        notification_interval   0.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        retain_status_information       1
        retain_nonstatus_information    1
        }
Second two:
Umm, those commands don't seem to show all the services that they should be and neither show the one we are looking for. Also, why do the host ones above only list the contact groups in that output and not the contacts? And the services that do show show "contact_groups admins" and that's no where on them.

EDIT: I'm an idiot. My stuff is offloaded. I'll redo the commands within the hour and repost!