Page 1 of 2
Weird notification issue!
Posted: Thu Nov 20, 2014 10:28 am
by BanditBBS
Ok, I've got a weird one here that I can not figure out!
I have two services on the same host:
Code: Select all
define service {
host_name extn-chi-pdb02
service_description PRODEBS - Database Alert Log Errors
use extn_generic-service-5
servicegroups extn_prodebs
check_command check_by_ssh_dblog!/us1001/app/oracle/diag/rdbms/prodebs/PRODEBS2/trace/alert_PRODEBS2.log!ORA!-c 1!!!!!
max_check_attempts 1
check_period xi_timeperiod_24x7
flap_detection_enabled 0
notification_period xi_timeperiod_24x7
register 1
}
define service {
host_name extn-chi-pdb02
service_description PRODEBS - WF Deferred Queue
use extn_generic-service-5
servicegroups extn_prodebs
check_command check_xi_oraclequery2!1522!PRODEBS!xxxxxxx!xxxxxxx!"Select count(*) from apps.wf_deferred where state=0"!"WF Deferred Queue"!--warning=100 --critical=125!
check_period xi_timeperiod_24x7
notification_period xi_timeperiod_24x7
_xiwizard oraclequery
register 1
}
When a notification goes out for the top one it is only sent to the escalations and not the inherited contacts from the host. When a notification is sent for the bottom one it goes to the escalations and the inherited contacts from the host. WTF am I missing here?
EDIT #1: Now this is even weirder. I have another host with the identical setup(and I mean identical) and the service is the same as well, and it alerts to everyone it should. So now I am even more confused
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 4:04 pm
by cmerchant
You haven't shown us your service escalation, or the extn_generic-service-5 service template. In your first service definition you set the max_check_attempts to 1. Which if you don't escalate until say the third check with the changed state, you won't ever reach escalation.
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 4:09 pm
by BanditBBS
As I stated, the escalations are working perfectly, but here it is regardless:
Code: Select all
define serviceescalation {
hostgroup_name EXTN-Prod-Servers
service_description *
contacts matmakuru,pushwinder,sasaf
first_notification 1
last_notification 1
notification_interval 15
escalation_period 24x7
escalation_options w,u,c,r,
}
The inherited ones from the HOST are the ones not working for every service, just some. The services I listed above, one works and one doesn't.
Here is the service template:
Code: Select all
define service {
name extn_generic-service-5
service_description Generic Exterran Service(5 Min)
is_volatile 0
max_check_attempts 3
check_interval 5
retry_interval 2
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
parallelize_check 1
obsess_over_service 1
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 0
notification_period 24x7
notification_options w,c,u,r,
notifications_enabled 1
register 0
}
And here is the host:
Code: Select all
define host {
host_name extn-chi-pdb02
use extn_generic_host
alias extn-chi-pdb02(PRODEBS2)
address 10.xx.xx.xx
contacts +extn_nagios_db
contact_groups +EXTN_NAGIOS_ALL_CG
register 1
}
and before you ask:
Code: Select all
define host {
name extn_generic_host
check_command check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
max_check_attempts 2
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups +EXTN_NAGIOS_ALL_CG
notification_interval 0
notification_period xi_timeperiod_24x7
notifications_enabled 1
register 0
}
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 5:05 pm
by BanditBBS
I'm beginning to wonder if the escalations are messing with the inheritence from host to service....but if so, why isn't it everywhere for every service. Very confusing.
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 5:40 pm
by scottwilkerson
Once an escalation is triggered, original contacts are not notified any longer unless they are part of the escalation
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 5:49 pm
by BanditBBS
scottwilkerson wrote:Once an escalation is triggered, original contacts are not notified any longer unless they are part of the escalation
Scott(and everyone),
Please trust me that these two hosts are setup identical.
This is from this problem host when the service fails:
Capture2.PNG
This is from a working host when it fails. Setup identical(I PROMISE

)
Capture.PNG
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 6:17 pm
by scottwilkerson
"Setup identical" including, the same contacts on all of the hosts, services, escalations ? Are all of the contacts set the same, same time periods, etc?
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 8:45 pm
by BanditBBS
scottwilkerson wrote:"Setup identical" including, the same contacts on all of the hosts, services, escalations ? Are all of the contacts set the same, same time periods, etc?
Yes sir. The only difference is the IP on the host, the hostname and the slight difference in service description. Besides that, everything is identical. I'll go ahead and show, maybe I am missing something so I will edit this post in a bit and paste comparisons of the 2 services that are identical yet behave differently.
Hosts:
Code: Select all
define host {
host_name extn-chi-pdb02
use extn_generic_host
alias extn-chi-pdb02(PRODEBS2)
address 10.160.101.17
contacts +extn_nagios_db
contact_groups +EXTN_NAGIOS_ALL_CG
register 1
}
define host {
host_name extn-chi-pdb05
use extn_generic_host
alias extn-chi-pdb05(ASCP)
address 10.160.101.20
contacts +extn_nagios_db
contact_groups +EXTN_NAGIOS_ALL_CG
register 1
}
Services:
Code: Select all
define service {
host_name extn-chi-pdb02
service_description PRODEBS - Database Alert Log Errors
use extn_generic-service-5
servicegroups extn_prodebs
check_command check_by_ssh_dblog!/us1001/app/oracle/diag/rdbms/prodebs/PRODEBS2/trace/alert_PRODEBS2.log!ORA!-c 1!!!!!
max_check_attempts 1
check_period xi_timeperiod_24x7
flap_detection_enabled 0
notification_period xi_timeperiod_24x7
register 1
}
define service {
host_name extn-chi-pdb05
service_description PRODASCP - Database Alert Log Errors
use extn_generic-service-5
servicegroups extn_prodascp
check_command check_by_ssh_dblog!/us1001/oracle/db/tech_st/11.1.0/admin/PRODASCP_extn-chi-pdb05/diag/rdbms/prodascp/PRODASCP/trace/alert_PRODASCP.log!ORA!-c 1!!!!!
max_check_attempts 1
check_period xi_timeperiod_24x7
flap_detection_enabled 0
notification_period xi_timeperiod_24x7
register 1
}
The templates and other stuff have already been pasted earlier in thread. Also the images in my previous post show the difference in notifications sent.
Re: Weird notification issue!
Posted: Thu Nov 20, 2014 11:51 pm
by Box293
Can you run these two commands and post the output:
Code: Select all
cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb02/p;d};ba}"
cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb05/p;d};ba}"
Can you run these two commands and post the output for the "PRODEBS - Database Alert Log Errors" and "PRODASCP - Database Alert Log Errors" services:
Code: Select all
cat /usr/local/nagios/var/objects.cache | sed -rn "/define service \{/{:a;N;/}/{/.host_name.extn-chi-pdb02/p;d};ba}"
cat /usr/local/nagios/var/objects.cache | sed -rn "/define service \{/{:a;N;/}/{/.host_name.extn-chi-pdb05/p;d};ba}"
Re: Weird notification issue!
Posted: Fri Nov 21, 2014 12:21 am
by BanditBBS
Here you go sir:
First two commands:
Code: Select all
[jclark@iss-chi-nag05 ~]$ cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb02/p;d};ba}"
define host {
host_name extn-chi-pdb02
alias extn-chi-pdb02(PRODEBS2)
address 10.160.101.17
check_period xi_timeperiod_24x7
check_command check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
contact_groups EXTN_NAGIOS_ALL_CG
notification_period xi_timeperiod_24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 2
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options a
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
[jclark@iss-chi-nag05 ~]$ cat /usr/local/nagios/var/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.extn-chi-pdb05/p;d};ba}"
define host {
host_name extn-chi-pdb05
alias extn-chi-pdb05(ASCP)
address 10.160.101.20
check_period xi_timeperiod_24x7
check_command check_xi_host_ping!3000.0!80%!5000.0!100%!!!!
contact_groups EXTN_NAGIOS_ALL_CG
notification_period xi_timeperiod_24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 2
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options a
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
Second two:
Umm, those commands don't seem to show all the services that they should be and neither show the one we are looking for. Also, why do the host ones above only list the contact groups in that output and not the contacts? And the services that do show show "contact_groups admins" and that's no where on them.
EDIT:
I'm an idiot. My stuff is offloaded. I'll redo the commands within the hour and repost!