Page 1 of 1

Not Alerting to Complete List

Posted: Tue Aug 27, 2013 1:35 pm
by pteegarden
I have a host (host_name=eatools) that has two contact objects defined:

Code: Select all

define host {
	host_name			eatools
	use				xiwizard_windowsserver_host
	alias				eatools
	address				xxx.xxx.xxx.xxx
	hostgroups			windows-servers
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	contacts			tellpaul
	contact_groups			Windows
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	first_notification_delay	0
	notification_options		d,u,r,f,s,
	notifications_enabled		1
	icon_image			win_server.png
	statusmap_image			win_server.png
	_xiwizard			windowsserver
	register			1
	}	
"tellpaul" is my email address, and, "Windows" is 4 other email addresses.

I also have a "ping" service defined for this host with the same contacts:

Code: Select all

define service {
	host_name			eatools
	service_description		Ping
	use				xiwizard_windowsserver_ping_service
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		1
	contacts			tellpaul
	contact_groups			Windows
	_xiwizard			windowsserver
	register			1
	}	
Nagios is trigering a flappingstarted/stopped notice for both the HOST and the SERVICE, but only a subset of the intended recipients are getting emails.

Notifications are supposed to go to "tellpaul" (pteegarden), and the members of the Windows contact group (rquinto,sclaverie,skrok,tmjessen).

The Notification log indicates that emails are being sent only to sclaverie, skrok, and tmjessen.

This type of error also occured with another host/service yesterday, too.
Strangely, yesterday's occurance worked (at 20:34) and then failed (at 20;39).

What would be your recommendation to fix?

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 2:14 pm
by sreinhardt
Are all of the these emails managed as just contacts or XI users? Often times this is simply an issue of the notification handler and type of contact. Otherwise, how long has this been in place and was yesterday\today the only known two issues?

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 2:29 pm
by pteegarden
I have been going through all of the Hosts/services for the last three workdays, adjusting the contacts/contact_groups to the desired members.
the following are both contacts and XI users: rquinto, sclaverie, tmjessen, skrok.
The following is just a contact: pteegarden

I noticed this error Monday morning when my supervisor (rquinto) pointed out that he didn't get the messages that his supervisor (sclaverie) was getting.
These messages came in over the weekend (as a result of changes made on Friday???) for 4 different hosts/services. I rebooted the server, hoping that it would clear up whatever ailed it, and send out a Custom Notification on one of the failing host/services (cypress-dev) and verified that everybody got it.
My assumption was that the reboot cleared up the problem.

When the problem showed up again this morning, my supervisor asked me to check in with support.

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 2:35 pm
by sreinhardt
I am guessing, that your notification command is also defined as the xi-notify-[host\service]-by-email as well? Can you make that additional contact a user as well? Otherwise you will need to alter them to use the standard core commands of notify-[host\service]-by-email. This however will alter the message somewhat and may not work if you are using an smtp server opposed to the internal MTA on the nagios server.

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 3:01 pm
by pteegarden
You got me thinking to look at the contacts.cfg file.
Some of the members were defined differently with respect to;
host_notification_options d,u,r,f,s
service_notification_options w,u,c,r,f,s

So some would see flapping and some would not.

So I went through the CCM interface and made all the contacts all the same.
Am I correct in assuming if none of the (d,u,r,f,s) checkboxes are checked they all are assumed to be checked?
Same with the services - if none of the (w,u,c,r,f,s) checkboxes are checked they all are assumed to be checked?

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 3:16 pm
by sreinhardt
Unless you have this defined in a template or group, this thinking is unfortunately incorrect. The way alerting works is that first the filter on the host\service being alerted for is checked to see if that state should send an alert. Then each individual contact is looked at and their filter is applied to see if they opt-in to receiving that notification type. So if the contact has nothing checked, nothing will come through to them. For all notifications, you would need all boxes checked. Although I would highly suggest not marking flapping states unless you know that it is not a frequent option.

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 4:19 pm
by pteegarden
Ok. I will check (d,u,r) and (w,u,c,r). Thanks for the heads-up.

One more question before we close this topic: Each notification_option contains an "s". What does that stand for?

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 4:21 pm
by sreinhardt
You are welcome! The S is for scheduled downtime, it alerts people that the host or service is going into downtime and when it comes back up. It does not allow them to receive alerts during downtime.

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 4:25 pm
by pteegarden
Thanks!
Do I close this topic ( I don't immeadiately see how... ) or do you?

Re: Not Alerting to Complete List

Posted: Tue Aug 27, 2013 4:36 pm
by abrist
We do, though once you know our system, you can put a green check on the first post by editing it - but only if you never want to hear from us again on that topic :)