Have I somehow misconfigured notifications?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
logic_bomb421
Posts: 43
Joined: Tue Jul 15, 2014 6:58 pm

Have I somehow misconfigured notifications?

Post by logic_bomb421 »

First of all, I want to thank you guys for being so helpful. Every question I've had has been answered or responded to.

I recently tweaked my notification settings in order to make notifications for critical events send out faster. I feel like I may have messed up something though, because I no longer seem to get any notifications for certain events. I tested a server shutdown just now, and noticed that I never received an email (nor did the rest of my team). When I check the notification page in the Nagios console, it says it sent emails for that event, we just never got them. At first I thought there was something wrong with the mail service on the server again, but then I realized that we're still getting emails in the middle of the night like normal (we have a server that reboots nightly). This must mean emails are getting out, just not for the things I specially configured.

For reference, what I did was change the check attempts in the timplates.cfg to 2 instead of 10, and I changed the retry interval to 0.5 minutes. In theory this should mean that, when a server goes down, it waits half a minute, rechecks, and then sends the notification, right?

See even now when I'm thinking about this, I'm realizing that:
A) Emails are still getting through (as confirmed by our nightly server reboot)
and B) the notifications page in Nagios says it sent a notification for the incident

So why didn't I get it? Any ideas?
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Have I somehow misconfigured notifications?

Post by eloyd »

It's hard to diagnose without some configuration samples and log files.

Can you post an example of your services.cfg file that shows one of the service definitions that's not working? And can you post the nagios.log file portion that shows where you think it's sending them? Generally, this is /usr/local/nagios/var/nagios.log
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
logic_bomb421
Posts: 43
Joined: Tue Jul 15, 2014 6:58 pm

Re: Have I somehow misconfigured notifications?

Post by logic_bomb421 »

Okay so I still use the original template files as a way to manage things at a single point. So here is my template file for my Windows Hosts:

Code: Select all

# Windows host definition template - This is NOT a real host, just a template!

define host{
	name			windows-server	; The name of this host template
	use			generic-host	; Inherit default values from the generic-host template
	check_period		24x7		; By default, Windows servers are monitored round the clock
	check_interval		1		; Actively check the server every 1 minutes
	retry_interval		0.5		; Schedule host check retries at 0.5 minute intervals
	max_check_attempts	1		; Check each server 2 times (max)
	check_command		check-host-alive	; Default command to check if servers are "alive"
	notification_period	24x7		; Send notification out at any time - day or night
	notification_interval	10		; Resend notifications every 10 minutes
	notification_options	d,r		; Only send notifications for specific host states
	contact_groups		admins		; Notifications get sent to the admins by default
	hostgroups		3		; Host groups that Windows servers should be a member of
	register		0		; DONT REGISTER THIS - ITS JUST A TEMPLATE
	}
This connects to my hosts in my main hosts file. For example:

Code: Select all

define host{
	use		windows-server
	host_name	RDS03
	alias		RDS03
	parents		Dell LAN Switch 1
	address		SERVER IP ADDRESS
	statusmap_image server.png
	}
And finally, here is a copy of the nagios.log that should show that these notifications were (or should have been) sent out:

Code: Select all

[1407180136] HOST ALERT: RDS03;DOWN;HARD;1;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Have I somehow misconfigured notifications?

Post by eloyd »

Can we also see the "notify-host-by-email" definition from commands.cfg?

It's just a guess right now and I haven't looked more closely, but I am thinking that the code expects "retry_interval" to be an integer and it's throwing everything else off. This is just a guess right now. I will look at the Nagios Core base code later to confirm that, but if you want to experiment, find "interval_length" and change it from 60 to 30, then change your retry_interval from 0.5 to 1. Then force a failure somehow and see if it works.
Last edited by eloyd on Tue Aug 05, 2014 9:15 pm, edited 1 time in total.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
logic_bomb421
Posts: 43
Joined: Tue Jul 15, 2014 6:58 pm

Re: Have I somehow misconfigured notifications?

Post by logic_bomb421 »

Alright, the notify host command is as follows:

Code: Select all

# 'notify-host-by-email' command definition
define command{
	command_name	notify-host-by-email
	command_line	/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$##\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" #$CONTACTEMAIL$
	}
As for the integer thing, I was kind of wondering that myself. I forget where, but I know I used a decimal number in a cfg and it worked, so I must have thought it would work here as well.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Have I somehow misconfigured notifications?

Post by abrist »

The command looks ok. Did you change the decimal to an int? If so, did it help?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
logic_bomb421
Posts: 43
Joined: Tue Jul 15, 2014 6:58 pm

Re: Have I somehow misconfigured notifications?

Post by logic_bomb421 »

I did change the decimal back to an int, and it hasn't made a difference. It's just weird, I can see service notifications that happen early in the morning/overnight, but nothing comes through for host up/down.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Have I somehow misconfigured notifications?

Post by abrist »

logic_bomb421 wrote:# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$##\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" #$CONTACTEMAIL$
}
Is there a reason why the "$CONTACTEMAIL$" macro is commented out? (#)
Try removing the pound before $CONTACTEMAIL$:

Code: Select all

define command{
   command_name   notify-host-by-email
   command_line   /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$##\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
   }
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Have I somehow misconfigured notifications?

Post by eloyd »

Nice catch. I'll bet you an IPA that's the problem. :)
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
logic_bomb421
Posts: 43
Joined: Tue Jul 15, 2014 6:58 pm

Re: Have I somehow misconfigured notifications?

Post by logic_bomb421 »

Jeez guys.. I have absolutely no idea how that comment marker got there. That was definitely the problem!

Thank you for all the help!
Locked