Have I somehow misconfigured notifications?
-
- Posts: 43
- Joined: Tue Jul 15, 2014 6:58 pm
Have I somehow misconfigured notifications?
First of all, I want to thank you guys for being so helpful. Every question I've had has been answered or responded to.
I recently tweaked my notification settings in order to make notifications for critical events send out faster. I feel like I may have messed up something though, because I no longer seem to get any notifications for certain events. I tested a server shutdown just now, and noticed that I never received an email (nor did the rest of my team). When I check the notification page in the Nagios console, it says it sent emails for that event, we just never got them. At first I thought there was something wrong with the mail service on the server again, but then I realized that we're still getting emails in the middle of the night like normal (we have a server that reboots nightly). This must mean emails are getting out, just not for the things I specially configured.
For reference, what I did was change the check attempts in the timplates.cfg to 2 instead of 10, and I changed the retry interval to 0.5 minutes. In theory this should mean that, when a server goes down, it waits half a minute, rechecks, and then sends the notification, right?
See even now when I'm thinking about this, I'm realizing that:
A) Emails are still getting through (as confirmed by our nightly server reboot)
and B) the notifications page in Nagios says it sent a notification for the incident
So why didn't I get it? Any ideas?
I recently tweaked my notification settings in order to make notifications for critical events send out faster. I feel like I may have messed up something though, because I no longer seem to get any notifications for certain events. I tested a server shutdown just now, and noticed that I never received an email (nor did the rest of my team). When I check the notification page in the Nagios console, it says it sent emails for that event, we just never got them. At first I thought there was something wrong with the mail service on the server again, but then I realized that we're still getting emails in the middle of the night like normal (we have a server that reboots nightly). This must mean emails are getting out, just not for the things I specially configured.
For reference, what I did was change the check attempts in the timplates.cfg to 2 instead of 10, and I changed the retry interval to 0.5 minutes. In theory this should mean that, when a server goes down, it waits half a minute, rechecks, and then sends the notification, right?
See even now when I'm thinking about this, I'm realizing that:
A) Emails are still getting through (as confirmed by our nightly server reboot)
and B) the notifications page in Nagios says it sent a notification for the incident
So why didn't I get it? Any ideas?
Re: Have I somehow misconfigured notifications?
It's hard to diagnose without some configuration samples and log files.
Can you post an example of your services.cfg file that shows one of the service definitions that's not working? And can you post the nagios.log file portion that shows where you think it's sending them? Generally, this is /usr/local/nagios/var/nagios.log
Can you post an example of your services.cfg file that shows one of the service definitions that's not working? And can you post the nagios.log file portion that shows where you think it's sending them? Generally, this is /usr/local/nagios/var/nagios.log
-
- Posts: 43
- Joined: Tue Jul 15, 2014 6:58 pm
Re: Have I somehow misconfigured notifications?
Okay so I still use the original template files as a way to manage things at a single point. So here is my template file for my Windows Hosts:
This connects to my hosts in my main hosts file. For example:
And finally, here is a copy of the nagios.log that should show that these notifications were (or should have been) sent out:
Code: Select all
# Windows host definition template - This is NOT a real host, just a template!
define host{
name windows-server ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, Windows servers are monitored round the clock
check_interval 1 ; Actively check the server every 1 minutes
retry_interval 0.5 ; Schedule host check retries at 0.5 minute intervals
max_check_attempts 1 ; Check each server 2 times (max)
check_command check-host-alive ; Default command to check if servers are "alive"
notification_period 24x7 ; Send notification out at any time - day or night
notification_interval 10 ; Resend notifications every 10 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
hostgroups 3 ; Host groups that Windows servers should be a member of
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
Code: Select all
define host{
use windows-server
host_name RDS03
alias RDS03
parents Dell LAN Switch 1
address SERVER IP ADDRESS
statusmap_image server.png
}
Code: Select all
[1407180136] HOST ALERT: RDS03;DOWN;HARD;1;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
[1407180136] HOST NOTIFICATION: TEAMMEMBER;RDS03;DOWN;notify-host-by-email;PING CRITICAL - Packet loss = 100%
Re: Have I somehow misconfigured notifications?
Can we also see the "notify-host-by-email" definition from commands.cfg?
It's just a guess right now and I haven't looked more closely, but I am thinking that the code expects "retry_interval" to be an integer and it's throwing everything else off. This is just a guess right now. I will look at the Nagios Core base code later to confirm that, but if you want to experiment, find "interval_length" and change it from 60 to 30, then change your retry_interval from 0.5 to 1. Then force a failure somehow and see if it works.
It's just a guess right now and I haven't looked more closely, but I am thinking that the code expects "retry_interval" to be an integer and it's throwing everything else off. This is just a guess right now. I will look at the Nagios Core base code later to confirm that, but if you want to experiment, find "interval_length" and change it from 60 to 30, then change your retry_interval from 0.5 to 1. Then force a failure somehow and see if it works.
Last edited by eloyd on Tue Aug 05, 2014 9:15 pm, edited 1 time in total.
-
- Posts: 43
- Joined: Tue Jul 15, 2014 6:58 pm
Re: Have I somehow misconfigured notifications?
Alright, the notify host command is as follows:
As for the integer thing, I was kind of wondering that myself. I forget where, but I know I used a decimal number in a cfg and it worked, so I must have thought it would work here as well.
Code: Select all
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$##\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" #$CONTACTEMAIL$
}
Re: Have I somehow misconfigured notifications?
The command looks ok. Did you change the decimal to an int? If so, did it help?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 43
- Joined: Tue Jul 15, 2014 6:58 pm
Re: Have I somehow misconfigured notifications?
I did change the decimal back to an int, and it hasn't made a difference. It's just weird, I can see service notifications that happen early in the morning/overnight, but nothing comes through for host up/down.
Re: Have I somehow misconfigured notifications?
Is there a reason why the "$CONTACTEMAIL$" macro is commented out? (#)logic_bomb421 wrote:# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$##\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" #$CONTACTEMAIL$
}
Try removing the pound before $CONTACTEMAIL$:
Code: Select all
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$##\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Have I somehow misconfigured notifications?
Nice catch. I'll bet you an IPA that's the problem.
-
- Posts: 43
- Joined: Tue Jul 15, 2014 6:58 pm
Re: Have I somehow misconfigured notifications?
Jeez guys.. I have absolutely no idea how that comment marker got there. That was definitely the problem!
Thank you for all the help!
Thank you for all the help!