Email Notification question
Email Notification question
Hi All,
I am trying to separate the notification emails for prod and stage servers/services. What I did was to define notification commands fore each environment, separate contact definitions,separate host/service template for each environment.
But for some reason when I get an alert notification for prod server/service, there is a duplicate email for the same server/service alert with a different email heading that points to stage definition. It's very weird, this is what I have
commands
Prod
------
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s " $HOSTNAME$ is $HOSTSTATE$ " $CONTACTEMAIL$
}
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nComment: $SERVICEACKCOMMENT$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$" | /bin/mail -s "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ " $CONTACTEMAIL$
}
Stage
-------
# 'notify-host-by-email-stage' command definition
define command{
command_name notify-host-by-email-stage
command_line /usr/bin/printf "%b" "***** Nagios Stage Alert*****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s " $HOSTNAME$ is $HOSTSTATE$ " $CONTACTEMAIL$
}
# 'notify-service-by-email-stage' command definition
define command{
command_name notify-service-by-email-stage
command_line /usr/bin/printf "%b" "***** Nagios Stage Alert*****\n\nNotification Type: $NOTIFICATIONTYPE$\nComment: $SERVICEACKCOMMENT$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$" | /bin/mail -s "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ " $CONTACTEMAIL$
}
Contact definitions
----------------------
prod
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
stage
define contact{
name stage-generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email-stage ; send service notifications via email
host_notification_commands notify-host-by-email-stage ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
prod
#######################################################
################# ALERTS SYSTEM #####################
#######################################################
define contact{
use generic-contact
contact_name alerts_system
alias Alerts System
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
email alerts-system@blah.com
}
stage
#######################################################
################# ALERTS SYSTEM STAGING ##############
#######################################################
define contact{
use stage-generic-contact
contact_name stage-alerts_system
alias Stage-Alerts System
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
email alerts-system@blah.com
}
define contactgroup{
contactgroup_name apps
alias Application Server Administrators
members alerts_system
}
define contactgroup{
contactgroup_name stage-apps
alias Stage Application Server Administrators
members stage-alerts_system
}
For service I have this template and service defintion
define service{
name mniv-services
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 0
check_freshness 0
notifications_enabled 1
notification_interval 60
notification_period 24x7
notification_options w,u,c,r
contact_groups apps
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
max_check_attempts 3
check_period 24x7
normal_check_interval 3
retry_check_interval 1
action_url /graphs/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/graphs/popup?host=$HOSTNAME$&srv=$SERVICEDESC$'
register 0
}
define service {
use mniv-services
hostgroup_name MNIV
service_description LOAD
check_command check_nrpe!check_load
}
Contact group is set for apps.
So when there is an alert in one of the servers of this MNIV group, I get the following emails, one with stage header, the one without. But since this is a prod server, I shouldn't see the one with Stage in the header.
***** Nagios Stage Alert*****
Notification Type: PROBLEM
Comment:
Service: LOAD
Host: us1map03
Address: xx.xx.4.103
State: WARNING
Date/Time: Mon Aug 4 10:25:15 PDT 2014
Additional Info:
WARNING - load average: 9.58, 10.41, 10.31
***** Nagios*****
Notification Type: PROBLEM
Comment:
Service: LOAD
Host: us1map03
Address: xx.xx.4.103
State: WARNING
Date/Time: Mon Aug 4 10:25:15 PDT 2014
Additional Info:
WARNING - load average: 9.58, 10.41, 10.31
I am trying to separate the notification emails for prod and stage servers/services. What I did was to define notification commands fore each environment, separate contact definitions,separate host/service template for each environment.
But for some reason when I get an alert notification for prod server/service, there is a duplicate email for the same server/service alert with a different email heading that points to stage definition. It's very weird, this is what I have
commands
Prod
------
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s " $HOSTNAME$ is $HOSTSTATE$ " $CONTACTEMAIL$
}
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nComment: $SERVICEACKCOMMENT$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$" | /bin/mail -s "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ " $CONTACTEMAIL$
}
Stage
-------
# 'notify-host-by-email-stage' command definition
define command{
command_name notify-host-by-email-stage
command_line /usr/bin/printf "%b" "***** Nagios Stage Alert*****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s " $HOSTNAME$ is $HOSTSTATE$ " $CONTACTEMAIL$
}
# 'notify-service-by-email-stage' command definition
define command{
command_name notify-service-by-email-stage
command_line /usr/bin/printf "%b" "***** Nagios Stage Alert*****\n\nNotification Type: $NOTIFICATIONTYPE$\nComment: $SERVICEACKCOMMENT$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$" | /bin/mail -s "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ " $CONTACTEMAIL$
}
Contact definitions
----------------------
prod
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
stage
define contact{
name stage-generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email-stage ; send service notifications via email
host_notification_commands notify-host-by-email-stage ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
prod
#######################################################
################# ALERTS SYSTEM #####################
#######################################################
define contact{
use generic-contact
contact_name alerts_system
alias Alerts System
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
email alerts-system@blah.com
}
stage
#######################################################
################# ALERTS SYSTEM STAGING ##############
#######################################################
define contact{
use stage-generic-contact
contact_name stage-alerts_system
alias Stage-Alerts System
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
email alerts-system@blah.com
}
define contactgroup{
contactgroup_name apps
alias Application Server Administrators
members alerts_system
}
define contactgroup{
contactgroup_name stage-apps
alias Stage Application Server Administrators
members stage-alerts_system
}
For service I have this template and service defintion
define service{
name mniv-services
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 0
check_freshness 0
notifications_enabled 1
notification_interval 60
notification_period 24x7
notification_options w,u,c,r
contact_groups apps
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
max_check_attempts 3
check_period 24x7
normal_check_interval 3
retry_check_interval 1
action_url /graphs/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/graphs/popup?host=$HOSTNAME$&srv=$SERVICEDESC$'
register 0
}
define service {
use mniv-services
hostgroup_name MNIV
service_description LOAD
check_command check_nrpe!check_load
}
Contact group is set for apps.
So when there is an alert in one of the servers of this MNIV group, I get the following emails, one with stage header, the one without. But since this is a prod server, I shouldn't see the one with Stage in the header.
***** Nagios Stage Alert*****
Notification Type: PROBLEM
Comment:
Service: LOAD
Host: us1map03
Address: xx.xx.4.103
State: WARNING
Date/Time: Mon Aug 4 10:25:15 PDT 2014
Additional Info:
WARNING - load average: 9.58, 10.41, 10.31
***** Nagios*****
Notification Type: PROBLEM
Comment:
Service: LOAD
Host: us1map03
Address: xx.xx.4.103
State: WARNING
Date/Time: Mon Aug 4 10:25:15 PDT 2014
Additional Info:
WARNING - load average: 9.58, 10.41, 10.31
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Email Notification question
Can you show us the host definition for us1map03 AND any template definitions it is using.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Email Notification question
Here it is
define host{
name mniv-hosts ; The name of this host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each Linux host 10 times (max)
check_command check-host-alive; Default command to check Linux hosts
contact_groups apps
notification_period 24x7 ; Linux admins hate to be woken up, so we only notify during the day
notification_interval 60 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define host{
use mniv-hosts
host_name US1MAP03
alias us1map03
address xx.xx.4.103
}
define host{
name mniv-hosts ; The name of this host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each Linux host 10 times (max)
check_command check-host-alive; Default command to check Linux hosts
contact_groups apps
notification_period 24x7 ; Linux admins hate to be woken up, so we only notify during the day
notification_interval 60 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define host{
use mniv-hosts
host_name US1MAP03
alias us1map03
address xx.xx.4.103
}
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Email Notification question
Nothing obvious sticks out.
I suggest you turn on debug logging and see what happens.
In your nagios.cfg please set this to -1
Restart Nagios.
Watch /usr/local/nagios/var/nagios.debug and see what is logged when these dual emails are sent.
I suggest you turn on debug logging and see what happens.
In your nagios.cfg please set this to -1
Code: Select all
debug_level=-1
Watch /usr/local/nagios/var/nagios.debug and see what is logged when these dual emails are sent.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Email Notification question
Is it possible that this is a bug? I have different notification commands, my contact definition uses are separate, contact groups are different.
Somehow production contact group is referencing both notification commands, and stage contact group references only one notification command.
Somehow production contact group is referencing both notification commands, and stage contact group references only one notification command.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Email Notification question
Everything is possible!
We need to investigate to find out what is happening, debug logs will help!
We need to investigate to find out what is happening, debug logs will help!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Email Notification question
Here is a snippet from the log file
[1407952514] SERVICE NOTIFICATION: alerts_system;US1NRDB03;DISK SPACE;WARNING;notify-service-by-email;DISK WARNING - free space: / 138843 MB (96% inode=99%): /mongodbdata 216723 MB (20% inode=99%): /journal 96508 MB (95% inode=99%):
[1407952514] SERVICE NOTIFICATION: stage-alerts_system;US1NRDB03;DISK SPACE;WARNING;service_email_stage;DISK WARNING - free space: / 138843 MB (96% inode=99%): /mongodbdata 216723 MB (20% inode=99%): /journal 96508 MB (95% inode=99%):
First line is fine, second line should only be logged if only it's a stage server. In this case it's not. As you can see it is referencing different contact groups and different commands for some reason, but sends out for the same host. Would a duplicate service definition cause this?
[1407952514] SERVICE NOTIFICATION: alerts_system;US1NRDB03;DISK SPACE;WARNING;notify-service-by-email;DISK WARNING - free space: / 138843 MB (96% inode=99%): /mongodbdata 216723 MB (20% inode=99%): /journal 96508 MB (95% inode=99%):
[1407952514] SERVICE NOTIFICATION: stage-alerts_system;US1NRDB03;DISK SPACE;WARNING;service_email_stage;DISK WARNING - free space: / 138843 MB (96% inode=99%): /mongodbdata 216723 MB (20% inode=99%): /journal 96508 MB (95% inode=99%):
First line is fine, second line should only be logged if only it's a stage server. In this case it's not. As you can see it is referencing different contact groups and different commands for some reason, but sends out for the same host. Would a duplicate service definition cause this?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Email Notification question
Can you post the whole log from when the service triggers a warning state and all the commands that take place up to and including the double notification.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Email Notification question
I made a change that fixed the issue, this could even be a bug in nagios, still no clue why this could be the case
I had contact groups apps and stage-apps, somehow the stage-apps was being used as well, although none of the production definitions inherited from a template that would ave stage-apps as contact group.
what I did was to changed stage-apps to stageApps. This way there was more pattern difference. This worked
But again, it's still very weird, I have never seen this problem before, although I only had to work with one set of host and service email notification commands, in my case I have two sets stage and prod. Even though they are separated and should have worked with any headaches
I had contact groups apps and stage-apps, somehow the stage-apps was being used as well, although none of the production definitions inherited from a template that would ave stage-apps as contact group.
what I did was to changed stage-apps to stageApps. This way there was more pattern difference. This worked
But again, it's still very weird, I have never seen this problem before, although I only had to work with one set of host and service email notification commands, in my case I have two sets stage and prod. Even though they are separated and should have worked with any headaches
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Email Notification question
Honestly I've not heard of this happening before, but we will keep an eye out for similar issues in the future. What version of Core was this on?