Email not working for one contact

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Email not working for one contact

Postby blackrino9 » Mon Feb 12, 2018 2:40 pm

Hello,

I'll start out by stating that I'm not sure this contact ever worked. I've recently started relying on it more heavily for some email-only alerts (anything critical goes straight to pagerduty) and I've noticed that email isn't coming through. I should also state that I use NagiosQL for a nice graphical interface for Nagios Core.

Other than this one contact, email is working normally on the server and I do receive alerts which are configured to go to other addresses.

I can manually email the address at the command line and the message is delivered.

If I tail the nagios.log, I can see the alert being triggered and it calls out the contact in question for the notification.
[1518461479] SERVICE NOTIFICATION: CloudServices-QPS-Admin-Email-contact;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;notify-service-by-email;DTU Utilization on qp-prod-dbs/qp_master is 37%: DTU Warning


The maillog for Postfix does contain any record of the email being sent. Postfix queues are empty. Other emails are being sent as expected.



Each service in our environment is constructed by defining the Service Check and then applying nested service templates.
Top layer: Define the service to be tested (essentially, define the command to use)
define service {
#NAGIOSQL_CONFIG_NAME QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master
hostgroup_name QP-Prod-App-SQL-Azure
service_description QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master
use CloudServices-QPS-Crit-Email
check_command Azure_Check_DB_DTU!<MyAzureSubscriptionID>!"QP-Prod"!qp-prod-dbs!qp_master!80!90
register 1
}


Middle Layer: Contact Group
define service {
name CloudServices-QPS-Crit-Email
service_description CloudServices-QPS-Crit-Email
display_name CloudServices-QPS-Crit-Email
use PagerDuty_Monitoring_Profile-5-5-1
contact_groups CloudServices-QPS-Admin-Email-Group
register 0
}


Bottom Layer: Frequency and notification options
define service {
name PagerDuty_Monitoring_Profile-5-5-1
service_description PagerDuty_Monitoring_Profile-5min
display_name PagerDuty_Monitoring_Profile-5min
is_volatile 0
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
obsess_over_service 1
check_freshness 1
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period 24x7
notification_options w,u,c,r
notifications_enabled 1
parallelize_check 1
register 0
}


The contact group contains the contact
define contactgroup {
contactgroup_name CloudServices-QPS-Admin-Email-Group
alias CloudServices-QPS-Admin-Email-Group
members CloudServices-QPS-Admin-Email-contact
register 1
}



The contact appears to be setup correctly.
define contact {
contact_name CloudServices-QPS-Admin-Email-contact
alias CloudServices-QPS-Admin-Email-contact
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,r
service_notification_options w,c,r
host_notification_commands notify-host-by-email
service_notification_commands notify-service-by-email
can_submit_commands 0
retain_status_information 1
retain_nonstatus_information 1
email <myemailaddress>
register 1
}


The command works perfectly fine with any other contact

define command {
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios 4.3.1 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\n\n$LONGSERVICEOUTPUT$\n$SERVICEPERFDATA$\n\n\nFollow-up Response:\n\n1) Briefly define Root cause\n2) Impact to IT services (as felt by our customer)\n3) Remedy to fix (Immediate/short term)\n4) Remedy to prevent a recurrence (future/long term)\n\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" "$CONTACTEMAIL$"
register 1


Any ideas why this one contact seems to be non-funcational?

Thanks
blackrino9
 
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Postby npolovenko » Mon Feb 12, 2018 5:03 pm

@blackrino9, Could it be that the email gets rejected on the recipient's side? Did you check the spam folder? Or perhaps nagios may have gotten blacklisted? Also, I don't seem to have double quotes around $CONTACTEMAIL$ in my notify service by email command:
Code: Select all
/usr/bin/printf "%b" "***** Nagios 4.3.1 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\n\n$LONGSERVICEOUTPUT$\n$SERVICEPERFDATA$\n\n\nFollow-up Response:\n\n1) Briefly define Root cause\n2) Impact to IT services (as felt by our customer)\n3) Remedy to fix (Immediate/short term)\n4) Remedy to prevent a recurrence (future/long term)\n\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$

But since you said all other contacts work fine I don't think that's the cause. Can you temporarely change this contact's email to a diiferent one, then force a critical check to start a notification and see if works with a different email, prefereably hosted on a different domain?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
npolovenko
Support Tech
 
Posts: 1833
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Postby blackrino9 » Mon Feb 12, 2018 7:08 pm

Hello,

That's a good thought. Even though I can send from the command line to that address, that doesn't necessarily mean that the nagios address hasn't been blocked somehow. I checked the group mailbox (it's o365) but I couldn't find anyplace where there might be settings for spam/junk.

Yeah, I tried substituting another email address (my gmail instead of a corporate address) into that contact but that didn't seem to make a difference. I also tried updating the contact group to use different contacts and just now, I removed all the nested service template configuration and just set things up in a single service health check and STILL couldn't get this service to send an email.

I've tried generating alerts with both Warning and Critical Status. I can see Nagios sending the Notification to the contact, but I can't see the email being sent in the maillog.

Regarding the quotes around the CONTACTEMAIL variable, we had an issue a while ago where, after a round of patching, no emails at all were being sent until we updated the command to enclose that variable in double quotes as well.

Considering I couldn't get that service to send an email no matter what I did, I started wondering if perhaps something in my scripts output could be interfering with Nagios's ability to send mail. I tried commenting out the "echo" statements in the final results block but that didn't seem to have any effect.

#!/bin/bash

sName="check_azure_database_dtu_utilization_5min"
TimePeriod=5
Metric="dtu_consumption_percent"
declare -i Warn=0
declare -i Crit=0

while getopts S:R:s:d:w:c: option
do
case "${option}"
in
S) Subscription=${OPTARG};;
R) ResourceGroup=${OPTARG};;
s) SQLServer=${OPTARG};;
d) Database=${OPTARG};;
w) Warn=${OPTARG};;
c) Crit=${OPTARG};;
esac
done

if [ -n "$Subscription" ] && [ -n "$ResourceGroup" ] && [ -n "$SQLServer" ] && [ -n "$Database" ] && [ -n "$Warn" ] && [ -n "$Crit" ]; then
#get time information
timestamp=$(date +"%Y-%m-%dT%H:%M:%SZ" -d "$TimePeriod min ago")
Interval="PT"$TimePeriod"M"

#Azure login performed separately

#perform the service test
ServiceTest=`timeout 50s az monitor metrics list --resource/subscriptions/$Subscription/resourceGroups/$ResourceGroup/providers/Microsoft.Sql/servers/$SQLServer/databases/$Database" --metric $Metric --start-time $timestamp --interval $Interval`

#parse out everything except the value of the metric
ServiceTest=`echo "$ServiceTest" | grep average`

if [[ ${ServiceTest} != *"average"* ]];then
errormail=`echo "$Database API Call Failed" | mailx -s "$Database API Call Failed" <myemail@corporateaccount.com>`
echo "API Call Failed"
exit 3
else
#Using the colon as the delimiter, select the second set of information
ServiceTest=`echo $ServiceTest | cut -d ":" -f2|cut -d "," -f1|cut -d "." -f1|tr -d '"'`

#Perform tests and return health response
if (( ${ServiceTest} <= Warn )); then
echo "DTU Utilization on $SQLServer:$Database is $ServiceTest%: DTU OK"
exit 0
elif (( ${ServiceTest} >= Warn )) && (( ${ServiceTest} <= Crit )); then
echo "DTU Utilization on $SQLServer:$Database is $ServiceTest%: DTU Warning"
exit 1
elif (( ${ServiceTest} >= Crit )); then
echo "DTU Utilization on $SQLServer:$Database is $ServiceTest%: DTU Critical"
exit 2
fi
fi
# If inputs are not as expected, print help.
else
echo -e "\n\n\t\t### $sName Version 1.0###\n"
echo -e "# Usage:\t$sName -S <subscriptionID> -R <ResourceGroup> -s <SQLServerName> -d <DatabaseName> -m <MetricName> -w <Warn> -c <Crit>"
exit
fi
blackrino9
 
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Postby blackrino9 » Mon Feb 12, 2018 7:48 pm

The more I research this, the more I don't think it's the contact that is the problem. It seems to be related to this service check as the contact group works fine when used by any other check.

This is a custom written script for monitoring Azure SQL DTU Utilization via the Azure CLI. The CLI does require authentication which I handle via a separate scheduled task to cut down on the number of time I need to touch the CLI when performing a test. Is there something in the way I've constructed this script which is preventing Nagios from sending email alerts when the status is warn or crit?
blackrino9
 
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Postby npolovenko » Tue Feb 13, 2018 5:42 pm

@blackrino9, I'm not sure why you have register1 in the mail command, I'd get rid of it. Then, please restart Nagios.

Code: Select all
define command {
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios 4.3.1 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\n\n$LONGSERVICEOUTPUT$\n$SERVICEPERFDATA$\n\n\nFollow-up Response:\n\n1) Briefly define Root cause\n2) Impact to IT services (as felt by our customer)\n3) Remedy to fix (Immediate/short term)\n4) Remedy to prevent a recurrence (future/long term)\n\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" "$CONTACTEMAIL$"
register 1


After that please upload these two files:
Code: Select all
/usr/local/nagios/var/status.dat
/usr/local/nagios/var/objects.cache
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
npolovenko
Support Tech
 
Posts: 1833
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Postby blackrino9 » Tue Feb 13, 2018 6:15 pm

I suspect the line "register 1" has to do with NagiosQL (the management suite we use to administer Nagios). I've removed it manually for now but this will be re-added the next time I save command information.

Requested files attached.
Last edited by blackrino9 on Tue Feb 20, 2018 1:46 am, edited 1 time in total.
blackrino9
 
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Postby npolovenko » Wed Feb 14, 2018 5:00 pm

@blackrino9, This timestamp tells me that the notification was sent out yesterday at 2:56:58 PM US central time:

Code: Select all
contactstatus {
   contact_name=CloudServices-QPS-Admin-Email-contact
   last_service_notification=1518555418
}


Can you upload a screenshot of the service check output in Core Web interface when it's in a Critical state?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
npolovenko
Support Tech
 
Posts: 1833
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Postby blackrino9 » Wed Feb 14, 2018 6:08 pm

Hello,

I can see the Service Notification when the service is Warning or Critical in the nagios logs, but the email never actually goes out. The mail logs confirm (no entry) that the email doesn't actually make it to the mail queue.

This is what I see when I tail both nagios.log and /var/log/maillog. As you can see, there are no entries made in the maillog to correspond with the email alert being generated.

==> nagios.log <==
[1518649415] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649416] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649417] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;1;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649418] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649418] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;2;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649420] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649420] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;3;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649421] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649422] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;4;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649423] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;HARD;5;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649423] SERVICE NOTIFICATION: CloudServices-QPS-Admin-Email-contact;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;notify-service-by-email;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
Attachments
Nagios.png
blackrino9
 
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Postby npolovenko » Thu Feb 15, 2018 3:03 pm

The maillog for Postfix does contain any record of the email being sent.

Does it have entries for other service check notifications? Can you actually upload it? /var/log/maillog i suppose
And all the other service notifications work just fine? Can you test that to make sure?

Also, what you could do is change the debug level in
Code: Select all
/usr/local/nagios/etc/nagios.cfg

look for debug_level and change it to:
Code: Select all
debug_level=-1

Yes, -1
Then restart nagios:
Code: Select all
service nagios restart

Then force the notification for that service again. And upload the debug file:
Code: Select all
/usr/local/nagios/var/nagios.debug

After that don't forget to change the debug_level= back to 0, and restart Nagios, otherwise it'll fill up your hard drive pretty quickly.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
npolovenko
Support Tech
 
Posts: 1833
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Postby blackrino9 » Thu Feb 15, 2018 4:27 pm

Hello,

Yes, it contains a full record for all email that has been sent by the server. I can manually send emails to the contact assigned to this service from the command line and they will send out (and log) as expected. I hope you can take my word for this one as I'd rather not upload a file with dozens of our email addresses in it for public inspection. Confirmed that every other service/host check is working correctly with email notifications.

debug log attached
blackrino9
 
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Next

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 17 guests