Email not working for one contact

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
blackrino9
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Email not working for one contact

Post by blackrino9 »

Hello,

I'll start out by stating that I'm not sure this contact ever worked. I've recently started relying on it more heavily for some email-only alerts (anything critical goes straight to pagerduty) and I've noticed that email isn't coming through. I should also state that I use NagiosQL for a nice graphical interface for Nagios Core.

Other than this one contact, email is working normally on the server and I do receive alerts which are configured to go to other addresses.

I can manually email the address at the command line and the message is delivered.

If I tail the nagios.log, I can see the alert being triggered and it calls out the contact in question for the notification.
[1518461479] SERVICE NOTIFICATION: CloudServices-QPS-Admin-Email-contact;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;notify-service-by-email;DTU Utilization on qp-prod-dbs/qp_master is 37%: DTU Warning
The maillog for Postfix does contain any record of the email being sent. Postfix queues are empty. Other emails are being sent as expected.



Each service in our environment is constructed by defining the Service Check and then applying nested service templates.
Top layer: Define the service to be tested (essentially, define the command to use)
define service {
#NAGIOSQL_CONFIG_NAME QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master
hostgroup_name QP-Prod-App-SQL-Azure
service_description QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master
use CloudServices-QPS-Crit-Email
check_command Azure_Check_DB_DTU!<MyAzureSubscriptionID>!"QP-Prod"!qp-prod-dbs!qp_master!80!90
register 1
}
Middle Layer: Contact Group
define service {
name CloudServices-QPS-Crit-Email
service_description CloudServices-QPS-Crit-Email
display_name CloudServices-QPS-Crit-Email
use PagerDuty_Monitoring_Profile-5-5-1
contact_groups CloudServices-QPS-Admin-Email-Group
register 0
}
Bottom Layer: Frequency and notification options
define service {
name PagerDuty_Monitoring_Profile-5-5-1
service_description PagerDuty_Monitoring_Profile-5min
display_name PagerDuty_Monitoring_Profile-5min
is_volatile 0
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
obsess_over_service 1
check_freshness 1
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period 24x7
notification_options w,u,c,r
notifications_enabled 1
parallelize_check 1
register 0
}
The contact group contains the contact
define contactgroup {
contactgroup_name CloudServices-QPS-Admin-Email-Group
alias CloudServices-QPS-Admin-Email-Group
members CloudServices-QPS-Admin-Email-contact
register 1
}

The contact appears to be setup correctly.
define contact {
contact_name CloudServices-QPS-Admin-Email-contact
alias CloudServices-QPS-Admin-Email-contact
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,r
service_notification_options w,c,r
host_notification_commands notify-host-by-email
service_notification_commands notify-service-by-email
can_submit_commands 0
retain_status_information 1
retain_nonstatus_information 1
email <myemailaddress>
register 1
}
The command works perfectly fine with any other contact
define command {
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios 4.3.1 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\n\n$LONGSERVICEOUTPUT$\n$SERVICEPERFDATA$\n\n\nFollow-up Response:\n\n1) Briefly define Root cause\n2) Impact to IT services (as felt by our customer)\n3) Remedy to fix (Immediate/short term)\n4) Remedy to prevent a recurrence (future/long term)\n\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" "$CONTACTEMAIL$"
register 1
Any ideas why this one contact seems to be non-funcational?

Thanks
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Post by npolovenko »

@blackrino9, Could it be that the email gets rejected on the recipient's side? Did you check the spam folder? Or perhaps nagios may have gotten blacklisted? Also, I don't seem to have double quotes around $CONTACTEMAIL$ in my notify service by email command:

Code: Select all

/usr/bin/printf "%b" "***** Nagios 4.3.1 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\n\n$LONGSERVICEOUTPUT$\n$SERVICEPERFDATA$\n\n\nFollow-up Response:\n\n1) Briefly define Root cause\n2) Impact to IT services (as felt by our customer)\n3) Remedy to fix (Immediate/short term)\n4) Remedy to prevent a recurrence (future/long term)\n\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
But since you said all other contacts work fine I don't think that's the cause. Can you temporarely change this contact's email to a diiferent one, then force a critical check to start a notification and see if works with a different email, prefereably hosted on a different domain?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
blackrino9
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Post by blackrino9 »

Hello,

That's a good thought. Even though I can send from the command line to that address, that doesn't necessarily mean that the nagios address hasn't been blocked somehow. I checked the group mailbox (it's o365) but I couldn't find anyplace where there might be settings for spam/junk.

Yeah, I tried substituting another email address (my gmail instead of a corporate address) into that contact but that didn't seem to make a difference. I also tried updating the contact group to use different contacts and just now, I removed all the nested service template configuration and just set things up in a single service health check and STILL couldn't get this service to send an email.

I've tried generating alerts with both Warning and Critical Status. I can see Nagios sending the Notification to the contact, but I can't see the email being sent in the maillog.

Regarding the quotes around the CONTACTEMAIL variable, we had an issue a while ago where, after a round of patching, no emails at all were being sent until we updated the command to enclose that variable in double quotes as well.

Considering I couldn't get that service to send an email no matter what I did, I started wondering if perhaps something in my scripts output could be interfering with Nagios's ability to send mail. I tried commenting out the "echo" statements in the final results block but that didn't seem to have any effect.
#!/bin/bash

sName="check_azure_database_dtu_utilization_5min"
TimePeriod=5
Metric="dtu_consumption_percent"
declare -i Warn=0
declare -i Crit=0

while getopts S:R:s:d:w:c: option
do
case "${option}"
in
S) Subscription=${OPTARG};;
R) ResourceGroup=${OPTARG};;
s) SQLServer=${OPTARG};;
d) Database=${OPTARG};;
w) Warn=${OPTARG};;
c) Crit=${OPTARG};;
esac
done

if [ -n "$Subscription" ] && [ -n "$ResourceGroup" ] && [ -n "$SQLServer" ] && [ -n "$Database" ] && [ -n "$Warn" ] && [ -n "$Crit" ]; then
#get time information
timestamp=$(date +"%Y-%m-%dT%H:%M:%SZ" -d "$TimePeriod min ago")
Interval="PT"$TimePeriod"M"

#Azure login performed separately

#perform the service test
ServiceTest=`timeout 50s az monitor metrics list --resource/subscriptions/$Subscription/resourceGroups/$ResourceGroup/providers/Microsoft.Sql/servers/$SQLServer/databases/$Database" --metric $Metric --start-time $timestamp --interval $Interval`

#parse out everything except the value of the metric
ServiceTest=`echo "$ServiceTest" | grep average`

if [[ ${ServiceTest} != *"average"* ]];then
errormail=`echo "$Database API Call Failed" | mailx -s "$Database API Call Failed" <myemail@corporateaccount.com>`
echo "API Call Failed"
exit 3
else
#Using the colon as the delimiter, select the second set of information
ServiceTest=`echo $ServiceTest | cut -d ":" -f2|cut -d "," -f1|cut -d "." -f1|tr -d '"'`

#Perform tests and return health response
if (( ${ServiceTest} <= Warn )); then
echo "DTU Utilization on $SQLServer:$Database is $ServiceTest%: DTU OK"
exit 0
elif (( ${ServiceTest} >= Warn )) && (( ${ServiceTest} <= Crit )); then
echo "DTU Utilization on $SQLServer:$Database is $ServiceTest%: DTU Warning"
exit 1
elif (( ${ServiceTest} >= Crit )); then
echo "DTU Utilization on $SQLServer:$Database is $ServiceTest%: DTU Critical"
exit 2
fi
fi
# If inputs are not as expected, print help.
else
echo -e "\n\n\t\t### $sName Version 1.0###\n"
echo -e "# Usage:\t$sName -S <subscriptionID> -R <ResourceGroup> -s <SQLServerName> -d <DatabaseName> -m <MetricName> -w <Warn> -c <Crit>"
exit
fi
blackrino9
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Post by blackrino9 »

The more I research this, the more I don't think it's the contact that is the problem. It seems to be related to this service check as the contact group works fine when used by any other check.

This is a custom written script for monitoring Azure SQL DTU Utilization via the Azure CLI. The CLI does require authentication which I handle via a separate scheduled task to cut down on the number of time I need to touch the CLI when performing a test. Is there something in the way I've constructed this script which is preventing Nagios from sending email alerts when the status is warn or crit?
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Post by npolovenko »

@blackrino9, I'm not sure why you have register1 in the mail command, I'd get rid of it. Then, please restart Nagios.

Code: Select all

define command {
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios 4.3.1 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\n\n$LONGSERVICEOUTPUT$\n$SERVICEPERFDATA$\n\n\nFollow-up Response:\n\n1) Briefly define Root cause\n2) Impact to IT services (as felt by our customer)\n3) Remedy to fix (Immediate/short term)\n4) Remedy to prevent a recurrence (future/long term)\n\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" "$CONTACTEMAIL$"
register 1
After that please upload these two files:

Code: Select all

/usr/local/nagios/var/status.dat
/usr/local/nagios/var/objects.cache
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
blackrino9
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Post by blackrino9 »

I suspect the line "register 1" has to do with NagiosQL (the management suite we use to administer Nagios). I've removed it manually for now but this will be re-added the next time I save command information.

Requested files attached.
Last edited by blackrino9 on Tue Feb 20, 2018 1:46 am, edited 1 time in total.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Post by npolovenko »

@blackrino9, This timestamp tells me that the notification was sent out yesterday at 2:56:58 PM US central time:

Code: Select all

contactstatus {
	contact_name=CloudServices-QPS-Admin-Email-contact
	last_service_notification=1518555418
}
Can you upload a screenshot of the service check output in Core Web interface when it's in a Critical state?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
blackrino9
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Post by blackrino9 »

Hello,

I can see the Service Notification when the service is Warning or Critical in the nagios logs, but the email never actually goes out. The mail logs confirm (no entry) that the email doesn't actually make it to the mail queue.

This is what I see when I tail both nagios.log and /var/log/maillog. As you can see, there are no entries made in the maillog to correspond with the email alert being generated.

==> nagios.log <==
[1518649415] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649416] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649417] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;1;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649418] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649418] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;2;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649420] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649420] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;3;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649421] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;1518649414
[1518649422] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;SOFT;4;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649423] SERVICE ALERT: QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;HARD;5;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
[1518649423] SERVICE NOTIFICATION: CloudServices-QPS-Admin-Email-contact;QP-Prod-US-East-Azure-SQL-Azure;QP-Prod-App-SQL-Azure-Check_DTU_Utilization-qp_master;WARNING;notify-service-by-email;DTU Utilization on qp-prod-dbs:qp_master is 36%: DTU Warning
Attachments
Nagios.png
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Email not working for one contact

Post by npolovenko »

The maillog for Postfix does contain any record of the email being sent.
Does it have entries for other service check notifications? Can you actually upload it? /var/log/maillog i suppose
And all the other service notifications work just fine? Can you test that to make sure?

Also, what you could do is change the debug level in

Code: Select all

/usr/local/nagios/etc/nagios.cfg
look for debug_level and change it to:

Code: Select all

debug_level=-1
Yes, -1
Then restart nagios:

Code: Select all

service nagios restart
Then force the notification for that service again. And upload the debug file:

Code: Select all

/usr/local/nagios/var/nagios.debug
After that don't forget to change the debug_level= back to 0, and restart Nagios, otherwise it'll fill up your hard drive pretty quickly.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
blackrino9
Posts: 27
Joined: Wed Apr 12, 2017 7:19 pm

Re: Email not working for one contact

Post by blackrino9 »

Hello,

Yes, it contains a full record for all email that has been sent by the server. I can manually send emails to the contact assigned to this service from the command line and they will send out (and log) as expected. I hope you can take my word for this one as I'd rather not upload a file with dozens of our email addresses in it for public inspection. Confirmed that every other service/host check is working correctly with email notifications.

debug log attached
Locked