Email Notifications seem to be working intermittently
Posted: Thu Oct 27, 2016 10:45 am
I am running Nagios XI 5.3.0 on CentOS. This is an enterprise edition for VMWare 64 Bit that was originally OVA download. It has been through a few upgrades at this point for Yum and for nagiosxi.
About 10 days ago, some (not all, just some) of the emails for issues quit working. It appears that the emails go out for Problems, but many times not for when the problem Resolves. It also seems to be getting worse over time. Now were are starting to see times where neither the problem nor resolution emails are sent.
I can look at the Notifications, and Nagios says that it sent the emails out.
I can test from the user and I get test notifications.
I can look at the Event Log and I see where Nagios says it sent out the email.
I have run ./repair_databases.sh nagios, no issues.
I have run tcpdump and do not see these emails sent when Nagios reports that it did in the above logs.
We send out our messages via a smtp relay at the site. I have had those guys looking at this for over an hour and they have many thousands of emails going through without issue. Nagios server will "say" in the logs that an email will have been sent and sometimes it is in the smtp and sometimes not.
So far, I have not been able to get reliability for the past 10 days for the messages. I also do not seem to be able to find a common reason.
About 10 days ago, some (not all, just some) of the emails for issues quit working. It appears that the emails go out for Problems, but many times not for when the problem Resolves. It also seems to be getting worse over time. Now were are starting to see times where neither the problem nor resolution emails are sent.
I can look at the Notifications, and Nagios says that it sent the emails out.
I can test from the user and I get test notifications.
I can look at the Event Log and I see where Nagios says it sent out the email.
I have run ./repair_databases.sh nagios, no issues.
I have run tcpdump and do not see these emails sent when Nagios reports that it did in the above logs.
We send out our messages via a smtp relay at the site. I have had those guys looking at this for over an hour and they have many thousands of emails going through without issue. Nagios server will "say" in the logs that an email will have been sent and sometimes it is in the smtp and sometimes not.
So far, I have not been able to get reliability for the past 10 days for the messages. I also do not seem to be able to find a common reason.