I mentioned this in a PM, but for others that come to the thread, please note that we are not open on the weekend. Details at https://www.nagios.com/contact/
As for the issue at hand, if you go to Admin -> Mail Settings can you send a screenshot of that page? You can PM it if concerned about security.
Also on that page, can you turn on the phpmailer log? Give it a few minutes, and then run tail /usr/local/nagiosxi/tmp/phpmailer.log. Again, you can PM the output if necessary.
If you do PM, please update this post as that is the only way for it to come back up on our support dashboard.
URGENT! Nagios flooding mail server but mailq is empty!
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: URGENT! Nagios flooding mail server but mailq is empty!
Here is the log. I am having to block port 25 because of the backlog of alerts.
If you can tell me where those backlogs sit, all I have to do is clear them and I should be back in business.
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Data not accepted.<p>SMTP server error: </p>
(method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: The following recipients failed: [email protected]<p>SMTP server error: </p>
(method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
If you can tell me where those backlogs sit, all I have to do is clear them and I should be back in business.
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Could not connect to SMTP host. (method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: Data not accepted.<p>SMTP server error: </p>
(method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
SMTP Error: The following recipients failed: [email protected]<p>SMTP server error: </p>
(method=smtp;host=xxxx.onshoreit.net;port=25;security=none)
You do not have the required permissions to view the files attached to this post.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Couple of questions regarding notifications
Sure just point out where you'd like to see it.Could you clarify your responses?
Nagios doesn't cache mail sent by php, it is a direct handoff to the SMTP server. For this reason and others, I prefer to use the Sendmail option and use a smarthost if need be.Where on the Nagios server is the mail held that is sent by php?
Are you sure these aren't new notifications? If you enable the Debug Log, what are the contents? What does the Notification log say?I noticed that the individual mails come out about once a minute. Why would I have such a long delay between mails?
Have you seen this document:For the flapping, it is typical for a T1 interface to flap once every 2 minutes as it tries to restart itself. I would like Nagios to detect this as a flap and suppress alerts
https://assets.nagios.com/downloads/nag ... pping.html
Specifically the section "How Flap Detection Works".
Reading over your description, I don't think you want to mess around with flap detection. Instead, you should adjust your check interval to something "reasonable" like 5 minutes, then give it a grace period. You can do this by setting Retry interval = 1 and Max check attempts = 5. This will make notifications go out if the bad state hasn't recovered 5 minutes from the first detection of the bad state.The documentation for flap detection was hard for me to follow and I was hoping that a practical example may help. I would like for flap detection to treat failures within a 5 minute interval as the same outage and not alert. Is that possible?
Previous Nagios employee
Re: URGENT! Nagios flooding mail server but mailq is empty!
The mail MUST be cached somewhere otherwise I would be not be seeing down messages streaming in while the interface shows everything up. I have verified this by seeing that lsof shows the process "php" sending mail to my mail server. This process and connection happens about once a minute consistently. Please see those command outputs earlier in this post. I just need to know where this file is or table so it can be cleared/flushed. I can't allow notifications until I can get rid of all the backlog.
So when you say it is a direct handoff to an smtp server. Which server would that be? If I turn off postfix, mail still flows and mailq shows zero messages so it is not postfix nor sendmail sending the email. It also can't be my exchange server because if I block SMTP in front of the Nagios server, I stop getting the mail at my off site Exchange server.
The output from the debug log was posted in a previous comment. I know about the blocks as the firewall is currently blocking port 25 to prevent the mail flood.
I am ABSOLUTELY CERTAIN that these are not new alerts. Birdseye shows no new alerts since 12:33pm CST Friday.
I have had trouble with Sendmail in the past which is why I switched to SMTP. What are the differences? Is the Sendmail option using Postfix in the background? What are the advantages/disadvantages to using Sendmail? I have no issue switching if that is a better method but I would really like to understand why I am seeing such a delay with alerts from the phpmailer and where it is holding queued mail.
I read the flap detection document but it did not help much which is why I was looking for a practical example. Could you provide?
I wish I could set the retry and max as you suggest as I know that would solve it but we are under contractual obligation to alert in 3 minutes so the retry at 5 would make us miss a lot of legitimate alerts. It seems to me that flap detection would be a good method to suppress multiple alerts if I understand the feature correctly.
So when you say it is a direct handoff to an smtp server. Which server would that be? If I turn off postfix, mail still flows and mailq shows zero messages so it is not postfix nor sendmail sending the email. It also can't be my exchange server because if I block SMTP in front of the Nagios server, I stop getting the mail at my off site Exchange server.
The output from the debug log was posted in a previous comment. I know about the blocks as the firewall is currently blocking port 25 to prevent the mail flood.
I am ABSOLUTELY CERTAIN that these are not new alerts. Birdseye shows no new alerts since 12:33pm CST Friday.
I have had trouble with Sendmail in the past which is why I switched to SMTP. What are the differences? Is the Sendmail option using Postfix in the background? What are the advantages/disadvantages to using Sendmail? I have no issue switching if that is a better method but I would really like to understand why I am seeing such a delay with alerts from the phpmailer and where it is holding queued mail.
I read the flap detection document but it did not help much which is why I was looking for a practical example. Could you provide?
I wish I could set the retry and max as you suggest as I know that would solve it but we are under contractual obligation to alert in 3 minutes so the retry at 5 would make us miss a lot of legitimate alerts. It seems to me that flap detection would be a good method to suppress multiple alerts if I understand the feature correctly.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: URGENT! Nagios flooding mail server but mailq is empty!
Is sounds like a cron is failing to run or just running through things very slowing.
What's the output of systemctl status crond?
In order to expedite things, can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.
After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
UPDATE: Profile received and shared with techs.
What's the output of systemctl status crond?
In order to expedite things, can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.
After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
UPDATE: Profile received and shared with techs.
Last edited by dwhitfield on Mon Apr 03, 2017 5:02 pm, edited 1 time in total.
Reason: profile received
Reason: profile received
Re: URGENT! Nagios flooding mail server but mailq is empty!
[root@bmcap-nagios01 ~]# systemctl status crond
-bash: systemctl: command not found
So I tried sysctl instead
[root@bmcap-nagios01 ~]# sysctl status crond
error: "status" is an unknown key
error: "crond" is an unknown key
profile sent as PM.
-bash: systemctl: command not found
So I tried sysctl instead
[root@bmcap-nagios01 ~]# sysctl status crond
error: "status" is an unknown key
error: "crond" is an unknown key
profile sent as PM.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: URGENT! Nagios flooding mail server but mailq is empty!
What is the output of:
Code: Select all
# mysql -u root -pnagiosxi nagiosxi -e "SELECT COUNT(*) FROM xi_eventqueue;"Previous Nagios employee
Re: URGENT! Nagios flooding mail server but mailq is empty!
[root@bmcap-nagios01 ~]# mysql -u root -pnagiosxi nagiosxi -e "SELECT COUNT(*) FROM xi_eventqueue;"
ERROR 1146 (42S02) at line 1: Table 'nagiosxi.xi_eventqueue' doesn't exist
ERROR 1146 (42S02) at line 1: Table 'nagiosxi.xi_eventqueue' doesn't exist
Re: URGENT! Nagios flooding mail server but mailq is empty!
I tried "service" instead of sysctl and I do see output for that command. Was that what you were looking for?
[root@bmcap-nagios01 ~]# service crond status
crond (pid 1664) is running...
[root@bmcap-nagios01 ~]# service crond status
crond (pid 1664) is running...
Re: URGENT! Nagios flooding mail server but mailq is empty!
I tried this since there is no xi_eventqueue. I used database nagiosxi. Does that help?
mysql> SELECT COUNT(*) FROM xi_events;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(*) FROM xi_events;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)