URGENT! Nagios flooding mail server but mailq is empty!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
dfmco
Posts: 257
Joined: Wed Dec 04, 2013 11:05 am

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dfmco »

Received: from MAIL1.onshoreit.local (172.31.128.25) by MAIL1.onshoreit.local
(172.31.128.25) with Microsoft SMTP Server (TLS) id 15.0.847.32 via Mailbox
Transport; Fri, 31 Mar 2017 18:27:23 -0500
Received: from MAIL1.onshoreit.local (172.31.128.25) by MAIL1.onshoreit.local
(172.31.128.25) with Microsoft SMTP Server (TLS) id 15.0.847.32; Fri, 31 Mar
2017 18:27:23 -0500
Received: from bmcap-nagios01.bexar.e-911.net (99.20.165.50) by
MAIL1.onshoreit.local (172.31.128.25) with Microsoft SMTP Server (TLS) id
15.0.847.32 via Frontend Transport; Fri, 31 Mar 2017 18:27:23 -0500
Date: Fri, 31 Mar 2017 18:26:58 -0500
To: <[email protected]>
From: <[email protected]>
Reply-To: <[email protected]>
Subject: BEXAR PROBLEM Host Alert - Seguin-S2 is DOWN
Message-ID: <[email protected]>
X-Mailer: PHPMailer 5.2.22 (https://github.com/PHPMailer/PHPMailer)
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="b1_5fe1b29fcf9ee6e2fd2eb06eadfcc5cb"
Content-Transfer-Encoding: 8bit
Return-Path: [email protected]
X-MS-Exchange-Organization-AuthSource: MAIL1.onshoreit.local
X-MS-Exchange-Organization-AuthAs: Internal
X-MS-Exchange-Organization-AuthMechanism: 10
X-MS-Exchange-Organization-Network-Message-Id: 34dffa0f-8af4-4672-df21-08d4788d7c3b
X-MS-Exchange-Organization-AVStamp-Enterprise: 1.0
dfmco
Posts: 257
Joined: Wed Dec 04, 2013 11:05 am

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dfmco »

Pager Duty is removed.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by avandemore »

Can you provide the full email, headers and all.
Previous Nagios employee
dfmco
Posts: 257
Joined: Wed Dec 04, 2013 11:05 am

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dfmco »

Here is what is avaialbe to me:

Headers:
Received: from MAIL1.onshoreit.local (172.31.128.25) by MAIL1.onshoreit.local
(172.31.128.25) with Microsoft SMTP Server (TLS) id 15.0.847.32 via Mailbox
Transport; Fri, 31 Mar 2017 18:17:26 -0500
Received: from MAIL1.onshoreit.local (172.31.128.25) by MAIL1.onshoreit.local
(172.31.128.25) with Microsoft SMTP Server (TLS) id 15.0.847.32; Fri, 31 Mar
2017 18:17:26 -0500
Received: from bmcap-nagios01.bexar.e-911.net (99.20.165.50) by
MAIL1.onshoreit.local (172.31.128.25) with Microsoft SMTP Server (TLS) id
15.0.847.32 via Frontend Transport; Fri, 31 Mar 2017 18:17:26 -0500
Date: Fri, 31 Mar 2017 18:17:01 -0500
To: <[email protected]>
From: <[email protected]>
Reply-To: <[email protected]>
Subject: BEXAR PROBLEM Host Alert - LiveOak-S2 is DOWN
Message-ID: <[email protected]>
X-Mailer: PHPMailer 5.2.22 (https://github.com/PHPMailer/PHPMailer)
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="b1_e7cc30be1afb0b52517662917ab59304"
Content-Transfer-Encoding: 8bit
Return-Path: [email protected]
X-MS-Exchange-Organization-AuthSource: MAIL1.onshoreit.local
X-MS-Exchange-Organization-AuthAs: Internal
X-MS-Exchange-Organization-AuthMechanism: 10
X-MS-Exchange-Organization-Network-Message-Id: 886ecab2-9ff6-4650-2cc9-08d4788c183d
X-MS-Exchange-Organization-AVStamp-Enterprise: 1.0

Body:





***** DFMCO Alert *****

Nagios has detected a problem with this host.

Notification Type: PROBLEM
Host: LiveOak-S2
State: DOWN
Address: 10.36.14.2
Info: CRITICAL - 10.36.14.2: rta nan, lost 100%
Date/Time: 2017-03-31 18:17:01

Respond: http://10.4.199.12/nagiosxi/rr.php?uid= ... 3ba48516f2
Nagios URL: http://10.4.199.12/nagiosxi/
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by avandemore »

So here's what I can say. It's apparent there was some delay between the alert being generated and then received by your SMTP server, assuming the timezone is correct on it.

Since PHPMailer debug wasn't enabled at the time there is nothing to cross reference. Even if it had been the ability to cross reference would be limited as PHPMailer debug logs aren't very informative.

If you want more control over mail on the server I would select Sendmail as the option then configure postfix as a smarthost. You can find more info on that here:

https://devops.profitbricks.com/tutoria ... -centos-7/

If your SMTP requires authentication you'd need to make sure that is correct.
Previous Nagios employee
dfmco
Posts: 257
Joined: Wed Dec 04, 2013 11:05 am

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dfmco »

What concerns me is that Nagios just uploaded a video 2 weeks ago on your youtube channel recommending smtp for mail over sendmail. If there are delays, do you think they are phpmailer related or do you think we may have a bigger system issue? What worries me is how I often get I/O wait with SSD drives and 16G RAM with a relatively small install. I have 2 other installs that are literally twice the size with regular HDD and less RAM that run smoother.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dwhitfield »

Curious where that video is. I'm looking at https://www.youtube.com/user/nagiosvideo/videos and see nothing.

Regardless, selecting the right mail system really depends on your environment.

Do those other environments have checks running less frequently? Do they utilize ramdisk? We're very quickly moving into "this should be another thread" territory, so if you really want to tackle the performance issues rather than email, that's where I'd suggest answering those two questions. Alternatively, I leave them here for your personal inquiry.
dfmco
Posts: 257
Joined: Wed Dec 04, 2013 11:05 am

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dfmco »

Here is the video.
https://www.youtube.com/watch?v=glrfs6ImDhc

My concern is that if the mail is not being queued, what caused it to be delayed for days?

I can start a new topic if that is what you recommend. Now that I have a backup server fully functional, I will try killing the firewall again to see if I can recreate the problem. It has happened each time the firewall has gone down (client has moved the firewall in the rack 3 times in as many months) so I am sure it will happen again. If I go this route, let me know what information I should capture. Could I schedule some time so someone could look at the system as it is failed? I won't be able to leave it failed due to the havoc that it causes but I can fail it long enough to capture some data.

Let me know how you want to proceed.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dwhitfield »

That video was posted in April *2016*. Admittedly, I listened at 2x, but I didn't hear him say SMTP > sendmail. It's just an option. Regardless, I stand by my comment that is just depends on what works best in your environment.

The notifications are stored in the database, but unless you know what you are looking for, it is difficult to be precise. You could always truncate the table in the future if it is worth the possibility of losing notifications.

As for why Nagios could not talk to your mail server, you'll need to talk to your Exchange admin about that, because your profile didn't show any issues with the database.

As for the performance stuff, yes, I think that should be a different thread.
dfmco
Posts: 257
Joined: Wed Dec 04, 2013 11:05 am

Re: URGENT! Nagios flooding mail server but mailq is empty!

Post by dfmco »

I did not catch the 2016 date. I am subscribed to the channel and got a notice of the video last week. Good catch! You are correct that they never said that SMTP was better but since this was the method documented, I assumed that it would be recommended over the other. Could you answer the second part of my question about the positives/negatives of each method so I can make the best choice? A pointer to a document would be great if that saves you some time. From what you told me, Sendmail has better logging but are there any other drivers? I tried to enable Sendmail and I get no test notifications. I did a quick read through some Sendmail docs and don't see the problem immediately. Also, isn't Postfix being used in the background replacing Sendmail? Curious as I want to make sure I am reading the correct documentation. Once you send me the documentation, I will review and if I decide to go that route, I will open another thread to figure out why Sendmail is not working on any of my servers. Since you are on the front line to solving problems, I am leaning to your recommendation but I do want to document the reasons why for my company in case I am hit by a bus!

When you say "The notifications are stored in the database", that is EXACTLY what I was looking for! Can you tell me where these are and how to wipe that table? I have had several occasions where a customer has done something to cause the monitoring to register a mass alert and we have always had to "wait it out". During that time, we have missed legitimate alarms for other clients. If I had a way to clear the notifications, that is all I was after for this thread. Will this get rid of all notifications in history as well or just ones queued to be sent? Please send the commands with the database and table info if you don't mind as my mysql is a bit rusty.

The reason you saw that Nagios can't talk to Exchange is that I put a block on the firewall denying outbound SMTP to Exchange from Nagios until we could resolve the problem. I have since disabled that rule and SMTP connects normally again.

Thanks to both of you for all the help so far. I appreciate the work you all do.
Locked