Narrowed it down a bit further. Checked todays logs and there were a series of events from early this morning until I restarted Nagios that were missing from maillog but showing in Nagios Notifications.
However..... for one of the alerts, I am sending email via the mutt command I posted previously "Email-service-event" to one user and via the standard "notify-service-by-email" to another user. The maillog shows the mail being sent for the "notify-service-by-email" command only. So it now looks like there is a problem with Nagios and the way it interacts with mutt?
BTW - I am using mutt becaause I need to masquerade the reply-to address and have different mutt.rc files for each client.
I have updated mutt on the two servers now and the next time I have missing emails, I will check for any mutt processes bedore restarting Nagios.
regards... Fred
Email notifications stopped
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Email notifications stopped
Further Update.
Checked Notifications - Showed that it had sent an email using "Email-service-event" - no corresponding event found in maillog. (in fcat there were 3 days worth - Fri evening to Mon morning)
I then ran the "Email-service-event"command from the command line, saw the entry in the maillog and the email was successfully delivered.
After this, Nagios sent another email using "Email-service-event" - again, no corresponding event found in maillog.
I then restarted Nagios & waited for another event to be generated.
Nagios sent another email using "Email-service-event" saw the entry in the maillog and the email ws successfully delivered.
The fact that I can manually send emails, proves that there is nothing wrong with the mutt client or postfix.
Also the fact that emails from Nagios using mutt only start working again after a Nagios restart indicates to me that Nagios is unable to run the command before the restart.
After the restart, emails are sent correctly.
Fred
Checked Notifications - Showed that it had sent an email using "Email-service-event" - no corresponding event found in maillog. (in fcat there were 3 days worth - Fri evening to Mon morning)
I then ran the "Email-service-event"command from the command line, saw the entry in the maillog and the email was successfully delivered.
After this, Nagios sent another email using "Email-service-event" - again, no corresponding event found in maillog.
I then restarted Nagios & waited for another event to be generated.
Nagios sent another email using "Email-service-event" saw the entry in the maillog and the email ws successfully delivered.
The fact that I can manually send emails, proves that there is nothing wrong with the mutt client or postfix.
Also the fact that emails from Nagios using mutt only start working again after a Nagios restart indicates to me that Nagios is unable to run the command before the restart.
After the restart, emails are sent correctly.
Fred
Re: Email notifications stopped
When you run the "Email-service-event"command from the command line, do you run it as root or as the nagios user?
Can you run the following command while logged in to the Nagios server as the nagios user and post the output?
This will Print the value of all of mutt's configuration options to stdout.
BTW, you can change the from address for the mail command by adding the -r option.
Can you run the following command while logged in to the Nagios server as the nagios user and post the output?
Code: Select all
mutt -DBTW, you can change the from address for the mail command by adding the -r option.
-r address
Sets the From address. Overrides any from variable specified in environment or startup files. Tilde escapes are disabled. The -r address options are passed to the
mail transfer agent unless SMTP is used. This option exists for compatibility only; it is recommended to set the from variable directly instead.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Email notifications stopped
Good point - I ran it as root. It stopped again last night so will try again as nagios
mutt config is attached.
As well the reply-to I use the set realname feature in mutt so that I can control how the name is displayed.
OK - so this is what I found . The emails from mutt are logged in ~/sent
All entries up to 19:36 were sent OK - no further entries generated from nagios after that - even though many more are shown in the Notifications log in Nagios.
I sent an email manually from the command line - it showed up OK in ~/sent and maillog and was delivered OK.
This is the end of the ~/sent file.
It shows an event being sent successfully to two email addresses at 19:21
The interesting thing is that the event at 19:36 was only sent to one address, the second is missing and no further emails are sent after that from Nagios. You can see an entry at 20:18 from root and then my manual email at 06:44.
So after Nagios failed to send the second email, it stopped ending them altogether.
regards... Fred
mutt config is attached.
As well the reply-to I use the set realname feature in mutt so that I can control how the name is displayed.
OK - so this is what I found . The emails from mutt are logged in ~/sent
All entries up to 19:36 were sent OK - no further entries generated from nagios after that - even though many more are shown in the Notifications log in Nagios.
I sent an email manually from the command line - it showed up OK in ~/sent and maillog and was delivered OK.
This is the end of the ~/sent file.
It shows an event being sent successfully to two email addresses at 19:21
The interesting thing is that the event at 19:36 was only sent to one address, the second is missing and no further emails are sent after that from Nagios. You can see an entry at 20:18 from root and then my manual email at 06:44.
So after Nagios failed to send the second email, it stopped ending them altogether.
Code: Select all
Aug 15 19:21:08 nagios postfix/pickup[4572]: 9129940D: uid=500 from=<nagios>
Aug 15 19:21:08 nagios postfix/cleanup[45143]: 9129940D: message-id=<[email protected]>
Aug 15 19:21:08 nagios postfix/qmgr[46981]: 9129940D: from=<[email protected]>, size=886, nrcpt=1 (queue active)
Aug 15 19:21:09 nagios postfix/smtp[45164]: 9129940D: to=<[email protected]>, relay=wmailscan.client.com.au[10.19.3.41]:25, delay=0.44, delays=0.05/0.01/0.37/0.01, dsn=2.0.0, status=sent (250 Message accepted for delivery)
Aug 15 19:21:09 nagios postfix/qmgr[46981]: 9129940D: removed
Aug 15 19:21:09 nagios postfix/pickup[4572]: 8EFDE40D: uid=500 from=<nagios>
Aug 15 19:21:09 nagios postfix/cleanup[45143]: 8EFDE40D: message-id=<[email protected]>
Aug 15 19:21:09 nagios postfix/qmgr[46981]: 8EFDE40D: from=<[email protected]>, size=891, nrcpt=1 (queue active)
Aug 15 19:21:09 nagios postfix/smtp[45164]: 8EFDE40D: to=<[email protected]>, relay=wmailscan.client.com.au[10.19.3.41]:25, delay=0.04, delays=0.02/0/0.01/0, dsn=2.0.0, status=sent (250 Message accepted for delivery)
Aug 15 19:21:09 nagios postfix/qmgr[46981]: 8EFDE40D: removed
Aug 15 19:36:01 nagios postfix/pickup[4572]: E001769A: uid=500 from=<nagios>
Aug 15 19:36:01 nagios postfix/cleanup[41428]: E001769A: message-id=<[email protected]>
Aug 15 19:36:01 nagios postfix/qmgr[46981]: E001769A: from=<[email protected]>, size=878, nrcpt=1 (queue active)
Aug 15 19:36:01 nagios postfix/smtp[41448]: E001769A: to=<[email protected]>, relay=wmailscan.client.com.au[10.19.3.41]:25, delay=0.09, delays=0.05/0.02/0.01/0, dsn=2.0.0, status=sent (250 Message accepted for delivery)
Aug 15 19:36:01 nagios postfix/qmgr[46981]: E001769A: removed
Aug 15 20:18:29 nagios postfix/pickup[4572]: 61D15472: uid=0 from=<root>
Aug 15 20:18:29 nagios postfix/cleanup[20782]: 61D15472: message-id=<[email protected]>
Aug 15 20:18:29 nagios postfix/qmgr[46981]: 61D15472: from=<[email protected]>, size=3134, nrcpt=1 (queue active)
Aug 15 20:18:29 nagios postfix/local[20788]: 61D15472: to=<[email protected]>, orig_to=<root>, relay=local, delay=1108, delays=1107/0.07/0/0.04, dsn=2.0.0, status=sent (delivered to mailbox)
Aug 15 20:18:29 nagios postfix/qmgr[46981]: 61D15472: removed
Aug 16 05:00:01 nagios postfix/pickup[21614]: 9F901AEF: uid=0 from=<root>
Aug 16 05:00:01 nagios postfix/cleanup[19451]: 9F901AEF: message-id=<[email protected]>
Aug 16 05:00:01 nagios postfix/qmgr[46981]: 9F901AEF: from=<[email protected]>, size=781, nrcpt=1 (queue active)
Aug 16 05:00:01 nagios postfix/local[19454]: 9F901AEF: to=<[email protected]>, orig_to=<root>, relay=local, delay=0.2, delays=0.14/0.05/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Aug 16 05:00:01 nagios postfix/qmgr[46981]: 9F901AEF: removed
Aug 16 06:44:15 nagios postfix/pickup[40318]: C040D695: uid=500 from=<nagios>
Aug 16 06:44:15 nagios postfix/cleanup[40319]: C040D695: message-id=<[email protected]>
Aug 16 06:44:15 nagios postfix/qmgr[46981]: C040D695: from=<[email protected]>, size=513, nrcpt=1 (queue active)
Aug 16 06:44:15 nagios postfix/smtp[40322]: C040D695: to=<[email protected]>, relay=wmailscan.client.com.au[10.19.3.41]:25, delay=0.13, delays=0.06/0.05/0.01/0.02, dsn=2.0.0, status=sent (250 Message accepted for delivery)
Aug 16 06:44:15 nagios postfix/qmgr[46981]: C040D695: removed
regards... Fred
You do not have the required permissions to view the files attached to this post.
Re: Email notifications stopped
Haven't used mutt in ages, but are there any debug or verbose flags you could append to your notification command to get some more output? Or just redirect stdout and stderr to a file in case there are errors we don't get to see.
Former Nagios employee
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Email notifications stopped
I've shown that while Nagios is unable to send a notification via mutt, I can manually send an email OK as the nagios user. There are no errors displayed.
There doesn't appear to be any locks or hung mail processes while it is in this state and the fact that mail via mutt starts working again after restarting Nagios would indicate that it can't run that external command. If there is a problem running the mutt command from Nagios (doesn't get a response back, etc) is there something in Nagios that stops it from running that command again ?
As this happens when we are sending emails to more than one user, is it possible to stagger the notifications so that they don't all go out at once?
There doesn't appear to be any locks or hung mail processes while it is in this state and the fact that mail via mutt starts working again after restarting Nagios would indicate that it can't run that external command. If there is a problem running the mutt command from Nagios (doesn't get a response back, etc) is there something in Nagios that stops it from running that command again ?
As this happens when we are sending emails to more than one user, is it possible to stagger the notifications so that they don't all go out at once?
Re: Email notifications stopped
It will only run the command once as it's timed. Nagios doesn't know that your notification command is failing, it just does what it's told.If there is a problem running the mutt command from Nagios (doesn't get a response back, etc) is there something in Nagios that stops it from running that command again ?
Yes, you would need to look into setting up host / service escalations which would end up letting you do a 'tier' like notifications. Try appending 2>&1 | tee /tmp/notify.txt to your notifications command (make sure /tmp/notify.txt is writeable by nagios user). This will give us a bit more information as to what's going on exactly when Nagios tries to execute it.As this happens when we are sending emails to more than one user, is it possible to stagger the notifications so that they don't all go out at once?
Former Nagios Employee
-
Fred Kroeger
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Email notifications stopped
I have abandoned using the mutt email client. There is something seriously wrong here.
I had two notifications setup - one with standard email client and the other with mutt, Both showed as being sent in the Nagios Notifications log, however email log shows that only the email using the standard client went through.
If I restart Nagios, then both will work for a while.
You can close this
I had two notifications setup - one with standard email client and the other with mutt, Both showed as being sent in the Nagios Notifications log, however email log shows that only the email using the standard client went through.
If I restart Nagios, then both will work for a while.
You can close this
Re: Email notifications stopped
Closing.
Be sure to check out our Knowledgebase for helpful articles and solutions!