nagios bug

solarflow · Post by **solarflow** » Fri Dec 27, 2013 12:17 am

I think I might have found a bug in nagios, this is version 3.4.1 however. What happens is that when a host goes down, the notification tries to send but times out, yet the host UP notifications always work so I can't see it being a sendmail problem. I've tried many things and still can't find the problem, increasing the timeout to 600 doesn't help. Running the command from linux works perfectly. Here's what the logs show:

HOST NOTIFICATION: nagiosadmin;Router;DOWN;notify-host-by-email;(Host Check Timed Out)
[1388111250] Warning: Contact 'nagiosadmin' host notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: Router\nState: DOWN\nAddress: 192.168.2.10\nInfo: (Host Check Timed Out)\n\nDate/Time: Thu Dec 26 21:26:29 EST 2013\n" | /bin/mail -s "** PROBLEM Host Alert: Router is DOWN **" [email protected]' timed out after 60 seconds

And if I set host_notification_options = n then it just fails on the service_notification instead:

SERVICE NOTIFICATION: nagiosadmin;hp1810-SW;Port 1 Link Status;CRITICAL;notify-service-by-email;SNMP CRITICAL - *down(2)*
[1388120076] Warning: Contact 'nagiosadmin' service notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\n\nService: Port 1 Link Status\nHost: hp1810-8g\nAddress: 192.168.10.2\nState: CRITICAL\n\nDate/Time: Thu Dec 26 23:53:35 EST 2013\n\nAdditional Info:\n\nSNMP CRITICAL - *down(2)*\n" | /bin/mail -s "** PROBLEM Service Alert: hp1810-8g/Port 1 Link Status is CRITICAL **" [email protected]' timed out after 60 seconds

As soon as connectivity is restored, all the recovery emails come in.

in Templates.cfg:

service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email

tmcdonald · Post by **tmcdonald** » Fri Dec 27, 2013 12:22 pm

Have you filed a bug report for this yet? It certainly does not seem like expected behavior. Can you expand on "Running the command from linux works perfectly"? Do you mean running the whole /bin/mail command works?

slansing · Post by **slansing** » Fri Dec 27, 2013 12:26 pm

Can you show us the output from your maillog of the mail actually timing out?

solarflow · Post by **solarflow** » Fri Dec 27, 2013 3:46 pm

I haven't files a bug report since I thought I should make sure it was really a bug or if someone else has the same problem. Here is the output from maillog and what happens when I run the whole command from linux:

sendmail[17867]: rBRKZGBl017867: from=nagios, size=0, class=0, nrcpts=0, relay=nagios@localhost

And here is when the recovery emails come in:

sendmail[18190]: rBRKeS6Y018190: from=nagios, size=430, class=0, nrcpts=1, msgid=<[email protected]>, relay=nagios@localhost
solarflow sendmail[18191]: rBRKeSVd018191: from=<[email protected]>, size=673, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
solarflow sendmail[18190]: rBRKeS6Y018190: to=[email protected], ctladdr=nagios (496/496), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=30430, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (rBRKeSVd018191 Message accepted for delivery)

And here is when I run it from the command line:

$ /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: Router\nState: DOWN\nAddress: 192.168.2.10\nInfo: (Host Check Timed Out)\n\nDate/Time: Fri Dec 27 15:35:16 EST 2013\n" | /bin/mail -s "** PROBLEM Host Alert: Router is DOWN **" [email protected]

$ mail
Heirloom Mail version 12.4 7/29/08. Type ? for help.
"/var/spool/mail/root": 2 messages 1 new
1 Mail System Internal Fri Dec 27 15:45 13/544 "DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA"
>N 2 root Fri Dec 27 15:45 28/931 "** PROBLEM Host Alert: Router is DOWN **"

scottwilkerson · Post by **scottwilkerson** » Mon Dec 30, 2013 9:11 am

In your first message it looks like you have a single ' after the email address followed by some other info. This doesn't seem to beproperly formatted, can you post your notify-host-by-email command

HOST NOTIFICATION: nagiosadmin;Router;DOWN;notify-host-by-email;(Host Check Timed Out)
[1388111250] Warning: Contact 'nagiosadmin' host notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: Router\nState: DOWN\nAddress: 192.168.2.10\nInfo: (Host Check Timed Out)\n\nDate/Time: Thu Dec 26 21:26:29 EST 2013\n" | /bin/mail -s "** PROBLEM Host Alert: Router is DOWN **" [email protected]' timed out after 60 seconds

solarflow · Post by **solarflow** » Sun Jan 12, 2014 5:09 pm

Just to provide some closure to this issue, the problem seems to stem from DNS not being available. So something about the way sendmail delivers messages locally even with host entires in /etc/hosts it still queries DNS anyways, if it can't reach it nothing goes in the mailq and silently fails. In my tests sendmail would not devilver unless it got a NXDOMAN response. There's probably a configuration option to change this, but postfix seemed to handle it better, and it's listed to replace sendmail as the default MTA in rhel and fedora anyways.

Thaks for the help ...

tmcdonald · Post by **tmcdonald** » Mon Jan 13, 2014 10:11 am

Thanks for getting back to us! Glad to see you got it working. Yea, postfix seems to be the preferred MTA these days, so I'm not surprised. Good to see some empirical evidence though.

I'm going to lock this up now, but feel free to open another if you have more questions.

Nagios Support Forum

nagios bug

nagios bug

Re: nagios bug

Re: nagios bug

Re: nagios bug

Re: nagios bug

Re: nagios bug

Re: nagios bug