nagios bug
Posted: Fri Dec 27, 2013 12:17 am
I think I might have found a bug in nagios, this is version 3.4.1 however. What happens is that when a host goes down, the notification tries to send but times out, yet the host UP notifications always work so I can't see it being a sendmail problem. I've tried many things and still can't find the problem, increasing the timeout to 600 doesn't help. Running the command from linux works perfectly. Here's what the logs show:
HOST NOTIFICATION: nagiosadmin;Router;DOWN;notify-host-by-email;(Host Check Timed Out)
[1388111250] Warning: Contact 'nagiosadmin' host notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: Router\nState: DOWN\nAddress: 192.168.2.10\nInfo: (Host Check Timed Out)\n\nDate/Time: Thu Dec 26 21:26:29 EST 2013\n" | /bin/mail -s "** PROBLEM Host Alert: Router is DOWN **" [email protected]' timed out after 60 seconds
And if I set host_notification_options = n then it just fails on the service_notification instead:
SERVICE NOTIFICATION: nagiosadmin;hp1810-SW;Port 1 Link Status;CRITICAL;notify-service-by-email;SNMP CRITICAL - *down(2)*
[1388120076] Warning: Contact 'nagiosadmin' service notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\n\nService: Port 1 Link Status\nHost: hp1810-8g\nAddress: 192.168.10.2\nState: CRITICAL\n\nDate/Time: Thu Dec 26 23:53:35 EST 2013\n\nAdditional Info:\n\nSNMP CRITICAL - *down(2)*\n" | /bin/mail -s "** PROBLEM Service Alert: hp1810-8g/Port 1 Link Status is CRITICAL **" [email protected]' timed out after 60 seconds
As soon as connectivity is restored, all the recovery emails come in.
in Templates.cfg:
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
HOST NOTIFICATION: nagiosadmin;Router;DOWN;notify-host-by-email;(Host Check Timed Out)
[1388111250] Warning: Contact 'nagiosadmin' host notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: Router\nState: DOWN\nAddress: 192.168.2.10\nInfo: (Host Check Timed Out)\n\nDate/Time: Thu Dec 26 21:26:29 EST 2013\n" | /bin/mail -s "** PROBLEM Host Alert: Router is DOWN **" [email protected]' timed out after 60 seconds
And if I set host_notification_options = n then it just fails on the service_notification instead:
SERVICE NOTIFICATION: nagiosadmin;hp1810-SW;Port 1 Link Status;CRITICAL;notify-service-by-email;SNMP CRITICAL - *down(2)*
[1388120076] Warning: Contact 'nagiosadmin' service notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\n\nService: Port 1 Link Status\nHost: hp1810-8g\nAddress: 192.168.10.2\nState: CRITICAL\n\nDate/Time: Thu Dec 26 23:53:35 EST 2013\n\nAdditional Info:\n\nSNMP CRITICAL - *down(2)*\n" | /bin/mail -s "** PROBLEM Service Alert: hp1810-8g/Port 1 Link Status is CRITICAL **" [email protected]' timed out after 60 seconds
As soon as connectivity is restored, all the recovery emails come in.
in Templates.cfg:
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email