[SOLVED] - Host notifications not always sent

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
tapp
Posts: 5
Joined: Mon Feb 03, 2014 1:44 pm

[SOLVED] - Host notifications not always sent

Post by tapp »

Hi,

I have installed nagios v4.0.2, and I'm having problems on some hosts, whose notifications are not always sent.

This is an excerpt from the log, where you can see the entries related to a particular host:

Code: Select all

May 21 15:53:03 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds
May 21 15:53:28 eureka nagios: HOST ALERT: srvext56;UP;SOFT;2;HTTP WARNING: HTTP/1.0 400 Bad Request - 1622 bytes in 0.191 second response time
May 21 16:55:43 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds
May 21 16:56:07 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;2;CRITICAL - Socket timeout after 10 seconds
May 21 16:57:17 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;3;CRITICAL - Socket timeout after 10 seconds
May 21 16:58:27 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;4;CRITICAL - Socket timeout after 10 seconds
May 21 16:59:37 eureka nagios: HOST ALERT: srvext56;DOWN;HARD;5;CRITICAL - Socket timeout after 10 seconds
     May 21 16:59:37 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext56;DOWN;host-notify-by-email;CRITICAL - Socket timeout after 10 seconds
May 21 17:35:38 eureka nagios: HOST ALERT: srvext56;UP;HARD;1;HTTP WARNING: HTTP/1.0 400 Bad Request - 1622 bytes in 0.190 second response time
May 21 17:37:48 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds
May 21 17:38:35 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;2;CRITICAL - Socket timeout after 10 seconds
May 21 17:39:45 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;3;CRITICAL - Socket timeout after 10 seconds
May 21 17:40:31 eureka nagios: HOST ALERT: srvext56;DOWN;SOFT;4;CRITICAL - Socket timeout after 10 seconds
May 21 17:41:41 eureka nagios: HOST ALERT: srvext56;DOWN;HARD;5;CRITICAL - Socket timeout after 10 seconds
May 21 17:55:38 eureka nagios: HOST ALERT: srvext56;UP;HARD;1;HTTP WARNING: HTTP/1.0 400 Bad Request - 1622 bytes in 0.654 second response time
As you can see, a notification can be found when the host went down (I've indented it), but there aren't any notifications when it went up again.

I'm experiencing this issue with some of the configured servers. However, other servers, sharing the same configurations, produce correct notifications.

The configuration for the host is as follows:

Code: Select all

define host{
       	use                     linux-server            ; Name of host template to use
                                                       	; This host definition will inherit all variables that are defined
                                                        ; in (or inherited by) the linux-server host template definition.
        host_name              srvext56
        alias                  srvext56
       	address                XX.XX.XX.XX
       	}
And the notifications configuration is as follows:

Code: Select all

define contact{
       	contact_name                    nagios-admin
       	alias                           Nagios Admin
       	service_notification_period     24x7
       	host_notification_period        24x7
       	service_notification_options    w,c,r
       	host_notification_options       d,r
        service_notification_commands   notify-by-email
        host_notification_commands	host-notify-by-email
        email                          email_address
        }
The host definition template is:

Code: Select all

define host{
	name                            linux-server    ; The name of this host template
        use                             generic-host    ; This template inherits other values from the generic-host template
        check_period                    24x7            ; By default, Linux hosts are checked round the clock
        max_check_attempts              5              ; Check each Linux host 10 times (max)
        check_interval                  2
        check_command                   check-host-alive-by-http ; Default command to check Linux hosts
        notification_period             workhours       ; Linux admins hate to be woken up, so we only notify during the day
                                                       	; Note that the notification_period variable is being overridden from
                                                       	; the value that is inherited from the generic-host template!
       	notification_interval           0             ; Resend notification every 2 hours
       	notification_options            d,r           ; Only send notifications for specific host states
       	contact_groups                  admins          ; Notifications get sent to the admins by default
       	register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
       	}
The contact group is defined as:

Code: Select all

define contactgroup{
        contactgroup_name       admins
       	alias                   Nagios Administrators
        members                 nagios-admin
        }
The command is defined as follows:

Code: Select all

define command {
    command_name check-host-alive-by-http
    command_line $USER1$/check_http -H $HOSTADDRESS
}
What I've found is that usually there are no notifications when the host is up again (but as I said before, this not applies to all the hosts, although all of them share the same configuration).

Any suggestions?

Thanks in advance.
Last edited by tapp on Mon May 26, 2014 2:05 am, edited 1 time in total.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Host notifications not always sent

Post by sreinhardt »

The only two items based off what we see now, is your host check is returning a warning instead of OK. While the documentation does state that they should be considered an UP state when warning is received, it is possible that there is an error with the logic there, that does not fully consider this a recovery. Could you try getting that host into a down state and submitting a passive ok check result, and seeing if that results in a notification sent?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
tapp
Posts: 5
Joined: Mon Feb 03, 2014 1:44 pm

Re: Host notifications not always sent

Post by tapp »

Hi sreinhardt,

Thanks for your reply. I've tried to do a passive check after getting it to down, and I got the notification.

However, after returning to up state, I've tried to getting it down again, and this time I've received the notification when it went up... :-m

It seems that nagios is showing an erratic behaviour when sending notifications. In the meanwhile, I've found another similar problem with another server, but this time the notification has been sent about 15 hours later!

Nagios detected the host down at 18:24, but didn't send the notification until 09:00:

Code: Select all

May 21 18:24:32 eureka nagios: HOST ALERT: srvext23;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds
May 21 18:25:42 eureka nagios: HOST ALERT: srvext23;DOWN;SOFT;2;CRITICAL - Socket timeout after 10 seconds
May 21 18:26:52 eureka nagios: HOST ALERT: srvext23;DOWN;SOFT;3;CRITICAL - Socket timeout after 10 seconds
May 21 18:27:14 eureka nagios: HOST ALERT: srvext23;DOWN;SOFT;4;CRITICAL - Socket timeout after 10 seconds
May 21 18:27:35 eureka nagios: HOST ALERT: srvext23;DOWN;HARD;5;CRITICAL - Socket timeout after 10 seconds
May 21 18:32:04 eureka nagios: SERVICE ALERT: srvext23;Check Disk;CRITICAL;SOFT;1;Timeout while attempting connection
May 21 18:34:04 eureka nagios: SERVICE ALERT: srvext23;Check Disk;CRITICAL;SOFT;2;Timeout while attempting connection
May 21 18:36:04 eureka nagios: SERVICE ALERT: srvext23;Check Disk;CRITICAL;SOFT;3;Timeout while attempting connection
May 21 18:38:04 eureka nagios: SERVICE ALERT: srvext23;Check Disk;CRITICAL;SOFT;4;Timeout while attempting connection
May 21 18:40:04 eureka nagios: SERVICE ALERT: srvext23;Check Disk;CRITICAL;HARD;5;Timeout while attempting connection
May 22 09:00:14 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext23;DOWN;host-notify-by-email;CRITICAL - Socket timeout after 10 seconds
I'm thinking about completely reinstalling/updating nagios in case something is broken.

Any suggestions?

Thanks in advance.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Host notifications not always sent

Post by slansing »

We would need to see the configurations for that host/template/etc as well to help you figure out what happened. Additionally, a maillog would be nice to see, looks like you have a fairly clean environment if nothing is going down between those hours!
tapp
Posts: 5
Joined: Mon Feb 03, 2014 1:44 pm

Re: Host notifications not always sent

Post by tapp »

Hi slansing,

Thanks for your reply. I put the host configuration and template in the first post. Regarding the maillog, there are no errors or problems found on it. (I mean retries, errors with the MTA, etc.) Please consider that while no notifications are sent for some hosts, in the meantime, notifications are actually sent for other different hosts.

I've been revising the logs, and about 25% of the UP notifications are not sent, and I've found that it's not related to the mta, but nagios: Sometimes, the up notification for one host is not sent, but sometimes is (this shows that it does not seem a configuration issue).

Look at this excerpt from the nagios log:

Code: Select all

[root@eureka ~]# cat /var/log/messages | grep 'srvext21' | grep NOTIF
May 19 16:30:19 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext21;DOWN;host-notify-by-email;CRITICAL - Socket timeout after 10 seconds
May 20 10:56:45 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext21;DOWN;host-notify-by-email;CRITICAL - Socket timeout after 10 seconds
May 20 10:58:20 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext21;UP;host-notify-by-email;HTTP WARNING: HTTP/1.0 400 Bad Request - 1602 bytes in 0.169 second response time
May 20 15:34:24 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext21;DOWN;host-notify-by-email;CRITICAL - Socket timeout after 10 seconds
May 20 15:38:44 eureka nagios: HOST NOTIFICATION: nagios-admin;srvext21;UP;host-notify-by-email;HTTP WARNING: HTTP/1.0 400 Bad Request - 1602 bytes in 0.165 second response time
You can find that after the first DOWN notification, no UP notification was sent. However, the UP alert is present in the log:

Code: Select all

May 19 17:18:20 eureka nagios: HOST ALERT: srvext21;UP;HARD;1;HTTP WARNING: HTTP/1.0 400 Bad Request - 1602 bytes in 1.168 second response time
I'll think about doing a fresh install.

Thanks.
Stuart Watts
Posts: 40
Joined: Wed Sep 25, 2013 7:01 am

Re: Host notifications not always sent

Post by Stuart Watts »

tapp wrote: Nagios detected the host down at 18:24, but didn't send the notification until 09:00:
The host has "notification_period" set to "workhours" which by default is 0900-1700 Monday to Friday. This might explain what you're referring to here.
tapp
Posts: 5
Joined: Mon Feb 03, 2014 1:44 pm

Re: Host notifications not always sent

Post by tapp »

Hi Stuart,

That's a good point!. I think that you've found it. I've set it to 24x7, and I'll post the results in 2-3 days.

Thanks!

Regards.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Host notifications not always sent

Post by slansing »

Excellent, timeperiods are one of those things you need to be careful with as they will be very unforgiving if they are set incorrectly. :)
tapp
Posts: 5
Joined: Mon Feb 03, 2014 1:44 pm

Re: [SOLVED] - Host notifications not always sent

Post by tapp »

Hi Stuart,

This seems fixed now. All the notifications have been sent after two days.

It seems easy when you know how... ;-)

Thanks for your help.


Regards.
Stuart Watts
Posts: 40
Joined: Wed Sep 25, 2013 7:01 am

Re: [SOLVED] - Host notifications not always sent

Post by Stuart Watts »

Good news!

Have a read on notifications. This should give you an idea of the steps Nagios takes in deciding whether to send an alert or not.
Locked