Page 1 of 1

Warnings causing service failure

Posted: Wed Jan 29, 2014 10:48 am
by MikeM-2468
With 4.0.2 I'm finding that when a config file has anything in it that causes a warning, the service fails to restart. With 3.x, the warnings were displayed, but the services still started. Only errors caused the service to fail to start. I know that the real fix is to resolve the warnings, but is it supposed to work like this?

Re: Warnings causing service failure

Posted: Wed Jan 29, 2014 11:13 am
by abrist
Can you give us an example?

Re: Warnings causing service failure

Posted: Wed Jan 29, 2014 11:16 am
by MikeM-2468
The following warning causes the service not to start:

Code: Select all

Warning: Service 'Certificate' on host 'web.domain.com'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.

Re: Warnings causing service failure

Posted: Wed Jan 29, 2014 2:30 pm
by lmiltchev
Do you have any config errors (or just warnings)?

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Re: Warnings causing service failure

Posted: Wed Jan 29, 2014 2:36 pm
by MikeM-2468
Just that one warning. It relates to having the normal_check_interval specified in the file for that host. If I remove that, the warning goes away and the service starts.

Re: Warnings causing service failure

Posted: Thu Jan 30, 2014 12:54 pm
by lmiltchev
Can you post the service definition (hide the sensitive info)?

Re: Warnings causing service failure

Posted: Thu Jan 30, 2014 1:07 pm
by MikeM-2468

Code: Select all

define service{
		use							generic-service
		host_name					web.domain.com
		normal_check_interval		43200		; Check the service every month under normal conditions
		service_description			Certificate
		check_command				check_http!-H web.domain.com -S -C 90
		}

Re: Warnings causing service failure

Posted: Thu Jan 30, 2014 1:17 pm
by tmcdonald
There's a pretty good explanation of the issue here.

From the article:
the reason this happens is that nagios can not send a notification based on an "old" data , the notification is relevant to the current status and that is related to the latest check. now if your check interval is smaller then the notification , that means that nagios can do more checks and then send the alert on the latest data .

but if it is the other way around - it means that the alert sent might be based on out-of-date date and hence there is a strong possibility of a "false-positive" and nagios is reminding you of that possibility and also refrains from sending messages if it does not have a more current check .
Basically having your notification lower than your check can cause issues, and you will need to adjust it up accordingly.