Page 1 of 1

Service Escalation Notification on Critical Only

Posted: Tue Jan 07, 2014 11:58 am
by Brick
Running Nagios 3.2.1

In my system my service escalation options are set for unknown or critical and the first notification is set to 3

The problem is that when a service is in warning state for two notification and then switches to critical for one notification then this counts as three and alerts.

I would like Nagios to only alert when it has received three critical notifications.

I found an old work around here- http://tracker.nagios.org/view.php?id=163 but it appears to require a reinstall of Nagios to make the change...

Has there been a fix for this issue since? or is there an easier way of achieving this?

Re: Service Escalation Notification on Critical Only

Posted: Tue Jan 07, 2014 1:00 pm
by sreinhardt
This would require recompiling core, you are correct, but not a true reinstall as everything else would stay the same. In fact I don't think you would even need to recompile the cgi's just the core binary engine its self. Unfortunately I don't really see this being implemented in core, as there are people that want warnings and escalations for them, and having a warning->critical state change count reset, would cause undue latency between when an issue occurs and when the issue is notified to contacts. Certainly feel free to patch this in, if it is something you are looking for but short of adding a config option to enable this behavior with a default of it not resetting, again I highly doubt this would be added to core. :(

Re: Service Escalation Notification on Critical Only

Posted: Mon Feb 17, 2014 6:53 am
by Brick
I've now tested this out but my code doesn't appear to be working, can anyone shed any light on this?

This is the code section of checks.c I am working on-

The bit I have added is the last three lines just inside the closing bracket of the main 'if' statement

Code: Select all

	/* a state change occurred... */
	/* reset last and next notification times and acknowledgement flag if necessary, misc other stuff */
	if(state_change == TRUE || hard_state_change == TRUE) {

		/* reschedule the service check */
		reschedule_check = TRUE;

		/* reset notification times */
		temp_service->last_notification = (time_t)0;
		temp_service->next_notification = (time_t)0;

		/* reset notification suppression option */
		temp_service->no_more_notifications = FALSE;

		if(temp_service->acknowledgement_type == ACKNOWLEDGEMENT_NORMAL && (state_change == TRUE || hard_state_change == FALSE)) {

			temp_service->problem_has_been_acknowledged = FALSE;
			temp_service->acknowledgement_type = ACKNOWLEDGEMENT_NONE;

			/* remove any non-persistant comments associated with the ack */
			delete_service_acknowledgement_comments(temp_service);
			}
		else if(temp_service->acknowledgement_type == ACKNOWLEDGEMENT_STICKY && temp_service->current_state == STATE_OK) {

			temp_service->problem_has_been_acknowledged = FALSE;
			temp_service->acknowledgement_type = ACKNOWLEDGEMENT_NONE;

			/* remove any non-persistant comments associated with the ack */
			delete_service_acknowledgement_comments(temp_service);
			}

		/* do NOT reset current notification number!!! */
		/* hard changes between non-OK states should continue to be escalated, so don't reset current notification number */
		/*temp_service->current_notification_number=0;*/
		
	if(temp_service->current_state == STATE_WARNING) {
      
      temp_service->current_notification_number=0;
      }
		
		}
I've also tried setting it to "temp_service->current_notification_number=-1;" but this hasn't worked either- it still counts warnings as notifications


Incidently I was wondering if I would be better placed to alter the very start of the if statement to simply exclude warning state changes... Would this be a better way of doing this? Or is there another even better way?

Re: Service Escalation Notification on Critical Only

Posted: Mon Feb 17, 2014 5:50 pm
by sreinhardt
Could you clarify what lines you added\modified, its pretty hard to see which ones you mean. Thanks!

Re: Service Escalation Notification on Critical Only

Posted: Tue Feb 18, 2014 4:38 am
by Brick
Appologies, this is the bit I added-

Code: Select all

	if(temp_service->current_state == STATE_WARNING) {
      
      temp_service->current_notification_number=0;
      }

Re: Service Escalation Notification on Critical Only

Posted: Tue Feb 18, 2014 11:39 am
by sreinhardt
I am looking at core 4 source not 3.5 so we might have a few line differences. Looking further down the code, you might have to change a few other places to check for warning as well.

Line 512

Code: Select all

/* increment the current attempt number if this is a soft state (service was rechecked) */
if(temp_service->state_type == SOFT_STATE && (temp_service->current_attempt < temp_service->max_attempts))
[add an "if (temp_service->current_state != STATE_WARNING)" to disable increasing check attempts for warning statuses, don't forget to add { } for the first if statement]
	temp_service->current_attempt = temp_service->current_attempt + 1;
This change might cause warning states to go into a continual state of immediate retry checks though, as they would be in a perpetual soft state with 0 attempts made. I would say definitely test this out before implementing in prod.

Re: Service Escalation Notification on Critical Only

Posted: Thu Feb 20, 2014 5:19 am
by Brick
Hmmm, it doesn't appear to be working as planned...

I think its because the if statement it is in is actually based on state change- so its only executed when the state changes, not when it notifies... so obviously when the state changes from warning to critical the 'set to 0' command doesn't execute because it is no longer in a warning state... But I've a few ideas I still need to try :-)

One quick question that will be a real help to me though- every time I make a change to this I basically do a complete reinstall of the whole nagios system- i.e. remove the sysconfdir and the localstatedir and do a full configure/make all/make install/make install-init/make install-config/make install-commandmode/make install-webconf!

Now I'm assuming this is complete overkill but I don't want to risk missing the change actually being made! Can anyone tell me what the minimum I need to do to get changes to the check.c file into the live code is?

Thanks!

Re: Service Escalation Notification on Critical Only

Posted: Thu Feb 20, 2014 3:11 pm
by sreinhardt
Bare minimum would be the following, on a system that is already installed and had ./configure run on it:

make clean
make all; make install

The rest are not really needed in this case, but do the following:

make install-init - installs init scripts
make install-config - installs basic configs, probably don't want to do this on an existing system
make install-commandmode - installs nagios.cmd, would be in place and have permissions already
make install-webconf - installs cgi's, which you are not modifying

Hope that helps!