Service Escalation Notification on Critical Only

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Brick
Posts: 26
Joined: Thu Aug 29, 2013 6:02 am

Service Escalation Notification on Critical Only

Post by Brick »

Running Nagios 3.2.1

In my system my service escalation options are set for unknown or critical and the first notification is set to 3

The problem is that when a service is in warning state for two notification and then switches to critical for one notification then this counts as three and alerts.

I would like Nagios to only alert when it has received three critical notifications.

I found an old work around here- http://tracker.nagios.org/view.php?id=163 but it appears to require a reinstall of Nagios to make the change...

Has there been a fix for this issue since? or is there an easier way of achieving this?
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Service Escalation Notification on Critical Only

Post by sreinhardt »

This would require recompiling core, you are correct, but not a true reinstall as everything else would stay the same. In fact I don't think you would even need to recompile the cgi's just the core binary engine its self. Unfortunately I don't really see this being implemented in core, as there are people that want warnings and escalations for them, and having a warning->critical state change count reset, would cause undue latency between when an issue occurs and when the issue is notified to contacts. Certainly feel free to patch this in, if it is something you are looking for but short of adding a config option to enable this behavior with a default of it not resetting, again I highly doubt this would be added to core. :(
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Brick
Posts: 26
Joined: Thu Aug 29, 2013 6:02 am

Re: Service Escalation Notification on Critical Only

Post by Brick »

I've now tested this out but my code doesn't appear to be working, can anyone shed any light on this?

This is the code section of checks.c I am working on-

The bit I have added is the last three lines just inside the closing bracket of the main 'if' statement

Code: Select all

	/* a state change occurred... */
	/* reset last and next notification times and acknowledgement flag if necessary, misc other stuff */
	if(state_change == TRUE || hard_state_change == TRUE) {

		/* reschedule the service check */
		reschedule_check = TRUE;

		/* reset notification times */
		temp_service->last_notification = (time_t)0;
		temp_service->next_notification = (time_t)0;

		/* reset notification suppression option */
		temp_service->no_more_notifications = FALSE;

		if(temp_service->acknowledgement_type == ACKNOWLEDGEMENT_NORMAL && (state_change == TRUE || hard_state_change == FALSE)) {

			temp_service->problem_has_been_acknowledged = FALSE;
			temp_service->acknowledgement_type = ACKNOWLEDGEMENT_NONE;

			/* remove any non-persistant comments associated with the ack */
			delete_service_acknowledgement_comments(temp_service);
			}
		else if(temp_service->acknowledgement_type == ACKNOWLEDGEMENT_STICKY && temp_service->current_state == STATE_OK) {

			temp_service->problem_has_been_acknowledged = FALSE;
			temp_service->acknowledgement_type = ACKNOWLEDGEMENT_NONE;

			/* remove any non-persistant comments associated with the ack */
			delete_service_acknowledgement_comments(temp_service);
			}

		/* do NOT reset current notification number!!! */
		/* hard changes between non-OK states should continue to be escalated, so don't reset current notification number */
		/*temp_service->current_notification_number=0;*/
		
	if(temp_service->current_state == STATE_WARNING) {
      
      temp_service->current_notification_number=0;
      }
		
		}
I've also tried setting it to "temp_service->current_notification_number=-1;" but this hasn't worked either- it still counts warnings as notifications


Incidently I was wondering if I would be better placed to alter the very start of the if statement to simply exclude warning state changes... Would this be a better way of doing this? Or is there another even better way?
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Service Escalation Notification on Critical Only

Post by sreinhardt »

Could you clarify what lines you added\modified, its pretty hard to see which ones you mean. Thanks!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Brick
Posts: 26
Joined: Thu Aug 29, 2013 6:02 am

Re: Service Escalation Notification on Critical Only

Post by Brick »

Appologies, this is the bit I added-

Code: Select all

	if(temp_service->current_state == STATE_WARNING) {
      
      temp_service->current_notification_number=0;
      }
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Service Escalation Notification on Critical Only

Post by sreinhardt »

I am looking at core 4 source not 3.5 so we might have a few line differences. Looking further down the code, you might have to change a few other places to check for warning as well.

Line 512

Code: Select all

/* increment the current attempt number if this is a soft state (service was rechecked) */
if(temp_service->state_type == SOFT_STATE && (temp_service->current_attempt < temp_service->max_attempts))
[add an "if (temp_service->current_state != STATE_WARNING)" to disable increasing check attempts for warning statuses, don't forget to add { } for the first if statement]
	temp_service->current_attempt = temp_service->current_attempt + 1;
This change might cause warning states to go into a continual state of immediate retry checks though, as they would be in a perpetual soft state with 0 attempts made. I would say definitely test this out before implementing in prod.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Brick
Posts: 26
Joined: Thu Aug 29, 2013 6:02 am

Re: Service Escalation Notification on Critical Only

Post by Brick »

Hmmm, it doesn't appear to be working as planned...

I think its because the if statement it is in is actually based on state change- so its only executed when the state changes, not when it notifies... so obviously when the state changes from warning to critical the 'set to 0' command doesn't execute because it is no longer in a warning state... But I've a few ideas I still need to try :-)

One quick question that will be a real help to me though- every time I make a change to this I basically do a complete reinstall of the whole nagios system- i.e. remove the sysconfdir and the localstatedir and do a full configure/make all/make install/make install-init/make install-config/make install-commandmode/make install-webconf!

Now I'm assuming this is complete overkill but I don't want to risk missing the change actually being made! Can anyone tell me what the minimum I need to do to get changes to the check.c file into the live code is?

Thanks!
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Service Escalation Notification on Critical Only

Post by sreinhardt »

Bare minimum would be the following, on a system that is already installed and had ./configure run on it:

make clean
make all; make install

The rest are not really needed in this case, but do the following:

make install-init - installs init scripts
make install-config - installs basic configs, probably don't want to do this on an existing system
make install-commandmode - installs nagios.cmd, would be in place and have permissions already
make install-webconf - installs cgi's, which you are not modifying

Hope that helps!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked