Page 1 of 2

Service checks notifications swithcing back to enabled

Posted: Sat Jun 13, 2015 10:52 pm
by krobertson71
Redhat el6.. Nagios XI 2.7

We have been running into a weird issue lately. We have all or ncpa swap partition checks disabled. We collect data we just do not alert on it. For several weeks now, off and on, we will notice that all the swap checks are enabled again, usually an angry admin saying they got alerted. Wont get into the history of that decision.

At first I thought it was that some of our nagios admins were just disabling entire hostgroups instead of using scheduled downtime for patching nights. But I checked this myself last night and scheduling and unscheduling downtime does not enable the service checks again (which is good).

So I have combed through the many logs, subsystem logs, audits, etc.. and can find nothing showing when this is happening.

Is there a known bug with service notifications for one reason or another that could be doing this?

If not, can someone point me in the right direction to try to find out how this is occuring?

Re: Service checks notifications swithcing back to enabled

Posted: Sun Jun 14, 2015 11:32 pm
by Box293
Can you download the History tab component:

https://exchange.nagios.org/directory/A ... ab/details

Upload it via Admin > System Extensions > Manage Components

Now go to one of these services and click the History tab.
Is there any information displayed here that may show if Notifications were enabled?

Re: Service checks notifications swithcing back to enabled

Posted: Mon Jun 15, 2015 10:52 am
by krobertson71
First thanks for the history tab! Some good information there.

Problem:

If I enable and disable notifications on a single service check it will register it on the history tab.

I had to mass disable about 80 service checks yesterday and those are not showing up in the respective history tabs.

I did this by going into the service group information screen and selecting "disable all services in the service group".

I have checked several of those and nothing in the history tab shows this event occurring.

There was a scheduled downtime earlier that day for patching so everything was put into downtime. I am wondering by removing the downtime this somehow enabled the notifications for services that had it disabled before the downtime.

I have gone through all the logs I can think of. I tested this last night manually. I put one host's services into a scheduled downtime, with one of the services having their notifications disabled beforehand.

I then went and removed that downtime via Mass Acknowledgement. The service check remained disabled. This is the same process we follow when scheduling and removing downtime.

Any other ideas would be a great help here!

Re: Service checks notifications swithcing back to enabled

Posted: Mon Jun 15, 2015 11:53 am
by lmiltchev
Problem:

If I enable and disable notifications on a single service check it will register it on the history tab.

I had to mass disable about 80 service checks yesterday and those are not showing up in the respective history tabs.

I did this by going into the service group information screen and selecting "disable all services in the service group".

I have checked several of those and nothing in the history tab shows this event occurring.
You are correct. Disabling notifications for all services in a services group won't show up in the "history tab" component. I will talk to Troy to see if this is something that can be added easily to the component.
There was a scheduled downtime earlier that day for patching so everything was put into downtime. I am wondering by removing the downtime this somehow enabled the notifications for services that had it disabled before the downtime.
I was not able to recreate this issue in house. You said you went through bunch of different logs. Have you tried grepping the nagios.log for the name of the "problem" service? Anything that can give us some clues? Also, have to tried disabling notifications in the CCM (Alert Settings tab)? Does this change "stick"?

Re: Service checks notifications swithcing back to enabled

Posted: Mon Jun 15, 2015 1:02 pm
by krobertson71
Let me say this better.

When I go through the audit log in XI all it shows it a user submitted a cmd to to subsystem. What they submitted is not presented. I have checked the nagios.log, cmdsubsys log, etc.. and I cannot see anything that shows notifications being disabled.

I have grepped the entire log directory looking for "grep -i disable" "grep -i notifications" (lots of hits, not what I was looking for though).

Guess another question is.. is it possible that when we are ending downtime, since what downtime really does is disable notifications, that it is turning all notifications back on?

We have multiple hostgroups for different applications and services. We also have a LIVE hostgroup that contains all the productions hosts and services. When patching time comes around we schedule downtime for the hostgroups and all services.

Re: Service checks notifications swithcing back to enabled

Posted: Mon Jun 15, 2015 1:40 pm
by krobertson71
We are also having another issue around this.. I was going to open another thread, but now it seems to fit into here better. Couple of our admins were trying out using the /import directory to make changes to hosts, like mass updating thresholds to a NCPA CPU check.

What they are doing is taking the service.cfg file for the hosts.. doing a mass edit, and importing them via the import directory over again. Concerns me as these are the same config files that state "Do not edit this file" and it is not using the same format as stated in your "Automation of Hosts and Services" documentation. I am wondering if this could be causing some issues with notification settings.. Here is why:

The front end (GUI) will have the icon that notifications are disabled:
nagiios-notificaions-ccm-2.png
IN CCM it will show the opposite:
nagiios-notificaions-ccm-1.png

Re: Service checks notifications swithcing back to enabled

Posted: Mon Jun 15, 2015 3:07 pm
by abrist
Object config state and runtime state are two separate things. You will most likely find that notifications are disabled in the retention.dat file, even though they are enabled in the CCM. On a restart, Nagios will read the object configs (generated from the CCM) first, writing that information into objects.cache and status.dat. It will then parse the retention.dat file and overwrite any settings that are different in status.dat.

Re: Service checks notifications swithcing back to enabled

Posted: Mon Jun 15, 2015 11:44 pm
by Box293
krobertson71 wrote:Problem:

If I enable and disable notifications on a single service check it will register it on the history tab.

I had to mass disable about 80 service checks yesterday and those are not showing up in the respective history tabs.

I did this by going into the service group information screen and selecting "disable all services in the service group".

I have checked several of those and nothing in the history tab shows this event occurring.
lmiltchev wrote:You are correct. Disabling notifications for all services in a services group won't show up in the "history tab" component. I will talk to Troy to see if this is something that can be added easily to the component.
I've added this to my "to do list", it might take a while to get to this as I'm busy with some other projects.

Re: Service checks notifications swithcing back to enabled

Posted: Tue Jun 16, 2015 7:27 am
by CatalystX
abrist wrote:Object config state and runtime state are two separate things. You will most likely find that notifications are disabled in the retention.dat file, even though they are enabled in the CCM. On a restart, Nagios will read the object configs (generated from the CCM) first, writing that information into objects.cache and status.dat. It will then parse the retention.dat file and overwrite any settings that are different in status.dat.
Umm ... Can you clarify that? Because if configs are correct and CCM is correct, then objects.cache and status.dat should take precedence, shouldn't they?

Re: Service checks notifications swithcing back to enabled

Posted: Tue Jun 16, 2015 8:51 am
by krobertson71
So if retention.dat is off, what is the proper procedure to make sure it is set the way we want it to be since it says at the top of the file not to modify this file?

Can we delete this file, then make changes to the services we want notifications disabled, and have it regenerate?