Lost Persistent Comments\Acknowledged Service Issues

Engage with the community of users including those using the open source solutions.
Includes Nagios Core, Plugins, and NCPA

Lost Persistent Comments\Acknowledged Service Issues

Postby yavin » Wed Dec 05, 2012 3:21 pm

I have started to have a strange problem with our Nagios 3.3 installation. When restarting nagios (via a service stop/start or restart) I am occasionally losing all of my persistent acknowledgement history and comments. So this does not happen every time but I would guess in the last 15 restarts I have lost these about 3 times now. Not only do I lose the historical data of acknowledgments which is painful enough but then all of the known service issues (of which there are LOTS) all have to be re-acknowledged to clean the system so real issues can identified. I see comments being written to the nagios' status.dat file but am having a hard time finding documentation on how these are preserved across restarts so I can focus on the point of where the transaction is breaking down. Any suggestions at all would be appreciated, thanks!
yavin
 
Posts: 2
Joined: Wed Dec 05, 2012 2:30 pm

Re: Lost Persistent Comments\Acknowledged Service Issues

Postby slansing » Wed Dec 05, 2012 4:59 pm

Just to be clear this is happening when you:

Code: Select all
service nagios stop

service nagios start


and you are NOT shutting the whole server down? This can happen when you unexpectedly shut the entire system down.

When you do these nagios restarts what are you doing before which requires this?
slansing
 
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Lost Persistent Comments\Acknowledged Service Issues

Postby yavin » Thu Dec 06, 2012 2:34 am

I have experienced the issue both while doing individual 'service nagios stop' and 'service nagios start' and doing a 'service nagios restart' (which appears to just automate the stop and start process). Our nagios configurations files are built from a remote cacti database and a restart has been historically used to push the changes from cacti into nagios. While investigating the start and stop scripts a little closer tonight I did find two pieces of mis-configured variables in our /etc/rc.d/init.d/nagios scripts (btw I have recently inherited responsibility of this system so I am unsure how or when this came to be):

NagiosStatusFile=/var/log/nagios/status.dat
NagiosCommandFile=/var/log/nagios/rw/nagios.cmd

These two entries were pointing to invalid targets and I just corrected them to match the locations listed in the nagios.cfg file. All the other entries appear to be correct. Reading through the nagios documentation and start up scripts it appears both of these files should be deleted each time nagios is stopped. So I guess my question is that if they were not being deleted what is the effect of this on a subsequent start? Could this intermittently confuse nagios when it came back into service if these files still existed? I can see if this resolves the problem but I would rather avoid as much trial and error as possible with the re-acknowledgement process being so time consuming. :-)
yavin
 
Posts: 2
Joined: Wed Dec 05, 2012 2:30 pm


Return to Community Support

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 14 guests