Major problem with CCM changes not being committed

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Major problem with CCM changes not being committed

Post by cwscribner »

Hi all.

I'm having a really severe problem with CCM changes not being committed to the config files. On the morning of 9/15/2011, I had 2300 devices showing in Nagios. After getting very irritated with seeing dozens and dozens of pages of the CCM devices as "Sync Missed" I decided to do some troubleshooting and poking around in the documentation. I ran through the troubleshooting suggestions in the FAQ including restarting Nagios and running reconfigure_nagios.sh. As of this morning (9/16/2011) the device count is 3140. That's 840 devices committed in 24 hours!!! And the time the config was last changed according to Nagios...is 9/1/2011...2 weeks ago! The "Apply Configuration" in the CCM is executed several times a day and it seemingly does nothing in terms of applying devices. I've also restarted Nagios several times prior to yesterday and none of the changes have committed. It would be awesome if someone could explain why this is happening and how I can fix it.


P.S. (Oncoming gripe) I noticed in the FAQ, there's a solution to making mass edits by using MySQL directly. That would've been helpful information when I posted a few weeks ago looking for a mass edit solution.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Major problem with CCM changes not being committed

Post by mguthrie »

Does your installation utilize a proxy?

What version of XI are you using?

Can you post the output from:

Code: Select all

cd /usr/local/nagiosxi/scripts
./reconfigure_nagios.sh &> output.txt
And then also, when you run the wizard, right before completing the wizard run:

Code: Select all

tail -f /usr/local/nagiosxi/var/cmdsubsys.log | tee cmdoutput.txt
And then Apply Configuration. Once the logging output appears to have stopped from running the wizard, go ahead and post the output from both of these.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Major problem with CCM changes not being committed

Post by cwscribner »

No proxy.

2011R1.6

I'll run the commands and post output later today. My boss is currently doing a Nagios demo for a client using the server in question.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Major problem with CCM changes not being committed

Post by cwscribner »

Output files attached.
You do not have the required permissions to view the files attached to this post.
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Major problem with CCM changes not being committed

Post by nscott »

Can you check the permissions on your /usr/local/nagios/etc/import directory? Post the results here.
Nicholas Scott
Former Nagios employee
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Major problem with CCM changes not being committed

Post by cwscribner »

nscott wrote:Can you check the permissions on your /usr/local/nagios/etc/import directory? Post the results here.

Code: Select all

drwsrwsr-x 2 apache nagios  20480 Sep 15 14:06 import
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Major problem with CCM changes not being committed

Post by mguthrie »

It looks like there are errors in the object configuration, as well as some permissions issues:
Error: Could not find any hostgroup matching 'desktop-mgmt-systems' (config file '/usr/local/nagios/etc/hosts/wconsccmp01.cfg', starting on line 14)
Error processing object config files!


***> One or more problems was encountered while processing the config files...

Check your configuration file(s) to ensure that they contain valid
directives and data defintions. If you are upgrading from a previous
version of Nagios, you should be aware that some variables/definitions
may have been removed or modified in this version. Make sure to read
the HTML documentation regarding the config files, as well as the
'Whats New' section to find out what has changed.
RET: 254
/usr/local/nagiosxi/nom/checkpoints/nagioscore/errors /usr/local/nagiosxi/scripts
tar: Removing leading `/' from member names
/usr/local/nagiosxi/scripts
LATEST NOM SNAPSHOT: /usr/local/nagiosxi/nom/checkpoints/nagioscore/1316369823.tar.gz
/ /usr/local/nagiosxi/scripts
RESTORING NOM SNAPSHOT : /usr/local/nagiosxi/nom/checkpoints/nagioscore/1316369823.tar.gz
tar: usr/local/nagios/etc/static: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/static: Cannot change mode to rwsrwsr-x: Operation not permitted
tar: usr/local/nagios/etc/import: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/import: Cannot change mode to rwsrwsr-x: Operation not permitted
tar: usr/local/nagios/etc/hosts: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/hosts: Cannot change mode to rwsrwsr-x: Operation not permitted
tar: usr/local/nagios/etc/pnp/check_commands: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/pnp/check_commands: Cannot change mode to rwxrwxr-x: Operation not permitted
tar: usr/local/nagios/etc/pnp/pages: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/pnp/pages: Cannot change mode to rwxrwxr-x: Operation not permitted
tar: usr/local/nagios/etc/pnp: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/pnp: Cannot change mode to rwxrwxr-x: Operation not permitted
tar: usr/local/nagios/etc/services: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/services: Cannot change mode to rwsrwsr-x: Operation not permitted
tar: usr/local/nagios/etc: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc: Cannot change mode to rwxrwsr-x: Operation not permitted
tar: Error exit delayed from previous errors
/usr/local/nagiosxi/scripts
Fix #1:
You'll need to correct the error in the configuration in order for it to actually Apply. If Nagios XI detects that you're trying to apply a bad config, it will roll back the changes in the files to the last known good configuration in order to keep Nagios running.
http://support.nagios.com/wiki/index.ph ... leshooting
Let us know if you need help debugging the configuration snapshot.

Fix #2: There appear to be several permissions issues. That are preventing configuration from Applying. The log files revealed that the configs generated by the wizard are either not being written or read from the /usr/local/nagios/etc/import directory. We have a script that's in a final beta stage for fixing all Nagios related permissions on a system. The script is attached. Can you post the ls -l output from the /usr/local/nagiosxi/scripts directory?
You do not have the required permissions to view the files attached to this post.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Major problem with CCM changes not being committed

Post by cwscribner »

Could you give me a bit more direction on fix #2? I tried running

Code: Select all

python fix_global_perms.py --nagioscore
But it returned...

Code: Select all

Cannot open nagioscoreperms.fix. Make sure it is in the same directory and rerun the script.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Major problem with CCM changes not being committed

Post by cwscribner »

So I definitely need some help fixing the configuration errors. There are two different hostgroups that "cannot be found" when attempting to apply configuration via ./reconfigure_nagios.sh. I tried removing the host configuration files but I still get the hostgroup errors. Neither of the hostgroups show up in hostgroups.cfg either so I'm not sure where to go in trying to fix this.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Major problem with CCM changes not being committed

Post by mguthrie »

What you need to do is access the Configure->Configuration Snapshots page, and take a look at the most recent "bad snapshot". There is a text file that displays the error output from the verification check, and then the tarball has the bad config files in it. If you're looking for the bad configs on the actual filesystem you won't find them, since they've already been rolled back. The changes need to be made in the Core Config Manager, and then written to file.

Another way that this can be done would be to use the Write Config Tool to manually the configurations to file, then verify them to find the error. Continue going back into the CCM and looking for/correcting the error until verification completes successfully.

Feel free to email us the bad snapshot tarball and the text file if you need help debugging.

I'll have the tech who wrote the fix_global_permission script reply to that part of the thread. In the meantime, I might also recommend running:

Code: Select all

/usr/local/nagiosxi/scripts/reset_config_perms
and see if that helps.

Also, are you using check_mk on your system at all?
Locked