Nagios stuck, won't check devices in queue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Nagios stuck, won't check devices in queue

Post by cwscribner »

Hi all.

My XI server has within the past week developed some additional odd behavior. I have two devices that are pending and have been as such for about a week. Also, I deleted several hundred devices yesterday through the CCM and those changes aren't showing up either. Oddly enough the Apply Configuration is working fine. No timeouts or anything. Any thoughts on what would cause a seemingly arbitrary stop to device processing?

P.S. Its still checked the devices that are already accounted for. Its just the new changes that haven't been assimilated yet.
Last edited by cwscribner on Mon Feb 20, 2012 4:55 pm, edited 1 time in total.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios stuck, won't check devices in queue

Post by scottwilkerson »

Can we check to see if the configuration files for these hosts/services actually got deleted from
/usr/local/nagios/etc/ ( hosts or services directory)
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Nagios stuck, won't check devices in queue

Post by cwscribner »

According to my co-worker who ran the analysis, there are ~1150 host config files for devices that are not in the database...

How in the heck can there be that much of a disconnect between the interface and actual config files?!
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios stuck, won't check devices in queue

Post by mguthrie »

That is a substantial disconnect. It would be worth checking to verify permissions for files under the nagios/etc directory. You can reset these by running:

Code: Select all

/usr/local/nagiosxi/scripts/reset_config_perms


You might also try test deleting the a few hosts, and make sure there aren't any php timeouts or memory limit being hit when attempting this. You can tail the apache log and see if anything shows up.

Code: Select all

tail -f /var/log/httpd/error_log
Is it common in your environment for large numbers of hosts to be deleted at once?

Do you guys use the "active/inactive" functionality of the Core Config Manager much in your environment for your configs?
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Nagios stuck, won't check devices in queue

Post by cwscribner »

We often do delete hosts in bulk through the CCM. I have nagiosql set to produce ~200 lines to make it easier. When doing an apply config, it doesn't ever time out, but it takes several minutes to complete probably due to the amount of devices.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios stuck, won't check devices in queue

Post by mguthrie »

Unfortunately I don't have any obvious ideas that come to mind as to how so many of those could have gotten deleted by the CCM, but not in the files. It does happen periodically where a host deletion will fail to delete a file correctly, so it's gone from the CCM, but not from the XI interface. However, I've never seen it fail on that scale before.

The host files as safe to physically delete from the XI server. When you attempt to delete a host from the Core Config Manager do you get any error messages at the bottom of the page showing that the deletion failed?
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Nagios stuck, won't check devices in queue

Post by cwscribner »

Nope. I just removed ~400 devices via CCM and not one gave an error. The majority of the devices either have no services associated with them, or they use group associated config files.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios stuck, won't check devices in queue

Post by mguthrie »

Now just to clarify, these devices that you just removed, were the config files for them deleted correctly or do they still appear to be there? (/usr/local/nagios/etc/hosts)
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Nagios stuck, won't check devices in queue

Post by cwscribner »

I haven't done a 1:1 comparison but some are still there. There were two devices that got added last week and they never got past pending. They hung there for ~5 days before I deleted them. I think that's the point when this problem started.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Nagios stuck, won't check devices in queue

Post by cwscribner »

In theory, if I deleted ALL of the host configuration files then did an apply configuration, the database would propogate all of the proper config files, right?
Locked