Ghost host and services issue
Posted: Fri Sep 11, 2015 9:53 am
Good morning, guys. I'm having a really hard time with two files being recreated whenever I click apply configuration. I've dealt with ghost hosts and services before, but it's never given me this much trouble.
A host was added a while back by somebody else on our team. I had given him a quick crash course on how to add hosts, and he added a few switches with the snmpwalk wizard. (It was our network supervisor. Very competent individual, not just some noob I was trusting to not mess up.) He added a few switches and apply the config, but now whenever I try to save new config changes, I get the following error
I can comment out the service starting on line 32, the error just moves down about 16 lines to the next service entry. There's 77 services in this file, so there's a lot of lines to go through. I waded through a dozen or so commenting them out, only to have the error move down to the next service each time. This by itself isn't a big deal. I don't mind deleting and readding the host. Here's where it becomes a problem, though.
In XI, I can delete all the services for this host, and I can delete the host itself. Applying the config causes the files to get re-written and the apply fails with the above error. Following the FAQ entries for ghost hosts, I did the following
After I delete the files from the cli, I then go into the CCM, and under services and hosts I delete everything there regarding this host. 77 services and 1 host are successfully deleted. After I've done that, I go into CCM -> Write Config Tool and manually write the data to file. That returns successfully, no mention of the host 10.228.251.1. Verify my config, that also comes back ok. Click Restart Nagios, it completes without error. At this point, if I click Apply Configuration, I get the error I mentioned above, and those two files I deleted via cli are both back.
I've even tried removing the physical files via cli, removing the entries in XI via CCM, write configs to file under CCM->Tools, complete server reboot, then once the server is back up, I've used a different browser than the one I removed everything in. Just trying to make sure there's nothing in cache anywhere causing things to get re-written. No matter what I do, these two files keep coming back when I click Apply Configuration.
I'm about to the point of making sure I've got current backups of all configs, Deleting all host/service configuration files under CCM->Write Config Files, and then importing all of the other configs. I really don't want to go that route if it can be avoided. Until this incident, Nagios has been working extremely well with minimal fuss, and I'm afraid doing that may just cause other issues. I've restored my oldest backup already, but either the files are just that persistent, or this guy was already added by the time that backup was made. I didn't realize there was a problem with this host until I tried to do some end of summer cleanup in Nagios a few weeks later. Is there anything else I can try instead?
CentOS 6.6
64 bit
Physical server, manual install
No special configs, nothing else running on this machine
I'm seeing two errors in error_log. I don't think they're related to this issue, but I'll let you guys determine that.
and
A host was added a while back by somebody else on our team. I had given him a quick crash course on how to add hosts, and he added a few switches with the snmpwalk wizard. (It was our network supervisor. Very competent individual, not just some noob I was trusting to not mess up.) He added a few switches and apply the config, but now whenever I try to save new config changes, I get the following error
Code: Select all
Error: Service has no hosts and/or service_description (config file '/usr/local/nagios/etc/services/10.228.251.1.cfg', starting on line 32)In XI, I can delete all the services for this host, and I can delete the host itself. Applying the config causes the files to get re-written and the apply fails with the above error. Following the FAQ entries for ghost hosts, I did the following
Code: Select all
killall nagios
sudo rm /usr/local/nagios/etc/hosts/10.228.251.1.cfg
sudo rm /usr/local/nagios/etc/services/10.228.251.1.cfg
sudo service start nagiosI've even tried removing the physical files via cli, removing the entries in XI via CCM, write configs to file under CCM->Tools, complete server reboot, then once the server is back up, I've used a different browser than the one I removed everything in. Just trying to make sure there's nothing in cache anywhere causing things to get re-written. No matter what I do, these two files keep coming back when I click Apply Configuration.
I'm about to the point of making sure I've got current backups of all configs, Deleting all host/service configuration files under CCM->Write Config Files, and then importing all of the other configs. I really don't want to go that route if it can be avoided. Until this incident, Nagios has been working extremely well with minimal fuss, and I'm afraid doing that may just cause other issues. I've restored my oldest backup already, but either the files are just that persistent, or this guy was already added by the time that backup was made. I didn't realize there was a problem with this host until I tried to do some end of summer cleanup in Nagios a few weeks later. Is there anything else I can try instead?
CentOS 6.6
64 bit
Physical server, manual install
No special configs, nothing else running on this machine
I'm seeing two errors in error_log. I don't think they're related to this issue, but I'll let you guys determine that.
Code: Select all
[Fri Sep 11 08:28:17 2015] [error] [client 10.0.0.109] PHP Notice: Undefined variable: sync_table_status in /usr/local/nagiosxi/html/includes/components/ccm/page_templates/ccm_table.php on line 196, referer: https://nagios.psd.ms/nagiosxi/includes/components/ccm/xi-index.phpCode: Select all
[Fri Sep 11 08:46:36 2015] [error] [client ::1] PHP Notice: Undefined index: language in /usr/local/nagiosxi/html/includes/components/ccm/includes/common_functions.inc.php on line 711