Host and services still showing but don't exist in CCM - av6
Posted: Mon Oct 26, 2015 11:30 am
I'm currently running XI version 5.2.0 and started experiencing the same behavior after data in /usr/local/nagios/spool/perfdata filled the disk in my XI VM. After moving the data to /var and dropping in a symlink to make things match up, I was able to get things going again and rolled back to a previous config snapshot.
When I tried to delete a switch I had retired after the snapshot, I saw the behavior Louis described. I deleted the device from CCM, validated the config and restarted. However, while the switch was gone from CCM, flat files, and Nagios Core, XI still saw the switch and generated host down notifications. Likewise, a possibly unrelated NRPE problem on the host itself generated notifications from Core, but XI showed everything normal.
The wiki suggests trying this: https://support.nagios.com/wiki/index.p ... _Hosts.29. While the process lets me confirm the device has been removed from Core, it doesn't clean up XI in my case.
In an attempt to clear the switch from XI, I tweaked the deadpool settings to make sure it was included for cleanup. When deadpool.php runs, it skips the switch and logs this message:
I don't market myself as a DBA, but it looks like the host was partially deleted (thus, no primary key, or "ID"), but the main record was wiped out before all of its relationships were cleaned up, leaving the ghost. And, since I'm not a DBA, I'm not going to dig into the schema and drop any row in any table that include a reference to the retired switch. Not sure where to go from here. I could probably reinitialize the whole works and import my flat files, but I'd like to keep my RRD's for history.
Moderator Edit: Link to original thread: https://support.nagios.com/forum/viewto ... 373#158001
I'm going to split this post in two so we can address the two of you separately. Please in the future if you are having a similar issue, create a new thread and link to the related one instead of posting directly in the related thread.
When I tried to delete a switch I had retired after the snapshot, I saw the behavior Louis described. I deleted the device from CCM, validated the config and restarted. However, while the switch was gone from CCM, flat files, and Nagios Core, XI still saw the switch and generated host down notifications. Likewise, a possibly unrelated NRPE problem on the host itself generated notifications from Core, but XI showed everything normal.
The wiki suggests trying this: https://support.nagios.com/wiki/index.p ... _Hosts.29. While the process lets me confirm the device has been removed from Core, it doesn't clean up XI in my case.
In an attempt to clear the switch from XI, I tweaked the deadpool settings to make sure it was included for cleanup. When deadpool.php runs, it skips the switch and logs this message:
Code: Select all
PROCESSING HOSTS...
Processing host 'edge-2-sw.xxx.xxx.com' in stage 2
Error: Could not get ID for host 'edge-2-sw.xxx.xxx.com' - skipping
PROCESSED HOSTS:Moderator Edit: Link to original thread: https://support.nagios.com/forum/viewto ... 373#158001
I'm going to split this post in two so we can address the two of you separately. Please in the future if you are having a similar issue, create a new thread and link to the related one instead of posting directly in the related thread.