Nagios Support Forum

Posted: **Mon Oct 26, 2015 11:30 am**

I'm currently running XI version 5.2.0 and started experiencing the same behavior after data in /usr/local/nagios/spool/perfdata filled the disk in my XI VM. After moving the data to /var and dropping in a symlink to make things match up, I was able to get things going again and rolled back to a previous config snapshot.

When I tried to delete a switch I had retired after the snapshot, I saw the behavior Louis described. I deleted the device from CCM, validated the config and restarted. However, while the switch was gone from CCM, flat files, and Nagios Core, XI still saw the switch and generated host down notifications. Likewise, a possibly unrelated NRPE problem on the host itself generated notifications from Core, but XI showed everything normal.

The wiki suggests trying this: https://support.nagios.com/wiki/index.p ... _Hosts.29. While the process lets me confirm the device has been removed from Core, it doesn't clean up XI in my case.

In an attempt to clear the switch from XI, I tweaked the deadpool settings to make sure it was included for cleanup. When deadpool.php runs, it skips the switch and logs this message:

Code: Select all

PROCESSING HOSTS...
Processing host 'edge-2-sw.xxx.xxx.com' in stage 2
Error: Could not get ID for host 'edge-2-sw.xxx.xxx.com' - skipping
PROCESSED HOSTS:

I don't market myself as a DBA, but it looks like the host was partially deleted (thus, no primary key, or "ID"), but the main record was wiped out before all of its relationships were cleaned up, leaving the ghost. And, since I'm not a DBA, I'm not going to dig into the schema and drop any row in any table that include a reference to the retired switch. Not sure where to go from here. I could probably reinitialize the whole works and import my flat files, but I'd like to keep my RRD's for history.

Moderator Edit: Link to original thread: https://support.nagios.com/forum/viewto ... 373#158001
I'm going to split this post in two so we can address the two of you separately. Please in the future if you are having a similar issue, create a new thread and link to the related one instead of posting directly in the related thread.

Posted: **Mon Oct 26, 2015 3:15 pm**

Lets repair the mysql database and restart the services to see if that resolves the problem. Run the following in a shell as root on the XI server.

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh
service nagios stop
killall -9 nagios
service ndo2db stop
service mysqld restart
service ndo2db start
service nagios start

Check and see if the hosts and services are in the CCM and if they are, delete the services and then the hosts and see if the apply config works to remove them.

Posted: **Mon Oct 26, 2015 5:01 pm**

tgriep wrote:Check and see if the hosts and services are in the CCM and if they are, delete the services and then the hosts and see if the apply config works to remove them.

No change. The host is not present in CCM, Core or flat files, but still exists in XI's host detail lists. Oddly, both the last check and next scheduled check times reflect the date when the switch was powered off.

Posted: **Mon Oct 26, 2015 5:37 pm**

Honestly, I'm pretty sure upgrading to XI 5 (or 2014R2.7) should fix your issue. XI 2012 uses Core 3.x and it wasn't as strict with it's configs.

Is there any possibility to upgrade?

Posted: **Tue Oct 27, 2015 10:25 am**

Box293 wrote:Is there any possibility to upgrade?

I'm already on XI 5.2.0 with Core 4.1.1. I upgraded weeks ago.

Posted: **Tue Oct 27, 2015 3:02 pm**

Go to the CCM->Tools->Write Config Files, then click on "Delete", "Write", and "Verify" buttons (in the exact same order!), and Apply Configuration. Check to see if the host/services still show in the GUI.

Posted: **Tue Oct 27, 2015 3:09 pm**

Another place to look for that host is in the retention.dat file. It may have been corrupted when the system's drive filled up.
Here is where the file is located on the system.

Code: Select all

/usr/local/nagios/var/retention.dat

Open it up and search for that host. If it is in there, stop the nagios process

Code: Select all

service nagios stop

remove the entries from that file and save it.
Restart nagios and see if the host is gone.

Code: Select all

service nagios start

Let us know what you find.

Posted: **Tue Oct 27, 2015 4:16 pm**

Rewriting the config files had no effect.

There were no references to the host in retention.dat

The host no longer appears in Nagios Core-- it only hangs around in the XI frontend (i.e.: /nagiosxi/includes/components/xicore/status.php).

Posted: **Wed Oct 28, 2015 12:01 am**

Lets look in the database. Using this command I can find the host with the alias win2008r2-01, can you run it for your host name and see if it appears. Make sure the cAsE is correct. You may need to use display_name instead of alias.

Code: Select all

echo "select * from nagios_hosts where alias like 'win2008r2-01' \G;" | mysql -pnagiosxi nagios

Please post the output.

Posted: **Wed Oct 28, 2015 12:09 am**

tgriep wrote:Another place to look for that host is in the retention.dat file. It may have been corrupted when the system's drive filled up.
Here is where the file is located on the system.
Code: Select all
/usr/local/nagios/var/retention.dat
Open it up and search for that host. If it is in there, stop the nagios process
Code: Select all
service nagios stop
remove the entries from that file and save it.
Restart nagios and see if the host is gone.
Code: Select all
service nagios start
Let us know what you find.

So that worked but the moment I applied config in CCM it came back.

Box293 wrote:Lets look in the database. Using this command I can find the host with the alias win2008r2-01, can you run it for your host name and see if it appears. Make sure the cAsE is correct. You may need to use display_name instead of alias.
Code: Select all
echo "select * from nagios_hosts where alias like 'win2008r2-01' \G;" | mysql -pnagiosxi nagios
Please post the output.

[root@nagios ~]# echo "select * from nagios_hosts where alias like 'Brisbane CORE Router' \G;" | mysql -pnagiosxi nagios
*************************** 1. row ***************************
host_id: 91939
instance_id: 1
config_type: 1
host_object_id: 227
alias: Brisbane CORE Router
display_name: Brisbane CORE Router
address: <redacted>
check_command_object_id: 54
check_command_args: 3000.0!80%!5000.0!100%
eventhandler_command_object_id: 0
eventhandler_command_args:
notification_timeperiod_object_id: 115
check_timeperiod_object_id: 115
failure_prediction_options:
check_interval: 2
retry_interval: 1
max_check_attempts: 3
first_notification_delay: 0
notification_interval: 60
notify_on_down: 1
notify_on_unreachable: 1
notify_on_recovery: 1
notify_on_flapping: 1
notify_on_downtime: 1
stalk_on_up: 0
stalk_on_down: 0
stalk_on_unreachable: 0
flap_detection_enabled: 1
flap_detection_on_up: 1
flap_detection_on_down: 1
flap_detection_on_unreachable: 1
low_flap_threshold: 0
high_flap_threshold: 0
process_performance_data: 1
freshness_checks_enabled: 0
freshness_threshold: 0
passive_checks_enabled: 1
event_handler_enabled: 1
active_checks_enabled: 1
retain_status_information: 1
retain_nonstatus_information: 1
notifications_enabled: 1
obsess_over_host: 1
failure_prediction_enabled: 1
notes:
notes_url:
action_url:
icon_image: switch.png
icon_image_alt:
vrml_image:
statusmap_image: switch.png
have_2d_coords: 0
x_2d: -1
y_2d: 0
have_3d_coords: 0
x_3d: 0
y_3d: 0
z_3d: 0

Nagios Support Forum

Host and services still showing but don't exist in CCM - av6

Host and services still showing but don't exist in CCM - av6

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM

Re: Host and services still showing but don't exist in CCM