Page 1 of 2

Configuration Discrepancies

Posted: Wed Jul 30, 2014 6:53 am
by chrisp
I'm struggling to get our Nagios system to apply config changes & it appears to be hanging on to servers which have been removed...

For example, I deleted several switches from the Nagios config via the XI GUI and they're no longer searchable in the CCM, but they still show up in the running system, mostly going critical/down because we moved them to internal IP ranges and nagios can no longer see them.
2014-07-30 12_26_43-Nagios XI.png
2014-07-30 12_33_08-Nagios XI - Configuration.png
There's also a server which I deleted & remains doggedly in the config but not manageable via the GUI. It's got 24 services all critical, so is very annoying to our out of hours guys, who keep seeing all the red and panicing.

How do I get around this sort of thing?

Due to another weird configuration verification error, I just checked the services in CCM and weirdly there's several of the same service, with no hosts assigned and I don't remember putting them there - we like to have a global service check, that's capable of being applied to a hostgroup or individual hosts, not a service for each host.
2014-07-30 12_39_53-Nagios XI - Configuration.png
Looking at the CoreDNS_53 service there, that was what was causing my config verification to bork and I had to edit and save it, in order to get the config verification to finally go green, but that's when I started to poke around and found the multiple identical service checks shown above.

I am a bit worried that it's got itself (or been helped) into a mess that I don't know how to untangle. Currently, when I apply the config, it works (goes green and applies), but I still have the legacy hosts lingering. Any ideas?

Re: Configuration Discrepancies

Posted: Wed Jul 30, 2014 9:43 am
by tmcdonald
What version of XI are you on?

For the lingering hosts/services, they are called Ghost Hosts and are pretty easy to remove:

http://support.nagios.com/wiki/index.ph ... t_Hosts.29

For the duplicate services with no assigned hosts, are there hosts assigned via templates applied to the services?

Re: Configuration Discrepancies

Posted: Wed Jul 30, 2014 11:35 am
by chrisp
We're on: Nagios XI 2012R2.9

This issue has survived the several reboots I've performed and I just did "killall nagios ; killall nagios" (to be sure), then "service nagios start" and it continues to be an issue.

I can also see the config files for the deleted hosts in /usr/local/nagios/etc/

Code: Select all

# ll -R /usr/local/nagios/etc/|grep -i server03.hq      
-rw-rw-r-- 1 apache nagios 1237 Jul 24 10:36 server03.hq.inty.net.cfg

Re: Configuration Discrepancies

Posted: Thu Jul 31, 2014 4:51 am
by chrisp
OK, so I just did this (for the chance to roll back my next crazy action, just in case of borkage...): -

Code: Select all

root@Nagios:/usr/local/nagios/etc/hosts
# cat server03.hq.inty.net.cfg 
Then I did this: -

Code: Select all

root@Nagios:/usr/local/nagios/etc/hosts
# rm -vf server03.hq.inty.net.cfg
# killall nagios && service nagios restart
I no longer see the "ghost host" that's in that particular config file.

In fact, I just hunted out the config files for the switches I had removed via the GUI and binned them too. Now I am ghostless & when I attempt to verify and update the config via the GUI, it's all green.

I'm not sure if what I have done is right. I think Whitney sang a song about it...

Re: Configuration Discrepancies

Posted: Thu Jul 31, 2014 10:45 am
by tmcdonald
chrisp wrote:I'm not sure if what I have done is right. I think Whitney sang a song about it...
You handled this situation appropriately, unlike the unsavory characters Whitney describes in her song.

Re: Configuration Discrepancies

Posted: Thu Aug 07, 2014 5:38 am
by chrisp
Good news about the ghost hosts, but I am increasingly worried about some strange service relationships and wonder if the config is somehow corrupted. Here's the latest thing to catch my eye, while diagnosing unexpected Nagios Notification behaviour (I'm not even sure where to start diagnosing and correcting this): -
NagiosXI_Config_Borked.png

Re: Configuration Discrepancies

Posted: Thu Aug 07, 2014 10:57 am
by tmcdonald
The host you have blurred out simply has some pings attached to it (quite a few, oddly enough). You cannot delete a host that has services still associated with it, so you will need to either delete or deactivate the services before you can delete the host.

Re: Configuration Discrepancies

Posted: Thu Aug 07, 2014 11:22 am
by chrisp
So sorry, I wasn't as clear as I could have been (my car's brakes had failed earlier in the day, leaving me a bit... spaced out).

I was looking at the host and saw that it had a bunch of pings associated, but we (whenever possible) take advantage of Nagios template inheritance, so there actually is only 1 legitimate "Ping" service in our config, which is assigned nostly by means of HostGroup membership and in the odd occurrence, to individual hosts.

I was just using the Database Relationships information button like a little window on the config, so I could show you the oddness in a single screenshot. I am now becoming aware of many more (but not all) services, that have multiple duplicates like this, where there used to be a single service, there's now 8 identical services!

I'm not sure how to proceed. Do I delete all but one of the services and then try to confirm that all the host-service relationships are as they should be? It's going to be a bit of a slog :(

Re: Configuration Discrepancies

Posted: Thu Aug 07, 2014 5:05 pm
by abrist
Can you post the config (save button next to service config) for the service Ping-Ping?

Re: Configuration Discrepancies

Posted: Thu Aug 07, 2014 5:13 pm
by chrisp
Do you mean this?

Code: Select all

###############################################################################
#
# Service configuration file
#
# Created by: Nagios QL Version 3.0.3
# Date:	      2014-08-07 22:09:43
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND --- 
# Nagios QL will overwite all manual settings during the next update
#
###############################################################################

define service {
	host_name			parkgroup.nagios.inty.com,sven-birmingham.nagios.inty.com,sven-guildford-internet.nagios.inty.com,sven-guildford-vpn.nagios.inty.com,sven-london.nagios.inty.com
	service_description		Ping
	display_name			Ping
	check_command			check_ping!200.0,5%!1000.0,80%!!!!!!
	max_check_attempts		5
	check_interval			2
	retry_interval			2
	active_checks_enabled		1
	check_period			xi_timeperiod_24x7
	check_freshness			1
	process_perf_data		1
	retain_status_information	1
	retain_nonstatus_information	1
	notification_interval		0
	notification_period		xi_timeperiod_24x7
	notification_options		c,r,s,
	notifications_enabled		1
	contact_groups			CG_Automated_Ticketing,CG_Infrastructure_Team,CG_Operations_Team,CG_Out_Of_Hours
	register			1
	}	

###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################
BTW, for this "Ping" service, I have deleted the 7 duplicates, to see if anything exploded... AFAIK, it didn't.