We recently recognized an oddity in our Nagios installation that we tried to correct and now Nagios won't start due to a configuration error. We noticed that we had a ping check running for a host that wasn't configured. I looked and there was a configuration text file for it on the server, CNN1059.cfg. I thought that by creating a host cnn1059 in the core config manager, I could apply and then remove and that might fix the problem. Sadly it only got worse. My initial apply after I created the host failed and for whatever reason, it did not create a snapshot to allow me to see what the error was. I removed the entry I just put in and applied again. Same problem, apply did not work and no snapshot. I then decided to restart Nagios and now it won't start saying there is a configuration error. When I ran a check on the configuration it said there was an error with a host config file saying a template didn't exist but I know that it is out to lunch. It complained about definition at line 79 of a config file citing that the template does not exist but we double checked and the template is setup and from the screenshot I have given below, we use that same template for other definitions that it is not complaining about plus we use the same template for serval other servers and the error only refers to this one config file. Any help would be appreciated.
Nagios won't start
Nagios won't start
You do not have the required permissions to view the files attached to this post.
Our installation is currently:
CentOS Linux release 6.0 (Final)
32 bit
VM Image
Special Configurations: No
CentOS Linux release 6.0 (Final)
32 bit
VM Image
Special Configurations: No
Re: Nagios won't start
Are you guys doing any manual maintenance of the configuration files under /usr/local/nagios/etc?
See the following wiki:
http://support.nagios.com/wiki/index.ph ... om_the_CCM
See the following wiki:
http://support.nagios.com/wiki/index.ph ... om_the_CCM
Re: Nagios won't start
Hi this is in addition to what jason posted regarding re-adding the ghost host that was in our running config but not in the xi(nagiosql db)...
We restored(unzipped) a previous configuration to /usr/local/nagios/etc from a last known good snap shot, our core nagios is running fine now. We aren't able apply the xi configuration however with changes, if we make a configuration change(i.e. add a host or contact) and apply it fails, remove those changes and apply it works. No snapshot results are being generated when the apply fails however, we are adding valid configurations so I don't believe the error is with what we are adding...
We restored(unzipped) a previous configuration to /usr/local/nagios/etc from a last known good snap shot, our core nagios is running fine now. We aren't able apply the xi configuration however with changes, if we make a configuration change(i.e. add a host or contact) and apply it fails, remove those changes and apply it works. No snapshot results are being generated when the apply fails however, we are adding valid configurations so I don't believe the error is with what we are adding...
Re: Nagios won't start
I have alarm bells going off in my mind about this. You should never have to manually restore XI to a good working snapshot, Nagios XI does this for you every time you Apply Configuration and you get a config error. XI automatically rolls it back to the last good configuration to keep the monitoring engine running.We restored(unzipped) a previous configuration to /usr/local/nagios/etc from a last known good snap shot
The fact that new snapshots aren't being created is a concern. You interested in a remote session either this afternoon or Monday?
Re: Nagios won't start
yes if you could do a remote session that would be great.
As mentioned Jason added that duplicate host and applied and got the error, unfortunately it didn't generate a snapsot error log. unfortunately we tried reversing that change and re-applying to no affect. His apply wiped out(0 bytes) the servicetemplate.cfg therefore had to use the config snapshot to get our monitoring back online. I did backup that prevous etc directory.
As mentioned Jason added that duplicate host and applied and got the error, unfortunately it didn't generate a snapsot error log. unfortunately we tried reversing that change and re-applying to no affect. His apply wiped out(0 bytes) the servicetemplate.cfg therefore had to use the config snapshot to get our monitoring back online. I did backup that prevous etc directory.
Re: Nagios won't start
How's the hard drive space on that machine?
Re: Nagios won't start
nagios server vnl64
[root@vnl64 /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
99G 52G 42G 56% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/sda1 97M 64M 29M 69% /boot
tmpfs 75M 30M 46M 40% /var/nagiosramdisk
vnl65.gov.ab.ca:/wmictmp
35G 5.7G 28G 17% /wmictmp
database server vnl65
[root@vnl65 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_vnl65-lv_root
35G 5.7G 28G 17% /
tmpfs 1012M 0 1012M 0% /dev/shm
/dev/sda1 485M 49M 411M 11% /boot
[root@vnl64 /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
99G 52G 42G 56% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/sda1 97M 64M 29M 69% /boot
tmpfs 75M 30M 46M 40% /var/nagiosramdisk
vnl65.gov.ab.ca:/wmictmp
35G 5.7G 28G 17% /wmictmp
database server vnl65
[root@vnl65 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_vnl65-lv_root
35G 5.7G 28G 17% /
tmpfs 1012M 0 1012M 0% /dev/shm
/dev/sda1 485M 49M 411M 11% /boot
Re: Nagios won't start
Yeah that appears ok. We're somewhat booked this afternoon for remote sessions, I'll follow up with you by PM and we'll get a time setup.
Re: Nagios won't start
Figured we'll keep working on this in the meantime since this is a BIG clue.
On a sidenote, I wouldn't recommend a memory limit over 1024M, that means each PHP script run can take up that much memory, so if you've got a lot of connections it's better to keep that lower. Make sure you restart apache after making changes to the php.ini file in order for the new settings to take effect.
Yes, if you get a blank page that means you have a fatal PHP error, and in this scenario you're either hitting PHP's memory limit or timeout. Check the /var/log/httpd/error_log for which error is occurring.I further bumped up my time out values in the php.ini file
max_execution=240
max_input_time=480
memory_limit=2048M
I have a couple service configs files that have over 150 service checks in them, it appeared when looking at the \usr\local\nagios\etc folders that those would be written at 0 bytes at times but not consistently. I just applied the configuration and it appeared to work and write the changes. When I click configure -> tools -> write config files -> write monitoring data that screen goes blank, I'm thinking it should show me something????
On a sidenote, I wouldn't recommend a memory limit over 1024M, that means each PHP script run can take up that much memory, so if you've got a lot of connections it's better to keep that lower. Make sure you restart apache after making changes to the php.ini file in order for the new settings to take effect.
Re: Nagios won't start
Just a follow up on this thread. Issue was found to be caused by the Write Config process timing out. We increased the max_execution_time paramater in the php.ini file, restart apache, and the issue appeared to be resolved.