Page 2 of 2
Re: Core 3.5 generic pre-flight verification error
Posted: Tue Dec 02, 2014 9:07 am
by v3-alex
Code: Select all
ls -la /usr/local/nagios/var/ndomod.tmp
returns the following output:
Code: Select all
-rw-r--r-- 1 root root 0 Aug 16 2011 /usr/local/nagios/var/ndomod.tmp
returns the following output:
Code: Select all
drwxrwxr-x 5 nagios nagios 4096 Dec 2 07:57 /usr/local/nagios/var/
returns:
Code: Select all
nagios 22706 1 0 Nov28 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
returns:
Note: this service was not running when I first ran the status command. After I executed "service ndo2db start" command the service started up, however, Nagios still returned the same error as before.
Code: Select all
ls -la /usr/local/nagios/var/ndo.sock
returns:
Code: Select all
srwxr-xr-x 1 nagios nagios 0 Nov 28 10:36 /usr/local/nagios/var/ndo.sock
Let me know what you think.
Re: Core 3.5 generic pre-flight verification error
Posted: Tue Dec 02, 2014 3:57 pm
by sreinhardt
OK let's take it from the top:
ls -la /usr/local/nagios/var/ndomod.tmp should return:
Code: Select all
-rw-r--r-- 1 nagios nagios 0 Nov 20 15:10 /usr/local/nagios/var/ndomod.tmp
So for some reason ndomod.tmp is owned as root:root but otherwise permissions are fine. I believe this get's created on nagios start with ndomod loaded, so let's stop nagios and remove that file.
Code: Select all
service nagios stop
rm -f usr/local/nagios/var/ndomod.tmp
service nagios start
ls -la /usr/local/nagios/var/ndomod.tmp
var/ permissions look great! Your ndo2db process also looks fine considering it is started at the same time associated files were created. I am going to have to assume you started it on the 28th since everything points that way. ndo.sock also looks great as it is created with the right time and permissions. I think the ndomod.tmp is probably our culprit here causing the crash, especially after looking at all of this.
Re: Core 3.5 generic pre-flight verification error
Posted: Wed Dec 03, 2014 12:39 pm
by v3-alex
Code: Select all
service nagios stop
rm -f usr/local/nagios/var/ndomod.tmp
This part went fine, I was able to delete the file. However, I am still unable to start the nagios service due to a configuration error.
service nagios start returns the following error:
Code: Select all
Starting nagios:CONFIG ERROR! Start aborted. Check your Nagios configuration.
Running the NAGIOS configuration check results in exactly the same error as before, only now I am unable to start up the service altogether.
Code: Select all
ls -la /usr/local/nagios/var/ndomod.tmp
returns:
Code: Select all
ls: /usr/local/nagios/var/ndomod.tmp: No such file or directory
Re: Core 3.5 generic pre-flight verification error
Posted: Wed Dec 03, 2014 3:06 pm
by sreinhardt
Let's get another strace just like before, and if you could send a tar of your configs. I think it's easiest if you just PM them over and I can attempt the verify myself and see whats happening directly.
Re: Core 3.5 generic pre-flight verification error
Posted: Thu Dec 04, 2014 1:06 pm
by v3-alex
Sent both archives, let me know what you think.
Re: Core 3.5 generic pre-flight verification error
Posted: Thu Dec 04, 2014 4:20 pm
by sreinhardt
Got them, have not had a chance to look at them yet today, it might be a tomorrow morning thing at this point. Just wanted to let you know that I have them, and will post back after looking at them!
Re: Core 3.5 generic pre-flight verification error
Posted: Fri Dec 05, 2014 11:14 am
by v3-alex
Spenser thank you for your assistance in this matter. We troubleshot this problem with a Linux dev yesterday and found the culprit. Basically the objects.cache file located in /usr/local/nagios/var/ was last time stamped October 13th. Looking at the object configuration files in the /usr/local/nagios/etc/objects/linux and /usr/local/nagios/etc/objects/windows we noticed that there were a slew of hosts that were added after this date. After appending the filenames with .old and running the pre-flight verification again the error disappeared. Renaming the files back one by one and running the verification check each time we identified the culprit config file. The file did not have any hostgroups specified, so after we modified the configuration the pre-flight check went ok.
Thanks again for taking the time to troubleshoot this issue guys!