Page 1 of 2

ndoutils or other problems

Posted: Mon Aug 27, 2012 11:52 am
by johndoe
Hi all, nagios has stopped working for us since the upgrade to 3.2, (I tried upgrading to 3.3 also but didn't fix it)

After poking around trying to find the answer I had to come and post it here as I can't seem to find the answer.

System details: CentOS dedicated VM to nagiosXI with nagiosXI 3.3

All data seems to be frozen and doesn't update from NRDP checks. My way of testing (which is currently also failing) is sending an unconfigured NRDP and checking if it shows up on the unconfigured objects (nothing so far...)

I've tried rebooting the machine, cleaning up lock files, shutting down services but nothing seems to fix it.

Prompt help is much appreciated since at the moment nagios is useless for us. Please check screenshots for further info.

During boot:
Starting httpd: [Mon Aug 27 16:17:42 2012] [warn] Useless use of AllowOverride in line 16 of /etc/httpd/conf.d/phpPgAdmin.conf.
httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
[ OK ]
Starting crond: [ OK ]
Starting xfs: [ OK ]
Starting npcd: NPCD started.
[ OK ]
Starting Avahi daemon... [ OK ]
Starting HAL daemon: [ OK ]
Starting monit: Starting monit daemon
Monit start delay set -- pause for 60s
[ OK ]
Starting ajaxterm: AjaxTerm at http://localhost:8022/ pid: 2867
[ OK ]
Starting nagios: done.
Starting nagiosxi: [ OK ]
Starting ndo2db: done.

In /var/log/messages:
Aug 27 16:17:51 XXXXX nagios: Nagios 3.4.1 starting... (PID=2909)
Aug 27 16:17:51 XXXXX nagios: Local time is Mon Aug 27 16:17:51 UTC 2012
Aug 27 16:17:51 XXXXX nagios: LOG VERSION: 2.0
Aug 27 16:17:51 XXXXX nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Aug 27 16:17:51 XXXXX nagios: ndomod: Could not open data sink! I'll keep trying, but some output may get lost...
Aug 27 16:17:51 XXXXX nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[...]
Aug 27 16:17:55 XXXXX nagios: Warning: Host 'XXXXX ' has no default contacts or contactgroups defined! Quite a bit of these show up
Aug 27 16:17:56 XXXXX nagios: Finished daemonizing... (New PID=3005)
Aug 27 16:22:02 XXXXX nagios: ndomod: Successfully connected to data sink. 3116 queued items to flush.
Aug 27 16:22:02 XXXXX ndo2db: Error: queue send error, retrying...

Re: ndoutils or other problems

Posted: Mon Aug 27, 2012 12:33 pm
by mguthrie
Try the following fix for the ndoutils error:
http://support.nagios.com/wiki/index.ph ... 3.x_Issues

Re: ndoutils or other problems

Posted: Mon Aug 27, 2012 1:35 pm
by johndoe
Hi mgurthie,

So i did as the solution you proposed and restarted the server, the status was the same as in the previous screenshots.
After that i stopped he nagios process and restarted it through the interface which dumped me alot of the following:
Aug 27 18:31:01 XXXX nagios: Warning: Could not stat() check result file '/usr/local/nagios/var/spool/checkresults/cBTVr91'. This was repeated many times
Aug 27 18:31:02 XXXXnagios: Caught SIGTERM, shutting down...
Aug 27 18:31:02 XXXXnagios: Successfully shutdown... (PID=2998)
Aug 27 18:31:02 XXXX nagios: ndomod: Shutdown complete.
Aug 27 18:31:02 XXXXnagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
It appears to run for a few seconds and then it turns red again

Re: ndoutils or other problems

Posted: Mon Aug 27, 2012 1:54 pm
by mguthrie
Can you post the permissions for the following directory and any of the files in it?

/usr/local/nagios/var/spool/checkresults/

They should generally all be owned by nagios.

Are you using any external event brokers like check_mk, DNX, or mod_gearman?

Re: ndoutils or other problems

Posted: Mon Aug 27, 2012 2:58 pm
by johndoe
-rwxrwx--- 1 apache nagcmd 0 Jul 21 21:09 c0Uj7u5
-rwxrwx--- 1 apache nagcmd 0 Aug 6 16:34 c0UjBcs
-rwxrwx--- 1 apache nagcmd 0 Aug 6 19:30 c0UjFug
-rwxrwx--- 1 apache nagcmd 0 Aug 12 03:28 c0UjZNK
-rwxrwx--- 1 apache nagcmd 0 Aug 10 05:48 c0uk9sb
-rwxrwx--- 1 apache nagcmd 0 Aug 10 01:44 c0uKAbv
-rwxrwx--- 1 apache nagcmd 0 Aug 10 13:40 c0ukDML
-rwxrwx--- 1 apache nagcmd 0 Aug 6 07:27 c0ukfJf
-rwxrwx--- 1 apache nagcmd 0 Aug 10 05:10 c0UKHQU
# ls -lha /usr/local/nagios/var/spool/checkresults/ | wc -l
207231
wow at the number of files there

Re: ndoutils or other problems

Posted: Mon Aug 27, 2012 5:00 pm
by scottwilkerson
Can you please post a copy of your nagios.cfg

Re: ndoutils or other problems

Posted: Mon Aug 27, 2012 5:02 pm
by mguthrie
Hmm, apache should not own anything in that directory. You've probably got a nagios instance running as apache, which you won't want.

Code: Select all

service nagios stop
killall -9 nagios
rm -f /usr/local/nagios/var/spool/checkresults/*
service nagios start
Then run:

Code: Select all

ps aux | grep /bin/nagios
and make sure apache does not own the nagios process.

Re: ndoutils or other problems

Posted: Tue Aug 28, 2012 5:40 am
by johndoe
As you can see the process is not owned by apache but still creates the files as apache

Code: Select all

-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cgx1DKw.ok
-rwxrwx--- 1 apache nagcmd  339 Aug 28 10:38 cjevTvh
-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cjevTvh.ok
-rwxrwx--- 1 apache nagcmd  330 Aug 28 10:38 cnTvg88
-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cnTvg88.ok
-rwxrwx--- 1 apache nagcmd  292 Aug 28 10:38 cqRJODJ
-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cqRJODJ.ok
-rwxrwx--- 1 apache nagcmd  292 Aug 28 10:38 cV2bIMf
-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cV2bIMf.ok
-rwxrwx--- 1 apache nagcmd  284 Aug 28 10:38 cwyBMu6
-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cwyBMu6.ok
-rwxrwx--- 1 apache nagcmd  284 Aug 28 10:38 cXAqOIg
-rw-r--r-- 1 apache apache    0 Aug 28 10:38 cXAqOIg.ok
[root@XXXX checkresults]# ps aux | grep /bin/nagios
nagios   12494  0.3  0.0  28984  2428 ?        Ssl  10:34   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     18831  0.0  0.0  61196   764 xvc0     S+   10:38   0:00 grep /bin/nagios

Re: ndoutils or other problems

Posted: Tue Aug 28, 2012 9:24 am
by mguthrie
Do you use NRDP for passive checks at all?

Re: ndoutils or other problems

Posted: Tue Aug 28, 2012 9:25 am
by johndoe
Yes, 99% of our checks are NRDP based