Multiple Instances / Doubling in the messages log

nseltzer · Post by **nseltzer** » Wed Jan 23, 2013 3:33 pm

Those are the hung processes after Nagios was stopped. I did, however, take a look at each of the children servers and they all appear to be synced via NTP.

Code: Select all

papmoncp00
Wed Jan 23 13:29:41 MST 2013

papmoncp01
Wed Jan 23 13:29:44 MST 2013

papmoncp02
Wed Jan 23 13:29:46 MST 2013

papmoncp03
Wed Jan 23 13:29:47 MST 2013

papmoncp04
Wed Jan 23 13:29:49 MST 2013

papmoncp05
Wed Jan 23 13:29:51 MST 2013

papmoncp06
Wed Jan 23 13:29:53 MST 2013

papmoncp07
Wed Jan 23 13:29:54 MST 2013

scottwilkerson · Post by **scottwilkerson** » Wed Jan 23, 2013 3:38 pm

That is strange that these are the hung processes because one item is the parent

Code: Select all

nagios    4888     1  1 Jan22 ?        00:14:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

do you get errors when you run

Code: Select all

service nagios stop

It could be timing out...in which case we may need to help you adjust the init script to offer a longer timeperiod when stopping nagios

nseltzer · Post by **nseltzer** » Wed Jan 23, 2013 3:47 pm

We have seen "Warning - nagios did not exit in a timely manner". I compared the init script on the new server to the old server and it appears that they are identical. Please let me know what you would advise.

Thanks!

scottwilkerson · Post by **scottwilkerson** » Wed Jan 23, 2013 5:59 pm

Attached is an init file with line 160 set to allow 30 second to shutdown (instead of the default 10)

mguthrie · Post by **mguthrie** » Wed Jan 23, 2013 6:00 pm

Also, can you verify that your RAM disk has enough space left on it, and that all of the directories on it are owned nagios:nagios?

nseltzer · Post by **nseltzer** » Thu Jan 24, 2013 10:21 am

Good morning,

Code: Select all

$ df -h
...snip...
tmpfs                 1.0G   14M 1011M   2% /var/nagiosramdisk
...snip...

I've made the change to the Nagios init script to allow for 30 seconds instead of ten when shutting down. I will restart the Nagios services. I will update the thread accordingly.

nseltzer · Post by **nseltzer** » Thu Jan 24, 2013 11:44 am

All,

We're still having issues with forked processes not falling off in a timely manner.

Code: Select all

$ date
Thu Jan 24 09:40:08 MST 2013

Every 1.0s: ps -ef | grep bin/nagios | grep -v grep                                                                                   
nagios    7149 11171  0 09:25 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   11171     1  5 08:38 ?        00:03:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15360 11171  0 09:38 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15902 11171  0 09:39 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26750 11171  0 09:05 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

PID 26750 and 7149 have been hanging around for a while and they don't appear to be going anywhere.

I've attached a copy of the current Nagios init file we're using for your information.

kdavison · Post by **kdavison** » Thu Jan 24, 2013 12:54 pm

Newer profile.txt

nseltzer · Post by **nseltzer** » Thu Jan 24, 2013 12:55 pm

My boss, kdavison, has posted a profile from the XI interface. This profile is from when Nagios is in a "stalled" state. The process is still running, but the scheduler has stopped processing External Commands and all of the forks are in a frozen state.

Code: Select all

Every 1.0s: ps -ef | grep bin/nagios | grep -v grep                                                                                   Thu Jan 24 10:55:21 2013

nagios    1053 11171  0 10:12 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    7149 11171  0 09:25 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8273 11171  0 10:25 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    9500 11171  0 10:28 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   11171     1  4 08:38 ?        00:05:53 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20715 11171  0 09:48 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26750 11171  0 09:05 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26909 11171  0 10:00 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Stopping Nagios resulted in the following:

Code: Select all

$ for i in nagiosxi npcd ndo2db nagios;do sudo /sbin/service $i stop;done
NPCD Stopped.
Stopping ndo2db: done.
Stopping nagios: ..............................
Warning - nagios did not exit in a timely manner

mguthrie · Post by **mguthrie** » Thu Jan 24, 2013 5:05 pm

I've only seen something like this one other time, not sure if the issue is related or not, but lets try the following commands:

Code: Select all

service nagios stop
killall -9 nagios

rm -f /usr/local/nagios/var/retention.dat

service nagios start

This will start Nagios with everything in a pending state until results come in, but I'd like to rule out a retention issue as a possibility.

Nagios Support Forum

Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log

Re: Multiple Instances / Doubling in the messages log