Page 2 of 3
Re: Multiple Instances / Doubling in the messages log
Posted: Wed Jan 23, 2013 3:33 pm
by nseltzer
Those are the hung processes after Nagios was stopped. I did, however, take a look at each of the children servers and they all appear to be synced via NTP.
Code: Select all
papmoncp00
Wed Jan 23 13:29:41 MST 2013
papmoncp01
Wed Jan 23 13:29:44 MST 2013
papmoncp02
Wed Jan 23 13:29:46 MST 2013
papmoncp03
Wed Jan 23 13:29:47 MST 2013
papmoncp04
Wed Jan 23 13:29:49 MST 2013
papmoncp05
Wed Jan 23 13:29:51 MST 2013
papmoncp06
Wed Jan 23 13:29:53 MST 2013
papmoncp07
Wed Jan 23 13:29:54 MST 2013
Re: Multiple Instances / Doubling in the messages log
Posted: Wed Jan 23, 2013 3:38 pm
by scottwilkerson
That is strange that these are the hung processes because one item is the parent
Code: Select all
nagios 4888 1 1 Jan22 ? 00:14:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
do you get errors when you run
It could be timing out...in which case we may need to help you adjust the init script to offer a longer timeperiod when stopping nagios
Re: Multiple Instances / Doubling in the messages log
Posted: Wed Jan 23, 2013 3:47 pm
by nseltzer
We have seen "Warning - nagios did not exit in a timely manner". I compared the init script on the new server to the old server and it appears that they are identical. Please let me know what you would advise.
Thanks!
Re: Multiple Instances / Doubling in the messages log
Posted: Wed Jan 23, 2013 5:59 pm
by scottwilkerson
Attached is an init file with line 160 set to allow 30 second to shutdown (instead of the default 10)
Re: Multiple Instances / Doubling in the messages log
Posted: Wed Jan 23, 2013 6:00 pm
by mguthrie
Also, can you verify that your RAM disk has enough space left on it, and that all of the directories on it are owned nagios:nagios?
Re: Multiple Instances / Doubling in the messages log
Posted: Thu Jan 24, 2013 10:21 am
by nseltzer
Good morning,
Code: Select all
$ df -h
...snip...
tmpfs 1.0G 14M 1011M 2% /var/nagiosramdisk
...snip...
I've made the change to the Nagios init script to allow for 30 seconds instead of ten when shutting down. I will restart the Nagios services. I will update the thread accordingly.
Re: Multiple Instances / Doubling in the messages log
Posted: Thu Jan 24, 2013 11:44 am
by nseltzer
All,
We're still having issues with forked processes not falling off in a timely manner.
Code: Select all
$ date
Thu Jan 24 09:40:08 MST 2013
Every 1.0s: ps -ef | grep bin/nagios | grep -v grep
nagios 7149 11171 0 09:25 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 11171 1 5 08:38 ? 00:03:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15360 11171 0 09:38 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 15902 11171 0 09:39 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26750 11171 0 09:05 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
PID 26750 and 7149 have been hanging around for a while and they don't appear to be going anywhere.
I've attached a copy of the current Nagios init file we're using for your information.
Re: Multiple Instances / Doubling in the messages log
Posted: Thu Jan 24, 2013 12:54 pm
by kdavison
Newer profile.txt
Re: Multiple Instances / Doubling in the messages log
Posted: Thu Jan 24, 2013 12:55 pm
by nseltzer
My boss, kdavison, has posted a profile from the XI interface. This profile is from when Nagios is in a "stalled" state. The process is still running, but the scheduler has stopped processing External Commands and all of the forks are in a frozen state.
Code: Select all
Every 1.0s: ps -ef | grep bin/nagios | grep -v grep Thu Jan 24 10:55:21 2013
nagios 1053 11171 0 10:12 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 7149 11171 0 09:25 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8273 11171 0 10:25 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 9500 11171 0 10:28 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 11171 1 4 08:38 ? 00:05:53 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20715 11171 0 09:48 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26750 11171 0 09:05 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26909 11171 0 10:00 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Stopping Nagios resulted in the following:
Code: Select all
$ for i in nagiosxi npcd ndo2db nagios;do sudo /sbin/service $i stop;done
NPCD Stopped.
Stopping ndo2db: done.
Stopping nagios: ..............................
Warning - nagios did not exit in a timely manner
Re: Multiple Instances / Doubling in the messages log
Posted: Thu Jan 24, 2013 5:05 pm
by mguthrie
I've only seen something like this one other time, not sure if the issue is related or not, but lets try the following commands:
Code: Select all
service nagios stop
killall -9 nagios
rm -f /usr/local/nagios/var/retention.dat
service nagios start
This will start Nagios with everything in a pending state until results come in, but I'd like to rule out a retention issue as a possibility.