Multiple Instances / Doubling in the messages log

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

Those are the hung processes after Nagios was stopped. I did, however, take a look at each of the children servers and they all appear to be synced via NTP.

Code: Select all

papmoncp00
Wed Jan 23 13:29:41 MST 2013

papmoncp01
Wed Jan 23 13:29:44 MST 2013

papmoncp02
Wed Jan 23 13:29:46 MST 2013

papmoncp03
Wed Jan 23 13:29:47 MST 2013

papmoncp04
Wed Jan 23 13:29:49 MST 2013

papmoncp05
Wed Jan 23 13:29:51 MST 2013

papmoncp06
Wed Jan 23 13:29:53 MST 2013

papmoncp07
Wed Jan 23 13:29:54 MST 2013
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by scottwilkerson »

That is strange that these are the hung processes because one item is the parent

Code: Select all

nagios    4888     1  1 Jan22 ?        00:14:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
do you get errors when you run

Code: Select all

service nagios stop
It could be timing out...in which case we may need to help you adjust the init script to offer a longer timeperiod when stopping nagios
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

We have seen "Warning - nagios did not exit in a timely manner". I compared the init script on the new server to the old server and it appears that they are identical. Please let me know what you would advise.

Thanks!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by scottwilkerson »

Attached is an init file with line 160 set to allow 30 second to shutdown (instead of the default 10)
You do not have the required permissions to view the files attached to this post.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Multiple Instances / Doubling in the messages log

Post by mguthrie »

Also, can you verify that your RAM disk has enough space left on it, and that all of the directories on it are owned nagios:nagios?
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

Good morning,

Code: Select all

$ df -h
...snip...
tmpfs                 1.0G   14M 1011M   2% /var/nagiosramdisk
...snip...
I've made the change to the Nagios init script to allow for 30 seconds instead of ten when shutting down. I will restart the Nagios services. I will update the thread accordingly.
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

All,

We're still having issues with forked processes not falling off in a timely manner.

Code: Select all

$ date
Thu Jan 24 09:40:08 MST 2013

Every 1.0s: ps -ef | grep bin/nagios | grep -v grep                                                                                   
nagios    7149 11171  0 09:25 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   11171     1  5 08:38 ?        00:03:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15360 11171  0 09:38 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15902 11171  0 09:39 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26750 11171  0 09:05 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
PID 26750 and 7149 have been hanging around for a while and they don't appear to be going anywhere.

I've attached a copy of the current Nagios init file we're using for your information.
You do not have the required permissions to view the files attached to this post.
kdavison
Posts: 3
Joined: Tue May 08, 2012 10:23 am

Re: Multiple Instances / Doubling in the messages log

Post by kdavison »

Newer profile.txt
You do not have the required permissions to view the files attached to this post.
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

My boss, kdavison, has posted a profile from the XI interface. This profile is from when Nagios is in a "stalled" state. The process is still running, but the scheduler has stopped processing External Commands and all of the forks are in a frozen state.

Code: Select all

Every 1.0s: ps -ef | grep bin/nagios | grep -v grep                                                                                   Thu Jan 24 10:55:21 2013

nagios    1053 11171  0 10:12 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    7149 11171  0 09:25 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8273 11171  0 10:25 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    9500 11171  0 10:28 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   11171     1  4 08:38 ?        00:05:53 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20715 11171  0 09:48 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26750 11171  0 09:05 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26909 11171  0 10:00 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Stopping Nagios resulted in the following:

Code: Select all

$ for i in nagiosxi npcd ndo2db nagios;do sudo /sbin/service $i stop;done
NPCD Stopped.
Stopping ndo2db: done.
Stopping nagios: ..............................
Warning - nagios did not exit in a timely manner
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Multiple Instances / Doubling in the messages log

Post by mguthrie »

I've only seen something like this one other time, not sure if the issue is related or not, but lets try the following commands:

Code: Select all

service nagios stop
killall -9 nagios

rm -f /usr/local/nagios/var/retention.dat

service nagios start
This will start Nagios with everything in a pending state until results come in, but I'd like to rule out a retention issue as a possibility.
Locked