Page 1 of 3

Multiple Instances / Doubling in the messages log

Posted: Tue Jan 22, 2013 5:49 pm
by nseltzer
Hello,

I'm working on a new Nagios install where we have moved our backup from one server and restored it on another. We have been putting out fires along the way, the most recent of which has me stumped. We're seeing an issue where Nagios will spawn an ever-increasing number of Nagios instances running over a period of time until the box succumbs to the load. The highest I've seen in 222 instances. Below is

Code: Select all

$ ps aux | grep nagios.cfg | grep -v grep
nagios    1497  0.0  0.1 156352 39852 ?        S    14:35   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1500  0.0  0.1 156352 39852 ?        S    14:35   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1808  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1836  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1837  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1838  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1840  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1843  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1845  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1847  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1849  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1851  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1852  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1855  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1857  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4888  5.5  0.1 156340 41572 ?        Ssl  13:42   6:31 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15966  0.0  0.1 156352 38660 ?        S    14:00   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30058  0.0  0.1 156344 40412 ?        S    15:30   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30207  0.0  0.1 156352 39944 ?        S    14:28   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   31757  0.0  0.1 156352 39932 ?        S    14:31   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
There is a strange quirk where /var/log/messages is doubling all Nagios-related entries.

Code: Select all

Jan 22 15:45:17 servername nagios: SERVICE ALERT: XXXXXXX;Portal;CRITICAL;SOFT;2;No process matching Portal.exe found : CRITICAL
Jan 22 15:45:17 servername nagios: SERVICE ALERT: XXXXXXX;Portal;CRITICAL;SOFT;2;No process matching Portal.exe found : CRITICAL
Stopping the Nagios service, kill -9'ing all of the nagios processes, and restarting doesn't resolve the issue (http://support.nagios.com/forum/viewtop ... t=multiple). We're using gearmand to distribute checks to several child servers.

Any advice on how to proceed with troubleshooting would be appreciated.

Thanks,



nseltzer

Re: Multiple Instances / Doubling in the messages log

Posted: Tue Jan 22, 2013 6:05 pm
by nseltzer
Because I read after the fact:

Code: Select all

$ uname -a
Linux servername.something.woo 2.6.18-308.24.1.el5 #1 SMP Wed Nov 21 11:42:14 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

$cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Manual or VM install? Manual XI install

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 10:19 am
by scottwilkerson
What version of Nagios XI and Core are you running?

Also, can you run

Code: Select all

ps -ef|grep bin/nagios | grep -v grep

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 10:35 am
by nseltzer
scottwilkerson wrote:What version of Nagios XI and Core are you running?
Nagios XI 2011R3.3
NagiosĀ® Core Version 3.4.1
scottwilkerson wrote:Also, can you run

Code: Select all

ps -ef|grep bin/nagios | grep -v grep
Sure thing. Here's the output:

Code: Select all

nagios    1497  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1500  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4307  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4888     1  1 Jan22 ?        00:14:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    7082  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8796  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   10216  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   14097  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15966  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20191  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20772  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   27569  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30058  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30207  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   31757  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 10:46 am
by kdavison
Attaching profile.txt

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 11:20 am
by scottwilkerson
looking at your

Code: Select all

ps -ef|grep bin/nagios | grep -v grep
This is just one parent process 4888 with several child processes. This would be considered normal behavior as nagios will fork itself to run checks.

It does somewhat seem a bit odd if you have gearmand setup and running properly to have so many forks..

One thing I noticed that are out of the ordinary from the system profile you posted was
No lock file found in /usr/local/nagios/var/nagios.lock
Are we starting nagios with the standard service command?

Code: Select all

service nagios start

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 11:38 am
by nseltzer
Thanks for the quick response.

Yes, we are starting Nagios using /sbin/service nagios start. Though, at the time that the profile was done, the service was not running.

The Nagios processes that are created, the "forks", don't go away until either killed via "kill -9" or a reboot of the machine. After a seemingly random period of time, Nagios will simply stop executing checks.

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 2:34 pm
by scottwilkerson
Can you post your nagios.cfg

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 2:42 pm
by nseltzer
Here ya go!

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Jan 23, 2013 3:23 pm
by scottwilkerson
This looks OK...

Looking back at
http://support.nagios.com/forum/posting ... 00#pr42866

I noticed you ran this on Jan 23 but the start date for these processes is all on Jan 22

this may or may not be un-related, but is the date on the XI server current and synced with your mod_gearman servers?