Multiple Instances / Doubling in the messages log

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Multiple Instances / Doubling in the messages log

Post by nseltzer »

Hello,

I'm working on a new Nagios install where we have moved our backup from one server and restored it on another. We have been putting out fires along the way, the most recent of which has me stumped. We're seeing an issue where Nagios will spawn an ever-increasing number of Nagios instances running over a period of time until the box succumbs to the load. The highest I've seen in 222 instances. Below is

Code: Select all

$ ps aux | grep nagios.cfg | grep -v grep
nagios    1497  0.0  0.1 156352 39852 ?        S    14:35   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1500  0.0  0.1 156352 39852 ?        S    14:35   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1808  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1836  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1837  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1838  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1840  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1843  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1845  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1847  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1849  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1851  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1852  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1855  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1857  0.0  0.1 156352 40596 ?        S    15:39   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4888  5.5  0.1 156340 41572 ?        Ssl  13:42   6:31 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15966  0.0  0.1 156352 38660 ?        S    14:00   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30058  0.0  0.1 156344 40412 ?        S    15:30   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30207  0.0  0.1 156352 39944 ?        S    14:28   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   31757  0.0  0.1 156352 39932 ?        S    14:31   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
There is a strange quirk where /var/log/messages is doubling all Nagios-related entries.

Code: Select all

Jan 22 15:45:17 servername nagios: SERVICE ALERT: XXXXXXX;Portal;CRITICAL;SOFT;2;No process matching Portal.exe found : CRITICAL
Jan 22 15:45:17 servername nagios: SERVICE ALERT: XXXXXXX;Portal;CRITICAL;SOFT;2;No process matching Portal.exe found : CRITICAL
Stopping the Nagios service, kill -9'ing all of the nagios processes, and restarting doesn't resolve the issue (http://support.nagios.com/forum/viewtop ... t=multiple). We're using gearmand to distribute checks to several child servers.

Any advice on how to proceed with troubleshooting would be appreciated.

Thanks,



nseltzer
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

Because I read after the fact:

Code: Select all

$ uname -a
Linux servername.something.woo 2.6.18-308.24.1.el5 #1 SMP Wed Nov 21 11:42:14 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

$cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Manual or VM install? Manual XI install
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by scottwilkerson »

What version of Nagios XI and Core are you running?

Also, can you run

Code: Select all

ps -ef|grep bin/nagios | grep -v grep
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

scottwilkerson wrote:What version of Nagios XI and Core are you running?
Nagios XI 2011R3.3
Nagios® Core Version 3.4.1
scottwilkerson wrote:Also, can you run

Code: Select all

ps -ef|grep bin/nagios | grep -v grep
Sure thing. Here's the output:

Code: Select all

nagios    1497  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    1500  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4307  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4888     1  1 Jan22 ?        00:14:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    7082  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8796  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   10216  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   14097  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   15966  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20191  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20772  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   27569  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30058  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   30207  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   31757  4888  0 Jan22 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
kdavison
Posts: 3
Joined: Tue May 08, 2012 10:23 am

Re: Multiple Instances / Doubling in the messages log

Post by kdavison »

Attaching profile.txt
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by scottwilkerson »

looking at your

Code: Select all

ps -ef|grep bin/nagios | grep -v grep
This is just one parent process 4888 with several child processes. This would be considered normal behavior as nagios will fork itself to run checks.

It does somewhat seem a bit odd if you have gearmand setup and running properly to have so many forks..

One thing I noticed that are out of the ordinary from the system profile you posted was
No lock file found in /usr/local/nagios/var/nagios.lock
Are we starting nagios with the standard service command?

Code: Select all

service nagios start
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

Thanks for the quick response.

Yes, we are starting Nagios using /sbin/service nagios start. Though, at the time that the profile was done, the service was not running.

The Nagios processes that are created, the "forks", don't go away until either killed via "kill -9" or a reboot of the machine. After a seemingly random period of time, Nagios will simply stop executing checks.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by scottwilkerson »

Can you post your nagios.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nseltzer
Posts: 18
Joined: Tue Sep 11, 2012 12:10 pm
Location: Sidney, NE
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by nseltzer »

Here ya go!
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Multiple Instances / Doubling in the messages log

Post by scottwilkerson »

This looks OK...

Looking back at
http://support.nagios.com/forum/posting ... 00#pr42866

I noticed you ran this on Jan 23 but the start date for these processes is all on Jan 22

this may or may not be un-related, but is the date on the XI server current and synced with your mod_gearman servers?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked