Page 1 of 3

Bug with two nagios services starting

Posted: Wed Mar 23, 2011 10:58 am
by niebais
OS: Centos 5.5
Version: Nagios XI 2009R1.4B

Problem:
This particular bug is showing up frequently enough where I need to report it so it can be looked at. I consider this one to be rather serious.

We have many people working on the system now and applying changes all the time. The problem now is that we keep getting duplicate nagios parents running on our system.

Duplication:
All we need to do is make changes to something inside Nagios and give it a restart, then I run this command:
ps -ef | grep "/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg" | grep -v grep

It will display something like this:
nagios 27657 1 3 09:43 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
with a ton of children

and
nagios 1024 1 3 09:43 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
with it's children

this causes the "process was orphaned" problem in the /usr/local/nagios/var/nagios.log file. Also, we noticed that we cannot view any notifications from yesterday (3-22 0:00:00 - 3:23 0:00:00) which we think might be related to this issue.

Any ideas on how I can stop the duplicate nagios service from appearing?

Re: Bug with two nagios services starting

Posted: Wed Mar 23, 2011 11:21 am
by niebais
Would this have anything to do with it? I stopped and started the nagios service with this command earlier this week (as root): service nagios restart.

Re: Bug with two nagios services starting

Posted: Wed Mar 23, 2011 12:58 pm
by tonyyarusso
We have many people working on the system now
Could you define "many"? I'm curious whether there's a flaw in the process locking logic or some kind of race condition.
Would this have anything to do with it? I stopped and started the nagios service with this command earlier this week (as root): service nagios restart.
No, that should honor the same locking system as everything else.

Re: Bug with two nagios services starting

Posted: Wed Mar 23, 2011 2:05 pm
by niebais
It's hard to tell, but during the day we probably have about 20 people in the system or more. Currently we have 80 users that regularly log in. 10 of those regularly make changes.

Re: Bug with two nagios services starting

Posted: Thu Mar 24, 2011 11:08 am
by mguthrie
We had that issue a while back, but we posted a fix and hadn't seen the issue for quite some time. Is this something that seems to be surfacing recently (since any particular upgrade)?

We'll do some investigating on this.


A system with that many active XI users is somewhat unique. As more of a favor, would you be willing to PM me with a system profile of your monitoring environment? (hosts + services, hardware specs, any special system configurations). We're trying to get a better sense of system capabilites and hardware requirements for XI.

Re: Bug with two nagios services starting

Posted: Thu Mar 24, 2011 12:20 pm
by niebais
mguthrie wrote:We had that issue a while back, but we posted a fix and hadn't seen the issue for quite some time. Is this something that seems to be surfacing recently (since any particular upgrade)?

We'll do some investigating on this.


A system with that many active XI users is somewhat unique. As more of a favor, would you be willing to PM me with a system profile of your monitoring environment? (hosts + services, hardware specs, any special system configurations). We're trying to get a better sense of system capabilites and hardware requirements for XI.
Yeah I can do that. What's the best way to get you that information?

Re: Bug with two nagios services starting

Posted: Thu Mar 24, 2011 12:33 pm
by mguthrie
Select the PM button next to my name to send me a personal message. Thanks!

Re: Bug with two nagios services starting

Posted: Wed Apr 06, 2011 11:13 am
by niebais
I'm still working on this one, but we created a nagios check to help us find out when this problem occurs.

Re: Bug with two nagios services starting

Posted: Wed Apr 06, 2011 12:43 pm
by rdedon
Just send that info when you can as we know you are addressing a few different issues.

Re: Bug with two nagios services starting

Posted: Wed Apr 20, 2011 9:53 am
by niebais
Ok, here's what we have figured out so far:

The dual process seems to be related to when people are applying changes in nagios. It's possible that two people click on the apply configuration button roughly the same time. The other possibility is that it doesn't quite kill the old nagios process when changes are being applied.