Page 3 of 3

Re: Multiple Instances / Doubling in the messages log

Posted: Thu Jan 24, 2013 6:21 pm
by nseltzer
I kill -9'd all running forks of Nagios, blew away retention.dat (I moved it to my home dir), and restarted the box.

Code: Select all

$ sudo mv /usr/local/nagios/var/retention.dat .
I still appear to be having issues with forks locking on me.

Code: Select all

Nagios instances:
5
nagios    3072  4894  0 15:26 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    4894     1  5 15:10 ?        00:03:44 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    9896  4894  0 15:37 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   25328  4894  0 16:06 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   31157  4894  0 16:17 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Cliche alert!: Help me Nagios Support Team. You're my only hope.

Re: Multiple Instances / Doubling in the messages log

Posted: Fri Jan 25, 2013 9:03 am
by scottwilkerson
I'm starting to wonder if this could be related to your mod_gearman setup.

if you run

Code: Select all

gearman_top
Can you see the checks being processed?

Re: Multiple Instances / Doubling in the messages log

Posted: Fri Jan 25, 2013 10:13 am
by nseltzer
Yessir. I'm not discounting the possibility that something is breaking within mod_gearman, but the configs are almost (save for gearmand server settings) the same.

Code: Select all

2013-01-25 08:12:30  -  localhost:4730   -  v0.25

 Queue Name                     | Worker Available | Jobs Waiting | Jobs Running
---------------------------------------------------------------------------------
 check_results                  |               2  |           0  |           0
 eventhandler                   |               5  |           0  |           0
 host                           |             386  |           0  |           2
 service                        |             386  |           0  |         115
 worker_papmoncp00.cabelas.corp |               1  |           0  |           0
 worker_papmoncp01.cabelas.corp |               1  |           0  |           0
 worker_papmoncp02.cabelas.corp |               1  |           0  |           0
 worker_papmoncp03.cabelas.corp |               1  |           0  |           0
 worker_papmoncp04.cabelas.corp |               1  |           0  |           0
 worker_papmoncp05.cabelas.corp |               1  |           0  |           0
 worker_papmoncp06.cabelas.corp |               1  |           0  |           0
 worker_papmoncp07.cabelas.corp |               1  |           0  |           0
 worker_sidhqmonm0_eventhandler |               1  |           0  |           0

Re: Multiple Instances / Doubling in the messages log

Posted: Fri Jan 25, 2013 10:24 am
by mguthrie
Ah, thank you swilkerson, I think I found a clue. Can you turn of distributing event handlers with your gearman config. If you're using XI's notification handler, it won't be able to connect to the locale database and submit any notifications.
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf server=127.0.0.1:4730 keyfile=/usr/local/nagios/etc/gearman_key.txt eventhandler=yes services=yes hosts=yes
If that doesn't fix it, send us all of your mod gearman related configs.

Re: Multiple Instances / Doubling in the messages log

Posted: Mon Feb 04, 2013 5:41 pm
by gwakem
We disabled the event handling portion of gearman entirely by removing it from the both the mod_gearman_neb.conf and nagios.cfg, restarted (per below), and within ten minutes noticed the same issues with processes hanging.

/usr/local/nagios/etc/nagios.cfg:

Code: Select all

broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf server=127.0.0.1:4730 keyfile=/usr/local/nagios/etc/gearman_key.txt eventhandler=no services=yes hosts=yes
/etc/mod_gearman/mod_gearman_neb.conf:

Code: Select all

# defines if the module should distribute execution of
# eventhandlers.
eventhandler=no
I have attached the configs from a child and the master gearmand server.

Re: Multiple Instances / Doubling in the messages log

Posted: Tue Feb 05, 2013 11:22 am
by mguthrie
Could this issue be caused by certain check plugins timing out? Are you having multiple *parent* processes spawn, or just forks of the Nagios process. Nagios forks itself to run checks, so for longer running checks you'll see many child instances of it running.

Re: Multiple Instances / Doubling in the messages log

Posted: Tue Feb 05, 2013 11:33 am
by gwakem
Aha! Yes indeed, we do have a lot of WMI plugins timing out due to multiple remote side rules. I was in the process of attempting to clear those up, and this gives me additional ammo to do so. Thanks, I will see if getting that cleared out helps and let you know.

Re: Multiple Instances / Doubling in the messages log

Posted: Tue Feb 05, 2013 12:05 pm
by scottwilkerson
Hopefully this will get us down the right track. Let us know if this resolves the issue...

Re: Multiple Instances / Doubling in the messages log

Posted: Wed Feb 06, 2013 10:19 am
by gwakem
This was a triumph. I'm making a note here: HUGE SUCCESS. It's hard to overstate my satisfaction.

That did it! Thanks for the help guys. We still have doubling in the /var/log/messages logfile, but that's not critical. I can open a separate post for that later. This can be closed. Thanks again!