Page 5 of 5

Re: Checks stop running randomly

Posted: Tue Jun 19, 2012 1:41 pm
by KevinD
Alas... no change...

Ran the DB upgrade (had to make some minor tweaks due to primary keys cols(varchar) being changed to text cols (which doesn't work in MySQL)) but nothing major.

The count on the frozen checks continues to rise until we trigger the script to force a check.

I would LOVE to do a remote session.
Feel free to email, PM, USPS, FedEX, smoke signals, or telepathically transmit that and we can get something arranged for 2 minutes from now if your available.

Re: Checks stop running randomly

Posted: Tue Jun 19, 2012 2:35 pm
by mguthrie
Just a follow up for other users who may read. This issue appears to be caused be using DNX with the Nagios Core 3.4.1 engine. Not sure of the exact cause yet. More detail will be posted as we find out.

Re: Checks stop running randomly

Posted: Tue Jun 19, 2012 3:25 pm
by mguthrie
Hey what version of the Nagios Core was running on the DNX slave machine(s)?

Re: Checks stop running randomly

Posted: Tue Jun 19, 2012 3:30 pm
by gwakem
Those were upgraded from r2.3 to r3.1 also, so Nagios Core 3.4.1. They have no configuration, so their databases are local and we didn't need to modify to the upgrade scripts.

Re: Checks stop running randomly

Posted: Fri Jun 22, 2012 3:47 pm
by mguthrie
Did my best to try and hunt this one down, but tracing it through the event timing loop is pretty brutal. I actually passed the question onto the nagios-devel mailing list on sourceforge to see if they have any ideas.

Re: Checks stop running randomly

Posted: Fri Jun 22, 2012 3:54 pm
by gwakem
We appreciate your work and help on tracking it down. I'm going to start investigating mod gearman on Monday, since it seems to be highly recommended. Any pointers or docs that have proven helpful for gottchas would be appreciated, but I'll be scouring the page as well as the documents section next week. Have a great weekend guys.

Re: Checks stop running randomly

Posted: Mon Jun 25, 2012 9:36 am
by scottwilkerson
Mod-gearman has the required items in RPM form now at http://mod-gearman.org/download/ so the install looks something like this

Code: Select all

cd /tmp
wget http://mod-gearman.org/download/v1.3.0/rhel6/x86_64/gearmand-0.25-1.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v1.3.0/rhel6/x86_64/gearmand-devel-0.25-1.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v1.3.0/rhel6/x86_64/mod_gearman-1.3.0-1.e.rhel6.x86_64.rpm
yum --nogpgcheck -y install gearmand-0.25-1.rhel6.x86_64.rpm
yum --nogpgcheck -y install gearmand-devel-0.25-1.rhel6.x86_64.rpm
yum --nogpgcheck -y install mod_gearman-1.3.0-1.e.rhel6.x86_64.rpm
you will need to add the NEB to the nagios.cfg, something like this

Code: Select all

broker_module=/usr/local/share/nagios/mod_gearman.o keyfile=/usr/local/share/nagios/secret.txt server=localhost eventhandler=yes hosts=yes services=yes

On each of the workers we need to adjust the config at /etc/mod_gearman/mod_gearman_worker.conf
workers we need to adjust the following values
server address is that of your gearmand server (usually Nagios Server)
server=localhost:4730
Shared key or keyfile needs to be the same on both the workers and gearmand server
key=should_be_changed
or
keyfile=

Re: Checks stop running randomly

Posted: Fri Jul 13, 2012 12:50 pm
by mguthrie
We were finally able to located and correct the check issue with DNX. The fix will be available in the upcoming 3.3 release, but if you'd like a patch sooner than that let me know and I'll get that to you. (Not sure if you moved to Mod Gearman or not).

Re: Checks stop running randomly

Posted: Fri Jul 13, 2012 2:27 pm
by gwakem
We did move to mod_gearman, and it is actually filling our needs a bit better. Thank you for letting us know though!