Page 2 of 2
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Sat Dec 12, 2015 5:00 am
by TBT
WillemDH wrote:Yo say this is happening on multiple XI hosts? All the same symptoms?
Currently it seems to be only the one host now. But happened again, around the 6 hour mark.
Update: Affecting 3 of our 7 XI 5.2.3 servers.
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Dec 14, 2015 8:09 am
by TBT
Last communication from Nagios was Fri Dec 11, 2015 3:00 pm (Central). We need support on this ASAP.
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Dec 14, 2015 10:14 am
by tmcdonald
If you've sent in a ticket about this then we would have continued troubleshooting in the ticket, and I see that Troy got back to you late on the 13th. We generally will either do a ticket or a forum post, but not both as that would double the time spent. Please let us know if you would like to continue in the ticket or this thread, and we'll go from there.
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Dec 14, 2015 11:10 am
by TBT
tmcdonald wrote:If you've sent in a ticket about this then we would have continued troubleshooting in the ticket, and I see that Troy got back to you late on the 13th. We generally will either do a ticket or a forum post, but not both as that would double the time spent. Please let us know if you would like to continue in the ticket or this thread, and we'll go from there.
But not continual troubleshooting forum posts?
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Dec 14, 2015 11:17 am
by tmcdonald
Normally when people open a ticket as you have done, we lock the forum thread and continue in the ticket so we are not trying to keep two support avenues synchronized. If you only meant to send in an attachment, please let us know and in the future please either attach it directly to the post or PM it to the requesting team member if it contains sensitive information. This will avoid opening a ticket in our system.
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Dec 14, 2015 2:43 pm
by TBT
Another oddity discovered.
When Monitoring Engine has stopped,
System Status indicator located in the header and navigating to
Admin > System Status > XI System Component Status are correct. Both indicate the engine has stopped. (fig 1.)
mon2.PNG
However, navigating to
Home > Monitoring Process > Process Info > Monitoring Engine Process shows the engine running with associated PID. (fig 2) But this is false, the engine has in fact stopped and no such process is running.
mon1.PNG
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Dec 14, 2015 5:31 pm
by tmcdonald
Almost looks like either cron is not running properly, or duplicate nagios processes are fighting it out. I'll add this to the list of things to check in our remote session.
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Feb 22, 2016 2:33 pm
by TBT
For the record, our issue was attributed to SEGFAULT using mod_gearman. We have since stopped using mod_gearman, no bug was reported to ConSol Labs.
Re: Monitoring Engine randomly stops in Nagios XI 5.2.3
Posted: Mon Feb 22, 2016 2:39 pm
by hsmith
Here's a quote from one of our developers regarding gearman issues:
bheden wrote:Recently, we've been able to identify some memory leak issues revolving around the mod_gearman Nagios event broker module and we've identified a possible solution. I'd like to see what happens if you upgrade to a newer version, using the following instructions. Please make sure you have a working backup of your server in case of failure.
Code: Select all
cd /tmp
yum remove libgearman-devel libgearman gearmand mod_gearman
mkdir gearman_install
cd gearman_install/
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/gearmand-0.33-2.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/gearmand-devel-0.33-2.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/gearmand-server-0.33-2.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/mod_gearman2-2.1.1-1.rhel6.x86_64.rpm
yum --nogpgcheck localinstall *
sed -i 's/\(^broker_module=.*mod_gearman.*\)/#\1/' /usr/local/nagios/etc/nagios.cfg
echo "broker_module=/usr/lib64/mod_gearman2/mod_gearman2.o config=/etc/mod_gearman/mod_gearman_neb.conf eventhandler=no" >> /usr/local/nagios/etc/nagios.cfg
service nagios stop
service mod_gearman_worker stop
service gearmand stop
service gearmand start
service mod_gearman_worker start
service nagios start
Please inform us if this resolves your issue. Thank you.
Hopefully this may be of some use to you, if you continue to want to use gearman.