Page 2 of 2
Re: Unable to start nagios - no errors
Posted: Fri Apr 08, 2016 1:50 pm
by emartine
yum list installed | grep gearman
gearmand.x86_64 1:0.33-2 @/gearmand-0.33-2.rhel6.x86_64
gearmand-devel.x86_64 1:0.33-2 @/gearmand-devel-0.33-2.rhel6.x86_64
gearmand-server.x86_64 1:0.33-2 @/gearmand-server-0.33-2.rhel6.x86_64
mod_gearman2.x86_64 2.1.1-1.el6 @/mod_gearman2-2.1.1-1.rhel6.x86_64
Worker is on server and 2 other hosts. I have disabled the 2 external hosts for the meantime.
What are we looking for at level 3?
Re: Unable to start nagios - no errors
Posted: Fri Apr 08, 2016 1:51 pm
by bheden
I'd just like to see more information regarding this line in your original worker log:
Code: Select all
[ERROR] worker error: flush(Broken pipe) lost connection to server during send -> libgearman/connection.cc:761
Re: Unable to start nagios - no errors
Posted: Fri Apr 08, 2016 2:08 pm
by emartine
I stopped and started the worker. I didn't see that error.
[2016-04-08 14:00:10][8647][TRACE] 428 +++>
HqYZpEe5+0bHzBLfuGtrhnos3YHRMgPZHEFYN33EISP0kGyVwojWlEikHuTPND9UgRD+/fcb2/k/D70uvw890EZdtlPzQsOtt6Z62Dc0vln6fb/QnZ20p1mwTofhdiLLjuxezMwGklrS+Q67TSWg9mILJ+SIK5n9Y5uM1FPrD/gDc1fIEOeAmz4P+XK3Pr6BQQtP6Z9RmieOt4xd4kYhjzs=
<+++
[2016-04-08 14:00:10][8647][TRACE] add_job_to_queue() finished successfully: 0 0
[2016-04-08 14:00:10][8647][TRACE] send_result_back() finished successfully
[2016-04-08 14:00:10][8647][TRACE] send_result_back() has no duplicate servers to send to.
[2016-04-08 14:00:10][8647][TRACE] set_state(1)
[2016-04-08 14:00:10][8641][TRACE] idle_sighandler(14)
[2016-04-08 14:00:10][8641][TRACE] clean_worker_exit(0)
[2016-04-08 14:00:10][8641][TRACE] cleaning worker
[2016-04-08 14:00:10][8641][TRACE] cleaning client
[2016-04-08 14:00:11][8640][TRACE] waitpid() worker exited with: 0
[2016-04-08 14:00:11][8640][TRACE] make_new_child(2)
[2016-04-08 14:00:11][8640][TRACE] forking status worker
[2016-04-08 14:00:11][8892][DEBUG] child started with pid: 8892
[2016-04-08 14:00:11][8892][TRACE] status worker client started
[2016-04-08 14:00:11][8892][TRACE] set_worker()
[2016-04-08 14:00:11][8892][TRACE] create_client()
[2016-04-08 14:00:11][8645][TRACE] set_state(0)
[2016-04-08 14:00:11][8645][TRACE] get_job()
[2016-04-08 14:00:11][8645][TRACE] got new job H:<NAGIOSXIHOSTNAME>:36092
[2016-04-08 14:00:11][8645][TRACE] 384 +++>
wuyIXMz+16Yrsk+RkDoGE1fxNhEeIx8oHzgKi1lpVip+V0wkVINNHVNzDLRIQRIJb1kssVve1EZo9eZifaVJ2zU7WU7rydYDb+LIKjDoU1CtwfGa0a4yDuWNWtjzhG+WiE7/GGKdnoLnL5wbHwnd6xn6HTOvABcLTiPoQlepFZH+ipmw3c0SFzMz2yetFBfuP7uBALTxHSCzK2Q8V5De7zi9Kr32EE9aMooEcnfC8s+qpOBN7us1wT0iDtxV1n7bMX75mT182FuKPaWWrcKqr7N6miqzRxrqLYISlb0xt9qL0
<+++
[2016-04-08 14:00:11][8645][TRACE] 287 --->
type=service
result_queue=check_results
host_name=SERVER1
service_description=Disk - C
start_time=1460142011.0
next_check=1460142011.0
core_time=1460142011.965547
timeout=60
command_line=/usr/local/nagios/libexec/check_nt -H SERVER1 -p 1248 -v USEDDISKSPACE -l C -w 85 -c 95
Re: Unable to start nagios - no errors
Posted: Fri Apr 08, 2016 2:35 pm
by ssax
Try setting result_workers=1 in your mod_gearman_neb.conf, we had another customer make this change and it allowed it to work for a similar issue.
Re: Unable to start nagios - no errors
Posted: Fri Apr 08, 2016 2:45 pm
by emartine
I can't seem to find mod_gearman_neb.conf anywhere.
Re: Unable to start nagios - no errors
Posted: Mon Apr 11, 2016 2:56 am
by Box293
it should be on your XI server under /etc/mod_gearman/mod_gearman_neb.conf or it will be /etc/mod_gearman2/module.conf
Re: Unable to start nagios - no errors
Posted: Mon Apr 11, 2016 11:30 am
by emartine
found /etc/mod_gearman2/module.conf thanks.
result_workers=1 was already set.
Re: Unable to start nagios - no errors
Posted: Mon Apr 11, 2016 11:48 am
by bheden
You can turn down the debug verbosity now (reset it to 0 or 1).
So after all this stop/start of all these services, are you still having the same issues? A lot of times, just getting them stopped and started in the right order, as funny as it sounds, can resolve a lot of issues.
If not, can I see the output of
and
Code: Select all
cat /usr/local/nagiosxi/var/cmdsubsys.log
Re: Unable to start nagios - no errors
Posted: Mon Apr 11, 2016 2:09 pm
by emartine
Not sure what I did that it now started working. Rebooting the server... started the engine from the web interface vs commandline? In any case I will be restarting the server again shortly. What priority should the services have so that they come up at startup properly?
ls -alh /usr/local/nagios/var/rw
prw-rw---- 1 nagios nagios 0 Apr 11 10:46 nagios.cmd
srw-rw---- 1 nagios nagios 0 Apr 11 10:46 nagios.qh
cat /usr/local/nagiosxi/var/cmdsubsys.log
.................
Priority settings right now...
# chkconfig: 2345 85 15
# description: Mod-Gearman2 Worker Daemon
# gearmand Startup script for the Gearman server
# chkconfig: - 85 15
# chkconfig: 345 99 01
# description: Nagios network monitor
Re: Unable to start nagios - no errors
Posted: Mon Apr 11, 2016 2:23 pm
by bheden
Those look fine. The order goes gearmand -> nagios -> worker.
Glad its all working!