Page 1 of 1

(host check orphaned, is the mod-gearman worker on queue 'ho

Posted: Fri Dec 09, 2016 2:25 pm
by craig.sands
####################################################
# ISSUE
####################################################
I am receiving sporadic alerts from a random number of hosts with the following message:

Host Status: (host check orphaned, is the mod-gearman worker on queue 'host' running?)


Sometimes all hosts will fail with that error, sometimes only some will be in error, sometimes all clear.

Code: Select all

####################################################
# MASTER SERVER
####################################################
[root@master ~]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

[root@master ~]# gearmand -V

gearmand 0.33 - https://bugs.launchpad.net/gearmand

[root@master ~]# /usr/bin/mod_gearman_worker -V
-bash: /usr/bin/mod_gearman_worker: No such file or directory

[root@master ~]# uname -a
Linux master 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


[root@master ~]# ps -ef | grep gearman
root      38963  23862  0 15:20 pts/2    00:00:00 grep --color=auto gearman
gearmand 112324      1  2 Nov28 ?        05:22:56 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log

[root@master ~]# tail /var/log/gearmand/gearmand.log
  ERROR 2016-11-08 19:22:02.000000 [     1 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2016-11-08 19:22:02.000000 [     1 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2016-11-08 19:22:02.000000 [     1 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2016-11-08 19:22:02.000000 [     1 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2016-11-08 19:22:02.000000 [     1 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2016-11-08 19:22:02.000000 [     1 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2016-11-08 19:22:02.000000 [     4 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2016-11-08 19:22:02.000000 [     4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2016-11-08 19:22:02.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2016-11-08 19:22:02.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109



####################################################
# WORKER SERVER
####################################################
[root@worker ~]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

[root@worker ~]# gearmand -V

gearmand 0.33 - https://bugs.launchpad.net/gearmand

[root@worker ~]# /usr/bin/mod_gearman_worker -V
-bash: /usr/bin/mod_gearman_worker: No such file or directory

[root@worker ~]# uname -a
Linux worker 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@worker ~]# ps -ef | grep gearman
gearmand   1712      1  0 Nov29 ?        00:00:13 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log
root      81374  49625  0 15:41 pts/0    00:00:00 grep --color=auto gearman

[root@worker ~]# tail /var/log/gearmand/gearmand.log
  ERROR 2016-10-23 04:34:18.000000 [  main ] gearman_server_job_add _queue_replay_add(JOB_EXISTS) -> libgearman-server/server.c:820
  ERROR 2016-10-23 04:34:18.000000 [  main ] gearman_server_job_add _queue_replay_add(JOB_EXISTS) -> libgearman-server/server.c:820
  ERROR 2016-10-23 04:34:18.000000 [  main ] gearman_server_job_add _queue_replay_add(JOB_EXISTS) -> libgearman-server/server.c:820
  ERROR 2016-10-28 16:32:21.000000 [  main ] bind(Address already in use) -> libgearman-server/gearmand.cc:526
  ERROR 2016-10-28 16:32:21.000000 [  main ] bind(Transport endpoint is not connected) -> libgearman-server/gearmand.cc:540
  ERROR 2016-10-28 16:32:21.000000 [  main ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:364
  ERROR 2016-10-29 20:01:24.000000 [  main ] /tmp/gearmand.retention(Permission denied) -> libgearman-server/plugins/queue/retention/queue.cc:204
  ERROR 2016-11-06 16:01:12.000000 [  main ] bind(Address already in use) -> libgearman-server/gearmand.cc:526
  ERROR 2016-11-06 16:01:12.000000 [  main ] bind(Transport endpoint is not connected) -> libgearman-server/gearmand.cc:540
  ERROR 2016-11-06 16:01:12.000000 [  main ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:364

Code: Select all

####################################################
# SYSTEM PROFILE
####################################################
Nagios XI Installation Profile
System:
Nagios XI Version : 5.3.2
master 3.10.0-327.36.3.el7.x86_64 x86_64
CentOS Linux release 7.2.1511 (Core)
Gnome is not installed
Apache Information
PHP Version: 5.6.26
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0
Server Name: master
Server Address: xxx.xxx.xxx.xxx
Server Port: 443
Date/Time
PHP Timezone: US/Eastern
PHP Time: Thu, 08 Dec 2016 15:09:01 -0500
System Time: Thu, 08 Dec 2016 15:09:01 -0500
Nagios XI Data
License ends in: QSNMMU

nagios (pid 6461) is running...
NPCD running (pid 4366).
ndo2db (pid 127965) is running...
CPU Load 15: 0.11
Total Hosts: 2204
Total Services: 17247 


CentOS minimal install
Connections via SSH (Password or keys)
Internal network (No internet)

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Fri Dec 09, 2016 2:46 pm
by tgriep
I just want to verify that the MASTER server is the Nagios XI server, correct?

From the output of your PS command that you ran on the worker server, I do not see a mod gearman worker running, only a gearman server.
Stop the gearman server from running on the worker and make sure the worker is running and see if that resolves the issue.

Also, take a look at this doc for more details.
https://assets.nagios.com/downloads/nag ... ios_XI.pdf

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Fri Dec 09, 2016 2:48 pm
by dwhitfield
Per https://support.nagios.com/forum/viewto ... 16&t=31797, we might need to end up moving this to a ticket, but we might as well see if there is an easy fix first (like the one already suggested!).

Assuming that doesn't do the trick, can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile (not Show Profile) button in the top right corner. If for whatever reason you cannot download the profile, please put the output of Show Profile in the thread (that will at least get us some info).

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.