# ISSUE
####################################################
I am receiving sporadic alerts from a random number of hosts with the following message:
Host Status: (host check orphaned, is the mod-gearman worker on queue 'host' running?)
Sometimes all hosts will fail with that error, sometimes only some will be in error, sometimes all clear.
Code: Select all
####################################################
# MASTER SERVER
####################################################
[root@master ~]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)
[root@master ~]# gearmand -V
gearmand 0.33 - https://bugs.launchpad.net/gearmand
[root@master ~]# /usr/bin/mod_gearman_worker -V
-bash: /usr/bin/mod_gearman_worker: No such file or directory
[root@master ~]# uname -a
Linux master 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@master ~]# ps -ef | grep gearman
root 38963 23862 0 15:20 pts/2 00:00:00 grep --color=auto gearman
gearmand 112324 1 2 Nov28 ? 05:22:56 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log
[root@master ~]# tail /var/log/gearmand/gearmand.log
ERROR 2016-11-08 19:22:02.000000 [ 1 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2016-11-08 19:22:02.000000 [ 1 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2016-11-08 19:22:02.000000 [ 1 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2016-11-08 19:22:02.000000 [ 1 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2016-11-08 19:22:02.000000 [ 1 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2016-11-08 19:22:02.000000 [ 1 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2016-11-08 19:22:02.000000 [ 4 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2016-11-08 19:22:02.000000 [ 4 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
ERROR 2016-11-08 19:22:02.000000 [ 2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
ERROR 2016-11-08 19:22:02.000000 [ 2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
####################################################
# WORKER SERVER
####################################################
[root@worker ~]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)
[root@worker ~]# gearmand -V
gearmand 0.33 - https://bugs.launchpad.net/gearmand
[root@worker ~]# /usr/bin/mod_gearman_worker -V
-bash: /usr/bin/mod_gearman_worker: No such file or directory
[root@worker ~]# uname -a
Linux worker 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@worker ~]# ps -ef | grep gearman
gearmand 1712 1 0 Nov29 ? 00:00:13 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log
root 81374 49625 0 15:41 pts/0 00:00:00 grep --color=auto gearman
[root@worker ~]# tail /var/log/gearmand/gearmand.log
ERROR 2016-10-23 04:34:18.000000 [ main ] gearman_server_job_add _queue_replay_add(JOB_EXISTS) -> libgearman-server/server.c:820
ERROR 2016-10-23 04:34:18.000000 [ main ] gearman_server_job_add _queue_replay_add(JOB_EXISTS) -> libgearman-server/server.c:820
ERROR 2016-10-23 04:34:18.000000 [ main ] gearman_server_job_add _queue_replay_add(JOB_EXISTS) -> libgearman-server/server.c:820
ERROR 2016-10-28 16:32:21.000000 [ main ] bind(Address already in use) -> libgearman-server/gearmand.cc:526
ERROR 2016-10-28 16:32:21.000000 [ main ] bind(Transport endpoint is not connected) -> libgearman-server/gearmand.cc:540
ERROR 2016-10-28 16:32:21.000000 [ main ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:364
ERROR 2016-10-29 20:01:24.000000 [ main ] /tmp/gearmand.retention(Permission denied) -> libgearman-server/plugins/queue/retention/queue.cc:204
ERROR 2016-11-06 16:01:12.000000 [ main ] bind(Address already in use) -> libgearman-server/gearmand.cc:526
ERROR 2016-11-06 16:01:12.000000 [ main ] bind(Transport endpoint is not connected) -> libgearman-server/gearmand.cc:540
ERROR 2016-11-06 16:01:12.000000 [ main ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:364
Code: Select all
####################################################
# SYSTEM PROFILE
####################################################
Nagios XI Installation Profile
System:
Nagios XI Version : 5.3.2
master 3.10.0-327.36.3.el7.x86_64 x86_64
CentOS Linux release 7.2.1511 (Core)
Gnome is not installed
Apache Information
PHP Version: 5.6.26
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0
Server Name: master
Server Address: xxx.xxx.xxx.xxx
Server Port: 443
Date/Time
PHP Timezone: US/Eastern
PHP Time: Thu, 08 Dec 2016 15:09:01 -0500
System Time: Thu, 08 Dec 2016 15:09:01 -0500
Nagios XI Data
License ends in: QSNMMU
nagios (pid 6461) is running...
NPCD running (pid 4366).
ndo2db (pid 127965) is running...
CPU Load 15: 0.11
Total Hosts: 2204
Total Services: 17247
CentOS minimal install
Connections via SSH (Password or keys)
Internal network (No internet)