Worker shows up in Unhandled hosts "mod_gearman_worker down"

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Worker shows up in Unhandled hosts "mod_gearman_worker down"

Post by litsupport.box »

Hello,

Today(not the first time) i saw my worker show up at "Unhandled hosts" when i ping the host i get a respond so it was up. I logged into the worker with ssh and checked "service mod_gearman_worker status" it told me it was not running.

How can i check why my mod gearman worker stopped working?

Please advise on how to investigate/repair this.
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by lmiltchev »

Is the "mod_gearman_worker" set up to start automatically on reboot? What is the output of the following command?

Code: Select all

chkconfig --list | grep mod_gearman_worker
Be sure to check out our Knowledgebase for helpful articles and solutions!
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by litsupport.box »

Code: Select all

]# chkconfig --list | grep mod_gearman_worker
mod_gearman_worker      0:off   1:off   2:on    3:on    4:on    5:on    6:off
guess it is
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by jdalrymple »

A few things:

By default there is a log at /var/log/mod_gearman/mod_gearman_worker.log

Look there and see if there aren't any clues. I bet there will be. If not up your debug level in /etc/mod_gearman/mod_gearman_worker.conf to 2:

Code: Select all

# use debug to increase the verbosity of the module.
# Possible values are:
#     0 = only errors
#     1 = debug messages
#     2 = trace messages
#     3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=2
Is the worker on the same host as Nagios, I assume not?
Does the Linux box hosting the worker have any monitoring enabled, load, memory, etc?
Do you have multiple workers and if so did any others exhibit weirdness?
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by litsupport.box »

jdalrymple wrote:A few things:

By default there is a log at /var/log/mod_gearman/mod_gearman_worker.log

Look there and see if there aren't any clues. I bet there will be. If not up your debug level in /etc/mod_gearman/mod_gearman_worker.conf to 2:

Code: Select all

# use debug to increase the verbosity of the module.
# Possible values are:
#     0 = only errors
#     1 = debug messages
#     2 = trace messages
#     3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=2
Is the worker on the same host as Nagios, I assume not?
Does the Linux box hosting the worker have any monitoring enabled, load, memory, etc?
Do you have multiple workers and if so did any others exhibit weirdness?
This is what i see in the log:

There is many many many of these:
[3423][INFO ] no checks in 2minutes, restarting all workers
[1431][INFO ] no checks in 2minutes, restarting all workers

Earlier that day:
[2015-06-08 00:13:52][1432][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 00:18:54][1432][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 00:23:54][1432][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 00:32:32][1432][INFO ] mod_gearman worker daemon started with pid 1432
[2015-06-08 00:32:32][1432][INFO ] found pid file for: 1432
[2015-06-08 00:32:32][1432][INFO ] pidfile already exists, cannot start!
[2015-06-08 08:36:23][3423][INFO ] mod_gearman worker daemon started with pid 3423
[2015-06-08 08:36:23][3423][INFO ] found pid file for: 1432
[2015-06-08 08:36:23][3423][INFO ] removed stale pidfile
[2015-06-08 08:36:24][3423][INFO ] no checks in 2minutes, restarting all workers

Later that day:
[2015-06-08 09:05:52][4091][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:05:54][4092][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:00][3423][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 09:06:14][4092][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:14][4092][ERROR] cannot start client
[2015-06-08 09:06:24][4102][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4100][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4101][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4099][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4103][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:35][4104][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4102][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4102][ERROR] cannot start client
[2015-06-08 09:06:44][4101][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4101][ERROR] cannot start client
[2015-06-08 09:06:44][4099][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4099][ERROR] cannot start client
[2015-06-08 09:06:44][4100][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4100][ERROR] cannot start client
[2015-06-08 09:06:44][4103][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4103][ERROR] cannot start client
[2015-06-08 09:06:55][4104][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:55][4104][ERROR] cannot start client
[2015-06-08 09:07:05][4108][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4110][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4107][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4111][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4109][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211

At 11:07 i restarted mod_gearman_worker service.
And then from around the same time i got the mail that my worker was unreachable:

[2015-06-08 11:02:56][3423][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 11:12:36][1431][INFO ] mod_gearman worker daemon started with pid 1431
[2015-06-08 11:12:36][1431][INFO ] found pid file for: 3423
[2015-06-08 11:12:36][1431][INFO ] removed stale pidfile
[2015-06-08 11:12:38][1431][INFO ] no checks in 2minutes, restarting all workers
------------------------------------------------------------------------------------------------------------------------------------

Is the worker on the same host as Nagios?
This host is installed on a completely different host as the Nagios XI server, as are 15 others.

Does the Linux box hosting the worker have any monitoring enabled, load, memory, etc?
No, do you have any tips?

Do you have multiple workers and if so did any other exhibit weirdness?
Yes, there's about 15+ others around the world. This is maybe the second or third time this happened to any of my workers, everytime i restart the service or the machine it is fixed. But this is more of a workaround for me. I didn't investigate on these.
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by tgriep »

I found this description for the GEARMAN_GETADDRINFO error, "Name resolution failed for a host."
Was there a DNS issue on your network at the time of the failure?
Be sure to check out our Knowledgebase for helpful articles and solutions!
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by litsupport.box »

There were no DNS issues at that moment.

Today it happened again: checked log to see what happened:

Code: Select all

[2015-06-15 11:10:24][1420][DEBUG] --------------------------------
[2015-06-15 11:10:24][1431][INFO ] mod_gearman worker daemon started with pid 1431
[2015-06-15 11:10:24][1431][DEBUG] Version 1.5.0b1
[2015-06-15 11:10:24][1431][DEBUG] running on libgearman 1.1.8
[2015-06-15 11:10:24][1431][INFO ] found pid file for: 1431
[2015-06-15 11:10:24][1431][INFO ] pidfile already exists, cannot start!
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by tgriep »

Could you upload the following files so we can review them?

Code: Select all

/var/log/mod_gearman/mod_gearman_worker.log
/etc/mod_gearman/mod_gearman_worker.conf
Be sure to check out our Knowledgebase for helpful articles and solutions!
litsupport.box
Posts: 80
Joined: Wed Apr 02, 2014 7:24 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by litsupport.box »

mod_gearman_worker.conf
mod_gearman_workeredited.txt
I had to cut alot out of the log - it was 22 MB's only left the day it happened again.
tgriep wrote:Could you upload the following files so we can review them?

Code: Select all

/var/log/mod_gearman/mod_gearman_worker.log
/etc/mod_gearman/mod_gearman_worker.conf
You do not have the required permissions to view the files attached to this post.
Nagios XI Version : 2014R2.6
fqdn 2.6.32-431.17.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Proxy appears to be in use
VMware Image
Mod Gearman
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Worker shows up in Unhandled hosts "mod_gearman_worker d

Post by tgriep »

In the log file, it looks like the max-jobs that the worker ran was hit. The setting in your config file is set to 1000. Here is the description of what that option is for.
max-jobs
Controls the amount of jobs a worker will do before he exits.
Use this to control how fast the amount of workers will go down after high load times. Disabled when set to 0. Default: 1000

It look like if this limit is hit, it will restart the worker. In the log file, it looks like that happened on your system but the worker wouldn't restart.
You may need to look in other log files to see why that didn't restart. See if there is any information in the messages or debug logs.

You can try and set the max-jobs to 0 to disable it, that should prevent it from restarting and then it will not have that issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked