jdalrymple wrote:A few things:
By default there is a log at /var/log/mod_gearman/mod_gearman_worker.log
Look there and see if there aren't any clues. I bet there will be. If not up your debug level in /etc/mod_gearman/mod_gearman_worker.conf to 2:
Code: Select all
# use debug to increase the verbosity of the module.
# Possible values are:
# 0 = only errors
# 1 = debug messages
# 2 = trace messages
# 3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=2
Is the worker on the same host as Nagios, I assume not?
Does the Linux box hosting the worker have any monitoring enabled, load, memory, etc?
Do you have multiple workers and if so did any others exhibit weirdness?
This is what i see in the log:
There is many many many of these:
[3423][INFO ] no checks in 2minutes, restarting all workers
[1431][INFO ] no checks in 2minutes, restarting all workers
Earlier that day:
[2015-06-08 00:13:52][1432][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 00:18:54][1432][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 00:23:54][1432][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 00:32:32][1432][INFO ] mod_gearman worker daemon started with pid 1432
[2015-06-08 00:32:32][1432][INFO ] found pid file for: 1432
[2015-06-08 00:32:32][1432][INFO ] pidfile already exists, cannot start!
[2015-06-08 08:36:23][3423][INFO ] mod_gearman worker daemon started with pid 3423
[2015-06-08 08:36:23][3423][INFO ] found pid file for: 1432
[2015-06-08 08:36:23][3423][INFO ] removed stale pidfile
[2015-06-08 08:36:24][3423][INFO ] no checks in 2minutes, restarting all workers
Later that day:
[2015-06-08 09:05:52][4091][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:05:54][4092][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:00][3423][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 09:06:14][4092][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:14][4092][ERROR] cannot start client
[2015-06-08 09:06:24][4102][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4100][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4101][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4099][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:24][4103][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:35][4104][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4102][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4102][ERROR] cannot start client
[2015-06-08 09:06:44][4101][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4101][ERROR] cannot start client
[2015-06-08 09:06:44][4099][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4099][ERROR] cannot start client
[2015-06-08 09:06:44][4100][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4100][ERROR] cannot start client
[2015-06-08 09:06:44][4103][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:44][4103][ERROR] cannot start client
[2015-06-08 09:06:55][4104][ERROR] client error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:06:55][4104][ERROR] cannot start client
[2015-06-08 09:07:05][4108][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4110][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4107][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4111][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
[2015-06-08 09:07:05][4109][ERROR] worker error: gearman_connection_create_args(GEARMAN_GETADDRINFO) -> libgearman/connection.cc:211
At 11:07 i restarted mod_gearman_worker service.
And then from around the same time i got the mail that my worker was unreachable:
[2015-06-08 11:02:56][3423][INFO ] no checks in 2minutes, restarting all workers
[2015-06-08 11:12:36][1431][INFO ] mod_gearman worker daemon started with pid 1431
[2015-06-08 11:12:36][1431][INFO ] found pid file for: 3423
[2015-06-08 11:12:36][1431][INFO ] removed stale pidfile
[2015-06-08 11:12:38][1431][INFO ] no checks in 2minutes, restarting all workers
------------------------------------------------------------------------------------------------------------------------------------
Is the worker on the same host as Nagios?
This host is installed on a completely different host as the Nagios XI server, as are 15 others.
Does the Linux box hosting the worker have any monitoring enabled, load, memory, etc?
No, do you have any tips?
Do you have multiple workers and if so did any other exhibit weirdness?
Yes, there's about 15+ others around the world. This is maybe the second or third time this happened to any of my workers, everytime i restart the service or the machine it is fixed. But this is more of a workaround for me. I didn't investigate on these.