Page 1 of 3
gearman - a lot JOBs waiting
Posted: Thu Mar 03, 2016 8:48 am
by bosecorp
I started noticing that my gearman server is getting behind on JOBs
seems to be having a hard time keeping up
2016-03-03 21:47:28 - 139.68.12.15:4730 - v1.1.8
Queue Name | Worker Available | Jobs Waiting | Jobs Running
------------------------------------------------------------------------
check_results | 1 | 1255 | 0
hostgroup_gearman_hk1 | 24 | 0 | 9
worker_gearmanhk1 | 1 | 0 | 0
------------------------------------------------------------------------
Re: gearman - a lot JOBs waiting
Posted: Thu Mar 03, 2016 3:43 pm
by rkennedy
How many CPU's do you have allocated to the gearman machine? Additionally, what's the output of top|head -25?
Re: gearman - a lot JOBs waiting
Posted: Thu Mar 03, 2016 4:08 pm
by bosecorp
here you go
root@hk-ngmon01:(03-04 05:07): /root
# cat /proc/cpuinfo | grep processor | wc -l
1
top - 05:07:16 up 27 days, 14:54, 1 user, load average: 1.43, 1.09, 0.91
Tasks: 223 total, 3 running, 220 sleeping, 0 stopped, 0 zombie
Cpu(s): 37.1%us, 4.2%sy, 0.0%ni, 58.2%id, 0.0%wa, 0.3%hi, 0.2%si, 0.0%st
Mem: 3916052k total, 3185824k used, 730228k free, 431896k buffers
Swap: 15724540k total, 0k used, 15724540k free, 2110000k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3445 gdm 20 0 392m 63m 9952 R 50.3 1.7 51:37.83 gnome-settings-
5203 nagios 20 0 144m 11m 2128 R 38.7 0.3 0:00.20 check_wmi_plus.
195 root 20 0 0 0 0 S 1.9 0.0 45:54.06 scsi_eh_1
5090 nagios 20 0 71836 3660 3056 S 1.9 0.1 0:00.01 wmic
1 root 20 0 19356 1536 1228 S 0.0 0.0 0:16.62 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.04 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:11.28 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/0
6 root RT 0 0 0 0 S 0.0 0.0 0:03.69 watchdog/0
7 root 20 0 0 0 0 S 0.0 0.0 1:59.11 events/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/0
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/0
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr
Re: gearman - a lot JOBs waiting
Posted: Thu Mar 03, 2016 5:06 pm
by jolson
check_results | 1 | 1255 | 0
There's only one worker for your 1255 checks to run.
Is this a gearman_worker node or is this your gearman server?
Either way, I'd like to see your worker config file:
Once you get that back, we'll see if there's anything wrong with the configuration file.
I'm also interested in your waiting jobs:
Code: Select all
ps -ef | grep gearman | wc -l
ps -ef | grep gearman | tail -n20
Re: gearman - a lot JOBs waiting
Posted: Thu Mar 03, 2016 7:32 pm
by bosecorp
My worker runs on the same server where gearman runs on
Code: Select all
root@hk-ngmon01:(03-04 08:29): /root
# ps -ef | grep gearman | wc -l
43
root@hk-ngmon01:(03-04 08:29): /root
# ps -ef | grep gearman | tail -n20
nagios 36122 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36134 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36135 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36136 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36146 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36311 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36317 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36318 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36319 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36344 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36345 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36346 38099 0 08:31 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36418 38099 0 08:32 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36445 38099 0 08:32 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36446 38099 0 08:32 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36529 38099 0 08:32 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios 36547 38099 0 08:32 ? 00:00:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
root 36601 34571 0 08:32 pts/0 00:00:00 grep gearman
nagios 38099 1 0 04:25 ? 00:00:02 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid
gearmand 60583 1 0 Mar03 ? 00:04:12 /usr/sbin/gearmand -d --log-file=/var/log/gearmand/gearmand.log -q builtin --verbose WARNING
root@hk-ngmon01:(03-04 08:29): /root
#
# cat /etc/mod_gearman/mod_gearman_worker.conf
###############################################################################
#
# Mod-Gearman - distribute checks with gearman
#
# Copyright (c) 2010 Sven Nierlein
#
# Worker Module Config
#
###############################################################################
# Identifier, hostname will be used if undefined
identifier=gearmanhk1
# use debug to increase the verbosity of the module.
# Possible values are:
# 0 = only errors
# 1 = debug messages
# 2 = trace messages
# 3 = trace and all gearman related logs are going to stdout.
# Default is 0.
debug=1
# Path to the logfile.
logfile=/var/log/mod_gearman/mod_gearman_worker.log
# sets the addess of your gearman job server. Can be specified
# more than once to add more server.
#server=10.100.30.113:4730
server=localhost:4730
# sets the address of your 2nd (duplicate) gearman job server. Can
# be specified more than once o add more servers.
#dupserver=<host>:<port>
# defines if the module should distribute execution of
# eventhandlers.
eventhandler=no
# defines if the module should distribute execution of
# service checks.
services=no
# defines if the module should distribute execution of
# host checks.
hosts=no
# sets a list of hostgroups which will go into seperate
# queues. Either specify a comma seperated list or use
# multiple lines.
hostgroups=gearman_hk1
#hostgroups=name2,name3
# sets a list of servicegroups which will go into seperate
# queues.
# Set this to 'no' if you want Mod-Gearman to only take care of
# servicechecks. No hostchecks will be processed by Mod-Gearman. Use
# this option to disable hostchecks and still have the possibility to
# use hostgroups for easy configuration of your services.
# If set to yes, you still have to define which hostchecks should be
# processed by either using 'hosts' or the 'hostgroups' option.
# Default is Yes.
do_hostchecks=yes
# enables or disables encryption. It is strongly
# advised to not disable encryption. Anybody will be
# able to inject packages to your worker.
# Encryption is enabled by default and you have to
# explicitly disable it.
# When using encryption, you will either have to
# specify a shared password with key=... or a
# keyfile with keyfile=...
# Default is On.
encryption=yes
# A shared password which will be used for
# encryption of data pakets. Should be at least 8
# bytes long. Maximum length is 32 characters.
key=Qb86Fg93
# The shared password will be read from this file.
# Use either key or keyfile. Only the first 32
# characters will be used.
#keyfile=/path/to/secret.file
###############################################################################
#
# Worker Config
#
# the following settings are for the worker only and
# will be ignored by the neb module.
#
###############################################################################
# Path to the pidfile. Usually set by the init script
#pidfile=/var/mod_gearman/mod_gearman_worker.pid
# Default job timeout in seconds. Currently this value is only used for
# eventhandler. The worker will use the values from the core for host and
# service checks.
job_timeout=60
# Minimum number of worker processes which should
# run at any time.
min-worker=39
# Maximum number of worker processes which should
# run at any time. You may set this equal to
# min-worker setting to disable dynamic starting of
# workers. When setting this to 1, all services from
# this worker will be executed one after another.
max-worker=100
# Time after which an idling worker exists
# This parameter controls how fast your waiting workers will
# exit if there are no jobs waiting.
idle-timeout=30
# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000
# max-age is the threshold for discarding too old jobs. When a new job is older
# than this amount of seconds it will not be executed and just discarded. Set to
# zero to disable this check.
#max-age=0
# defines the rate of spawned worker per second as long
# as there are jobs waiting
spawn-rate=1
# Use this option to disable an extra fork for each plugin execution. Disabling
# this option will reduce the load on the worker host but can lead to problems with
# unclean plugin. Default: yes
fork_on_exec=no
# Use this option to show stderr output of plugins too.
# Default: yes
show_error_output=yes
# Use dup_results_are_passive to set if the duplicate result send to the dupserver
# will be passive or active.
# Default is yes (passive).
#dup_results_are_passive=yes
# When embedded perl has been compiled in, you can use this
# switch to enable or disable the embedded perl interpreter.
enable_embedded_perl=on
# Default value used when the perl script does not have a
# "nagios: +epn" or "nagios: -epn" set.
# Perl scripts not written for epn support usually fail with epn,
# so its better to set the default to off.
use_embedded_perl_implicitly=off
# Cache compiled perl scripts. This makes the worker process a little
# bit bigger but makes execution of perl scripts even faster.
# When turned off, Mod-Gearman will still use the embedded perl
# interpreter, but will not cache the compiled script.
use_perl_cache=on
# path to p1 file which is used to execute and cache the
# perl scripts run by the embedded perl interpreter
p1_file=/usr/share/mod_gearman/mod_gearman_p1.pl
# Workarounds
# workaround for rc 25 bug
# duplicate jobs from gearmand result in exit code 25 of plugins
# because they are executed twice and get killed because of using
# the same ressource.
# Sending results (when exit code is 25 ) will be skipped with this
# enabled.
workaround_rc_25=off
Re: gearman - a lot JOBs waiting
Posted: Thu Mar 03, 2016 7:51 pm
by gormank
I tried running my nagios servers (gearman on the same host) on a VM w/ a single core and they were not happy. I went back to 4 cores and now load averages are around .25 or so.
I'm too dim to know how to show the jobs waiting as shown in post 1. Please enlighten me. I did google it, but not much help... I also looked at the output of gearman -H and check_gearman2
Re: gearman - a lot JOBs waiting
Posted: Fri Mar 04, 2016 10:26 am
by jolson
bosecorp,
Could you try allocating one-two more CPUs to the gearman machine?
gormank,
The application you're looking for is either called gearman_top or gearman_top2.
Re: gearman - a lot JOBs waiting
Posted: Fri Mar 04, 2016 10:41 am
by bosecorp
did that,
situation improved for the first hours, but it build up over time again
I have 8 CPUs now with 8GB, but I still see now 800 JOBs
2016-03-04 23:43:28 - 139.68.12.15:4730 - v1.1.8
Queue Name | Worker Available | Jobs Waiting | Jobs Running
------------------------------------------------------------------------
check_results | 1 | 941 | 1
hostgroup_gearman_hk1 | 39 | 0 | 18
worker_gearmanhk1 | 1 | 0 | 0
------------------------------------------------------------------------
Re: gearman - a lot JOBs waiting
Posted: Fri Mar 04, 2016 2:01 pm
by ssax
Try adjusting your worker configuration to disable debug logging (increases load) and reduce your min-worker to 5 (it will automatically start as many as it needs up to max-worker, no need to consume the extra resources on startup).
Re: gearman - a lot JOBs waiting
Posted: Fri Mar 04, 2016 2:31 pm
by bosecorp
just implemented that...still doesn't help
2016-03-05 03:31:06 - 139.68.12.15:4730 - v1.1.8
Queue Name | Worker Available | Jobs Waiting | Jobs Running
------------------------------------------------------------------------
check_results | 1 | 1065 | 1
hostgroup_gearman_hk1 | 33 | 0 | 9
worker_gearmanhk1 | 1 | 0 | 0
------------------------------------------------------------------------