check_gearman question

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

check_gearman question

Post by Fred Kroeger »

I have implemented the latest version of mod_gearman with NagiosXI v5.6.6 and all is working well.
I would like to monitor mod_gearman - specifically the Waiting jobs queue.
Searching for check_gearman on the Nagios server I find two different versions

Code: Select all

# /usr/bin/check_gearman -V
check_gearman: version 3.0.7 running on libgearman 0.33

# /usr/local/nagios/libexec/check_gearman -V
check_gearman 0.2.1
Using the version in libexec, I'm not sure how to monitor the jobs that are waiting. The help info for the monitor says to use -f option , however when I use this it measures the Jobs Running - not Jobs Waiting

Code: Select all

-f, --flist=STRING
   Check for the functions listed in STRING, separated by comma. If optional threshold is given (separated by :), check that waiting jobs for this particular function are not exceeding that value
Running the following returns a Warning but the Jobs Waiting queue is zero as I am monitoring gearman_top at the same time.

Code: Select all

# /usr/local/nagios/libexec/check_gearman -f hostgroup_SITE-DC1:1
CHECK_GEARMAN WARNING - 20 jobs for hostgroup_SITE-DC1 exceeds threshold 1
2019-08-29 19:09:44  -  localhost:4730  -  v0.33


 Queue Name                 | Worker Available | Jobs Waiting | Jobs Running
-----------------------------------------------------------------------------
 check_results              |               1  |           1  |           0
 eventhandler               |              45  |           0  |           0
 host                       |              45  |           0  |           0
 hostgroup_SITE-XX1         |              31  |           0  |           0
 hostgroup_SITE-XX2         |              31  |           0  |           0
 hostgroup_SITE-DC1         |              70  |           0  |          43
So it would appear that check_gearman is returning the number of Jobs Running rather than Jobs Waiting.
Also it would be good if I could define a Critical threshold as Jobs Waiting is a better indication of there being a problem.

Regards... Fred
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: check_gearman question

Post by tgriep »

First, don't use this check anymore as it is an older version and may now work.

Code: Select all

/usr/local/nagios/libexec/check_gearman
Use this one.

Code: Select all

/usr/bin/check_gearman
On my test system, I ran the gearman_top and it shows that for the servicegroup_Centos7_ServiceGroup queue, I have 2 Jobs Waiting.

Code: Select all

2019-08-29 14:16:23  -  localhost:4730  -  v0.33

 Queue Name                        | Worker Available | Jobs Waiting | Jobs Running
------------------------------------------------------------------------------------
 check_results                     |               1  |           0  |           0
 eventhandler                      |              10  |           0  |           0
 hostgroup_Centos7_HostGroup       |               0  |           1  |           0
 hostgroup_Local_HostGroup         |              10  |           0  |           2
 servicegroup_Centos7_ServiceGroup |               0  |           2  |           0
 worker_centos7.localdomain        |               0  |           0  |           0
 worker_localhost.localdomain      |               1  |           0  |           0
------------------------------------------------------------------------------------
Using this command.

Code: Select all

check_gearman -H localhost -q  servicegroup_Centos7_ServiceGroup
It gave this output

Code: Select all

check_gearman CRITICAL - Queue servicegroup_Centos7_ServiceGroup has 2 jobs without any worker. |'servicegroup_Centos7_ServiceGroup_waiting'=2;10;100;0 'servicegroup_Centos7_ServiceGroup_running'=0 'servicegroup_Centos7_ServiceGroup_worker'=0;25;50;0
Showing it has 2 job waiting so the new check_gearman seems to show the correct results.

The new check you need to specify the Gearman Host with the -H option and which queue you want to look at -q.

If you omit the -q option, it will return all of the queues that is hosted on the Gearman server at once like this example.

Code: Select all

check_gearman -H localhost
check_gearman CRITICAL - Queue hostgroup_Centos7_HostGroup has 1 job without any worker. Queue servicegroup_Centos7_ServiceGroup has 2 jobs without any worker. |'check_results_waiting'=0;10;100;0 'check_results_running'=0 'check_results_worker'=1;25;50;0 'eventhandler_waiting'=0;10;100;0 'eventhandler_running'=0 'eventhandler_worker'=6;25;50;0 'hostgroup_Centos7_HostGroup_waiting'=1;10;100;0 'hostgroup_Centos7_HostGroup_running'=0 'hostgroup_Centos7_HostGroup_worker'=0;25;50;0 'hostgroup_Local_HostGroup_waiting'=0;10;100;0 'hostgroup_Local_HostGroup_running'=1 'hostgroup_Local_HostGroup_worker'=6;25;50;0 'servicegroup_Centos7_ServiceGroup_waiting'=2;10;100;0 'servicegroup_Centos7_ServiceGroup_running'=0 'servicegroup_Centos7_ServiceGroup_worker'=0;25;50;0 'worker_centos7.localdomain_waiting'=0;10;100;0 'worker_centos7.localdomain_running'=0 'worker_centos7.localdomain_worker'=0;25;50;0 'worker_localhost.localdomain_waiting'=0;10;100;0 'worker_localhost.localdomain_running'=0 'worker_localhost.localdomain_worker'=1;25;50;0
To specify thresholds, use the following on the command line.
[ -w=<jobs warning level> ] default: 10
[ -c=<jobs critical level> ] default: 100
[ -W=<worker warning level> ] default: 25
[ -C=<worker critical level> ] default: 50
It is unclear it the thresholds are for Jobs running or Waiting so you will have to test that out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: check_gearman question

Post by Fred Kroeger »

Brilliant - thanks very much for the info. I've always assumed (obviously incorrectly) that the plugins in libexec would be the latest version and supported.
Tried your suggestion and it works as I would have expected

Code: Select all

# ./check_gearman -H localhost -q hostgroup_SITE-DC1
check_gearman CRITICAL - Queue hostgroup_SITE-DC1 has 82 worker. |'hostgroup_SITE-DC1_waiting'=0;10;100;0 'hostgroup_SITE-DC1_running'=13 'hostgroup_SITE-DC1_worker'=82;25;50;0
It appears that the -w/-c thresholds apply to the Jobs Waiting queue and the -W/-C parameters apply to the Worker Available Queue. Not sure what use the latter is as the number of workers available will never be higher than what is configured in the Worker config file? I would expect that it should alert if the number is zero? Otherwise how would you monitor if the worker has stopped? (apart from the Jobs waiting increasing for the hostgroups).

Code: Select all

# ./check_gearman -H localhost -q worker_hostname.domain
check_gearman OK - 0 jobs running and 0 jobs waiting. Version: 0.33|'worker_hostname.domain_waiting'=0;10;100;0 'worker_hostname.domain_running'=0 'worker_hostname.domain_worker'=1;25;50;0
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: check_gearman question

Post by tgriep »

The check will generate an Alert if the workers drop below the thresholds and especially if the number of workers drop to 0.
I would guess they are needed if you have a worker configured outside of the default settings, you would have to specify it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: check_gearman question

Post by Fred Kroeger »

Thanks - you can close this
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: check_gearman question

Post by scottwilkerson »

Fred Kroeger wrote:Thanks - you can close this
Great!

Locking
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked