Page 3 of 5

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Wed Sep 13, 2017 11:31 am
by emartine
Ok. So I just saw an error come in at around 11:11:32

[2017-09-13 11:11:15][30705][DEBUG] received job for queue service: 01hp - Current Users
[2017-09-13 11:11:15][30705][DEBUG] service: '01hp' - 'Current Users', next_check is at 2017-09-13 11:11:15, latency so far: 0
[2017-09-13 11:11:15][30705][DEBUG] service job completed: 01hp Current Users: 0
[2017-09-13 11:11:22][30705][DEBUG] received job for queue host: tst01hp
[2017-09-13 11:11:22][30705][DEBUG] host: 'tst01hp', next_check is at 2017-09-13 11:11:22, latency so far: 0
[2017-09-13 11:11:22][30705][DEBUG] host job completed: tst01hp: 0
[2017-09-13 11:11:26][30705][DEBUG] received job for queue service: 01hp - Current Load
[2017-09-13 11:11:26][30705][DEBUG] service: '01hp' - 'Current Load', next_check is at 2017-09-13 11:11:26, latency so far: 0
[2017-09-13 11:11:27][30705][DEBUG] service job completed: 01hp Current Load: 0
[2017-09-13 11:11:28][30705][DEBUG] received job for queue service: tst01dar - Swap
[2017-09-13 11:11:28][30705][DEBUG] service: 'tst01dar' - 'Swap', next_check is at 2017-09-13 11:11:28, latency so far: 0
[2017-09-13 11:11:29][30705][DEBUG] service job completed: tst01dar Swap: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver-dr - Ping
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver-dr' - 'Ping', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue host: 02hp
[2017-09-13 11:11:32][30705][DEBUG] host: '02hp', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver - Current Load
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver' - 'Current Load', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] service job completed: nagiosserver-dr Ping: 0

I then did a grep for "orphaned" and it appears to be happening often but not enough to trigger an alert.


[2017-09-13 10:33:32][30705][DEBUG] host check for 01hp orphaned
[2017-09-13 10:39:32][30705][DEBUG] host check for 01dar orphaned
[2017-09-13 10:41:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 10:42:32][30705][DEBUG] host check for 02dar orphaned
[2017-09-13 10:46:32][30705][DEBUG] host check for tst01hp orphaned
[2017-09-13 11:08:32][30705][DEBUG] host check for tst01dar orphaned
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned


I understand that there are newer versions elsewhere but I am looking for the supported version by nagios.

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Wed Sep 13, 2017 1:00 pm
by tgriep
Orphaned hosts could be a Hostgroup settings in the Modgearman configuration files.
If the Gearman server tries to run the host check on the wrong gearman worker, it could generate that error.
Another cause is if the Nagios server or the Gearman server / worker cannot process the checks quick enough because of resource issues, could cause that error.
Make sure you have enough CPU's and Memory assigned to the servers.

One thing to try is to edit the Gearman worker config files and increase the min workers and max jobs, that could solve the issue as the worker can process more checks.

Code: Select all

# Minimum number of worker processes which should
# run at any time.
min-worker=5

Code: Select all

# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Wed Sep 13, 2017 2:07 pm
by emartine
I have not configured Hostgroup settings in modgearman.
Our DR server is checking 9 Hosts with a total of 65 services. It is a physical server, 128G RAM, and CPU

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Stepping: 4
CPU MHz: 1200.000

Has gearman worker with setting set to 10 workers and max jobs set to 1000.

Any other ideas?

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Wed Sep 13, 2017 2:48 pm
by tgriep
Check all of the logs for all of the workers and see if there is any errors in them that could help.
Does the DR system have it's own Gearman Infrastructure or is it sharing it with others?

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Wed Sep 13, 2017 4:33 pm
by emartine
the DR server only has the worker on it and the server on it. It is stand alone manual DR.

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Thu Sep 14, 2017 9:15 am
by emartine
I enabled debug on the gearman worker as well as the server. Only the server reports orphaneded checks and they are only for hosts, not services.

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Thu Sep 14, 2017 9:48 am
by tgriep
Can you post the full worker.conf file and the module.conf file and the output of the following command run on the Nagios server?

Code: Select all

gearman_top2 -b

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Mon Sep 18, 2017 7:23 pm
by emartine
I PMd you the items.

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Tue Sep 19, 2017 9:13 am
by tmcdonald
tgriep is actually out this week - can you please post the items or send them to myself in a PM? I will be sure to share the items with the rest of the team.

Mod Edit: Files received and placed on shared drive

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Posted: Tue Sep 19, 2017 11:06 am
by emartine
Messages sent.