Nagios XI 5.4.4 Appluconfiguration taking forever

emartine · Post by **emartine** » Wed Sep 13, 2017 11:31 am

Ok. So I just saw an error come in at around 11:11:32

[2017-09-13 11:11:15][30705][DEBUG] received job for queue service: 01hp - Current Users
[2017-09-13 11:11:15][30705][DEBUG] service: '01hp' - 'Current Users', next_check is at 2017-09-13 11:11:15, latency so far: 0
[2017-09-13 11:11:15][30705][DEBUG] service job completed: 01hp Current Users: 0
[2017-09-13 11:11:22][30705][DEBUG] received job for queue host: tst01hp
[2017-09-13 11:11:22][30705][DEBUG] host: 'tst01hp', next_check is at 2017-09-13 11:11:22, latency so far: 0
[2017-09-13 11:11:22][30705][DEBUG] host job completed: tst01hp: 0
[2017-09-13 11:11:26][30705][DEBUG] received job for queue service: 01hp - Current Load
[2017-09-13 11:11:26][30705][DEBUG] service: '01hp' - 'Current Load', next_check is at 2017-09-13 11:11:26, latency so far: 0
[2017-09-13 11:11:27][30705][DEBUG] service job completed: 01hp Current Load: 0
[2017-09-13 11:11:28][30705][DEBUG] received job for queue service: tst01dar - Swap
[2017-09-13 11:11:28][30705][DEBUG] service: 'tst01dar' - 'Swap', next_check is at 2017-09-13 11:11:28, latency so far: 0
[2017-09-13 11:11:29][30705][DEBUG] service job completed: tst01dar Swap: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver-dr - Ping
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver-dr' - 'Ping', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue host: 02hp
[2017-09-13 11:11:32][30705][DEBUG] host: '02hp', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver - Current Load
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver' - 'Current Load', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] service job completed: nagiosserver-dr Ping: 0

I then did a grep for "orphaned" and it appears to be happening often but not enough to trigger an alert.

[2017-09-13 10:33:32][30705][DEBUG] host check for 01hp orphaned
[2017-09-13 10:39:32][30705][DEBUG] host check for 01dar orphaned
[2017-09-13 10:41:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 10:42:32][30705][DEBUG] host check for 02dar orphaned
[2017-09-13 10:46:32][30705][DEBUG] host check for tst01hp orphaned
[2017-09-13 11:08:32][30705][DEBUG] host check for tst01dar orphaned
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned

I understand that there are newer versions elsewhere but I am looking for the supported version by nagios.

Post by **tgriep** » Wed Sep 13, 2017 1:00 pm

Orphaned hosts could be a Hostgroup settings in the Modgearman configuration files.
If the Gearman server tries to run the host check on the wrong gearman worker, it could generate that error.
Another cause is if the Nagios server or the Gearman server / worker cannot process the checks quick enough because of resource issues, could cause that error.
Make sure you have enough CPU's and Memory assigned to the servers.

One thing to try is to edit the Gearman worker config files and increase the min workers and max jobs, that could solve the issue as the worker can process more checks.

Code: Select all

# Minimum number of worker processes which should
# run at any time.
min-worker=5

Code: Select all

# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000

emartine · Post by **emartine** » Wed Sep 13, 2017 2:07 pm

I have not configured Hostgroup settings in modgearman.
Our DR server is checking 9 Hosts with a total of 65 services. It is a physical server, 128G RAM, and CPU

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Stepping: 4
CPU MHz: 1200.000

Has gearman worker with setting set to 10 workers and max jobs set to 1000.

Any other ideas?

Post by **tgriep** » Wed Sep 13, 2017 2:48 pm

Check all of the logs for all of the workers and see if there is any errors in them that could help.
Does the DR system have it's own Gearman Infrastructure or is it sharing it with others?

emartine · Post by **emartine** » Wed Sep 13, 2017 4:33 pm

the DR server only has the worker on it and the server on it. It is stand alone manual DR.

emartine · Post by **emartine** » Thu Sep 14, 2017 9:15 am

I enabled debug on the gearman worker as well as the server. Only the server reports orphaneded checks and they are only for hosts, not services.

Post by **tgriep** » Thu Sep 14, 2017 9:48 am

Can you post the full worker.conf file and the module.conf file and the output of the following command run on the Nagios server?

Code: Select all

gearman_top2 -b

emartine · Post by **emartine** » Mon Sep 18, 2017 7:23 pm

I PMd you the items.

tmcdonald · Post by **tmcdonald** » Tue Sep 19, 2017 9:13 am

tgriep is actually out this week - can you please post the items or send them to myself in a PM? I will be sure to share the items with the rest of the team.

Mod Edit: Files received and placed on shared drive

emartine · Post by **emartine** » Tue Sep 19, 2017 11:06 am

Messages sent.

Nagios Support Forum

Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Re: Nagios XI 5.4.4 Appluconfiguration taking forever