Nagios XI 5.4.4 Appluconfiguration taking forever
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
Ok. So I just saw an error come in at around 11:11:32
[2017-09-13 11:11:15][30705][DEBUG] received job for queue service: 01hp - Current Users
[2017-09-13 11:11:15][30705][DEBUG] service: '01hp' - 'Current Users', next_check is at 2017-09-13 11:11:15, latency so far: 0
[2017-09-13 11:11:15][30705][DEBUG] service job completed: 01hp Current Users: 0
[2017-09-13 11:11:22][30705][DEBUG] received job for queue host: tst01hp
[2017-09-13 11:11:22][30705][DEBUG] host: 'tst01hp', next_check is at 2017-09-13 11:11:22, latency so far: 0
[2017-09-13 11:11:22][30705][DEBUG] host job completed: tst01hp: 0
[2017-09-13 11:11:26][30705][DEBUG] received job for queue service: 01hp - Current Load
[2017-09-13 11:11:26][30705][DEBUG] service: '01hp' - 'Current Load', next_check is at 2017-09-13 11:11:26, latency so far: 0
[2017-09-13 11:11:27][30705][DEBUG] service job completed: 01hp Current Load: 0
[2017-09-13 11:11:28][30705][DEBUG] received job for queue service: tst01dar - Swap
[2017-09-13 11:11:28][30705][DEBUG] service: 'tst01dar' - 'Swap', next_check is at 2017-09-13 11:11:28, latency so far: 0
[2017-09-13 11:11:29][30705][DEBUG] service job completed: tst01dar Swap: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver-dr - Ping
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver-dr' - 'Ping', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue host: 02hp
[2017-09-13 11:11:32][30705][DEBUG] host: '02hp', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver - Current Load
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver' - 'Current Load', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] service job completed: nagiosserver-dr Ping: 0
I then did a grep for "orphaned" and it appears to be happening often but not enough to trigger an alert.
[2017-09-13 10:33:32][30705][DEBUG] host check for 01hp orphaned
[2017-09-13 10:39:32][30705][DEBUG] host check for 01dar orphaned
[2017-09-13 10:41:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 10:42:32][30705][DEBUG] host check for 02dar orphaned
[2017-09-13 10:46:32][30705][DEBUG] host check for tst01hp orphaned
[2017-09-13 11:08:32][30705][DEBUG] host check for tst01dar orphaned
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
I understand that there are newer versions elsewhere but I am looking for the supported version by nagios.
[2017-09-13 11:11:15][30705][DEBUG] received job for queue service: 01hp - Current Users
[2017-09-13 11:11:15][30705][DEBUG] service: '01hp' - 'Current Users', next_check is at 2017-09-13 11:11:15, latency so far: 0
[2017-09-13 11:11:15][30705][DEBUG] service job completed: 01hp Current Users: 0
[2017-09-13 11:11:22][30705][DEBUG] received job for queue host: tst01hp
[2017-09-13 11:11:22][30705][DEBUG] host: 'tst01hp', next_check is at 2017-09-13 11:11:22, latency so far: 0
[2017-09-13 11:11:22][30705][DEBUG] host job completed: tst01hp: 0
[2017-09-13 11:11:26][30705][DEBUG] received job for queue service: 01hp - Current Load
[2017-09-13 11:11:26][30705][DEBUG] service: '01hp' - 'Current Load', next_check is at 2017-09-13 11:11:26, latency so far: 0
[2017-09-13 11:11:27][30705][DEBUG] service job completed: 01hp Current Load: 0
[2017-09-13 11:11:28][30705][DEBUG] received job for queue service: tst01dar - Swap
[2017-09-13 11:11:28][30705][DEBUG] service: 'tst01dar' - 'Swap', next_check is at 2017-09-13 11:11:28, latency so far: 0
[2017-09-13 11:11:29][30705][DEBUG] service job completed: tst01dar Swap: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver-dr - Ping
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver-dr' - 'Ping', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue host: 02hp
[2017-09-13 11:11:32][30705][DEBUG] host: '02hp', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver - Current Load
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver' - 'Current Load', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] service job completed: nagiosserver-dr Ping: 0
I then did a grep for "orphaned" and it appears to be happening often but not enough to trigger an alert.
[2017-09-13 10:33:32][30705][DEBUG] host check for 01hp orphaned
[2017-09-13 10:39:32][30705][DEBUG] host check for 01dar orphaned
[2017-09-13 10:41:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 10:42:32][30705][DEBUG] host check for 02dar orphaned
[2017-09-13 10:46:32][30705][DEBUG] host check for tst01hp orphaned
[2017-09-13 11:08:32][30705][DEBUG] host check for tst01dar orphaned
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
I understand that there are newer versions elsewhere but I am looking for the supported version by nagios.
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
Orphaned hosts could be a Hostgroup settings in the Modgearman configuration files.
If the Gearman server tries to run the host check on the wrong gearman worker, it could generate that error.
Another cause is if the Nagios server or the Gearman server / worker cannot process the checks quick enough because of resource issues, could cause that error.
Make sure you have enough CPU's and Memory assigned to the servers.
One thing to try is to edit the Gearman worker config files and increase the min workers and max jobs, that could solve the issue as the worker can process more checks.
If the Gearman server tries to run the host check on the wrong gearman worker, it could generate that error.
Another cause is if the Nagios server or the Gearman server / worker cannot process the checks quick enough because of resource issues, could cause that error.
Make sure you have enough CPU's and Memory assigned to the servers.
One thing to try is to edit the Gearman worker config files and increase the min workers and max jobs, that could solve the issue as the worker can process more checks.
Code: Select all
# Minimum number of worker processes which should
# run at any time.
min-worker=5Code: Select all
# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
I have not configured Hostgroup settings in modgearman.
Our DR server is checking 9 Hosts with a total of 65 services. It is a physical server, 128G RAM, and CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Stepping: 4
CPU MHz: 1200.000
Has gearman worker with setting set to 10 workers and max jobs set to 1000.
Any other ideas?
Our DR server is checking 9 Hosts with a total of 65 services. It is a physical server, 128G RAM, and CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Stepping: 4
CPU MHz: 1200.000
Has gearman worker with setting set to 10 workers and max jobs set to 1000.
Any other ideas?
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
Check all of the logs for all of the workers and see if there is any errors in them that could help.
Does the DR system have it's own Gearman Infrastructure or is it sharing it with others?
Does the DR system have it's own Gearman Infrastructure or is it sharing it with others?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
the DR server only has the worker on it and the server on it. It is stand alone manual DR.
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
I enabled debug on the gearman worker as well as the server. Only the server reports orphaneded checks and they are only for hosts, not services.
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
Can you post the full worker.conf file and the module.conf file and the output of the following command run on the Nagios server?
Code: Select all
gearman_top2 -bBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
I PMd you the items.
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
tgriep is actually out this week - can you please post the items or send them to myself in a PM? I will be sure to share the items with the rest of the team.
Mod Edit: Files received and placed on shared drive
Mod Edit: Files received and placed on shared drive
Former Nagios employee
Re: Nagios XI 5.4.4 Appluconfiguration taking forever
Messages sent.