Nagios XI 5.4.4 Appluconfiguration taking forever

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by emartine »

Ok. So I just saw an error come in at around 11:11:32

[2017-09-13 11:11:15][30705][DEBUG] received job for queue service: 01hp - Current Users
[2017-09-13 11:11:15][30705][DEBUG] service: '01hp' - 'Current Users', next_check is at 2017-09-13 11:11:15, latency so far: 0
[2017-09-13 11:11:15][30705][DEBUG] service job completed: 01hp Current Users: 0
[2017-09-13 11:11:22][30705][DEBUG] received job for queue host: tst01hp
[2017-09-13 11:11:22][30705][DEBUG] host: 'tst01hp', next_check is at 2017-09-13 11:11:22, latency so far: 0
[2017-09-13 11:11:22][30705][DEBUG] host job completed: tst01hp: 0
[2017-09-13 11:11:26][30705][DEBUG] received job for queue service: 01hp - Current Load
[2017-09-13 11:11:26][30705][DEBUG] service: '01hp' - 'Current Load', next_check is at 2017-09-13 11:11:26, latency so far: 0
[2017-09-13 11:11:27][30705][DEBUG] service job completed: 01hp Current Load: 0
[2017-09-13 11:11:28][30705][DEBUG] received job for queue service: tst01dar - Swap
[2017-09-13 11:11:28][30705][DEBUG] service: 'tst01dar' - 'Swap', next_check is at 2017-09-13 11:11:28, latency so far: 0
[2017-09-13 11:11:29][30705][DEBUG] service job completed: tst01dar Swap: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver-dr - Ping
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver-dr' - 'Ping', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] received job for queue host: 02hp
[2017-09-13 11:11:32][30705][DEBUG] host: '02hp', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 11:11:32][30705][DEBUG] received job for queue service: nagiosserver - Current Load
[2017-09-13 11:11:32][30705][DEBUG] service: 'nagiosserver' - 'Current Load', next_check is at 2017-09-13 11:11:32, latency so far: 0
[2017-09-13 11:11:32][30705][DEBUG] service job completed: nagiosserver-dr Ping: 0

I then did a grep for "orphaned" and it appears to be happening often but not enough to trigger an alert.


[2017-09-13 10:33:32][30705][DEBUG] host check for 01hp orphaned
[2017-09-13 10:39:32][30705][DEBUG] host check for 01dar orphaned
[2017-09-13 10:41:32][30705][DEBUG] host check for 02hp orphaned
[2017-09-13 10:42:32][30705][DEBUG] host check for 02dar orphaned
[2017-09-13 10:46:32][30705][DEBUG] host check for tst01hp orphaned
[2017-09-13 11:08:32][30705][DEBUG] host check for tst01dar orphaned
[2017-09-13 11:11:32][30705][DEBUG] host check for 02hp orphaned


I understand that there are newer versions elsewhere but I am looking for the supported version by nagios.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by tgriep »

Orphaned hosts could be a Hostgroup settings in the Modgearman configuration files.
If the Gearman server tries to run the host check on the wrong gearman worker, it could generate that error.
Another cause is if the Nagios server or the Gearman server / worker cannot process the checks quick enough because of resource issues, could cause that error.
Make sure you have enough CPU's and Memory assigned to the servers.

One thing to try is to edit the Gearman worker config files and increase the min workers and max jobs, that could solve the issue as the worker can process more checks.

Code: Select all

# Minimum number of worker processes which should
# run at any time.
min-worker=5

Code: Select all

# Controls the amount of jobs a worker will do before he exits
# Use this to control how fast the amount of workers will go down
# after high load times
max-jobs=1000
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by emartine »

I have not configured Hostgroup settings in modgearman.
Our DR server is checking 9 Hosts with a total of 65 services. It is a physical server, 128G RAM, and CPU

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Stepping: 4
CPU MHz: 1200.000

Has gearman worker with setting set to 10 workers and max jobs set to 1000.

Any other ideas?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by tgriep »

Check all of the logs for all of the workers and see if there is any errors in them that could help.
Does the DR system have it's own Gearman Infrastructure or is it sharing it with others?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by emartine »

the DR server only has the worker on it and the server on it. It is stand alone manual DR.
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by emartine »

I enabled debug on the gearman worker as well as the server. Only the server reports orphaneded checks and they are only for hosts, not services.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by tgriep »

Can you post the full worker.conf file and the module.conf file and the output of the following command run on the Nagios server?

Code: Select all

gearman_top2 -b
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by emartine »

I PMd you the items.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by tmcdonald »

tgriep is actually out this week - can you please post the items or send them to myself in a PM? I will be sure to share the items with the rest of the team.

Mod Edit: Files received and placed on shared drive
Former Nagios employee
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI 5.4.4 Appluconfiguration taking forever

Post by emartine »

Messages sent.
Locked