Page 2 of 4
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Mon Apr 09, 2018 10:13 am
by rtsupport
Yes neb.conf was from **** 107.31 servers and we have only one worker server for Test and Dev environment.
Have attached doc with some findings in your PM please check and advised.
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Mon Apr 09, 2018 2:16 pm
by tgriep
@cdienger is out of the office this week and we do not have access to his forum inbox so can you send the information to my PM account?
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Tue Apr 10, 2018 4:52 am
by rtsupport
Sure, Details has been posted to your PM Please check.
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Tue Apr 10, 2018 8:44 am
by tgriep
I just want to verify that you are running 2 Nagios servers and both of them are running the Gearman server.
Then you have 2 Gearman workers that are setup to process checks from both of the Gearman servers, it that correct?
In your document, you say that you stop the Gearman Worker and the Jobs waiting increase, that would be normal as the checks are setup to be run by that worker and it is not running.
When a Gearman Worker it not running and the check is in the queue on a Gearman worker for longer than 10 minutes, it will return to the Nagios server the "host check orphaned, is the mod-gearman worker on queue" message which is telling you that a Gearman Worker is down. That is normal as well.
You posted this last Friday.
"Also noticed that X server was having Host group - admin_locale_wb_xgi and when I changed it other admin_locale_wv_xgi its throwing old group error. ( Please see attached error for more better understanding ) and when I add host group " nagios_infrastructure " it started working fine."
What were the steps that you did for this?
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Wed Apr 11, 2018 5:48 am
by rtsupport
No, we have only 1 worker server which configured for both Nagios Servers .
Worker server --
Nagios Server -- 107.31
Nagios Server -- 108.37
And keeping in mind same if we are stopping worker servers then ideally Queue should be increase on both server which is happing on only 108.37 and another server 107.31 checked and alerts are working fine ( have shared screenshot as well )
Have checked neb.conf and module.conf on both Nagios servers both are identical. Have attached for your reference as well.
Gearman Version on both Nagios servers -- gearmand 1.1.12
For Host group change steps done by me .. CCM > Host > Select the Host > Manage Host Group > Removed "admin_locale_wb_xgi" and added "admin_locale_wv_xgi" using search option. > Save > Apply Configure.
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Wed Apr 11, 2018 8:43 am
by tgriep
OK, I think I know what you are doing.
You have 2 Nagios Servers and each of them is running a Gearman Server.
Then you have only 1 Gearman worker and you want it to process the queues from both Gearman Servers, correct?
You may have to set the use_uniq_jobs to no because with the duplicate Hostgroups, the servers may be having problems with the worker going down and it doesn't process them.
use_uniq_jobs
Using uniq keys prevents the gearman queues from filling up when there is no worker. However, gearmand seems to have problems with the uniq key and sometimes jobs get stuck in the queue. Set this option to off when you run into problems with stuck jobs but make sure your worker are running. Default is On.
Another option is to setup only one Gearman Server and have the 2 Nagios servers connect to that server then setup 2 result_queues on the gearman server. One for each Nagios server and then setup the worker to only use that Gearman Server.
result_queue
sets the result queue. Necessary when putting jobs from several Naemon instances onto the same gearman queues. Default: check_results
Third option it to run 2 unique workers on the Worker server. One worker for one Gearman Server and the other Worker for the other Gearman server.
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Wed Apr 11, 2018 10:36 am
by rtsupport
yes, you are correct with server infra,
and as per your advised I have changed the " use_uniq_jobs=no " and after that when I enter the host group for specific location " admin_locale_wb_xgi " " admin_locale_wv_xgi" on one of the server checks for that server stopped ( Almost a hr monitoring it ) with no error "host check orphaned" and for rest of the server which have host group " nagios_infrastructure " are working fine.
set value as below on both Nagios servers then > restarted nagios, gearmand
retarted gearman_worker on worker server as well.
# use_uniq_jobs
# Using uniq keys prevents the gearman queues from filling up when there
# is no worker. However, gearmand seems to have problems with the uniq
# key and sometimes jobs get stuck in the queue. Set this option to 'off'
# when you run into problems with stuck jobs but make sure your worker
# are running.
use_uniq_jobs=no
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Wed Apr 11, 2018 11:12 am
by tgriep
What I think is happening is that the Worker is having issues with the duplicate Hostgroups that are coming from 2 separate servers and it is now working right.
I suggest going to a single Gearman server and a single Gearman worker and see if that works better for your needs.
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Thu Apr 12, 2018 8:48 am
by rtsupport
We have commented ..108.37 in worker server and now we have only one gearman ..107.31 server with one worker server.
After this changes.. All checks with Linux, and windows server are working fine with specified host groups.
But we have to identified why we are not able to run two gearman server with one worker server. as we are in process of up-gradation then our servers architect would be ...
Two Nagios servers PRD/DR connected with --
X Site Two worker server
Y Site Two worker server
Z Site Two worker server
M Site Two worker server
All specified Sites worker server will connect with both Nagios server PRD/DR and if X Site any single worker server is down then we have another worker server to process the checks.
Additionally want to check...
Currently we have Nagios installed on Linux 6.9 which we have to upgrade on RHEL 7.4 so what would be the best practice for it.
As per my understanding .. Install Existing Nagios Version on RHEL 7.4 > Restore Existing Nagios Backup > upgrade Nagios Xi with latest version.
If we are going with this process then will Nagios configuration with Linux 6.9 work with RHEL 7.4 ?
or do you have any better plan for it ?
Re: host check orphaned, is the mod-gearman worker on queue
Posted: Thu Apr 12, 2018 10:32 am
by tgriep
Mod Gearman was not developed by us so you may have to ask the author on why your setup didn't work to be sure.
I did not see the Scenario you are trying to setup in your environment.
Your steps for migrating from a 6.x OS to a 7.x OS is sound.
Setup the new server and install the same version of XI to make sure everything is compatible.
Do the backup on the original server and restore on the new server. Here is the link for that procedure if you need it.
https://support.nagios.com/kb/article.php?id=180
Then I suggest doing the manual upgrade of XI on the new server and the documentation for that is here.
https://support.nagios.com/kb/article/n ... i-134.html
As long as both OS's are running the same architecture, it should migrate for you.
A few things to do on the new server.
If you setup SSL on the old server, you will have to set it up on the new as the settings for that are not migrated.
If you are using a different hostname for the server, you would have to go and update the Program URL and the External URL in the Admin > System Settings menu.