Page 2 of 3

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Tue Oct 30, 2018 4:44 pm
by ssax
We have checked the load on worker servers but its not an issue.
Did you check the load on the remote machines giving the errors during that time?

Is it only the host checks that are orphaned? Do any of the services give orphaned messages for those systems when they host reports that?

Have you tried stopping and restarting the nagios and mod_gearman services on the XI server and the workers to see if that resolves it?

What is the version of gearman installed on the XI server AND on the workers?

Code: Select all

rpm -qa | grep gearman

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Wed Oct 31, 2018 12:19 am
by progressive.nagiosXI
there is a problem with Host check only service checks are working fine.


[root@monitoring-nagiosxi ~]# rpm -qa | grep gearman
gearmand-server-0.33-2.x86_64
gearmand-0.33-2.x86_64
mod_gearman2-2.1.1-1.el7.centos.x86_64
gearmand-devel-0.33-2.x86_64
gearmand-debuginfo-0.33-2.x86_64

================================================================
This is the output we get when we run status of our gearmand

[root@monitoring-nagiosxi ~]# systemctl status gearmand
● gearmand.service - LSB: start and stop the Gearman server
Loaded: loaded (/etc/rc.d/init.d/gearmand; bad; vendor preset: disabled)
Active: failed (Result: timeout) since Wed 2018-10-31 00:02:57 IST; 10h ago
Docs: man:systemd-sysv-generator(8)
Process: 836 ExecStart=/etc/rc.d/init.d/gearmand start (code=exited, status=0/ SUCCESS)
CGroup: /system.slice/gearmand.service
└─2217 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/...

Oct 30 23:57:57 monitoring-nagiosxi.progressive.in systemd[1]: Starting LSB: ...
Oct 30 23:57:58 monitoring-nagiosxi.progressive.in runuser[838]: pam_unix(run...
Oct 30 23:57:59 monitoring-nagiosxi.progressive.in runuser[838]: pam_unix(run...
Oct 30 23:57:59 monitoring-nagiosxi.progressive.in gearmand[836]: Starting ge...
Oct 30 23:57:59 monitoring-nagiosxi.progressive.in gearmand[836]: /etc/rc.d/i...
Oct 30 23:57:59 monitoring-nagiosxi.progressive.in systemd[1]: PID file /var/...
Oct 31 00:02:57 monitoring-nagiosxi.progressive.in systemd[1]: gearmand.servi...
Oct 31 00:02:57 monitoring-nagiosxi.progressive.in systemd[1]: Failed to star...
Oct 31 00:02:57 monitoring-nagiosxi.progressive.in systemd[1]: Unit gearmand....
Oct 31 00:02:57 monitoring-nagiosxi.progressive.in systemd[1]: gearmand.servi...
Hint: Some lines were ellipsized, use -l to show in full.

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Wed Oct 31, 2018 2:35 pm
by tgriep
Without any debugging information or errors, it is hard to determine that the issue is so let's enable debugging.

On the system running the Gearman server edit the following file

Code: Select all

/etc/mod_gearman2/module.conf
Change this from

Code: Select all

debug=0
to

Code: Select all

debug=1
Save the file and restart the gearman server.

On the workers, edit this file

Code: Select all

/etc/mod_gearman2/module.conf
Change this from

Code: Select all

debug=0
to

Code: Select all

debug=1
Save the file and restart the gearman worker.


The next time you have the host check orphaned message, look at the Gearman server log file at this location

Code: Select all

/var/log/mod_gearman2/mod_gearman_neb.log
and the Gearman Worker log file at this location

Code: Select all

/var/log/mod_gearman2/mod_gearman_worker.log
for any errors that happen at the time of the orphan message to see if you can figure out why the message is happening.

You may want to implement the check_gearman plugin on the Nagios server so it can check the status of the Gearman Server / Worker to see if it is still functioning.
See this link under the How To section
https://labs.consol.de/nagios/mod-gearman/index.html

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Mon Nov 12, 2018 5:31 am
by progressive.nagiosXI
Hi Team,

We are again facing the issue and we have find below please suggest

2018-11-12 16:00:05 - localhost:4730 - v0.33

Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 296 | 0 | 0
host | 4 | 0 | 0
hostgroup_Ambicasteel | 31 | 0 | 28
hostgroup_Thomascook | 51 | 0 | 48
hostgroup_dnatamonitor | 49 | 0 | 49
hostgroup_hungama | 4 | 0 | 0
hostgroup_instarem | 50 | 102 | 50
hostgroup_laptop | 7 | 0 | 0
hostgroup_medanta | 50 | 9 | 50
hostgroup_somanyserver | 50 | 54 | 50
service | 4 | 0 | 0
worker_AS-MONITOR-SERVER | 1 | 0 | 0
worker_Hungama.NagiosXI | 1 | 0 | 0
worker_LTOB-Monitor | 1 | 0 | 0
worker_dcm.somany.com | 0 | 0 | 0
worker_ip-10-0-1-51.eu-west-1.compute.internal | 1 | 0 | 0
worker_localhost.localdomain | 2 | 0 | 0
worker_medanta.monitoring | 1 | 0 | 0
worker_monitoring-nagiosxi.progressive.in | 0 | 0 | 0
worker_nod-s-nagios.com | 1 | 0 | 0
-------------------------------------------------------------------------------------------------


and after some time its giving below error


2018-11-12 16:01:32 - localhost:4730 - v0.33

error reading from localhost:4730 - Interrupted system call

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Mon Nov 12, 2018 10:26 am
by tgriep
Which ever server gave that message, change the server option to use the IP address instead of the localhost entry.

Code: Select all

server=127.0.0.1:4730

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Wed Nov 14, 2018 3:20 am
by progressive.nagiosXI
Its our monitoring server mod-gearman server.

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Wed Nov 14, 2018 6:42 am
by progressive.nagiosXI
Why my gearmand service showing failed in my gearman server (Nagios XI)

[root@monitoring-nagiosxi ~]# systemctl status gearmand
● gearmand.service - LSB: start and stop the Gearman server
Loaded: loaded (/etc/rc.d/init.d/gearmand; bad; vendor preset: disabled)
Active: failed (Result: timeout) since Mon 2018-11-05 10:12:02 IST; 1 weeks 2 days ago
Docs: man:systemd-sysv-generator(8)
Process: 978 ExecStart=/etc/rc.d/init.d/gearmand start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/gearmand.service
└─1088 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.



also it happes some time all gearman checks stops without any intemation and maintaiin the last check data this is very critical

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Wed Nov 14, 2018 7:02 am
by progressive.nagiosXI
I have done the changes as per your recomendation from localhost to IP addrss but still getting the same probelm

Note --> all activity has been done in Nagios XI server (Mod-Gearman-server), after chnages in config file i have restarted mod-gearman-worker service.




2018-11-14 17:28:11 - localhost:4730 - v0.33

Queue Name | Worker Available | Jobs Waiting | Jobs Running
-------------------------------------------------------------------------------------------------
check_results | 1 | 0 | 0
eventhandler | 343 | 0 | 0
host | 4 | 0 | 0
hostgroup_Ambicasteel | 32 | 0 | 28
hostgroup_HTMEDIA | 32 | 0 | 28
hostgroup_Thomascook | 60 | 0 | 55
hostgroup_dnatamonitor | 83 | 0 | 83
hostgroup_hungama | 3 | 0 | 0
hostgroup_instarem | 45 | 0 | 42
hostgroup_laptop | 6 | 0 | 0
hostgroup_medanta | 33 | 0 | 33
hostgroup_somanyserver | 45 | 0 | 41
service | 4 | 0 | 0
worker_AS-MONITOR-SERVER | 1 | 0 | 0
worker_Hungama.NagiosXI | 1 | 0 | 0
worker_LTOB-Monitor | 1 | 0 | 0
worker_dcm.somany.com | 1 | 0 | 0
worker_ip-10-0-1-51.eu-west-1.compute.internal | 1 | 0 | 0
worker_localhost.localdomain | 2 | 0 | 0
worker_medanta.monitoring | 0 | 0 | 0
worker_monitoring-nagiosxi.progressive.in | 0 | 0 | 0
worker_nod-s-nagios.com | 0 | 0 | 0
-------------------------------------------------------------------------------------------------

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Wed Nov 14, 2018 9:36 am
by tgriep
Can you post the Mod Gearman Logfiles from all of the servers so we can check them?

Code: Select all

/var/log/gearmand/gearmand.log
/var/log/mod_gearman2/mod_gearman_neb.log
/var/log/mod_gearman2/mod_gearman_worker.log

Re: (host check orphaned, is the mod-gearman worker on queue

Posted: Thu Nov 15, 2018 7:42 am
by progressive.nagiosXI
Hi Teamm

These files are in gegabyte around 15 GB