Page 1 of 1

gearman error

Posted: Tue Mar 03, 2020 4:04 pm
by progressive.nagiosXI
hi team,

we update nagios core and mod gearman today

we are getting below error log /var/log/gearmand/gearmand.log and disk is going full again and again ,Please help on priority
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691


[root@monitoring-nagiosxi gearmand]# rpm -qa |grep -i gearman
mod_gearman-3.0.7-1.el7.x86_64
gearmand-0.33-7.x86_64
mod_gearman-debuginfo-3.0.7-1.el7.x86_64
gearmand-devel-0.33-7.x86_64
gearmand-server-0.33-7.x86_64


[root@monitoring-nagiosxi gearmand]# /usr/local/nagios/bin/nagios –help

Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Thanks

Re: gearman error

Posted: Tue Mar 03, 2020 8:33 pm
by Box293
What procedure did you use to upgrade?

Re: gearman error

Posted: Wed Mar 04, 2020 10:07 am
by tgriep
Find the systemctl file that loads the gearman daemon.

Code: Select all

find / -name gearmand.service
Typically it could be in this folder.

Code: Select all

/etc/systemd/system/multi-user.target.wants/gearmand.service
Add this line in the file

Code: Select all

LimitNOFILE=50000
After the following section

Code: Select all

[Service]
Save the file and run this as root

Code: Select all

systemctl daemon-reload
then run this

Code: Select all

systemctl restart gearmand
Look at the gearmand.log file to see if the error is gone.

You may also have to increase the linux kernel open file limit.
Edit this file

Code: Select all

/etc/security/limits.conf
Add this to the bottom

Code: Select all

* hard nofile 1000000
* soft nofile 1000000
Reboot the server.

Which file or folder has the data that is filling up the drive?

Re: gearman error

Posted: Wed Mar 04, 2020 12:54 pm
by progressive.nagiosXI
Hi ,

Thanks tgriep ,
after given changes and observe for 60 min we are not getting below error in /var/log/gearmand/gearmand.log which filled disk 2GB/min.

Code: Select all

ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
but now we are getting below error in /var/log/gearmand/gearmand.log

Code: Select all

[root@monitoring-nagiosxi gearmand]# tail -f  gearmand.log
  ERROR 2020-02-04 17:30:52.000000 [     3 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:30:52.000000 [     3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:30:52.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:30:52.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:31:16.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:31:16.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:31:16.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:31:16.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:31:16.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:31:16.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109

Re: gearman error

Posted: Wed Mar 04, 2020 5:39 pm
by tgriep
Those errors may be coming from a worker that cannot connect to the gearman server so run the gearman_top command and verify that all of the workers are in there, if any missing workers, look there.

Or it is a module.conf setting on the gearman server or a setting in one of the worker.conf files on a worker.
I would have to see them to check the settings and well as the output of this command ran on the Nagios sever.

Code: Select all

gearman_top -b

Re: gearman error

Posted: Thu Mar 05, 2020 12:45 pm
by progressive.nagiosXI
Hi,

All details attached and shared in PM ==> tgriep
1) gearman server -module.conf
2)gearman_top -b
3)hostgroup where Jobs Waiting is highest - worker.conf attached

Thanks

Re: gearman error

Posted: Thu Mar 05, 2020 3:18 pm
by tgriep
Edit the worker.conf file and increase this option

Code: Select all

min-worker=50
to

Code: Select all

min-worker=250
Restart the Mod Gearman worker on that server.
Run the gearman_top command and see if it starts to process more checks.

If not, edit the worker file and enable debugging.

Code: Select all

debug=2
Then check this log for errors.

Code: Select all

/var/log/mod_gearman2/mod_gearman_worker.log