gearman error

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
progressive.nagiosXI
Posts: 277
Joined: Mon Jul 31, 2017 5:54 am

gearman error

Post by progressive.nagiosXI »

hi team,

we update nagios core and mod gearman today

we are getting below error log /var/log/gearmand/gearmand.log and disk is going full again and again ,Please help on priority
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691


[root@monitoring-nagiosxi gearmand]# rpm -qa |grep -i gearman
mod_gearman-3.0.7-1.el7.x86_64
gearmand-0.33-7.x86_64
mod_gearman-debuginfo-3.0.7-1.el7.x86_64
gearmand-devel-0.33-7.x86_64
gearmand-server-0.33-7.x86_64


[root@monitoring-nagiosxi gearmand]# /usr/local/nagios/bin/nagios –help

Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Thanks
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: gearman error

Post by Box293 »

What procedure did you use to upgrade?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: gearman error

Post by tgriep »

Find the systemctl file that loads the gearman daemon.

Code: Select all

find / -name gearmand.service
Typically it could be in this folder.

Code: Select all

/etc/systemd/system/multi-user.target.wants/gearmand.service
Add this line in the file

Code: Select all

LimitNOFILE=50000
After the following section

Code: Select all

[Service]
Save the file and run this as root

Code: Select all

systemctl daemon-reload
then run this

Code: Select all

systemctl restart gearmand
Look at the gearmand.log file to see if the error is gone.

You may also have to increase the linux kernel open file limit.
Edit this file

Code: Select all

/etc/security/limits.conf
Add this to the bottom

Code: Select all

* hard nofile 1000000
* soft nofile 1000000
Reboot the server.

Which file or folder has the data that is filling up the drive?
Be sure to check out our Knowledgebase for helpful articles and solutions!
progressive.nagiosXI
Posts: 277
Joined: Mon Jul 31, 2017 5:54 am

Re: gearman error

Post by progressive.nagiosXI »

Hi ,

Thanks tgriep ,
after given changes and observe for 60 min we are not getting below error in /var/log/gearmand/gearmand.log which filled disk 2GB/min.

Code: Select all

ERROR 2020-02-03 20:54:45.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
but now we are getting below error in /var/log/gearmand/gearmand.log

Code: Select all

[root@monitoring-nagiosxi gearmand]# tail -f  gearmand.log
  ERROR 2020-02-04 17:30:52.000000 [     3 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:30:52.000000 [     3 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:30:52.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:30:52.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:31:16.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:31:16.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:31:16.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:31:16.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
  ERROR 2020-02-04 17:31:16.000000 [     2 ] lost connection to client recv(EPIPE || ECONNRESET || EHOSTDOWN)(Connection reset by peer) -> libgearman-server/io.cc:100
  ERROR 2020-02-04 17:31:16.000000 [     2 ] closing connection due to previous errno error -> libgearman-server/io.cc:109
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: gearman error

Post by tgriep »

Those errors may be coming from a worker that cannot connect to the gearman server so run the gearman_top command and verify that all of the workers are in there, if any missing workers, look there.

Or it is a module.conf setting on the gearman server or a setting in one of the worker.conf files on a worker.
I would have to see them to check the settings and well as the output of this command ran on the Nagios sever.

Code: Select all

gearman_top -b
Be sure to check out our Knowledgebase for helpful articles and solutions!
progressive.nagiosXI
Posts: 277
Joined: Mon Jul 31, 2017 5:54 am

Re: gearman error

Post by progressive.nagiosXI »

Hi,

All details attached and shared in PM ==> tgriep
1) gearman server -module.conf
2)gearman_top -b
3)hostgroup where Jobs Waiting is highest - worker.conf attached

Thanks
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: gearman error

Post by tgriep »

Edit the worker.conf file and increase this option

Code: Select all

min-worker=50
to

Code: Select all

min-worker=250
Restart the Mod Gearman worker on that server.
Run the gearman_top command and see if it starts to process more checks.

If not, edit the worker file and enable debugging.

Code: Select all

debug=2
Then check this log for errors.

Code: Select all

/var/log/mod_gearman2/mod_gearman_worker.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked