gearman : Too many open files
Posted: Wed Nov 30, 2016 11:58 am
Hello everybody,
We are running a distributed monitoring with nagios and gearman solution.
Sometimes, we face the following issue :
On the thruk interface, the nagios hosts are down and the nagios services keep their status before the issue.
The message for host check :
It looks like this issue : https://support.nagios.com/forum/viewto ... =6&t=31202
In the /var/log/messages log file, we get this kind of error messages :
in the gearmand log :
Nagios version : Nagios Core 4.1.1
gearmand version : gearmand 0.33
most mod gearman version : version 1.4_nagios4 running on libgearman 1.1.12
The limit of open files for gearmand on the nagios server (/etc/security/limits.conf) :
Maybe, we have to monitor the number of open files by gearmand.
Our workaround is to restart gearmand and nagios core.
Sorry for my english, could you please help ?
We are running a distributed monitoring with nagios and gearman solution.
Sometimes, we face the following issue :
On the thruk interface, the nagios hosts are down and the nagios services keep their status before the issue.
The message for host check :
Code: Select all
host check orphaned, is the mod-gearman worker on queue 'XXXXX' running?
In the /var/log/messages log file, we get this kind of error messages :
Code: Select all
Nov 30 08:58:43 marmara nagios: Warning: The check of service 'Lustre OSS health' on host 'scratchopera-oss2spg01a' looks like it was orphaned (results never came back; last_check=1480441715; next_check=1480492003). I'm scheduling an immediate check of the service...
Nov 30 08:58:43 marmara nagios: Warning: The check of service 'NTP' on host 'scratchopera-oss2spg01b' looks like it was orphaned (results never came back; last_check=1480440279; next_check=1480492003). I'm scheduling an immediate check of the service...
Nov 30 08:58:43 marmara nagios: Warning: The check of service 'disks state' on host 'taiwan060' looks like it was orphaned (results never came back; last_check=1480441706; next_check=1480492003). I'm scheduling an immediate check of the service...
Nov 30 08:58:43 marmara nagios: Warning: The check of service '[META] Service' on host 'vavau' looks like it was orphaned (results never came back; last_check=1480443879; next_check=1480492003). I'm scheduling an immediate check of the service...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-cisco-1' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-cisco-3' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-cisco-4' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-cisco-6' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-cisco-7' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-cisco-8' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-e2600-a' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-qlogic-1' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Nov 30 08:59:42 marmara nagios: Warning: The check of host 'oleron-qlogic-2' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host...
Code: Select all
576808 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
576809 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
576810 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
576811 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
576812 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
576813 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
576814 ERROR 2016-10-29 17:34:40.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:691
Nagios version : Nagios Core 4.1.1
gearmand version : gearmand 0.33
most mod gearman version : version 1.4_nagios4 running on libgearman 1.1.12
The limit of open files for gearmand on the nagios server (/etc/security/limits.conf) :
Code: Select all
gearmand hard nofile 22000
gearmand soft nofile 22000
Our workaround is to restart gearmand and nagios core.
Sorry for my english, could you please help ?