Page 1 of 2

check creating temp files are getting failed

Posted: Tue Nov 07, 2017 7:09 am
by padu_3891
Hello All,

For last one hour all the checks related to vmware monitoring started triggering alerts with error message (Return code of 24 is out of bounds) ,(Warning: Return code of 24 for check of service 'VMW_CPU' on host 'XXXXX' was out of bounds.)

When i check the system /var/log/messages i could see the below error messages . I am new to this error . could you please check below error message and advice me to rectify it .
nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading
nagios: Error: Unable to create temp file for writing status data: Too many open files

Re: check creating temp files are getting failed

Posted: Tue Nov 07, 2017 2:41 pm
by kyang
Some systems return error 24 if there are too many files open.

What OS? Which version of Nagios core?

Could you show us your nagios.log

Code: Select all

tail -50 /usr/local/nagios/var/nagios.log 
Does restarting nagios fix the issue?

Code: Select all

service nagios restart

Re: check creating temp files are getting failed

Posted: Wed Nov 08, 2017 10:06 am
by padu_3891
Hi kyang

* Yes restarting the nagios fixing the issue ..

Now i am getting the issue every 5 hours once .

OS verision :CentOS release 6.4 (Final)
Nagios core version : NagiosĀ® Coreā„¢ 3.5.1


Lot of --- " Unable to establish communication with Agent" errors
and finally none of client getting pinged from the nagios server .. all the Host unreachable

Some notable errors from nagios.log
[1510147925] livestatus: Cannot set FD_CLOEXEC on client socket: Bad file descriptor
[1510147925] livestatus: Cannot set FD_CLOEXEC on client socket: Bad file descriptor
LOT of ---- fork() error: 'Resource temporarily unavailable
[1510147857] Warning: The check of service 'EVENTLOG-System' on host 'XXXX' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.
[1510147857] Warning: The check of service 'Eventlog-system' on host 'XXXX' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.
[1510147857] Warning: The check of service 'EVENTLOG-System' on host 'XXXX' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.
[1510147857] Warning: The check of service 'EVENTLOG-System' on host 'XXXX' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.
[1510147857] Warning: The check of service 'EVENTLOG-System' on host 'XXXX' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.


Please help me to fix this issue .

Re: check creating temp files are getting failed

Posted: Thu Nov 09, 2017 3:37 pm
by npolovenko
@padu_3891, Do you monitor RAM usage on the Nagios server by chance? I'd like to see if the RAM gets maxed out because of this new check. Also what plugin are you using to monitor VMWare? Let's make sure that you're using the latest version.

Re: check creating temp files are getting failed

Posted: Fri Nov 10, 2017 2:58 am
by padu_3891
Dubled ram and cpu..

i am in mid of troubleshoofting ..File descriptor issue and vmware issues are fixed now .

But the main issue is my nagios server is a VM .. intermitentantly it was unable to ping larger number of servers..from different vlan diff are netwroks .

Will it be issue with my nagios server or issue with OS (Centos 6.4) but all the unreachable servers are up and running fine .

tracerout last till the last destination IP and geting ** .

Re: check creating temp files are getting failed

Posted: Fri Nov 10, 2017 2:13 pm
by npolovenko
@padu_3891, Can you ping the unreachable servers from your Nagios XI VM? You may need to clean the routing tables on routers.

Re: check creating temp files are getting failed

Posted: Mon Nov 13, 2017 4:55 am
by padu_3891
@npolovenko during for the particular period period may be for a minute or two the servers are unreachable . and durng the next check its getting reached .

Tracerout to that target IP has some * * * . WHen i check with network team , they stating like "All other virtual machines in same VLAN are running fine without any ping loss " they want me to check what is the issue with nagios servers .

Also i got this error message again at midnight 23:59:59 and so there were many alerts triggerd for esxi servers checks .
Nov 12 23:59:45 XXXX nagios: Error: Unable to create temp file for writing status data: Too many open files
Nov 12 23:59:46 XXXX xinetd[1880]: EXIT: livestatus status=0 pid=12687 duration=7(sec)
Nov 12 23:59:54 XXXX xinetd[1880]: START: livestatus pid=12805 from=::ffff:10.10.8.232
Nov 12 23:59:55 XXXX nagios: Error: Unable to create temp file for writing status data: Too many open files
Nov 12 23:59:58 XXXX xinetd[1880]: EXIT: livestatus status=0 pid=12805 duration=4(sec)
Due to the above error therewere again lot of alerts like "return code 24 is out of bound " " CRITICAL - could not find Net::SNMP module, wrong device"
Also "nagios: Error: Unable to create temp file for writing status data: Too many open files"

Re: check creating temp files are getting failed

Posted: Mon Nov 13, 2017 6:00 pm
by npolovenko
@padu_3891, Can you post the output of this conf file? You might need to increase the open file limits.

Code: Select all

/etc/security/limits.conf

Re: check creating temp files are getting failed

Posted: Tue Nov 28, 2017 10:39 am
by padu_3891
for nagios user the limit was set to 1024 ..

i doubled it now ..

also i found that Vmotion of my nagios VM between esxi servers made many ping drop issue and that was the reason for all these issues .

now the VM restricted to one esxi and theissue fixed .

also LSOF occupying more than 200 some time which i need to look into .

Re: check creating temp files are getting failed

Posted: Tue Nov 28, 2017 5:09 pm
by npolovenko
@padu_3891, I'm glad you figured this out. Do you have any other questions before I close this thread?