files hanging around in checkresults directory

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
pnewlon
Posts: 86
Joined: Mon May 16, 2011 2:19 pm

files hanging around in checkresults directory

Post by pnewlon »

I am getting hundreds of checkresults files being left in /usr/local/nagios/var/spool/checkresults I changed the max_check_result_file_age to 600 hoping that would help but it hasn't. Right now I have over 800 files there the oldest of which is more than 48 hours. I end up having to stop nagios, delete all the files, then restart nagios every couple days. Interestingly my max open files is set to almost 800K.


max_check_result_file_age=600
max_check_result_reaper_time=30


[root@lpnagv03 log]# ps -ef |grep 24916
nagios 24916 1 1 Aug02 ? 00:55:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Aug 5 08:56:07 lpnagv03 rsyslogd-2177: imuxsock lost 106 messages from pid 24916 due to rate-limiting
Aug 5 08:56:59 lpnagv03 rsyslogd-2177: imuxsock begins to drop messages from pid 24916 due to rate-limiting
Aug 5 08:57:07 lpnagv03 rsyslogd-2177: imuxsock lost 62 messages from pid 24916 due to rate-limiting
Aug 5 08:57:59 lpnagv03 rsyslogd-2177: imuxsock begins to drop messages from pid 24916 due to rate-limiting
Aug 5 08:58:07 lpnagv03 rsyslogd-2177: imuxsock lost 457 messages from pid 24916 due to rate-limiting

Aug 5 08:59:12 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:17 lpnagv03 nagios: Error: Unable to create temp file for writing status data: Too many open files
Aug 5 08:59:17 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:22 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:27 lpnagv03 nagios: Error: Unable to create temp file for writing status data: Too many open files
Aug 5 08:59:27 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:32 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:37 lpnagv03 nagios: Error: Unable to create temp file for writing status data: Too many open files
Aug 5 08:59:37 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:42 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.

-rw------- 1 nagios users 401 Aug 3 06:18 cqGN9TO
-rw------- 1 nagios users 0 Aug 3 06:18 cqGN9TO.ok
[root@lpnagv03 log]# ls -lt /usr/local/nagios/var/spool/checkresults |wc -l
811

[root@lpnagv03 log]# lsof |wc -l
7568

[root@lpnagv03 log]# sysctl fs.file-nr
fs.file-nr = 3200 0 793765
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: files hanging around in checkresults directory

Post by abrist »

You current ulimits are most likely too low, causing the XI server to run out out of (an artificially defined) set of resources. Follow the steps in the faq for orphaned files to resolve the issue. Remember to run:

Code: Select all

ulimit -a 
after you have made the changes in the faq:
http://support.nagios.com/wiki/index.ph ... g_Orphaned
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
pnewlon
Posts: 86
Joined: Mon May 16, 2011 2:19 pm

Re: files hanging around in checkresults directory

Post by pnewlon »

I had found that doc and made those changes a couple weeks ago (I set the open file limit to 10K). What I don't understand is that 800 or so files sit there and never get deleted. That is still a couple hundred short of the 1024 that the open file limit is set to by default.

I only changed the open file limit as a starting point since that is what the system is complaining about. I can change the max locked memory, process and stack if need be too.

[root@lpnagv03 pnewl01]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62829
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: files hanging around in checkresults directory

Post by abrist »

It could be related to perfdata processing. What is the output of:

Code: Select all

tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked