I am getting hundreds of checkresults files being left in /usr/local/nagios/var/spool/checkresults I changed the max_check_result_file_age to 600 hoping that would help but it hasn't. Right now I have over 800 files there the oldest of which is more than 48 hours. I end up having to stop nagios, delete all the files, then restart nagios every couple days. Interestingly my max open files is set to almost 800K.
max_check_result_file_age=600
max_check_result_reaper_time=30
[root@lpnagv03 log]# ps -ef |grep 24916
nagios 24916 1 1 Aug02 ? 00:55:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Aug 5 08:56:07 lpnagv03 rsyslogd-2177: imuxsock lost 106 messages from pid 24916 due to rate-limiting
Aug 5 08:56:59 lpnagv03 rsyslogd-2177: imuxsock begins to drop messages from pid 24916 due to rate-limiting
Aug 5 08:57:07 lpnagv03 rsyslogd-2177: imuxsock lost 62 messages from pid 24916 due to rate-limiting
Aug 5 08:57:59 lpnagv03 rsyslogd-2177: imuxsock begins to drop messages from pid 24916 due to rate-limiting
Aug 5 08:58:07 lpnagv03 rsyslogd-2177: imuxsock lost 457 messages from pid 24916 due to rate-limiting
Aug 5 08:59:12 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:17 lpnagv03 nagios: Error: Unable to create temp file for writing status data: Too many open files
Aug 5 08:59:17 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:22 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:27 lpnagv03 nagios: Error: Unable to create temp file for writing status data: Too many open files
Aug 5 08:59:27 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:32 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:37 lpnagv03 nagios: Error: Unable to create temp file for writing status data: Too many open files
Aug 5 08:59:37 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
Aug 5 08:59:42 lpnagv03 nagios: Error: Could not open check result queue directory '/usr/local/nagios/var/spool/checkresults' for reading.
-rw------- 1 nagios users 401 Aug 3 06:18 cqGN9TO
-rw------- 1 nagios users 0 Aug 3 06:18 cqGN9TO.ok
[root@lpnagv03 log]# ls -lt /usr/local/nagios/var/spool/checkresults |wc -l
811
[root@lpnagv03 log]# lsof |wc -l
7568
[root@lpnagv03 log]# sysctl fs.file-nr
fs.file-nr = 3200 0 793765
files hanging around in checkresults directory
Re: files hanging around in checkresults directory
You current ulimits are most likely too low, causing the XI server to run out out of (an artificially defined) set of resources. Follow the steps in the faq for orphaned files to resolve the issue. Remember to run:
after you have made the changes in the faq:
http://support.nagios.com/wiki/index.ph ... g_Orphaned
Code: Select all
ulimit -a http://support.nagios.com/wiki/index.ph ... g_Orphaned
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: files hanging around in checkresults directory
I had found that doc and made those changes a couple weeks ago (I set the open file limit to 10K). What I don't understand is that 800 or so files sit there and never get deleted. That is still a couple hundred short of the 1024 that the open file limit is set to by default.
I only changed the open file limit as a starting point since that is what the system is complaining about. I can change the max locked memory, process and stack if need be too.
[root@lpnagv03 pnewl01]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62829
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I only changed the open file limit as a starting point since that is what the system is complaining about. I can change the max locked memory, process and stack if need be too.
[root@lpnagv03 pnewl01]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62829
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Re: files hanging around in checkresults directory
It could be related to perfdata processing. What is the output of:
Code: Select all
tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.logFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.