NSCA tuning - HUGE checkresults directory
NSCA tuning - HUGE checkresults directory
First of all, i'm rather new to nagios and this forum. So please feel free to reassign this if it is in the wrong place and try to be patient if I don't know *ANYTHING*. I've been reading a bunch and I think I understand enough to dig into my issue:
I've inherited an old nagios environment with NSCA implemented in a distributed system on CENT 5. It is integrated into my companies datacenter software so much that we now cannot upgrade it. We are building a new infrastructure to replace it, but for now I'm stuck with it. The configuration is several nagios "distributed" servers running NSCA, sending the check data to a Nagios "Master" cluster. The issue we are having is the check data on the "distributed" servers (which resides in /nagios/var/spool/checkresults) is constantly at 20,000 + files, 90% of which are older than 30 days. I believe this old data may be caused by the servers getting out of sync due to things like network interruption or load spikes/server crashes etc...
What I'm trying to figure out:
1. Is it safe to delete the old files or not?
2. Is there some setting in NSCA that will trim the old data?
3. Is there some setting in NSCA that defines how old data has to be before it will no longer forward it to the "Master" server?
4. Is there some tuning I can do if the files are just expiring before they can be pushed out in time?
I did find this link (http://www.terminalinflection.com/nagio ... ding-nsca/) which is a decent overview and has a link within to an implementation guide but it's for nagios XI and this environment is more like Nagios 2 or 3... I have yet to find any documentation that talks about managing the checkresults directory. Please note that the location of our check results directory may be different from the default
Thanks in advance for any help you can offer!!
I've inherited an old nagios environment with NSCA implemented in a distributed system on CENT 5. It is integrated into my companies datacenter software so much that we now cannot upgrade it. We are building a new infrastructure to replace it, but for now I'm stuck with it. The configuration is several nagios "distributed" servers running NSCA, sending the check data to a Nagios "Master" cluster. The issue we are having is the check data on the "distributed" servers (which resides in /nagios/var/spool/checkresults) is constantly at 20,000 + files, 90% of which are older than 30 days. I believe this old data may be caused by the servers getting out of sync due to things like network interruption or load spikes/server crashes etc...
What I'm trying to figure out:
1. Is it safe to delete the old files or not?
2. Is there some setting in NSCA that will trim the old data?
3. Is there some setting in NSCA that defines how old data has to be before it will no longer forward it to the "Master" server?
4. Is there some tuning I can do if the files are just expiring before they can be pushed out in time?
I did find this link (http://www.terminalinflection.com/nagio ... ding-nsca/) which is a decent overview and has a link within to an implementation guide but it's for nagios XI and this environment is more like Nagios 2 or 3... I have yet to find any documentation that talks about managing the checkresults directory. Please note that the location of our check results directory may be different from the default
Thanks in advance for any help you can offer!!
Re: NSCA tuning - HUGE checkresults directory
That folder should be cleaned out after the checks are processed.
Can you post your nagios.cfg file and do you see any errors in the nagios.log file?
Can you post your nagios.cfg file and do you see any errors in the nagios.log file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: NSCA tuning - HUGE checkresults directory
Hrm, I'm not sure I've ever seen that directory empty. I moved any file older than 5 days to a backup directory and since I have seen it between 30 and 3k but never empty so maybe it isn't finishing processing and therefore never cleans up?That folder should be cleaned out after the checks are processed.
I uploaded it. I also saw a setting for this which I think answer question #3. Please note that the cfg was scrubbed for any sensitive data so where you see a string of hashes it's been edited:Can you post your nagios.cfg file
# MAX CHECK RESULT FILE AGE
# This option determines the maximum age (in seconds) which check
# result files are considered to be valid. Files older than this
# threshold will be mercilessly deleted without further processing.
max_check_result_file_age=3600
assuming this is normal operation:and do you see any errors in the nagios.log file
[1296718567] Finished daemonizing... (New PID=31151)
[1296718627] Caught SIGTERM, shutting down...
[1296718627] Successfully shutdown... (PID=31151)
Other than that the log file is clean
Re: NSCA tuning - HUGE checkresults directory
Are the files less then an hour old that are remaining in the checkresults directory, or are you seeing an issue with the max_check_result_file_age variable?
Former Nagios Employee
Re: NSCA tuning - HUGE checkresults directory
Well I moved everything this morning, so far I see these which look like they are going to remain in the directory:Are the files less then an hour old that are remaining in the checkresults directory
-rw------- 1 nagios nagios 301 Feb 4 11:20 checkO0kZs5
-rw------- 1 nagios nagios 301 Feb 4 11:20 checkoYba8F
-rw------- 1 nagios nagios 289 Feb 4 11:19 checkTkQ8X3
-rw------- 1 nagios nagios 296 Feb 4 11:19 checkD2mwna
-rw------- 1 nagios nagios 295 Feb 4 11:18 check4bvxf1
-rw------- 1 nagios nagios 296 Feb 4 11:18 checkaoOs7h
But I also see these, i'm just not sure if they are going to clear or not but they are less than an hour old. Some of them have the .ok extension on them and it never leaves any of those behind as far as I can tell. Keep in mind that this is a small snippet of whats in the directory:
-rw------- 1 nagios nagios 379 Feb 4 14:11 cVqNCkr
-rw------- 1 nagios nagios 0 Feb 4 14:11 cVqNCkr.ok
-rw------- 1 nagios nagios 400 Feb 4 14:11 cxeljtN
-rw------- 1 nagios nagios 0 Feb 4 14:11 cxeljtN.ok
-rw------- 1 nagios nagios 304 Feb 4 14:11 checkdlK2j7
-rw------- 1 nagios nagios 303 Feb 4 14:11 check9GciYH
-rw------- 1 nagios nagios 291 Feb 4 14:11 checkoHOcHs
-rw------- 1 nagios nagios 299 Feb 4 14:11 checkPdRTr6
-rw------- 1 nagios nagios 296 Feb 4 14:10 checkcNevSB
-rw------- 1 nagios nagios 304 Feb 4 14:10 checkt8MBcl
-rw------- 1 nagios nagios 297 Feb 4 14:04 check0I8cW4
-rw------- 1 nagios nagios 305 Feb 4 14:03 check7SCN9J
-rw------- 1 nagios nagios 296 Feb 4 14:03 checkOrW9Z9
I'm not sure how I can determine this information, how can I confirm if i'm having an issue with the max_check_result_file_age variable?are you seeing an issue with the max_check_result_file_age variable?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: NSCA tuning - HUGE checkresults directory
Try changing the permissions of the checkresults folder:
Code: Select all
chmod -R 770 /usr/local/nagios/var/spool/checkresults
service httpd restartAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: NSCA tuning - HUGE checkresults directory
permissions on the file are already more permissive than that (775)chmod -R 770 /usr/local/nagios/var/spool/checkresults
did you mean to say restart apache or just a typo or copy/paste errorservice httpd restart
Re: NSCA tuning - HUGE checkresults directory
It depends on the distro generally.n0b0de wrote:did you mean to say restart apache or just a typo or copy/paste error
CentOS/RHEL = httpd
Ubuntu/Debian/... = apache/apache2
Former Nagios Employee.
me.
me.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: NSCA tuning - HUGE checkresults directory
Can you show us what the permissions are currently on the checkresults folder please:n0b0de wrote:permissions on the file are already more permissive than that (775)chmod -R 770 /usr/local/nagios/var/spool/checkresults
Code: Select all
ls -lad /usr/local/nagios/var/spool/checkresultsI wanted you to restart apache after making the permissions changes I requested. If apache is producing an error when restarting it then show us the error.n0b0de wrote:did you mean to say restart apache or just a typo or copy/paste errorservice httpd restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: NSCA tuning - HUGE checkresults directory
There seems to be some confusion here. Apache has no errors when starting, I was asking if you made an error but I understand now so it's of no consequence!I wanted you to restart apache after making the permissions changes I requested. If apache is producing an error when restarting it then show us the error.
Here is the ls of the checkresults directory as requested, but I don't think it's permissions. This directory is being written to all the time. I see it constantly up and down in file count from as many as 4k files and as few as 20-30 (now that I have moved the old ones).
Code: Select all
drwxrwxr-x 2 nagios nagios 18M Feb 18 10:29 checkresults