NSCA tuning - HUGE checkresults directory

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
n0b0de
Posts: 5
Joined: Wed Feb 03, 2016 10:00 am

NSCA tuning - HUGE checkresults directory

Post by n0b0de »

First of all, i'm rather new to nagios and this forum. So please feel free to reassign this if it is in the wrong place and try to be patient if I don't know *ANYTHING*. I've been reading a bunch and I think I understand enough to dig into my issue:

I've inherited an old nagios environment with NSCA implemented in a distributed system on CENT 5. It is integrated into my companies datacenter software so much that we now cannot upgrade it. We are building a new infrastructure to replace it, but for now I'm stuck with it. The configuration is several nagios "distributed" servers running NSCA, sending the check data to a Nagios "Master" cluster. The issue we are having is the check data on the "distributed" servers (which resides in /nagios/var/spool/checkresults) is constantly at 20,000 + files, 90% of which are older than 30 days. I believe this old data may be caused by the servers getting out of sync due to things like network interruption or load spikes/server crashes etc...

What I'm trying to figure out:
1. Is it safe to delete the old files or not?
2. Is there some setting in NSCA that will trim the old data?
3. Is there some setting in NSCA that defines how old data has to be before it will no longer forward it to the "Master" server?
4. Is there some tuning I can do if the files are just expiring before they can be pushed out in time?

I did find this link (http://www.terminalinflection.com/nagio ... ding-nsca/) which is a decent overview and has a link within to an implementation guide but it's for nagios XI and this environment is more like Nagios 2 or 3... I have yet to find any documentation that talks about managing the checkresults directory. Please note that the location of our check results directory may be different from the default

Thanks in advance for any help you can offer!!
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NSCA tuning - HUGE checkresults directory

Post by tgriep »

That folder should be cleaned out after the checks are processed.
Can you post your nagios.cfg file and do you see any errors in the nagios.log file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
n0b0de
Posts: 5
Joined: Wed Feb 03, 2016 10:00 am

Re: NSCA tuning - HUGE checkresults directory

Post by n0b0de »

That folder should be cleaned out after the checks are processed.
Hrm, I'm not sure I've ever seen that directory empty. I moved any file older than 5 days to a backup directory and since I have seen it between 30 and 3k but never empty so maybe it isn't finishing processing and therefore never cleans up?
Can you post your nagios.cfg file
I uploaded it. I also saw a setting for this which I think answer question #3. Please note that the cfg was scrubbed for any sensitive data so where you see a string of hashes it's been edited:

# MAX CHECK RESULT FILE AGE
# This option determines the maximum age (in seconds) which check
# result files are considered to be valid. Files older than this
# threshold will be mercilessly deleted without further processing.

max_check_result_file_age=3600
and do you see any errors in the nagios.log file
assuming this is normal operation:

[1296718567] Finished daemonizing... (New PID=31151)
[1296718627] Caught SIGTERM, shutting down...
[1296718627] Successfully shutdown... (PID=31151)

Other than that the log file is clean
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: NSCA tuning - HUGE checkresults directory

Post by rkennedy »

Are the files less then an hour old that are remaining in the checkresults directory, or are you seeing an issue with the max_check_result_file_age variable?
Former Nagios Employee
n0b0de
Posts: 5
Joined: Wed Feb 03, 2016 10:00 am

Re: NSCA tuning - HUGE checkresults directory

Post by n0b0de »

Are the files less then an hour old that are remaining in the checkresults directory
Well I moved everything this morning, so far I see these which look like they are going to remain in the directory:
-rw------- 1 nagios nagios 301 Feb 4 11:20 checkO0kZs5
-rw------- 1 nagios nagios 301 Feb 4 11:20 checkoYba8F
-rw------- 1 nagios nagios 289 Feb 4 11:19 checkTkQ8X3
-rw------- 1 nagios nagios 296 Feb 4 11:19 checkD2mwna
-rw------- 1 nagios nagios 295 Feb 4 11:18 check4bvxf1
-rw------- 1 nagios nagios 296 Feb 4 11:18 checkaoOs7h

But I also see these, i'm just not sure if they are going to clear or not but they are less than an hour old. Some of them have the .ok extension on them and it never leaves any of those behind as far as I can tell. Keep in mind that this is a small snippet of whats in the directory:
-rw------- 1 nagios nagios 379 Feb 4 14:11 cVqNCkr
-rw------- 1 nagios nagios 0 Feb 4 14:11 cVqNCkr.ok
-rw------- 1 nagios nagios 400 Feb 4 14:11 cxeljtN
-rw------- 1 nagios nagios 0 Feb 4 14:11 cxeljtN.ok
-rw------- 1 nagios nagios 304 Feb 4 14:11 checkdlK2j7
-rw------- 1 nagios nagios 303 Feb 4 14:11 check9GciYH
-rw------- 1 nagios nagios 291 Feb 4 14:11 checkoHOcHs
-rw------- 1 nagios nagios 299 Feb 4 14:11 checkPdRTr6
-rw------- 1 nagios nagios 296 Feb 4 14:10 checkcNevSB
-rw------- 1 nagios nagios 304 Feb 4 14:10 checkt8MBcl
-rw------- 1 nagios nagios 297 Feb 4 14:04 check0I8cW4
-rw------- 1 nagios nagios 305 Feb 4 14:03 check7SCN9J
-rw------- 1 nagios nagios 296 Feb 4 14:03 checkOrW9Z9

are you seeing an issue with the max_check_result_file_age variable?
I'm not sure how I can determine this information, how can I confirm if i'm having an issue with the max_check_result_file_age variable?
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: NSCA tuning - HUGE checkresults directory

Post by Box293 »

Try changing the permissions of the checkresults folder:

Code: Select all

chmod -R 770 /usr/local/nagios/var/spool/checkresults
service httpd restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
n0b0de
Posts: 5
Joined: Wed Feb 03, 2016 10:00 am

Re: NSCA tuning - HUGE checkresults directory

Post by n0b0de »

chmod -R 770 /usr/local/nagios/var/spool/checkresults
permissions on the file are already more permissive than that (775)
service httpd restart
did you mean to say restart apache or just a typo or copy/paste error
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: NSCA tuning - HUGE checkresults directory

Post by hsmith »

n0b0de wrote:did you mean to say restart apache or just a typo or copy/paste error
It depends on the distro generally.

CentOS/RHEL = httpd
Ubuntu/Debian/... = apache/apache2
Former Nagios Employee.
me.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: NSCA tuning - HUGE checkresults directory

Post by Box293 »

n0b0de wrote:
chmod -R 770 /usr/local/nagios/var/spool/checkresults
permissions on the file are already more permissive than that (775)
Can you show us what the permissions are currently on the checkresults folder please:

Code: Select all

ls -lad /usr/local/nagios/var/spool/checkresults
n0b0de wrote:
service httpd restart
did you mean to say restart apache or just a typo or copy/paste error
I wanted you to restart apache after making the permissions changes I requested. If apache is producing an error when restarting it then show us the error.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
n0b0de
Posts: 5
Joined: Wed Feb 03, 2016 10:00 am

Re: NSCA tuning - HUGE checkresults directory

Post by n0b0de »

I wanted you to restart apache after making the permissions changes I requested. If apache is producing an error when restarting it then show us the error.
There seems to be some confusion here. Apache has no errors when starting, I was asking if you made an error but I understand now so it's of no consequence!

Here is the ls of the checkresults directory as requested, but I don't think it's permissions. This directory is being written to all the time. I see it constantly up and down in file count from as many as 4k files and as few as 20-30 (now that I have moved the old ones).

Code: Select all

drwxrwxr-x 2 nagios nagios  18M Feb 18 10:29 checkresults
Locked