Nagios Support Forum

Posted: **Wed Feb 03, 2016 10:08 am**

First of all, i'm rather new to nagios and this forum. So please feel free to reassign this if it is in the wrong place and try to be patient if I don't know *ANYTHING*. I've been reading a bunch and I think I understand enough to dig into my issue:

I've inherited an old nagios environment with NSCA implemented in a distributed system on CENT 5. It is integrated into my companies datacenter software so much that we now cannot upgrade it. We are building a new infrastructure to replace it, but for now I'm stuck with it. The configuration is several nagios "distributed" servers running NSCA, sending the check data to a Nagios "Master" cluster. The issue we are having is the check data on the "distributed" servers (which resides in /nagios/var/spool/checkresults) is constantly at 20,000 + files, 90% of which are older than 30 days. I believe this old data may be caused by the servers getting out of sync due to things like network interruption or load spikes/server crashes etc...

What I'm trying to figure out:
1. Is it safe to delete the old files or not?
2. Is there some setting in NSCA that will trim the old data?
3. Is there some setting in NSCA that defines how old data has to be before it will no longer forward it to the "Master" server?
4. Is there some tuning I can do if the files are just expiring before they can be pushed out in time?

I did find this link (http://www.terminalinflection.com/nagio ... ding-nsca/) which is a decent overview and has a link within to an implementation guide but it's for nagios XI and this environment is more like Nagios 2 or 3... I have yet to find any documentation that talks about managing the checkresults directory. Please note that the location of our check results directory may be different from the default

Thanks in advance for any help you can offer!!

Posted: **Wed Feb 03, 2016 6:04 pm**

That folder should be cleaned out after the checks are processed.
Can you post your nagios.cfg file and do you see any errors in the nagios.log file?

Posted: **Thu Feb 04, 2016 1:34 pm**

That folder should be cleaned out after the checks are processed.

Hrm, I'm not sure I've ever seen that directory empty. I moved any file older than 5 days to a backup directory and since I have seen it between 30 and 3k but never empty so maybe it isn't finishing processing and therefore never cleans up?

Can you post your nagios.cfg file

I uploaded it. I also saw a setting for this which I think answer question #3. Please note that the cfg was scrubbed for any sensitive data so where you see a string of hashes it's been edited:

# MAX CHECK RESULT FILE AGE
# This option determines the maximum age (in seconds) which check
# result files are considered to be valid. Files older than this
# threshold will be mercilessly deleted without further processing.

max_check_result_file_age=3600

and do you see any errors in the nagios.log file

assuming this is normal operation:

[1296718567] Finished daemonizing... (New PID=31151)
[1296718627] Caught SIGTERM, shutting down...
[1296718627] Successfully shutdown... (PID=31151)

Other than that the log file is clean

Posted: **Thu Feb 04, 2016 2:17 pm**

Are the files less then an hour old that are remaining in the checkresults directory, or are you seeing an issue with the max_check_result_file_age variable?

Posted: **Thu Feb 04, 2016 3:19 pm**

Are the files less then an hour old that are remaining in the checkresults directory

Well I moved everything this morning, so far I see these which look like they are going to remain in the directory:
-rw------- 1 nagios nagios 301 Feb 4 11:20 checkO0kZs5
-rw------- 1 nagios nagios 301 Feb 4 11:20 checkoYba8F
-rw------- 1 nagios nagios 289 Feb 4 11:19 checkTkQ8X3
-rw------- 1 nagios nagios 296 Feb 4 11:19 checkD2mwna
-rw------- 1 nagios nagios 295 Feb 4 11:18 check4bvxf1
-rw------- 1 nagios nagios 296 Feb 4 11:18 checkaoOs7h

But I also see these, i'm just not sure if they are going to clear or not but they are less than an hour old. Some of them have the .ok extension on them and it never leaves any of those behind as far as I can tell. Keep in mind that this is a small snippet of whats in the directory:
-rw------- 1 nagios nagios 379 Feb 4 14:11 cVqNCkr
-rw------- 1 nagios nagios 0 Feb 4 14:11 cVqNCkr.ok
-rw------- 1 nagios nagios 400 Feb 4 14:11 cxeljtN
-rw------- 1 nagios nagios 0 Feb 4 14:11 cxeljtN.ok
-rw------- 1 nagios nagios 304 Feb 4 14:11 checkdlK2j7
-rw------- 1 nagios nagios 303 Feb 4 14:11 check9GciYH
-rw------- 1 nagios nagios 291 Feb 4 14:11 checkoHOcHs
-rw------- 1 nagios nagios 299 Feb 4 14:11 checkPdRTr6
-rw------- 1 nagios nagios 296 Feb 4 14:10 checkcNevSB
-rw------- 1 nagios nagios 304 Feb 4 14:10 checkt8MBcl
-rw------- 1 nagios nagios 297 Feb 4 14:04 check0I8cW4
-rw------- 1 nagios nagios 305 Feb 4 14:03 check7SCN9J
-rw------- 1 nagios nagios 296 Feb 4 14:03 checkOrW9Z9

are you seeing an issue with the max_check_result_file_age variable?

I'm not sure how I can determine this information, how can I confirm if i'm having an issue with the max_check_result_file_age variable?

Posted: **Thu Feb 04, 2016 10:23 pm**

Try changing the permissions of the checkresults folder:

Code: Select all

chmod -R 770 /usr/local/nagios/var/spool/checkresults
service httpd restart

Posted: **Fri Feb 05, 2016 2:35 pm**

chmod -R 770 /usr/local/nagios/var/spool/checkresults

permissions on the file are already more permissive than that (775)

service httpd restart

did you mean to say restart apache or just a typo or copy/paste error

Posted: **Fri Feb 05, 2016 2:40 pm**

n0b0de wrote:did you mean to say restart apache or just a typo or copy/paste error

It depends on the distro generally.

CentOS/RHEL = httpd
Ubuntu/Debian/... = apache/apache2

Posted: **Sun Feb 07, 2016 7:45 pm**

n0b0de wrote:
chmod -R 770 /usr/local/nagios/var/spool/checkresults
permissions on the file are already more permissive than that (775)

Can you show us what the permissions are currently on the checkresults folder please:

Code: Select all

ls -lad /usr/local/nagios/var/spool/checkresults

n0b0de wrote:
service httpd restart
did you mean to say restart apache or just a typo or copy/paste error

I wanted you to restart apache after making the permissions changes I requested. If apache is producing an error when restarting it then show us the error.

Posted: **Thu Feb 18, 2016 11:32 am**

I wanted you to restart apache after making the permissions changes I requested. If apache is producing an error when restarting it then show us the error.

There seems to be some confusion here. Apache has no errors when starting, I was asking if you made an error but I understand now so it's of no consequence!

Here is the ls of the checkresults directory as requested, but I don't think it's permissions. This directory is being written to all the time. I see it constantly up and down in file count from as many as 4k files and as few as 20-30 (now that I have moved the old ones).

Code: Select all

drwxrwxr-x 2 nagios nagios  18M Feb 18 10:29 checkresults

Nagios Support Forum

NSCA tuning - HUGE checkresults directory

NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory

Re: NSCA tuning - HUGE checkresults directory