Page 2 of 3

Re: Eval extended but data collection stopped - why?

Posted: Fri Oct 11, 2013 10:59 am
by joe.ward
Output of the commands is in the attached.

/usr/local/nagios/var/perfdata.log does not exist! I ran a 'find' from /usr/local and no matches were found for perfdata.log.

Re: Eval extended but data collection stopped - why?

Posted: Fri Oct 11, 2013 12:50 pm
by sreinhardt
Well, initially I see that you do have some permissions issues. Try the following as you should have no files in the directory being changed below once they are reaped.

Code: Select all

chown -R nagios.nagios /usr/local/nagios/var/spool/checkresults/
chmod -R 775 /usr/local/nagios/var/spool/checkresults/
service npcd restart

ll -d /usr/local/nagios/var/spool/checkresults/
ll -d /usr/local/nagios/var/spool/
ll -d /usr/local/nagios/var/

Re: Eval extended but data collection stopped - why?

Posted: Fri Oct 11, 2013 7:22 pm
by joe.ward
Still no performance graphs :-(

# ll -d /usr/local/nagios/var/spool/checkresults/
drwxrwxr-x 2 nagios nagios 12288 Oct 11 20:21 /usr/local/nagios/var/spool/checkresults/

# ll -d /usr/local/nagios/var/spool
drwxr-xr-x 5 nagios nagios 4096 Aug 7 13:34 /usr/local/nagios/var/spool

# ll -d /usr/local/nagios/var
drwxrwxr-x 6 nagios nagios 4096 Oct 11 20:21 /usr/local/nagios/var

Re: Eval extended but data collection stopped - why?

Posted: Mon Oct 14, 2013 10:58 am
by abrist
You may not have logging enabled for perfdata. Lets do so:
Edit:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:

Code: Select all

LOG_LEVEL = 0
to:

Code: Select all

LOG_LEVEL = 1
Edit:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
Change:

Code: Select all

log_level = 0
To:

Code: Select all

log_level = 1
Restart npcd:

Code: Select all

service npcd restart
Now wait 15 to 20 minutes and then tail the logs:

Code: Select all

tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
Post the output of these logs here in code warps.

Re: Eval extended but data collection stopped - why?

Posted: Tue Oct 22, 2013 8:42 am
by joe.ward
I apologize for the slow response. I have been out of the office for a while :-)

The two log levels you mentioned were set to 0 so I set them to 1

npcd.log shows repeatedly: "NPCD: No more files to process... waiting for 15 seconds"

perfdata.log does not exist.

This all 'broke' when the original evaluation expired even though a new evaluation license had been install during the week before. The extended evaluation license expires in 10 days.

What next?

Re: Eval extended but data collection stopped - why?

Posted: Tue Oct 22, 2013 3:20 pm
by nscott
Well this got stranger with every post I read. That means its time to take it back to basics. Is there any failure or warning indicated in /usr/local/nagios/var/nagios.log. Are you seeing actual things happening in this log?

Re: Eval extended but data collection stopped - why?

Posted: Mon Oct 28, 2013 2:58 pm
by joe.ward
The log file looks fine. 2300+ lines since the first date I see in the log, about 24 hours ago. The messages are alerts (up, down, flapping, etc).

This all started when the evaluation was extended. I followed the steps that came with the new license file. The alerts still work but performance graphs are blank. There are 3 days left on the extended evaluation...

Re: Eval extended but data collection stopped - why?

Posted: Tue Oct 29, 2013 11:12 am
by lmiltchev
Can you run the following commands, and show us the output?

Code: Select all

ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l

Re: Eval extended but data collection stopped - why?

Posted: Tue Oct 29, 2013 2:18 pm
by joe.ward
Folder listings:
  • $ cd /usr/local/nagios/var/spool
    $ ls -l
    total 12668
    drwxrwxr-x 2 nagios nagios 12288 Oct 29 15:14 checkresults
    drwxr-xr-x 2 nagios nagios 4096 Oct 6 20:00 perfdata
    drwxr-xr-x 2 nagios nagios 12951552 Oct 29 15:14 xidpe
  • $ ls xidpe | wc -l
    262566
    $ ls perfdata | wc -l
    0
    $ ls checkresults | wc -l
    20

Re: Eval extended but data collection stopped - why?

Posted: Tue Oct 29, 2013 3:43 pm
by abrist
xidpe has 200k+ files. We can get this server back on track by removing those files as long as you are not concerned with the loss of the historical performance data from those pending checksresults. You most likely cannot remove the files with a standard 'rm' command as it will fail on stat() due to the number of files. You can remove them with the following commands (WARNING! this is destructive, make sure you understand the implications and type the commands exactly as shown):

Code: Select all

cd /usr/local/nagios/var/spool/xidpe/
find . -type f -delete
And then, try to restart npcd, wait 15 minutes, and then check for new perfdata on your graphs or any errors in the npcd or perfdata logs.

Code: Select all

service npcd restart