Yesterday, 26 Feb 2012, we have experienced very slow response.
So, I browsed /var/log and /usr/local/nagios/var folders to see if there is any indication of issues.
I found a repeated TIMEOUT indications in perfdata.log in /usr/local/nagios/var directory as follow:
2012-02-26 11:53:42 [31473] [0] *** TIMEOUT: Timeout after 5 secs. ***
2012-02-26 11:53:42 [31473] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2012-02-26 11:53:42 [31473] [0] *** TIMEOUT: Please check your npcd.cfg
2012-02-26 11:53:42 [31473] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//host-perfdata.1330284761-PID-31473 deleted
2012-02-26 11:53:42 [31473] [0] *** Timeout while processing Host: "B56-Sec-RR" Service: "_HOST_"
2012-02-26 11:53:42 [31473] [0] *** process_perfdata.pl terminated on signal ALRM
2012-02-26 11:53:55 [31556] [0] *** TIMEOUT: Timeout after 5 secs. ***
2012-02-26 11:53:57 [31556] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2012-02-26 11:53:57 [31556] [0] *** TIMEOUT: Please check your npcd.cfg
2012-02-26 11:53:57 [31556] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//host-perfdata.1330285522-PID-31556 deleted
2012-02-26 11:53:57 [31556] [0] *** Timeout while processing Host: "B44-Sec-RR" Service: "_HOST_"
2012-02-26 11:53:57 [31556] [0] *** process_perfdata.pl terminated on signal ALRM
I have seen similar TIMEOUT once a while, but it happened much more frequently yesterday.
Is this related to the slow response of Nagios XI somehow?
If yes, what would be the cause of the problem and how to mitigate it?
TIMEOUT message in /usr/local/nagios/var/perfdata.log
Re: TIMEOUT message in /usr/local/nagios/var/perfdata.log
I would start by editing the following files:
/usr/local/nagios/etc/pnp/process_perfdata.cfg
/usr/local/nagios/etc/pnp/npcd.cfg
If you set logging to 0 in both files you'll also notice a performance increase.
This can sometimes happen if there are a lot of files in the /usr/local/nagios/var/spool/perfdata directory. The directory scan for results can backup the processing queue, and then things can just snowball from there. Changing the configs above should prevent it in the future, but if you notice the issue persisting, you may need to clear the contents in the /usr/local/nagios/var/spool/perfdata directory so the system can catch up.
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Code: Select all
TIMEOUT = 15Code: Select all
sleep_time = 10This can sometimes happen if there are a lot of files in the /usr/local/nagios/var/spool/perfdata directory. The directory scan for results can backup the processing queue, and then things can just snowball from there. Changing the configs above should prevent it in the future, but if you notice the issue persisting, you may need to clear the contents in the /usr/local/nagios/var/spool/perfdata directory so the system can catch up.