Page 1 of 4

High CPU/Load on npcd

Posted: Mon Jun 08, 2015 8:37 am
by cambevis
Hello All,

For the last week, we've been experiencing high CPU/loads on our nagios server, with the attendant lines spamming my npcd.log file

[06-08-2015 08:36:07] NPCD: ERROR: Executed command exits with return code '2'
[06-08-2015 08:36:07] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1433156650.perfdata.service'

Any ideas how I further debug this?

Thanks

Cam

Re: High CPU/Load on npcd

Posted: Mon Jun 08, 2015 9:17 am
by tmcdonald
What version of XI are you using? What is the load on the server?

Re: High CPU/Load on npcd

Posted: Mon Jun 08, 2015 9:26 am
by cambevis
Sorry, duh. I'm running 2014R2.7 on Centos 6.6

The CPU is pretty much maxed any time npcd is running and the load will be between 2 and 4.

I have turned on debug logging for npcd, but it means very little to me.

[root@nagiosxi var]# less npcd.log.debug
[06-08-2015 09:19:21] NPCD: npcd Daemon (0.4.14) started with PID=30133
[06-08-2015 09:19:21] NPCD: Please have a look at 'npcd -V' to get license information
[06-08-2015 09:19:21] NPCD: HINT: load_threshold is enabled - ('10.000000')
[06-08-2015 09:19:21] NPCD: Found 264761 files in /usr/local/nagios/var/spool/perfdata/
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 0/5 File is .
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 0/5 File is ..
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 0/5 File is 1343078689.perfdata.service-PID-1925
[06-08-2015 09:19:21] NPCD: File '1343078689.perfdata.service-PID-1925' is an already in process PNP file. Leaving it untouched.
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 0/5 File is 1345654160.perfdata.host-PID-2007
[06-08-2015 09:19:21] NPCD: File '1345654160.perfdata.host-PID-2007' is an already in process PNP file. Leaving it untouched.
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 0/5 File is 1381343061.perfdata.service-PID-21924
[06-08-2015 09:19:21] NPCD: File '1381343061.perfdata.service-PID-21924' is an already in process PNP file. Leaving it untouched.
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 0/5 File is 1431787362.perfdata.host
[06-08-2015 09:19:21] NPCD: Regular File: 1431787362.perfdata.host
[06-08-2015 09:19:21] NPCD: A thread was started on thread_counter = 0
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: Processing file 1431787362.perfdata.host with ID 139914028746496 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787362.perfdata.host
[06-08-2015 09:19:21] NPCD: ThreadCounter 1/5 File is 1431787362.perfdata.service
[06-08-2015 09:19:21] NPCD: Processing file '1431787362.perfdata.host'
[06-08-2015 09:19:21] NPCD: Regular File: 1431787362.perfdata.service
[06-08-2015 09:19:21] NPCD: A thread was started on thread_counter = 1
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 2/5 File is 1431787377.perfdata.host
[06-08-2015 09:19:21] NPCD: Regular File: 1431787377.perfdata.host
[06-08-2015 09:19:21] NPCD: A thread was started on thread_counter = 2
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 3/5 File is 1431787377.perfdata.service
[06-08-2015 09:19:21] NPCD: Regular File: 1431787377.perfdata.service
[06-08-2015 09:19:21] NPCD: A thread was started on thread_counter = 3
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 4/5 File is 1431787392.perfdata.host
[06-08-2015 09:19:21] NPCD: Regular File: 1431787392.perfdata.host
[06-08-2015 09:19:21] NPCD: A thread was started on thread_counter = 4
[06-08-2015 09:19:21] NPCD: DEBUG: load 0.130000/10.000000
[06-08-2015 09:19:21] NPCD: ThreadCounter 5/5 File is 1431787392.perfdata.service
[06-08-2015 09:19:21] NPCD: Regular File: 1431787392.perfdata.service
[06-08-2015 09:19:21] NPCD: WARN: MAX Thread reached: 1431787392.perfdata.service comes later with ThreadCounter: 5
[06-08-2015 09:19:21] NPCD: DEBUG: Will wait for th['4']
[06-08-2015 09:19:21] NPCD: ERROR: Executed command exits with return code '2'
[06-08-2015 09:19:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787362.perfdata.host'
[06-08-2015 09:19:21] NPCD: Processing file 1431787362.perfdata.service with ID 139914018256640 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787362.perfdata.service
[06-08-2015 09:19:21] NPCD: Processing file '1431787362.perfdata.service'
[06-08-2015 09:19:21] NPCD: Processing file 1431787377.perfdata.service with ID 139913928374016 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787377.perfdata.service
[06-08-2015 09:19:21] NPCD: Processing file '1431787377.perfdata.service'
[06-08-2015 09:19:21] NPCD: Processing file 1431787392.perfdata.host with ID 139913917884160 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787392.perfdata.host
[06-08-2015 09:19:21] NPCD: Processing file '1431787392.perfdata.host'
[06-08-2015 09:19:21] NPCD: Processing file 1431787377.perfdata.host with ID 139914007766784 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787377.perfdata.host
[06-08-2015 09:19:21] NPCD: Processing file '1431787377.perfdata.host'
[06-08-2015 09:19:21] NPCD: ERROR: Executed command exits with return code '2'
[06-08-2015 09:19:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787362.perfdata.service'
[06-08-2015 09:19:21] NPCD: ERROR: Executed command exits with return code '2'
[06-08-2015 09:19:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787392.perfdata.host'
[06-08-2015 09:19:21] NPCD: ERROR: Executed command exits with return code '2'
[06-08-2015 09:19:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1431787377.perfdata.service'

Re: High CPU/Load on npcd

Posted: Mon Jun 08, 2015 10:49 am
by abrist
Well, I don't actually think your issues is with npcd load usage, as it looks quite low in those logs. Could you enable and increase the debug level for process_perfdata:
https://support.nagios.com/wiki/index.p ... h_Problems
And then post a large tail of the perfdata log:

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log

Re: High CPU/Load on npcd

Posted: Mon Jun 08, 2015 1:36 pm
by cambevis
Done, sir. But the whole log looks pretty much like this, unremarkable (shrug)..

Thanks!

Cam


2015-06-08 13:32:32 [19578] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822702.perfdata.host
2015-06-08 13:32:32 [19578] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822702.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822702.perfdata.host-PID-19578 for bulk update
2015-06-08 13:32:32 [19580] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19579] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19580] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19580] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822717.perfdata.host
2015-06-08 13:32:32 [19580] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822717.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822717.perfdata.host-PID-19580 for bulk update
2015-06-08 13:32:32 [19579] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19579] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822702.perfdata.service
2015-06-08 13:32:32 [19579] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822702.perfdata.service to /usr/local/nagios/var/spool/perfdata//1431822702.perfdata.service-PID-19579 for bulk update
2015-06-08 13:32:32 [19588] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19588] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19588] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822732.perfdata.service
2015-06-08 13:32:32 [19588] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822732.perfdata.service to /usr/local/nagios/var/spool/perfdata//1431822732.perfdata.service-PID-19588 for bulk update
2015-06-08 13:32:32 [19589] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19589] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19589] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822747.perfdata.host
2015-06-08 13:32:32 [19589] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822747.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822747.perfdata.host-PID-19589 for bulk update
2015-06-08 13:32:32 [19585] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19584] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19585] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19585] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822732.perfdata.host
2015-06-08 13:32:32 [19585] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822732.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822732.perfdata.host-PID-19585 for bulk update
2015-06-08 13:32:32 [19584] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19584] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822717.perfdata.service
2015-06-08 13:32:32 [19584] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822717.perfdata.service to /usr/local/nagios/var/spool/perfdata//1431822717.perfdata.service-PID-19584 for bulk update
2015-06-08 13:32:32 [19590] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19590] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19590] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822747.perfdata.service
2015-06-08 13:32:32 [19590] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822747.perfdata.service to /usr/local/nagios/var/spool/perfdata//1431822747.perfdata.service-PID-19590 for bulk update
2015-06-08 13:32:32 [19594] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19594] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19594] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822762.perfdata.host
2015-06-08 13:32:32 [19594] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822762.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822762.perfdata.host-PID-19594 for bulk update
2015-06-08 13:32:32 [19595] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19595] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19595] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822762.perfdata.service
2015-06-08 13:32:32 [19595] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822762.perfdata.service to /usr/local/nagios/var/spool/perfdata//1431822762.perfdata.service-PID-19595 for bulk update
2015-06-08 13:32:32 [19598] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19598] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19598] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822777.perfdata.host
2015-06-08 13:32:32 [19598] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822777.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822777.perfdata.host-PID-19598 for bulk update
2015-06-08 13:32:32 [19600] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19599] [2] Using Config File /usr/local/nagios/etc/pnp/process_perfdata.cfg parameters
2015-06-08 13:32:32 [19600] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19600] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822792.perfdata.host
2015-06-08 13:32:32 [19600] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822792.perfdata.host to /usr/local/nagios/var/spool/perfdata//1431822792.perfdata.host-PID-19600 for bulk update
2015-06-08 13:32:32 [19599] [1] process_perfdata.pl-0.4.14 starting in BULK Mode called by NPCD
2015-06-08 13:32:32 [19599] [2] searching for /usr/local/nagios/var/spool/perfdata//1431822777.perfdata.service
2015-06-08 13:32:32 [19599] [2] renaming /usr/local/nagios/var/spool/perfdata//1431822777.perfdata.service to /usr/local/nagios/var/spool/perfdata//1431822777.perfdat

Re: High CPU/Load on npcd

Posted: Mon Jun 08, 2015 5:00 pm
by tmcdonald
Back to basics perhaps, but what is top showing for processes?

Re: High CPU/Load on npcd

Posted: Tue Jun 09, 2015 9:16 am
by cambevis
Nothing get above 5 (mysqld, httpd, etc) until I start up npcd, then npcd goes nuts. I was hoping I'd be able to track it back to a piece of bad data, but no such luck.

Re: High CPU/Load on npcd

Posted: Tue Jun 09, 2015 2:30 pm
by tgriep
Could you run the following and post the output here?

Code: Select all

ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l

Re: High CPU/Load on npcd

Posted: Wed Jun 10, 2015 8:09 am
by cambevis
Looks like there's a lot of perfdata!

[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/perfdata | wc -l
287227
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/checkresults | wc -l
0

Re: High CPU/Load on npcd

Posted: Wed Jun 10, 2015 9:17 am
by tgriep
When the npcd daemon isn't running, the performance data gets spooled in this folder

Code: Select all

/usr/local/nagios/var/spool/perfdata
and when it is restarted, it has to go through all of the files in that folder and since you have a lot of files in it, it will load the system until they are all processed.

You can either let the npcd process run through all of the files which could take a while or if you can delete them and start the npcd daemon but if you do this, you will lose the performance data that is in those files.