Page 1 of 1
No performance graph data since August
Posted: Wed Sep 11, 2013 1:44 pm
by akepley
RHEL 6.4
Manual installed Nagios XI, upgraded to 2012R2.3 today and didn't help the issue.
We've restarted NPCD today, made changes to timeout and load threshold (20 and 30.0) and restarted, still the data is not populating the graphs. Looks like it stopped working in August. We hoped an upgrade today would help but still not populating the graphs. We've reviewed the Wiki and can't figure out what to do next.
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 2:09 pm
by lmiltchev
Run the following commands, and show the output:
Code: Select all
ls /usr/local/nagios/var/spool/checkresults | wc -l
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
grep "nagiosramdisk" /usr/local/nagios/etc/nagios.cfg
top
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 2:20 pm
by akepley
lmiltchev wrote:Run the following commands, and show the output:
Code: Select all
ls /usr/local/nagios/var/spool/checkresults | wc -l
92
lmiltchev wrote:Code: Select all
ls /usr/local/nagios/var/spool/xidpe | wc -l
2
lmiltchev wrote:Code: Select all
ls /usr/local/nagios/var/spool/perfdata | wc -l
380122
lmiltchev wrote:Code: Select all
grep "nagiosramdisk" /usr/local/nagios/etc/nagios.cfg
empty
Code: Select all
top - 14:20:21 up 39 min, 2 users, load average: 1.02, 1.41, 1.50
Tasks: 207 total, 1 running, 206 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 2.2%sy, 0.0%ni, 88.3%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3922928k total, 973940k used, 2948988k free, 50784k buffers
Swap: 1015800k total, 0k used, 1015800k free, 251892k cached
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 2:36 pm
by abrist
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 2:49 pm
by akepley
Code: Select all
[root@nagiosxi ~]# service npcd status
NPCD running (pid 28086).
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 2:53 pm
by abrist
Lets check the perfdata and npcd logs:
Code: Select all
tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 2:59 pm
by akepley
perfdata.log (I've replaced hostnames with "host")
Code: Select all
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/perfdata.log
2013-09-11 12:32:37 [1287] [0] *** process_perfdata.pl terminated on signal ALRM
2013-09-11 12:32:45 [1424] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-09-11 12:32:45 [1424] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-09-11 12:32:45 [1424] [0] *** TIMEOUT: Please check your npcd.cfg
2013-09-11 12:32:45 [1424] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata //1375985007.perfdata.service-PID-1424 deleted
2013-09-11 12:32:45 [1424] [0] *** Timeout while processing Host: "host" Service: "__Disk_Usage"
2013-09-11 12:32:45 [1424] [0] *** process_perfdata.pl terminated on signal ALRM
2013-09-11 12:32:54 [1587] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-09-11 12:32:54 [1587] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-09-11 12:32:54 [1587] [0] *** TIMEOUT: Please check your npcd.cfg
2013-09-11 12:32:54 [1587] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata //1375985067.perfdata.service-PID-1587 deleted
2013-09-11 12:32:54 [1587] [0] *** Timeout while processing Host: "host" Service: "Users"
2013-09-11 12:32:54 [1587] [0] *** process_perfdata.pl terminated on signal ALRM
2013-09-11 12:33:04 [1782] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-09-11 12:33:04 [1782] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-09-11 12:33:04 [1782] [0] *** TIMEOUT: Please check your npcd.cfg
2013-09-11 12:33:04 [1782] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata //1375985247.perfdata.service-PID-1782 deleted
2013-09-11 12:33:04 [1782] [0] *** Timeout while processing Host: "host" Service: "Users"
2013-09-11 12:33:04 [1782] [0] *** process_perfdata.pl terminated on signal ALRM
2013-09-11 12:33:12 [2045] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-09-11 12:33:12 [2045] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-09-11 12:33:12 [2045] [0] *** TIMEOUT: Please check your npcd.cfg
2013-09-11 12:33:12 [2045] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata //1375985547.perfdata.service-PID-2045 deleted
2013-09-11 12:33:12 [2045] [0] *** Timeout while processing Host: "host" Service: "CPU_Stats"
2013-09-11 12:33:12 [2045] [0] *** process_perfdata.pl terminated on signal ALRM
Looks like nothing in that log since we restarted NPCD this afternoon.
npcd.log
Code: Select all
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/npcd.log
[09-11-2013 14:56:08] NPCD: Regular File: 1376107182.perfdata.service
[09-11-2013 14:56:08] NPCD: A thread was started on thread_counter = 2
[09-11-2013 14:56:08] NPCD: Processing file 1376107182.perfdata.service with ID 139949410256640 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1376107182.perfdata.service
[09-11-2013 14:56:08] NPCD: Processing file '1376107182.perfdata.service'
[09-11-2013 14:56:08] NPCD: DEBUG: load 0.700000/30.000000
[09-11-2013 14:56:08] NPCD: ThreadCounter 3/5 File is 1376107197.perfdata.host
[09-11-2013 14:56:08] NPCD: Regular File: 1376107197.perfdata.host
[09-11-2013 14:56:08] NPCD: A thread was started on thread_counter = 3
[09-11-2013 14:56:08] NPCD: DEBUG: load 0.700000/30.000000
[09-11-2013 14:56:08] NPCD: ThreadCounter 4/5 File is 1376107197.perfdata.service
[09-11-2013 14:56:08] NPCD: Regular File: 1376107197.perfdata.service
[09-11-2013 14:56:08] NPCD: A thread was started on thread_counter = 4
[09-11-2013 14:56:08] NPCD: DEBUG: load 0.700000/30.000000
[09-11-2013 14:56:08] NPCD: ThreadCounter 5/5 File is 1376107212.perfdata.host
[09-11-2013 14:56:08] NPCD: Regular File: 1376107212.perfdata.host
[09-11-2013 14:56:08] NPCD: WARN: MAX Thread reached: 1376107212.perfdata.host comes later with ThreadCounter: 5
[09-11-2013 14:56:08] NPCD: DEBUG: Will wait for th['4']
[09-11-2013 14:56:08] NPCD: Processing file 1376107197.perfdata.host with ID 139949399766784 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1376107197.perfdata.host
[09-11-2013 14:56:08] NPCD: Processing file '1376107197.perfdata.host'
[09-11-2013 14:56:08] NPCD: Processing file 1376107197.perfdata.service with ID 139949389276928 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1376107197.perfdata.service
[09-11-2013 14:56:08] NPCD: Processing file '1376107197.perfdata.service'
[09-11-2013 14:56:11] NPCD: DEBUG: Will wait for th['3']
[09-11-2013 14:56:11] NPCD: DEBUG: Will wait for th['2']
[09-11-2013 14:56:11] NPCD: DEBUG: Will wait for th['1']
[09-11-2013 14:56:11] NPCD: DEBUG: Will wait for th['0']
Re: No performance graph data since August
Posted: Wed Sep 11, 2013 4:42 pm
by lmiltchev
What is the log level on the npcd.cfg and process_perfdata.cfg files?
Code: Select all
grep -i "log_level =" /usr/local/nagios/etc/pnp/process_perfdata.cfg
grep -i "log_level =" /usr/local/nagios/etc/pnp/npcd.cfg
If it is "0", modify both files, by changing the value to "1", and restart npcd:
Run the following commands, and show the output:
Code: Select all
grep -i "file_processing_interval" /usr/local/nagios/etc/nagios.cfg
ll /usr/local/nagios/libexec/process_perfdata.pl
Open the "/usr/local/nagios/etc/pnp/process_perfdata.cfg" in a text editor, and set:
save, exit, restart npcd:
tail both logs, and show the output:
Code: Select all
tail 30 /usr/local/nagios/var/npcd.log
tail 30 /usr/local/nagios/var/perfdata.log
Re: No performance graph data since August
Posted: Thu Sep 12, 2013 8:54 am
by akepley
Made the changes to log level on the process_perfdata.cfg since it was at 0. Left the npcd.cfg at -1 because I don't know if that will work just as well as 1. Let me know if I should still change it.
Code: Select all
[root@nagiosxi ~]# grep -i "log_level =" /usr/local/nagios/etc/pnp/process_perfdata.cfg
LOG_LEVEL = 0
[root@nagiosxi ~]# grep -i "log_level =" /usr/local/nagios/etc/pnp/npcd.cfg
# log_level = <integer value>
log_level = -1
[root@nagiosxi ~]# grep -i "file_processing_interval" /usr/local/nagios/etc/nagios.cfg
service_perfdata_file_processing_interval=15
host_perfdata_file_processing_interval=15
[root@nagiosxi ~]# ll /usr/local/nagios/libexec/process_perfdata.pl
-rwxr-xr-x. 1 nagios nagios 42724 Dec 5 2012 /usr/local/nagios/libexec/process_perfdata.pl
TIMEOUT = 20
[root@nagiosxi ~]# tail -30 /usr/local/nagios/var/npcd.log
[09-12-2013 08:51:59] NPCD: ThreadCounter 1/5 File is 1377138613.perfdata.host
[09-12-2013 08:51:59] NPCD: Regular File: 1377138613.perfdata.host
[09-12-2013 08:51:59] NPCD: Processing file 1377138598.perfdata.service with ID 140527962314496 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1377138598.perfdata.service
[09-12-2013 08:51:59] NPCD: A thread was started on thread_counter = 1
[09-12-2013 08:51:59] NPCD: Processing file '1377138598.perfdata.service'
[09-12-2013 08:51:59] NPCD: DEBUG: load 0.140000/30.000000
[09-12-2013 08:51:59] NPCD: ThreadCounter 2/5 File is 1377138613.perfdata.service
[09-12-2013 08:51:59] NPCD: Regular File: 1377138613.perfdata.service
[09-12-2013 08:51:59] NPCD: A thread was started on thread_counter = 2
[09-12-2013 08:51:59] NPCD: DEBUG: load 0.140000/30.000000
[09-12-2013 08:51:59] NPCD: ThreadCounter 3/5 File is 1377138628.perfdata.host
[09-12-2013 08:51:59] NPCD: Regular File: 1377138628.perfdata.host
[09-12-2013 08:51:59] NPCD: Processing file 1377138613.perfdata.host with ID 140527951824640 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1377138613.perfdata.host
[09-12-2013 08:51:59] NPCD: Processing file 1377138613.perfdata.service with ID 140527941334784 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1377138613.perfdata.service
[09-12-2013 08:51:59] NPCD: Processing file '1377138613.perfdata.host'
[09-12-2013 08:51:59] NPCD: Processing file '1377138613.perfdata.service'
[09-12-2013 08:51:59] NPCD: Processing file 1377138628.perfdata.host with ID 140527930844928 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1377138628.perfdata.host
[09-12-2013 08:51:59] NPCD: A thread was started on thread_counter = 3
[09-12-2013 08:51:59] NPCD: Processing file '1377138628.perfdata.host'
[09-12-2013 08:51:59] NPCD: DEBUG: load 0.140000/30.000000
[09-12-2013 08:51:59] NPCD: ThreadCounter 4/5 File is 1377138628.perfdata.service
[09-12-2013 08:51:59] NPCD: Regular File: 1377138628.perfdata.service
[09-12-2013 08:51:59] NPCD: A thread was started on thread_counter = 4
[09-12-2013 08:51:59] NPCD: DEBUG: load 0.140000/30.000000
[09-12-2013 08:51:59] NPCD: ThreadCounter 5/5 File is 1377138643.perfdata.host
[09-12-2013 08:51:59] NPCD: Regular File: 1377138643.perfdata.host
[09-12-2013 08:51:59] NPCD: WARN: MAX Thread reached: 1377138643.perfdata.host comes later with ThreadCounter: 5
[09-12-2013 08:51:59] NPCD: DEBUG: Will wait for th['4']
[09-12-2013 08:51:59] NPCD: Processing file 1377138628.perfdata.service with ID 140527920355072 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1377138628.perfdata.service
[09-12-2013 08:51:59] NPCD: Processing file '1377138628.perfdata.service'
[root@nagiosxi ~]# tail -30 /usr/local/nagios/var/perfdata.log
2013-09-12 08:52:27 [9906] [1] Found Performance Data for nasbaorgwp.nasba.int / __Disk_Usage (/=19517MB;31075;34959;0;38844)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for cpacentral.nasba.org / HTTP (time=0.062113s;;;0.000000 size=13622B;;;0)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for web_mantis.nasba.int / Web_Page_Content (time=0.224373s;;;0.000000 size=625B;;;0)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for drcpt.nasba.dr / Open_Files (opened_files=704;29655;49425)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for nasbaorgwp.nasba.int / Swap_Usage (swap=972MB;0;0;0;991)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for nasweb.nasba.int / CPU_Stats (user=0.00% system=0.20% iowait=0.00%;85;95 idle=99.80%)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for phoneftp.nsbaonp.int / Ping (rta=49.530ms;3000.000;5000.000;0; pl=0%;80;100;;)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for cpamobility.nasba.int / Load (load1=0.000;15.000;30.000;0; load5=0.000;10.000;20.000;0; load15=0.000;5.000;10.000;0;)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for web02.nasba.qa / Ping (rta=49.690ms;3000.000;5000.000;0; pl=0%;80;100;;)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for Sonicwall_One_Nashville_Place / Ping (rta=49.158ms;3000.000;5000.000;0; pl=0%;80;100;;)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for cpaesdb01.nasba.int / Open_Files (opened_files=608;979716;1632861)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for drkvm03.nasba.dr / Users (users=0;5;10;0)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for db02qa2.nasba.int / Ping (rta=49.162ms;3000.000;5000.000;0; pl=0%;80;100;;)
2013-09-12 08:52:27 [9912] [1] Found Performance Data for drkvm03.nasba.dr / _boot_Disk_Usage (/boot=90MB;387;435;0;484)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for web04.nasba.qa / Ping (rta=49.143ms;3000.000;5000.000;0; pl=0%;80;100;;)
2013-09-12 08:52:27 [9912] [1] 132 lines processed
2013-09-12 08:52:27 [9912] [1] /usr/local/nagios/var/spool/perfdata//1377139063.perfdata.service-PID-9912 deleted
2013-09-12 08:52:27 [9912] [1] PNP exiting (runtime 0.067544s) ...
2013-09-12 08:52:27 [9906] [1] Found Performance Data for nasbaorgwp.nasba.int / Open_Files (opened_files=960;239567;399279)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for db02qa2.nasba.int / Load (load1=0.000;15.000;30.000;0; load5=0.000;10.000;20.000;0; load15=0.000;5.000;10.000;0;)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for drdb02.nasba.dr / Ping (rta=42.422ms;3000.000;5000.000;0; pl=0%;80;100;;)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for ui02.kpmg.int / __Disk_Usage (/=2366MB;40316;45356;0;50396)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for logserver.nasba.int / __Disk_Usage (/=41073MB;106878;120238;0;133598)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for wordpress02.nasba.int / __Disk_Usage (/=3006MB;30268;34052;0;37836)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for prodrel.nasba.int / Load (load1=0.040;15.000;30.000;0; load5=0.030;10.000;20.000;0; load15=0.000;5.000;10.000;0;)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for cpaesweb04.nasba.int / Memory_Usage (total=7870MB free=7601MB used=533MB shared=0 buffers=106MB cached=264MB)
2013-09-12 08:52:27 [9906] [1] Found Performance Data for ces.nasba.int / Memory_Usage (total=996MB free=112MB used=884MB shared=0 buffers=156MB cached=514MB)
2013-09-12 08:52:27 [9906] [1] 85 lines processed
2013-09-12 08:52:27 [9906] [1] /usr/local/nagios/var/spool/perfdata//1377139048.perfdata.service-PID-9906 deleted
2013-09-12 08:52:27 [9906] [1] PNP exiting (runtime 0.090943s) ...
Re: No performance graph data since August
Posted: Thu Sep 12, 2013 12:19 pm
by lmiltchev
You can change it to "1" if you wish. "-1" will give you too much info. Let's try removing all the files in the "/usr/local/nagios/var/spool/perfdata" directory. You will lose some perfdata, but we need to do this, so that you won't be timing out. Run the following commands:
Code: Select all
cd /usr/local/nagios/var/spool
rm -rf perfdata
mkdir perfdata
chown nagios:nagios perfdata
chmod 755 perfdata
service npcd restart
Wait for 15-20 min, and check if perf graphs started to show up.