Page 1 of 3
Performance graph data is missing
Posted: Fri Jul 29, 2016 8:31 am
by hillhealthcenter
We've noticed that historical data is missing from the MSSQL Average Wait Time check. There is no data in the graph prior to 1:36 a.m. today. This has happened one other time last week. We need help determining the cause and steps to correct the issue.
Re: Performance graph data is missing
Posted: Fri Jul 29, 2016 10:54 am
by lmiltchev
Are you having issues with this particular service only or with all of the services?
Go to Service Detail->"MSSQL Average Wait Time"->Advanced tab, and show us a screenshot of the page.
Run the following commands, and show the output in code wraps:
Code: Select all
uptime
service npcd status
tail -25 /usr/local/nagios/var/npcd.log
tail -25 /usr/local/nagios/var/perfdata.log
ps -ef | grep [p]erf
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Re: Performance graph data is missing
Posted: Fri Jul 29, 2016 12:54 pm
by hillhealthcenter
Here's the output from the commands and the screenshot of the performance graph is attached.
Code: Select all
[root@nagiosxi ~]# uptime
13:33:15 up 30 days, 19:44, 1 user, load average: 0.69, 0.48, 0.41
[root@nagiosxi ~]# service npcd status
NPCD running (pid 1700).
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/npcd.log
[07-29-2016 13:33:00] NPCD: No more files to process... waiting for 15 seconds
[07-29-2016 13:33:15] NPCD: Found 6 files in /usr/local/nagios/var/spool/perfdat a/
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is .
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is ..
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is 1469038813.perfdata.servic e-PID-7670
[07-29-2016 13:33:15] NPCD: File '1469038813.perfdata.service-PID-7670' is an al ready in process PNP file. Leaving it untouched.
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is 1469813582.perfdata.host
[07-29-2016 13:33:15] NPCD: Regular File: 1469813582.perfdata.host
[07-29-2016 13:33:15] NPCD: A thread was started on thread_counter = 0
[07-29-2016 13:33:15] NPCD: Processing file 1469813582.perfdata.host with ID -12 16414864 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /us r/local/nagios/var/spool/perfdata//1469813582.perfdata.host
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: Processing file '1469813582.perfdata.host'
[07-29-2016 13:33:15] NPCD: ThreadCounter 1/5 File is 1469813582.perfdata.servic e
[07-29-2016 13:33:15] NPCD: Regular File: 1469813582.perfdata.service
[07-29-2016 13:33:15] NPCD: A thread was started on thread_counter = 1
[07-29-2016 13:33:15] NPCD: Processing file 1469813582.perfdata.service with ID -1226904720 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1469813582.perfdata.service
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: Processing file '1469813582.perfdata.service'
[07-29-2016 13:33:15] NPCD: ThreadCounter 2/5 File is host-perfdata.1330987694-P ID-2109
[07-29-2016 13:33:15] NPCD: File 'host-perfdata.1330987694-PID-2109' is an alrea dy in process PNP file. Leaving it untouched.
[07-29-2016 13:33:15] NPCD: Have to wait: Filecounter = 4 - thread_counter = 2
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/perfdata.log
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: Please check your process_perfdata. cfg
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat a//1469752478.perfdata.service-PID-22389 deleted
2016-07-28 20:35:05 [22389] [0] *** process_perfdata.pl terminated on signal ALR M
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: Please check your process_perfdata. cfg
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat a//1469752493.perfdata.service-PID-30206 deleted
2016-07-28 20:35:25 [30206] [0] *** process_perfdata.pl terminated on signal ALR M
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: Please check your process_perfdata. cfg
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat a//1469762243.perfdata.service-PID-31044 deleted
2016-07-28 23:17:48 [31044] [0] *** process_perfdata.pl terminated on signal ALR M
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: Please check your process_perfdata. cfg
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat a//1469776082.perfdata.service-PID-12014 deleted
2016-07-29 03:08:25 [12014] [0] *** process_perfdata.pl terminated on signal ALR M
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: Please check your process_perfdata. cfg
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat a//1469776082.perfdata.host-PID-12015 deleted
2016-07-29 03:08:26 [12015] [0] *** process_perfdata.pl terminated on signal ALR M
[root@nagiosxi ~]# ps -ef | grep [p]erf
nagios 27936 27924 0 13:33 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/ local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios 27944 27936 0 13:33 ? 00:00:00 /usr/bin/php -q /usr/local/nagio sxi/cron/perfdataproc.php
nagios 31893 1700 0 13:33 ? 00:00:00 /usr/bin/perl /usr/local/nagios/ libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//14698135 82.perfdata.service
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/perfdata | wc -l
3
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/checkresults | wc -l
4
You have new mail in /var/spool/mail/root
[root@nagiosxi ~]#
Re: Performance graph data is missing
Posted: Fri Jul 29, 2016 1:25 pm
by tgriep
When the Nagios system's load is high, it will stop collecting performance data so the system will continue to run smoothly.
You can adjust those levels in the server so it will collect performance data in a higher load situation by doing the following.
Edit /usr/local/nagios/etc/pnp/process_perfdata.cfg
change the default value from:
To:
Edit /usr/local/nagios/etc/pnp/npcd.cfg
change the default value from:
To:
Save the files and restart the following by running these commands
service npcd restart
service nagios restart
Re: Performance graph data is missing
Posted: Fri Jul 29, 2016 2:30 pm
by hillhealthcenter
I made the recommended changes.
BTW, the load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg was set to 10.0, not 40.0. Should I leave it at 60.0?
Re: Performance graph data is missing
Posted: Mon Aug 01, 2016 9:07 am
by lmiltchev
BTW, the load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg was set to 10.0, not 40.0. Should I leave it at 60.0?
It depends on your resources. The value of "10.0" is for a single core CPU machine. You can use "20.0" for dual core, "40.0" - for quad core, etc.
Re: Performance graph data is missing
Posted: Mon Oct 24, 2016 3:16 pm
by hillhealthcenter
We upgraded to Nagios XI 5.3.1 this past Friday. Today we noticed that the graph is flat again.
Re: Performance graph data is missing
Posted: Mon Oct 24, 2016 3:25 pm
by ssax
Please run through this KB article and see if you can find a resolution:
https://support.nagios.com/kb/article.php?id=9
If you are still hitting the load_threshold and/or TIMEOUT we'll need to figure out what is causing it.
What's the load look like on your server?
Re: Performance graph data is missing
Posted: Mon Oct 24, 2016 3:29 pm
by hillhealthcenter
Code: Select all
top - 16:29:01 up 27 days, 3:13, 2 users, load average: 1.07, 0.86, 0.59
Tasks: 236 total, 1 running, 235 sleeping, 0 stopped, 0 zombie
Cpu(s): 24.0%us, 9.3%sy, 0.0%ni, 66.3%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 4019748k total, 3217860k used, 801888k free, 242328k buffers
Swap: 262136k total, 1068k used, 261068k free, 2191612k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24637 apache 20 0 64208 31m 5004 S 5.6 0.8 13:11.80 httpd
6578 apache 20 0 62136 30m 5108 S 5.3 0.8 13:40.42 httpd
27710 nagios 20 0 46236 22m 6908 S 5.3 0.6 0:00.16 php
9506 apache 20 0 67412 34m 5296 S 5.0 0.9 12:11.77 httpd
2806 apache 20 0 63284 30m 5164 S 4.7 0.8 15:35.92 httpd
9959 apache 20 0 65332 31m 5224 S 4.7 0.8 12:09.20 httpd
27714 nagios 20 0 39492 16m 6824 S 3.3 0.4 0:00.10 php
27715 nagios 20 0 39124 16m 6812 S 3.3 0.4 0:00.10 php
27717 nagios 20 0 39232 16m 6856 S 3.3 0.4 0:00.10 php
27718 nagios 20 0 39232 16m 7244 S 3.3 0.4 0:00.10 php
1573 mysql 20 0 176m 35m 3868 S 3.0 0.9 1747:30 mysqld
27720 nagios 20 0 39236 16m 6816 S 3.0 0.4 0:00.09 php
9738 nagios 20 0 10160 3168 1008 S 1.0 0.1 0:51.46 ndo2db
1764 postgres 20 0 19204 1432 512 S 0.3 0.0 6:22.53 postmaster
1765 ajaxterm 20 0 22992 5280 1344 S 0.3 0.1 16:16.38 python
9726 nagios 20 0 15008 6860 1456 S 0.3 0.2 0:40.83 nagios
9730 nagios 20 0 3496 932 644 S 0.3 0.0 0:02.94 nagios
Re: Performance graph data is missing
Posted: Mon Oct 24, 2016 3:43 pm
by avandemore
Can you PM myself or another Nagios support rep your profile? I'd like to have a look at a few things.
XI > Admin > System Profile > Download Profile
Please include the zip file in your response.