Page 1 of 3

Performance graph data is missing

Posted: Fri Jul 29, 2016 8:31 am
by hillhealthcenter
We've noticed that historical data is missing from the MSSQL Average Wait Time check. There is no data in the graph prior to 1:36 a.m. today. This has happened one other time last week. We need help determining the cause and steps to correct the issue.

Re: Performance graph data is missing

Posted: Fri Jul 29, 2016 10:54 am
by lmiltchev
Are you having issues with this particular service only or with all of the services?

Go to Service Detail->"MSSQL Average Wait Time"->Advanced tab, and show us a screenshot of the page.

Run the following commands, and show the output in code wraps:

Code: Select all

uptime
service npcd status
tail -25 /usr/local/nagios/var/npcd.log
tail -25 /usr/local/nagios/var/perfdata.log
ps -ef | grep [p]erf
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l

Re: Performance graph data is missing

Posted: Fri Jul 29, 2016 12:54 pm
by hillhealthcenter
Here's the output from the commands and the screenshot of the performance graph is attached.

Code: Select all

[root@nagiosxi ~]# uptime
 13:33:15 up 30 days, 19:44,  1 user,  load average: 0.69, 0.48, 0.41
[root@nagiosxi ~]# service npcd status
NPCD running (pid 1700).
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/npcd.log
[07-29-2016 13:33:00] NPCD: No more files to process... waiting for 15 seconds
[07-29-2016 13:33:15] NPCD: Found 6 files in /usr/local/nagios/var/spool/perfdat                                                                                                                                                             a/
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is .
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is ..
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is 1469038813.perfdata.servic                                                                                                                                                             e-PID-7670
[07-29-2016 13:33:15] NPCD: File '1469038813.perfdata.service-PID-7670' is an al                                                                                                                                                             ready in process PNP file. Leaving it untouched.
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: ThreadCounter 0/5 File is 1469813582.perfdata.host
[07-29-2016 13:33:15] NPCD: Regular File: 1469813582.perfdata.host
[07-29-2016 13:33:15] NPCD: A thread was started on thread_counter = 0
[07-29-2016 13:33:15] NPCD: Processing file 1469813582.perfdata.host with ID -12                                                                                                                                                             16414864 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /us                                                                                                                                                             r/local/nagios/var/spool/perfdata//1469813582.perfdata.host
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: Processing file '1469813582.perfdata.host'
[07-29-2016 13:33:15] NPCD: ThreadCounter 1/5 File is 1469813582.perfdata.servic                                                                                                                                                             e
[07-29-2016 13:33:15] NPCD: Regular File: 1469813582.perfdata.service
[07-29-2016 13:33:15] NPCD: A thread was started on thread_counter = 1
[07-29-2016 13:33:15] NPCD: Processing file 1469813582.perfdata.service with ID                                                                                                                                                              -1226904720 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b                                                                                                                                                              /usr/local/nagios/var/spool/perfdata//1469813582.perfdata.service
[07-29-2016 13:33:15] NPCD: DEBUG: load 0.690000/10.000000
[07-29-2016 13:33:15] NPCD: Processing file '1469813582.perfdata.service'
[07-29-2016 13:33:15] NPCD: ThreadCounter 2/5 File is host-perfdata.1330987694-P                                                                                                                                                             ID-2109
[07-29-2016 13:33:15] NPCD: File 'host-perfdata.1330987694-PID-2109' is an alrea                                                                                                                                                             dy in process PNP file. Leaving it untouched.
[07-29-2016 13:33:15] NPCD: Have to wait: Filecounter = 4 - thread_counter = 2
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/perfdata.log
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: Deleting current file to avoid NPCD                                                                                                                                                              loops
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: Please check your process_perfdata.                                                                                                                                                             cfg
2016-07-28 20:35:05 [22389] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat                                                                                                                                                             a//1469752478.perfdata.service-PID-22389 deleted
2016-07-28 20:35:05 [22389] [0] *** process_perfdata.pl terminated on signal ALR                                                                                                                                                             M
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: Deleting current file to avoid NPCD                                                                                                                                                              loops
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: Please check your process_perfdata.                                                                                                                                                             cfg
2016-07-28 20:35:25 [30206] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat                                                                                                                                                             a//1469752493.perfdata.service-PID-30206 deleted
2016-07-28 20:35:25 [30206] [0] *** process_perfdata.pl terminated on signal ALR                                                                                                                                                             M
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: Deleting current file to avoid NPCD                                                                                                                                                              loops
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: Please check your process_perfdata.                                                                                                                                                             cfg
2016-07-28 23:17:48 [31044] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat                                                                                                                                                             a//1469762243.perfdata.service-PID-31044 deleted
2016-07-28 23:17:48 [31044] [0] *** process_perfdata.pl terminated on signal ALR                                                                                                                                                             M
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: Deleting current file to avoid NPCD                                                                                                                                                              loops
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: Please check your process_perfdata.                                                                                                                                                             cfg
2016-07-29 03:08:25 [12014] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat                                                                                                                                                             a//1469776082.perfdata.service-PID-12014 deleted
2016-07-29 03:08:25 [12014] [0] *** process_perfdata.pl terminated on signal ALR                                                                                                                                                             M
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: Deleting current file to avoid NPCD                                                                                                                                                              loops
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: Please check your process_perfdata.                                                                                                                                                             cfg
2016-07-29 03:08:26 [12015] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdat                                                                                                                                                             a//1469776082.perfdata.host-PID-12015 deleted
2016-07-29 03:08:26 [12015] [0] *** process_perfdata.pl terminated on signal ALR                                                                                                                                                             M
[root@nagiosxi ~]# ps -ef | grep [p]erf
nagios   27936 27924  0 13:33 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/                                                                                                                                                             local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log                                                                                                                                                              2>&1
nagios   27944 27936  0 13:33 ?        00:00:00 /usr/bin/php -q /usr/local/nagio                                                                                                                                                             sxi/cron/perfdataproc.php
nagios   31893  1700  0 13:33 ?        00:00:00 /usr/bin/perl /usr/local/nagios/                                                                                                                                                             libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//14698135                                                                                                                                                             82.perfdata.service
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/perfdata | wc -l
3
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/checkresults | wc -l
4
You have new mail in /var/spool/mail/root
[root@nagiosxi ~]#

Re: Performance graph data is missing

Posted: Fri Jul 29, 2016 1:25 pm
by tgriep
When the Nagios system's load is high, it will stop collecting performance data so the system will continue to run smoothly.
You can adjust those levels in the server so it will collect performance data in a higher load situation by doing the following.

Edit /usr/local/nagios/etc/pnp/process_perfdata.cfg
change the default value from:

Code: Select all

TIMEOUT = 5
To:

Code: Select all

TIMEOUT = 20
Edit /usr/local/nagios/etc/pnp/npcd.cfg
change the default value from:

Code: Select all

load_threshold = 40.0
To:

Code: Select all

load_threshold = 60.0
Save the files and restart the following by running these commands

service npcd restart
service nagios restart

Re: Performance graph data is missing

Posted: Fri Jul 29, 2016 2:30 pm
by hillhealthcenter
I made the recommended changes.

BTW, the load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg was set to 10.0, not 40.0. Should I leave it at 60.0?

Re: Performance graph data is missing

Posted: Mon Aug 01, 2016 9:07 am
by lmiltchev
BTW, the load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg was set to 10.0, not 40.0. Should I leave it at 60.0?
It depends on your resources. The value of "10.0" is for a single core CPU machine. You can use "20.0" for dual core, "40.0" - for quad core, etc.

Re: Performance graph data is missing

Posted: Mon Oct 24, 2016 3:16 pm
by hillhealthcenter
We upgraded to Nagios XI 5.3.1 this past Friday. Today we noticed that the graph is flat again.

Re: Performance graph data is missing

Posted: Mon Oct 24, 2016 3:25 pm
by ssax
Please run through this KB article and see if you can find a resolution:

https://support.nagios.com/kb/article.php?id=9

If you are still hitting the load_threshold and/or TIMEOUT we'll need to figure out what is causing it.

What's the load look like on your server?

Code: Select all

top

Re: Performance graph data is missing

Posted: Mon Oct 24, 2016 3:29 pm
by hillhealthcenter

Code: Select all

top - 16:29:01 up 27 days,  3:13,  2 users,  load average: 1.07, 0.86, 0.59
Tasks: 236 total,   1 running, 235 sleeping,   0 stopped,   0 zombie
Cpu(s): 24.0%us,  9.3%sy,  0.0%ni, 66.3%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4019748k total,  3217860k used,   801888k free,   242328k buffers
Swap:   262136k total,     1068k used,   261068k free,  2191612k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24637 apache    20   0 64208  31m 5004 S  5.6  0.8  13:11.80 httpd
 6578 apache    20   0 62136  30m 5108 S  5.3  0.8  13:40.42 httpd
27710 nagios    20   0 46236  22m 6908 S  5.3  0.6   0:00.16 php
 9506 apache    20   0 67412  34m 5296 S  5.0  0.9  12:11.77 httpd
 2806 apache    20   0 63284  30m 5164 S  4.7  0.8  15:35.92 httpd
 9959 apache    20   0 65332  31m 5224 S  4.7  0.8  12:09.20 httpd
27714 nagios    20   0 39492  16m 6824 S  3.3  0.4   0:00.10 php
27715 nagios    20   0 39124  16m 6812 S  3.3  0.4   0:00.10 php
27717 nagios    20   0 39232  16m 6856 S  3.3  0.4   0:00.10 php
27718 nagios    20   0 39232  16m 7244 S  3.3  0.4   0:00.10 php
 1573 mysql     20   0  176m  35m 3868 S  3.0  0.9   1747:30 mysqld
27720 nagios    20   0 39236  16m 6816 S  3.0  0.4   0:00.09 php
 9738 nagios    20   0 10160 3168 1008 S  1.0  0.1   0:51.46 ndo2db
 1764 postgres  20   0 19204 1432  512 S  0.3  0.0   6:22.53 postmaster
 1765 ajaxterm  20   0 22992 5280 1344 S  0.3  0.1  16:16.38 python
 9726 nagios    20   0 15008 6860 1456 S  0.3  0.2   0:40.83 nagios
 9730 nagios    20   0  3496  932  644 S  0.3  0.0   0:02.94 nagios

Re: Performance graph data is missing

Posted: Mon Oct 24, 2016 3:43 pm
by avandemore
Can you PM myself or another Nagios support rep your profile? I'd like to have a look at a few things.

XI > Admin > System Profile > Download Profile

Please include the zip file in your response.