Page 2 of 3

Re: Performance graph data is missing

Posted: Mon Oct 24, 2016 4:00 pm
by hillhealthcenter
I sent a PM to you with system profile zip file attached.

Re: Performance graph data is missing

Posted: Mon Oct 24, 2016 4:18 pm
by avandemore
I did get a message from you, however it didn't have an attachment. Can you try again?

Re: Performance graph data is missing

Posted: Tue Oct 25, 2016 7:35 am
by hillhealthcenter
I sent it again a few minutes ago. Did you get it?

Re: Performance graph data is missing

Posted: Tue Oct 25, 2016 9:55 am
by avandemore
Ok, I did receive this one. Unfortunately, it seems this profile was generated quite shortly after a reboot. A profile taken during a time period in which the symptoms are occurring is more helpful. Please send one of those when you can.

Is there any predictability to these missing perf data periods? Like time of day, triggered by certain action, etc?

We can try bumping different values in config files, but we'll really be shooting in the dark without a profile generated during bad perf data period.

Re: Performance graph data is missing

Posted: Tue Oct 25, 2016 10:20 am
by hillhealthcenter
The issue is that all of the historical trending prior to the upgrade is missing from the graph. The graph is flat prior to the upgrade.

Re: Performance graph data is missing

Posted: Tue Oct 25, 2016 10:54 am
by avandemore
Just to clarify as there seems to be conflicting reports.
  • All perf data for your MSSQL service is missing pre-upgrade.
  • Other perf data is correct, eg no gap(s).
  • There are not multiple gaps in MSSQl perf data, just pre-upgrade
Is this correct?

As before, a
Go to Service Detail->"MSSQL Average Wait Time"->Advanced tab, and show us a screenshot of the page.
should give a better illustration of the issue.

Re: Performance graph data is missing

Posted: Tue Oct 25, 2016 11:23 am
by hillhealthcenter
mssql_avg_wait_time.JPG

Re: Performance graph data is missing

Posted: Tue Oct 25, 2016 4:49 pm
by mcapra
Just so we're on the same page, did you run through the KB article posted by @ssax?
https://support.nagios.com/kb/article.php?id=9

Can I see the output of the following command (replace <host_name> with the host that is missing performance data):

Code: Select all

ls -al /usr/local/nagios/share/perfdata/<host_name>
I would be interested in seeing the contents of the following file as well in case they were reset by the update for some reason:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Also a fresh set of outputs for the previous commands @lmiltchev had you run:

Code: Select all

uptime
service npcd status
tail -25 /usr/local/nagios/var/npcd.log
tail -25 /usr/local/nagios/var/perfdata.log
ps -ef | grep [p]erf
ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Is moving this to an email ticket and scheduling a remote session an option?

Re: Performance graph data is missing

Posted: Wed Oct 26, 2016 12:24 pm
by hillhealthcenter

Code: Select all

root@nagiosxi ~]# ls -al /usr/local/nagios/share/perfdata/centcshdb
total 15976
drwxrwxr-x    2 nagios nagios   4096 Oct 26 12:45 .
drwxrwxr-x. 402 nagios nagios  20480 Oct 24 10:50 ..
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 centricityps_MSSQL_Database_Size.rrd
-rwxrwxr-x    1 nagios nagios   2465 Jul 29  2014 centricityps_MSSQL_Database_Size.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:43 CPU_Usage.rrd
-rw-rw-r--    1 nagios nagios   2132 Oct 26 12:43 CPU_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:43 Drive_C__Disk_Usage.rrd
-rw-rw-r--    1 nagios nagios   2298 Oct 26 12:43 Drive_C__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:40 Drive_D__Disk_Usage.rrd
-rw-rw-r--    1 nagios nagios   2320 Oct 26 12:40 Drive_D__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:45 Drive_E__Disk_Usage.rrd
-rw-rw-r--    1 nagios nagios   2320 Oct 26 12:45 Drive_E__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Dec 18  2015 Drive_I__Disk_Usage.rrd
-rwxrwxr-x    1 nagios nagios   1930 Dec 18  2015 Drive_I__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:44 Drive_J__Disk_Usage.rrd
-rw-rw-r--    1 nagios nagios   2317 Oct 26 12:44 Drive_J__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:40 Drive_K__Disk_Usage.rrd
-rw-rw-r--    1 nagios nagios   2313 Oct 26 12:40 Drive_K__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:45 Drive_L__Disk_Usage.rrd
-rw-rw-r--    1 nagios nagios   2313 Oct 26 12:45 Drive_L__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Mar 18  2016 Drive_Z__Disk_Usage.rrd
-rwxrwxr-x    1 nagios nagios   1933 Mar 18  2016 Drive_Z__Disk_Usage.xml
-rwxrwxr-x    1 nagios nagios 768008 Oct 26 12:45 _HOST_.rrd
-rw-rw-r--    1 nagios nagios   3953 Oct 26 12:45 _HOST_.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:44 Memory_Usage.rrd
-rw-rw-r--    1 nagios nagios   2287 Oct 26 12:44 Memory_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Active_Transactions.rrd
-rwxrwxr-x    1 nagios nagios   2365 Jul 29  2014 MSSQL_Active_Transactions.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:45 MSSQL_Average_Wait_Time.rrd
-rw-rw-r--    1 nagios nagios   2360 Oct 26 12:45 MSSQL_Average_Wait_Time.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:43 MSSQL_Buffer_Hit_Ratio.rrd
-rw-rw-r--    1 nagios nagios   2401 Oct 26 12:43 MSSQL_Buffer_Hit_Ratio.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:40 MSSQL_Checkpoint_Pages_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2402 Oct 26 12:40 MSSQL_Checkpoint_Pages_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:42 MSSQL_Connection_Time.rrd
-rw-rw-r--    1 nagios nagios   2302 Oct 26 12:42 MSSQL_Connection_Time.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:43 MSSQL_Database_Pages_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2447 Oct 26 12:43 MSSQL_Database_Pages_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:44 MSSQL_Deadlocks_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2275 Oct 26 12:44 MSSQL_Deadlocks_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:41 MSSQL_Free_Pages_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2356 Oct 26 12:41 MSSQL_Free_Pages_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:44 MSSQL_Lazy_Writes_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2304 Oct 26 12:44 MSSQL_Lazy_Writes_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:40 MSSQL_Lock_Requests_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2415 Oct 26 12:40 MSSQL_Lock_Requests_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:42 MSSQL_Lock_Timeouts_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2385 Oct 26 12:42 MSSQL_Lock_Timeouts_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:44 MSSQL_Lock_Waits_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2325 Oct 26 12:44 MSSQL_Lock_Waits_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Log_Cache_Hit_Rate.rrd
-rwxrwxr-x    1 nagios nagios   2456 Jul 29  2014 MSSQL_Log_Cache_Hit_Rate.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Log_File_Usage.rrd
-rwxrwxr-x    1 nagios nagios   2373 Jul 29  2014 MSSQL_Log_File_Usage.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Log_Flush_Wait_Time.rrd
-rwxrwxr-x    1 nagios nagios   2366 Jul 29  2014 MSSQL_Log_Flush_Wait_Time.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Log_Growths.rrd
-rwxrwxr-x    1 nagios nagios   2313 Jul 29  2014 MSSQL_Log_Growths.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Log_Shrinks.rrd
-rwxrwxr-x    1 nagios nagios   2299 Jul 29  2014 MSSQL_Log_Shrinks.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Log_Truncations.rrd
-rwxrwxr-x    1 nagios nagios   2397 Jul 29  2014 MSSQL_Log_Truncations.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:41 MSSQL_Page_Looks_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2391 Oct 26 12:41 MSSQL_Page_Looks_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:44 MSSQL_Page_Reads_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2356 Oct 26 12:44 MSSQL_Page_Reads_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:43 MSSQL_Page_Splits_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2349 Oct 26 12:43 MSSQL_Page_Splits_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:45 MSSQL_Page_Writes_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2373 Oct 26 12:45 MSSQL_Page_Writes_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:45 MSSQL_Readaheads_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2361 Oct 26 12:45 MSSQL_Readaheads_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:40 MSSQL_Stolen_Pages_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2402 Oct 26 12:40 MSSQL_Stolen_Pages_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:42 MSSQL_Target_Pages_Per_Sec.rrd
-rw-rw-r--    1 nagios nagios   2397 Oct 26 12:42 MSSQL_Target_Pages_Per_Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Jul 29  2014 MSSQL_Transactions___Sec.rrd
-rwxrwxr-x    1 nagios nagios   2437 Jul 29  2014 MSSQL_Transactions___Sec.xml
-rwxrwxr-x    1 nagios nagios 384736 Oct 26 12:43 Page_File_Usage.rrd
-rw-rw-r--    1 nagios nagios   2405 Oct 26 12:43 Page_File_Usage.xml
-rwxrwxr-x    1 nagios nagios 768008 Oct 26 12:44 Ping.rrd
-rw-rw-r--    1 nagios nagios   4063 Oct 26 12:44 Ping.xml
[root@nagiosxi ~]#

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg

#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout 
#
TIMEOUT = 20
#
# Use RRDs Perl Module
#
USE_RRDs = 1 
#
# 
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /usr/bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460 
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60
#
#
#
LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 0
#
# XML encoding
# The supported encodings are ISO-8859-1, UTF-8 and US-ASCII.
# http://www.php.net/xml-parser-create
XML_ENC = UTF-8
#
# EXPERIMENTAL rrdcached Support
# Use only with rrdtool svn revision 1511+
#
# RRD_DAEMON_OPTS = unix:/tmp/rrdcached.sock

Code: Select all

[root@nagiosxi ~]# uptime
 13:18:31 up 1 day, 20:33,  1 user,  load average: 0.31, 0.33, 0.42
[root@nagiosxi ~]# service npcd status
NPCD running (pid 1713).
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/npcd.log
[10-26-2016 13:18:28] NPCD: DEBUG: load 0.330000/60.000000
[10-26-2016 13:18:28] NPCD: ThreadCounter 0/5 File is ..
[10-26-2016 13:18:28] NPCD: DEBUG: load 0.330000/60.000000
[10-26-2016 13:18:28] NPCD: ThreadCounter 0/5 File is 1469038813.perfdata.service-PID-7670
[10-26-2016 13:18:28] NPCD: File '1469038813.perfdata.service-PID-7670' is an already in process PNP file. Leaving it untouched.
[10-26-2016 13:18:28] NPCD: DEBUG: load 0.330000/60.000000
[10-26-2016 13:18:28] NPCD: ThreadCounter 0/5 File is 1469890210.perfdata.service-PID-16865
[10-26-2016 13:18:28] NPCD: File '1469890210.perfdata.service-PID-16865' is an already in process PNP file. Leaving it untouched.
[10-26-2016 13:18:28] NPCD: DEBUG: load 0.330000/60.000000
[10-26-2016 13:18:28] NPCD: ThreadCounter 0/5 File is 1477502298.perfdata.host
[10-26-2016 13:18:28] NPCD: Regular File: 1477502298.perfdata.host
[10-26-2016 13:18:28] NPCD: A thread was started on thread_counter = 0
[10-26-2016 13:18:28] NPCD: Processing file 1477502298.perfdata.host with ID -1215702160 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1477502298.perfdata.host
[10-26-2016 13:18:28] NPCD: DEBUG: load 0.330000/60.000000
[10-26-2016 13:18:28] NPCD: Processing file '1477502298.perfdata.host'
[10-26-2016 13:18:28] NPCD: ThreadCounter 1/5 File is 1477502298.perfdata.service
[10-26-2016 13:18:28] NPCD: Regular File: 1477502298.perfdata.service
[10-26-2016 13:18:28] NPCD: A thread was started on thread_counter = 1
[10-26-2016 13:18:28] NPCD: Processing file 1477502298.perfdata.service with ID -1226192016 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1477502298.perfdata.service
[10-26-2016 13:18:28] NPCD: DEBUG: load 0.330000/60.000000
[10-26-2016 13:18:28] NPCD: ThreadCounter 2/5 File is host-perfdata.1330987694-PID-2109
[10-26-2016 13:18:28] NPCD: Processing file '1477502298.perfdata.service'
[10-26-2016 13:18:28] NPCD: File 'host-perfdata.1330987694-PID-2109' is an already in process PNP file. Leaving it untouched.
[10-26-2016 13:18:28] NPCD: Have to wait: Filecounter = 5 - thread_counter = 2
[10-26-2016 13:18:28] NPCD: No more files to process... waiting for 15 seconds
[root@nagiosxi ~]# tail -25 /usr/local/nagios/var/perfdata.log
2016-10-19 03:12:10 [17440] [0] *** process_perfdata.pl terminated on signal ALRM
2016-10-21 03:12:33 [2774] [0] *** TIMEOUT: Timeout after 20 secs. ***
2016-10-21 03:12:33 [2774] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-10-21 03:12:33 [2774] [0] *** TIMEOUT: Please check your npcd.cfg
2016-10-21 03:12:33 [2774] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1477033908.perfdata.service-PID-2774 deleted
2016-10-21 03:12:33 [2774] [0] *** Timeout while processing Host: "VMware_Switch_192.168.103.3" Service: "GigabitEthernet2_0_46_Bandwidth"
2016-10-21 03:12:33 [2774] [0] *** process_perfdata.pl terminated on signal ALRM
2016-10-21 03:12:33 [2772] [0] *** TIMEOUT: Timeout after 20 secs. ***
2016-10-21 03:12:33 [2772] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-10-21 03:12:33 [2772] [0] *** TIMEOUT: Please check your npcd.cfg
2016-10-21 03:12:33 [2772] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1477033927.perfdata.service-PID-2772 deleted
2016-10-21 03:12:33 [2772] [0] *** Timeout while processing Host: "Dntl_Basement.Stack_2960" Service: "Port_10636_Bandwidth"
2016-10-21 03:12:33 [2772] [0] *** process_perfdata.pl terminated on signal ALRM
2016-10-22 03:10:19 [20455] [0] *** TIMEOUT: Timeout after 20 secs. ***
2016-10-22 03:10:22 [20455] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-10-22 03:10:22 [20455] [0] *** TIMEOUT: Please check your npcd.cfg
2016-10-22 03:10:22 [20455] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1477120188.perfdata.service-PID-20455 deleted
2016-10-22 03:10:22 [20455] [0] *** Timeout while processing Host: "4500XCore" Service: "TenGigabitEthernet1_11_Bandwidth"
2016-10-22 03:10:22 [20455] [0] *** process_perfdata.pl terminated on signal ALRM
2016-10-24 03:09:19 [7575] [0] *** TIMEOUT: Timeout after 20 secs. ***
2016-10-24 03:09:19 [7575] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2016-10-24 03:09:19 [7575] [0] *** TIMEOUT: Please check your npcd.cfg
2016-10-24 03:09:19 [7575] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1477292916.perfdata.service-PID-7575 deleted
2016-10-24 03:09:19 [7575] [0] *** Timeout while processing Host: "INT.MED-2960-STACK1" Service: "GigabitEthernet5_0_19_Bandwidth"
2016-10-24 03:09:19 [7575] [0] *** process_perfdata.pl terminated on signal ALRM
[root@nagiosxi ~]# ps -ef | grep [p]erf
nagios   14796 14780  0 13:18 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios   14803 14796  0 13:18 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/perfdata | wc -l
3
[root@nagiosxi ~]# ls /usr/local/nagios/var/spool/checkresults | wc -l
4
[root@nagiosxi ~]#
Yes, please. Let's move to email and schedule a remote session. I'm available today until 5 pm EST.

Re: Performance graph data is missing

Posted: Wed Oct 26, 2016 3:45 pm
by tgriep
To open up an email ticket in our system, please send an email at [email protected] and copy the link to this forum post in the email.
Also, send in your system profile when you are opening the email ticket.
To send us your system profile. login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to the email.