Page 1 of 1
Bad charts on one host?
Posted: Thu Oct 10, 2013 4:24 pm
by vAJ
tale of two Nagios instances. Both monitor each other. The problem listed below has been around for a while, but I'm just now getting really tired of it.
vdc_mem.jpg
df_mem.jpg
The top memory chart is one we all know and love. Linux memory broken down by type. The bottom chart is only showing total. Now, they both have the same NRPE scripts installed right now. The bad server may have at one time had a broken NRPE package, but I believe I corrected that a while back. However, I believe the bad graphs were created earlier and may just need to be reset.
Also, the standard RRD charts don't show for datasources beyond 'total'.
df_mem_rrds.jpg
Yet the command output gives all of them:
Code: Select all
COMMAND: /usr/local/nagios/libexec/check_nrpe -H nagios.xxxxxx.com -t 30 -c check_mem -a '-w 20 -c 10'
OUTPUT: OK - 21284 / 24013 MB (88%) Free Memory, Used: 2729 MB, Shared: 0 MB, Buffers: 106 MB, Cached: 574 MB | total=24013MB free=21284MB used=2729MB shared=0 buffers=106MB cached=574MB
I've tried removing the host/services for the bad graph server, but the chart data remains. How do I purge this? It appears that GraphExplorer is using the RRD data, but is there a way to fix this without deleting the RRDs and letting Nagios recreate them?
Re: Bad charts on one host?
Posted: Thu Oct 10, 2013 4:25 pm
by vAJ
Both instances are running 2012R2.3
Re: Bad charts on one host?
Posted: Thu Oct 10, 2013 4:34 pm
by abrist
Lets increase the logging level, wait 15 minutes, and then check the logs:
Edit the file:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:
To:
Edit:
Code: Select all
/usr/local/nagios/etc/pnp/npcd.cfg
Change:
To:
Restart npcd:
Wait 15 minutes, and then check the logs and post the results:
Code: Select all
tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
WARNING! Make sure to reset the log levels to' 0' after we are done trouble shooting.
Re: Bad charts on one host?
Posted: Thu Oct 10, 2013 4:55 pm
by vAJ
Code: Select all
2013-10-10 14:50:54 [28736] [1] Found Performance Data for vweb411 / Active_FTP_Connections (Active Connections: %.f=0.000000%;40.000000;80.000000;)
2013-10-10 14:50:54 [28736] [2] No Custom Template found for check_xi_service_nsclient (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_nsclient.cfg)
2013-10-10 14:50:54 [28736] [2] Template is check_xi_service_nsclient.php
2013-10-10 14:50:54 [28736] [2] data2rrd called
2013-10-10 14:50:54 [28736] [2] RRDs::update /usr/local/nagios/share/perfdata/vweb411/Active_FTP_Connections.rrd 1381441844:0.000000
2013-10-10 14:50:54 [28736] [2] /usr/local/nagios/share/perfdata/vweb411/Active_FTP_Connections.rrd updated
2013-10-10 14:50:54 [28736] [2] Processing Line 2098
2013-10-10 14:50:54 [28736] [2] Datatype set to 'SERVICEPERFDATA'
2013-10-10 14:50:54 [28736] [1] Found Performance Data for vweb201 / Active_FTP_Connections (Active Connections: %.f=0.000000%;40.000000;80.000000;)
2013-10-10 14:50:54 [28736] [2] No Custom Template found for check_xi_service_nsclient (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_nsclient.cfg)
2013-10-10 14:50:54 [28736] [2] Template is check_xi_service_nsclient.php
2013-10-10 14:50:54 [28736] [2] data2rrd called
2013-10-10 14:50:54 [28736] [2] RRDs::update /usr/local/nagios/share/perfdata/vweb201/Active_FTP_Connections.rrd 1381441844:0.000000
2013-10-10 14:50:54 [28736] [2] /usr/local/nagios/share/perfdata/vweb201/Active_FTP_Connections.rrd updated
2013-10-10 14:50:54 [28736] [2] Processing Line 2099
2013-10-10 14:50:54 [28736] [2] Datatype set to 'SERVICEPERFDATA'
2013-10-10 14:50:54 [28736] [1] Found Performance Data for vweb378 / ASP_Requests_Per_Second (Requests /sec: %.f=0.000000%;1800.000000;2500.000000;)
2013-10-10 14:50:54 [28736] [2] No Custom Template found for check_xi_service_nsclient (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_nsclient.cfg)
2013-10-10 14:50:54 [28736] [2] Template is check_xi_service_nsclient.php
2013-10-10 14:50:54 [28736] [2] data2rrd called
2013-10-10 14:50:54 [28736] [2] RRDs::update /usr/local/nagios/share/perfdata/vweb378/ASP_Requests_Per_Second.rrd 1381441844:0.000000
2013-10-10 14:50:54 [28736] [2] /usr/local/nagios/share/perfdata/vweb378/ASP_Requests_Per_Second.rrd updated
2013-10-10 14:50:54 [28736] [1] 2099 lines processed
2013-10-10 14:50:54 [28736] [1] /usr/local/nagios/var/spool/perfdata//1381441849.perfdata.service-PID-28736 deleted
2013-10-10 14:50:54 [28736] [1] PNP exiting (runtime 2.562472s) ...
Code: Select all
[10-10-2013 14:51:09] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:09] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:09] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:51:24] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:24] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:24] NPCD: Processing file '1381441864.perfdata.host'
[10-10-2013 14:51:24] NPCD: Processing file '1381441864.perfdata.service'
[10-10-2013 14:51:24] NPCD: Processing file '1381441879.perfdata.host'
[10-10-2013 14:51:24] NPCD: Processing file '1381441879.perfdata.service'
[10-10-2013 14:51:28] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:51:43] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:43] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:43] NPCD: Processing file '1381441894.perfdata.host'
[10-10-2013 14:51:43] NPCD: Processing file '1381441894.perfdata.service'
[10-10-2013 14:51:44] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:51:59] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:59] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:59] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:52:14] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:52:14] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:52:14] NPCD: Processing file '1381441912.perfdata.host'
[10-10-2013 14:52:14] NPCD: Processing file '1381441912.perfdata.service'
[10-10-2013 14:52:14] NPCD: Processing file '1381441927.perfdata.host'
[10-10-2013 14:52:14] NPCD: Processing file '1381441927.perfdata.service'
[10-10-2013 14:52:15] NPCD: No more files to process... waiting for 15 seconds
Nothing appears to be related to the server looking at.
Wait, should this be done on the target server or the polling server?
Re: Bad charts on one host?
Posted: Fri Oct 11, 2013 10:44 am
by lmiltchev
Are are the timestamps on the RRDs (on the poling server)? Are they being updated?
Re: Bad charts on one host?
Posted: Fri Oct 11, 2013 10:53 am
by vAJ
Code: Select all
-rwxrwxr-x 1 nagios nagios 384952 Oct 11 08:47 Memory_Usage.rrd
-rw-rw-rw- 1 nagios nagios 5538 Oct 11 08:47 Memory_Usage.xml
Yes. I'm wondering is just moving the RRD out of the folder and letting it recreate might help. Good troubleshooting step if anything?
Re: Bad charts on one host?
Posted: Fri Oct 11, 2013 10:55 am
by slansing
Yes you can copy them to another folder, and then re-create the host/service object to see if it inherits the old rrd's.
Re: Bad charts on one host?
Posted: Fri Oct 11, 2013 11:03 am
by vAJ
I've already done that. Deleted this host and all services. Recreated and it assumed the orphaned RRDs.
I'm wondering if completely removing this one RRD (Memory Used) and letting the process recreate it will pose any problems. I don't need this data as it's broken anyhow. But if I can keep the rest of the RRDs for this host, I'd like to as they all hold good data.
Re: Bad charts on one host?
Posted: Fri Oct 11, 2013 11:11 am
by abrist
vAJ wrote:
I'm wondering if completely removing this one RRD (Memory Used) and letting the process recreate it will pose any problems
It will not create any problems, it will just generate a new rrd after 2 checks. Just a warning, this is not a hard, fast rule as mrtg bandwidth rrds can a bit more tricky. But for this service, removing the rrds should not do anything bad.
Re: Bad charts on one host?
Posted: Fri Oct 11, 2013 11:32 am
by vAJ
df_mem_good.jpg
That fixed it. I know the returned data was good and the XML compared identical to the good server. So it was just that I didn't rebuild the RRD months ago when the NRPE scripts were corrected on this server...
Thanks for the sounding board. Good to close this out.