Bad charts on one host?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Bad charts on one host?

Post by vAJ »

tale of two Nagios instances. Both monitor each other. The problem listed below has been around for a while, but I'm just now getting really tired of it.
vdc_mem.jpg
df_mem.jpg
The top memory chart is one we all know and love. Linux memory broken down by type. The bottom chart is only showing total. Now, they both have the same NRPE scripts installed right now. The bad server may have at one time had a broken NRPE package, but I believe I corrected that a while back. However, I believe the bad graphs were created earlier and may just need to be reset.

Also, the standard RRD charts don't show for datasources beyond 'total'.
df_mem_rrds.jpg
Yet the command output gives all of them:

Code: Select all

COMMAND: /usr/local/nagios/libexec/check_nrpe -H nagios.xxxxxx.com -t 30 -c check_mem -a '-w 20 -c 10'
OUTPUT: OK - 21284 / 24013 MB (88%) Free Memory, Used: 2729 MB, Shared: 0 MB, Buffers: 106 MB, Cached: 574 MB | total=24013MB free=21284MB used=2729MB shared=0 buffers=106MB cached=574MB
I've tried removing the host/services for the bad graph server, but the chart data remains. How do I purge this? It appears that GraphExplorer is using the RRD data, but is there a way to fix this without deleting the RRDs and letting Nagios recreate them?
You do not have the required permissions to view the files attached to this post.
Andrew J. - Do you even grok?
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Bad charts on one host?

Post by vAJ »

Both instances are running 2012R2.3
Andrew J. - Do you even grok?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Bad charts on one host?

Post by abrist »

Lets increase the logging level, wait 15 minutes, and then check the logs:
Edit the file:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:

Code: Select all

LOG_LEVEL = 0
To:

Code: Select all

LOG_LEVEL = 2
Edit:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
Change:

Code: Select all

log_level = 0
To:

Code: Select all

log_level = -1
Restart npcd:

Code: Select all

service npcd restart
Wait 15 minutes, and then check the logs and post the results:

Code: Select all

tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
WARNING! Make sure to reset the log levels to' 0' after we are done trouble shooting.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Bad charts on one host?

Post by vAJ »

Code: Select all

2013-10-10 14:50:54 [28736] [1] Found Performance Data for vweb411 / Active_FTP_Connections (Active Connections: %.f=0.000000%;40.000000;80.000000;)
2013-10-10 14:50:54 [28736] [2] No Custom Template found for check_xi_service_nsclient (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_nsclient.cfg)
2013-10-10 14:50:54 [28736] [2] Template is check_xi_service_nsclient.php
2013-10-10 14:50:54 [28736] [2] data2rrd called
2013-10-10 14:50:54 [28736] [2] RRDs::update /usr/local/nagios/share/perfdata/vweb411/Active_FTP_Connections.rrd 1381441844:0.000000
2013-10-10 14:50:54 [28736] [2] /usr/local/nagios/share/perfdata/vweb411/Active_FTP_Connections.rrd updated
2013-10-10 14:50:54 [28736] [2] Processing Line 2098
2013-10-10 14:50:54 [28736] [2] Datatype set to 'SERVICEPERFDATA'
2013-10-10 14:50:54 [28736] [1] Found Performance Data for vweb201 / Active_FTP_Connections (Active Connections: %.f=0.000000%;40.000000;80.000000;)
2013-10-10 14:50:54 [28736] [2] No Custom Template found for check_xi_service_nsclient (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_nsclient.cfg)
2013-10-10 14:50:54 [28736] [2] Template is check_xi_service_nsclient.php
2013-10-10 14:50:54 [28736] [2] data2rrd called
2013-10-10 14:50:54 [28736] [2] RRDs::update /usr/local/nagios/share/perfdata/vweb201/Active_FTP_Connections.rrd 1381441844:0.000000
2013-10-10 14:50:54 [28736] [2] /usr/local/nagios/share/perfdata/vweb201/Active_FTP_Connections.rrd updated
2013-10-10 14:50:54 [28736] [2] Processing Line 2099
2013-10-10 14:50:54 [28736] [2] Datatype set to 'SERVICEPERFDATA'
2013-10-10 14:50:54 [28736] [1] Found Performance Data for vweb378 / ASP_Requests_Per_Second (Requests /sec: %.f=0.000000%;1800.000000;2500.000000;)
2013-10-10 14:50:54 [28736] [2] No Custom Template found for check_xi_service_nsclient (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_nsclient.cfg)
2013-10-10 14:50:54 [28736] [2] Template is check_xi_service_nsclient.php
2013-10-10 14:50:54 [28736] [2] data2rrd called
2013-10-10 14:50:54 [28736] [2] RRDs::update /usr/local/nagios/share/perfdata/vweb378/ASP_Requests_Per_Second.rrd 1381441844:0.000000
2013-10-10 14:50:54 [28736] [2] /usr/local/nagios/share/perfdata/vweb378/ASP_Requests_Per_Second.rrd updated
2013-10-10 14:50:54 [28736] [1] 2099 lines processed
2013-10-10 14:50:54 [28736] [1] /usr/local/nagios/var/spool/perfdata//1381441849.perfdata.service-PID-28736 deleted
2013-10-10 14:50:54 [28736] [1] PNP exiting (runtime 2.562472s) ...

Code: Select all

[10-10-2013 14:51:09] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:09] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:09] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:51:24] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:24] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:24] NPCD: Processing file '1381441864.perfdata.host'
[10-10-2013 14:51:24] NPCD: Processing file '1381441864.perfdata.service'
[10-10-2013 14:51:24] NPCD: Processing file '1381441879.perfdata.host'
[10-10-2013 14:51:24] NPCD: Processing file '1381441879.perfdata.service'
[10-10-2013 14:51:28] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:51:43] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:43] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:43] NPCD: Processing file '1381441894.perfdata.host'
[10-10-2013 14:51:43] NPCD: Processing file '1381441894.perfdata.service'
[10-10-2013 14:51:44] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:51:59] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:59] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:51:59] NPCD: No more files to process... waiting for 15 seconds
[10-10-2013 14:52:14] NPCD: File '1346144050.perfdata.host-PID-26456' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:52:14] NPCD: File '1360789252.perfdata.host-PID-22261' is an already in process PNP file. Leaving it untouched.
[10-10-2013 14:52:14] NPCD: Processing file '1381441912.perfdata.host'
[10-10-2013 14:52:14] NPCD: Processing file '1381441912.perfdata.service'
[10-10-2013 14:52:14] NPCD: Processing file '1381441927.perfdata.host'
[10-10-2013 14:52:14] NPCD: Processing file '1381441927.perfdata.service'
[10-10-2013 14:52:15] NPCD: No more files to process... waiting for 15 seconds
Nothing appears to be related to the server looking at.

Wait, should this be done on the target server or the polling server?
Andrew J. - Do you even grok?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Bad charts on one host?

Post by lmiltchev »

Are are the timestamps on the RRDs (on the poling server)? Are they being updated?
Be sure to check out our Knowledgebase for helpful articles and solutions!
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Bad charts on one host?

Post by vAJ »

Code: Select all

-rwxrwxr-x 1 nagios nagios  384952 Oct 11 08:47 Memory_Usage.rrd
-rw-rw-rw- 1 nagios nagios    5538 Oct 11 08:47 Memory_Usage.xml
Yes. I'm wondering is just moving the RRD out of the folder and letting it recreate might help. Good troubleshooting step if anything?
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Bad charts on one host?

Post by slansing »

Yes you can copy them to another folder, and then re-create the host/service object to see if it inherits the old rrd's.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Bad charts on one host?

Post by vAJ »

I've already done that. Deleted this host and all services. Recreated and it assumed the orphaned RRDs.

I'm wondering if completely removing this one RRD (Memory Used) and letting the process recreate it will pose any problems. I don't need this data as it's broken anyhow. But if I can keep the rest of the RRDs for this host, I'd like to as they all hold good data.
Andrew J. - Do you even grok?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Bad charts on one host?

Post by abrist »

vAJ wrote: I'm wondering if completely removing this one RRD (Memory Used) and letting the process recreate it will pose any problems
It will not create any problems, it will just generate a new rrd after 2 checks. Just a warning, this is not a hard, fast rule as mrtg bandwidth rrds can a bit more tricky. But for this service, removing the rrds should not do anything bad.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Bad charts on one host?

Post by vAJ »

df_mem_good.jpg
That fixed it. I know the returned data was good and the XML compared identical to the good server. So it was just that I didn't rebuild the RRD months ago when the NRPE scripts were corrected on this server...

Thanks for the sounding board. Good to close this out.
You do not have the required permissions to view the files attached to this post.
Andrew J. - Do you even grok?
Locked