Page 1 of 2

Problem with graphs in Nagios

Posted: Thu Nov 08, 2012 4:40 am
by TSCAdmin
Hi,

We are using Nagios XI 2009R1.3 on CentOS 5.4 (final).

I am having an interesting issue here with the graphs for Disk monitoring. There are multiple disks being monitored on each server, some are showing the correct graphs, while some are blank.

In the /usr/local/nagios/share/perfdata/<server>/ directory there are two files for Disk Monitor:

Disk_Monitor.rrd
Disk_Monitor.xml

The servers on which graphs are being displayed has Disk_Monitor.rrd permission set to 666. I could not figure out why but on some servers the Disk_Monitor.rrd has 777 permissions. So if on the problematic graphing servers I set the permission to 666 it still does not display the graphs. So I have to remove both rrd and xml file and the graphs start appearing correctly after some time but they are completely new, all the old data is gone.

I can also see that the perfdata is being returned properly.

Is there an easy way to display graphs correctly on these hosts without removing the files?

Thanks

Re: Problem with graphs in Nagios

Posted: Thu Nov 08, 2012 8:21 am
by scottwilkerson
Can you show us the owner and group for the graphs that aren't working?

Also, have you ever restored a backup from a different architecture machine (32 to 64 bit)?

Re: Problem with graphs in Nagios

Posted: Thu Nov 08, 2012 9:19 am
by TSCAdmin
The owner and group for every file under /usr/local/nagios/share/perfdata/<server_names>/ is nagios and nagios. Here is an snippet:

Code: Select all

-rwxrwxrwx 1 nagios nagios 1151504 Nov  8 09:02 Disk_Monitor.rrd
-rwxrwxrwx 1 nagios nagios    1773 Nov  8 09:02 Disk_Monitor.xml
-rw-rw-rw- 1 nagios nagios  768232 Nov  8 09:10 _HOST_.rrd
-rw-rw-rw- 1 nagios nagios    1389 Nov  8 09:10 _HOST_.xml
-rw-rw-rw- 1 nagios nagios  768232 Nov  8 09:14 Ping.rrd
-rw-rw-rw- 1 nagios nagios    1567 Nov  8 09:14 Ping.xml
I had change the permissions for each Disk_Monitor.{rrd,xml} file to 666 after I raised this support request, so now it looks like:

Code: Select all

-rw-rw-rw- 1 nagios nagios 1151504 Nov  8 09:02 Disk_Monitor.rrd
-rw-rw-rw- 1 nagios nagios    1773 Nov  8 09:02 Disk_Monitor.xml
But graphs for this particular service is still blank, while I can see the graphs for Ping and HOST.

We have never restored the backup from anywhere on this machine. This is the original machine on which Nagios XI was initially installed, still up and running.

I hope this answers your queries. Thanks.

Re: Problem with graphs in Nagios

Posted: Thu Nov 08, 2012 10:28 am
by lmiltchev
Run the following command in terminal:

Code: Select all

chmod -R +x /usr/local/nagios/share/perfdata/
Let us know if this fixed your problem.

Re: Problem with graphs in Nagios

Posted: Fri Nov 09, 2012 1:34 am
by TSCAdmin
Hi,

Unfortunately it did not resolve the problem.

I have also attached an image of the perfdata if that helps:
perfdata.png

Thanks

Re: Problem with graphs in Nagios

Posted: Fri Nov 09, 2012 10:24 am
by slansing
Did you have the Nagios server down for any extended periods of time? Or did the graphs just stop responding one day. Did they ever generate performance data?

Re: Problem with graphs in Nagios

Posted: Fri Nov 09, 2012 1:31 pm
by TSCAdmin
Did you have the Nagios server down for any extended periods of time? - NO
Or did the graphs just stop responding one day. - We did not notice previously.
Did they ever generate performance data? - The problem is only with Disk partition graphs, a server with single or multiple partitions, the perfdata is being return properly.

Disk graphs are being generated properly for some hosts while they are blank on the others. If you have a look at the image I posted in the previous post you will notice that it is displaying the value for Warning and Critical threshold.

We are using the same disk monitor command for each server. Now if I remove the xml and rrd file for disk monitor for the servers which are displaying blank graphs it will start generating the fresh and working graphs but all the old data is gone.

Does that answer your questions?

Re: Problem with graphs in Nagios

Posted: Fri Nov 09, 2012 3:08 pm
by scottwilkerson
Did you change the plugin at some point? What can happen is that if the RRD file is created using one metric and then the check is changed and we now are receiving 2 metics the RRD file is no longer in the correct format.

In this case you would need to remove the RRD file as you mentioned and let a new one be created with the appropriate amount of space.

Re: Problem with graphs in Nagios

Posted: Sun Nov 11, 2012 6:41 am
by TSCAdmin
Did you change the plugin at some point? - NO

If you didn't pay much attention to my previous posts - I'm facing this issue only for some hosts, not all of them. Let's say 60 out of 100 are displaying graphs correctly while rest 40 aren't. The same plugin, same command definition and the same servicetemplate is used across all Disk checks.

Removing the RRD and XML file is understandable but is there an easy way to figure which hosts are displaying blank graphs for Disk monitor across 1000+ hosts? It would not be a good idea to remove all the RRDs including for the ones which are properly displaying the graphs.

Thanks

Re: Problem with graphs in Nagios

Posted: Mon Nov 12, 2012 11:32 am
by scottwilkerson
TSCAdmin wrote: is there an easy way to figure which hosts are displaying blank graphs for Disk monitor across 1000+ hosts?
Maybe... Can you compare the file size of one that is working and one that is not? being RRD's are a set size for the amount of data they are logging, if you have one Disk_Monitor.rrd that is working and one Disk_Monitor.rrd that is not, we may be able to detect the problem host by checking if some have different file sizes.